dual transfer learning

Dual Transfer Learning

Mingsheng Long1,2, Jianmin Wang2, Guiguang Ding2

Wei Cheng, Xiang Zhang, and Wei Wang1Department of Computer Science and Technology

2School of Software, Tsinghua University, Beijing 100084, China

Outline

Motivation The Framework

Dual Transfer Learning

An ImplementationJoint Nonnegative Matrix Tri-Factorization

Experiments Conclusion

Notations

DomainFeature space

Two domains are different

TaskGiven feature space and label space Learn or estimate where

Two tasks are different

( ), where P Xx x or ( ) ( )s t s tX X P P x x

:f yx ( | )P y x ,X y Y x or , ( | ) ( | )s t s t s tY Y f f P y P y x x

X

YX

Motivation

Exploring the marginal distributions ( ) and ( )s tP Px x

Target comp.hard

ware

Sourcecomp.os

Latent factors

Task scheduling

Performance ArchitecturePower consumption

Cause the discrepancy between domains

Represent the commonality between domains

Motivation

Exploring the conditional distributions ( | ) and ( | )s tP y P yx x

Target comp.hard

ware

Sourcecomp.os

Model parameters

Task scheduling

→ comp

Performance

→comp

Architecture

→comp

Power consumption

→comp

Represent the commonality between tasks

The Framework: Dual Transfer Learning (DTL)

Simultaneously learning the marginal distribution and the conditional distributionMarginal mapping: learning the marginal distribution Conditional mapping: learning the conditional distribution

Exploring the duality for mutual reinforcementLearning one distribution can help to learn the other distribution

( )P x( | )P y x

Marginal MappingDistinct View

Marginal MappingCommon View

Marginal MappingDistinct View

Conditional Mapping

sX

tX

( )s s X

( )t t X

( )t X

( )s X , ( ) X

Nonnegative Matrix Tri-Factorization (NMTF)

TNMTF, ,

min m n m k k c n cL

U H V 0X U H V

k feature clusters, latent factors induce marginal mapping

c example classes, representing the categorical information

association between k feature clusters and c example classes, model parameters induce conditional mapping

m kU

k cH

n cV

, 'min 'L

U X 0

X UX

T

,min 'L

H V 0

X HV

: , ' ( ) ( ) ( ')m kR R P P x x x x

T: , ( ') ( | ) ( | ')k cR R P y P y v x x x

An Implementation: Joint NMTF

Marginal mapping: learning the marginal distribution

2T

, , ,min ,L

U U H V 0X U U H V

Target comp.hard

ware

Sourcecomp.os

Task scheduling

Performance ArchitecturePower consumption

Cause the discrepancy between domains

Represent the commonality between domains

Latent factors

2T

, , ,min ,L

U U H V 0X U U HV


Conditional mapping: learning the conditional distribution

Target comp.hard

ware

Sourcecomp.os

Task scheduling

→ comp

Performance

→comp

Architecture

→comp

Power consumption

→comp

Represent the commonality between tasks

Model parameters


Joint Nonnegative Matrix Tri-Factorization

Solution to the Joint NMTF optimization problem

2T

, , ,{ , }

T T

min ,

. . , , , { , }

s t

m k c n

L

s t s t

U U H V 0

X U U HV

U U 1 1 V 1 1

T

T T,

X V HU U

U U HV V H

T

T T,

X V HU U

U U HV V H

T

TT

,

, ,

X U U HV V

V H U U U U H

T

T T

,

, ,

U U X VH H

U U U U HV VDual Transfer Learning

Joint NMTF: Theoretical Analysis

DerivationFormulate a Lagrange function for the optimization problem

Use the KKT condition

ConvergenceProve it by the auxiliary function approach [Ding et al. KDD’06]

2T

TT T

TT T

,

tr , ,

tr

m k m k

n c n c

X U U HV

Γ U U 1 1 U U 1 1

Λ V 1 1 V 1 1

, , etc U VU 0 V 0

Experiments

Open data sets20-NewsgroupsReuters-21578

Each cross-domain data set8,000 documents, 15,000 features approximately

Evaluation Criteria

:

:

t

t

D f yAccuracy

D

x x x x

x x

Experiments

Non-transfer methods: NMF, SVM, LR, TSVM Transfer learning methods:

Co-Clustering based Classification (CoCC) [Dai et al. KDD’07]Matrix Tri-Factorization based Classification (MTrick) [Zhuang

et al. SDM’10]Dual Knowledge Transfer (DKT) [Wang et al. SIGIR’11]

Experiments

Parameter sensitivity and algorithm convergence

Conclusion

We proposed a novel Dual Transfer Learning (DTL) frameworkExploring the duality between the marginal distribution and

the conditional distribution for mutual reinforcement

We implemented a novel Joint NMTF algorithm based on the DTL framework

Experimental results validated that DTL is superior to the state-of-the-art single transfer learning methods

Any Questions?

Thank you!

dual transfer learning

Documents