Transcript
Page 1: Actively Transfer Domain Knowledge

Actively Transfer Domain Knowledge

Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren†

†Sun Yat-sen University‡IBM T. J. Watson Research Center

Transfer when you can, otherwise ask and don’t stretch it

Page 2: Actively Transfer Domain Knowledge

2

Standard Supervised Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

85.5%

Page 3: Actively Transfer Domain Knowledge

3

In Reality……

New York Times

training (labeled)

test (unlabeled)

New York Times

Labeled data are insufficient!

47.3%

How to improve the

performance?

Page 4: Actively Transfer Domain Knowledge

4

Solution I : Active Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

LabelDomain Expert

$

Labeling Cost

83.4%

Page 5: Actively Transfer Domain Knowledge

5

Solution II : Transfer Learning

Reuters

Out-of-domaintraining (labeled)

In-domaintest (unlabeled)

Transfer Classifier

New York Times

No guarantee transfer learning

could help!

Accuracydrops

Significant Differences

82.6%??43.5%

Page 6: Actively Transfer Domain Knowledge

6

Motivation

• Active Learning:– Labeling cost

• Transfer Learning:– Domain difference risk

Both have disadvantages,

what to choose?

Page 7: Actively Transfer Domain Knowledge

7

Active Learner choose

Proposed Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Label

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Test

Labeled

Training

Cla

ssif

ier

Unlabeled in-domainTraining Data

out-domain training

(labeled)

Page 8: Actively Transfer Domain Knowledge

8

Transfer Classifier

Mo

ML+

ML-

L+

L-

+

-X: In-domain

unlabeled

1. Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo).

2. Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-).

3. Then the probability for X to be “+” is:

T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)

Out-of-domain dataset (labeled)

In-domain labeled

(few)

P(L+|X, Mo )

P(L-|X, M o)

P(+|X, ML+)

P(+|X

, ML- )

Train

TrainTrain

Mo

L+

L-In-domain

labeled (very few)

ML+

ML-

Train

Train

L+ = { (x,y=+/-)|Mo(x)=‘L+’ }the true in-domain

label may be either‘-’or ‘+’

-/L--/L+

+/L-+/L+In-domain

Label

Transfer Mo mapping

Page 9: Actively Transfer Domain Knowledge

9

Active

Learner

Our Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Label

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Test

Labeled

Training

Cla

ssif

ier

unlabeledTraining Data

outdomain training

(labeled)

Page 10: Actively Transfer Domain Knowledge

when prediction by transfer classifier is unreliable, ask domain experts

10

Decision Function

Transfer Classifier

• In the following, ask the domain expert to label the instance, not the transfer classifier:

a) Conflict b) Low in confidence c) Few labeled in-domain examples

Page 11: Actively Transfer Domain Knowledge

11

Decision Function

a) Conflict? b) Confidence? c) Size?

Decision Function:

Label by Transfer ClassifierLabel by Domain Expert

R : random number [0,1]

AcTraK asks the domain expert to label the instance with probability of

T(x): prediction by the transfer classifierML(x): prediction given by the in-domain classifier

Page 12: Actively Transfer Domain Knowledge

12

• It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded.

• It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.

Properties

Page 13: Actively Transfer Domain Knowledge

13

Theorems

expected error of the transfer classifier

Maximum size

Page 14: Actively Transfer Domain Knowledge

14

• Data Sets

– Synthetic data sets– Remote Sensing: data collected from regions with a

specific ground surface condition data collected from a new region

– Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup)

• Comparable Models– Inductive Learning model: AdaBoost, SVM– Transfer Learning model: TrAdaBoost (ICML’07)– Active Learning model: ERS (ICML’01)

Experiments setup

Page 15: Actively Transfer Domain Knowledge

15

Experiments on Synthetic Datasets

In-domain:2 labeled training

&testing

4 out domain labeled training

Page 16: Actively Transfer Domain Knowledge

16

Experiments on Real World DatasetEvaluation metric:• Compared with transfer learning on accuracy.• Compared with active learning on IEA (Integral

Evaluation on Accuracy).

Page 17: Actively Transfer Domain Knowledge

17

1. Comparison with Transfer Learner

2. Comparison with Active Learner

20 Newsgroup

Accuracy Compari son

0. 45

0. 55

0. 65

0. 75

0. 85

1 2 3 4 5 6Datasets

Accuracy

SVMTrAdaBoostAcTraK

I EA(AcTraK, ERS, 250)

0

0. 5

1

1. 5

2

1 2 3 4 5 6

Datasets

IEA

• comparison with active learner ERS

Page 18: Actively Transfer Domain Knowledge

18

• Actively Transfer Domain Knowledge

– Reduce domain difference risk: transfer useful knowledge (Theorem 2)

– Reduce labeling cost: query domain experts only when necessary (Theorem 3)

Conclusions


Top Related