a simple probabilistic approach to learning from positive and unlabeled examples

15
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Upload: brick

Post on 13-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples. Dell Zhang (BBK) and Wee Sun Lee (NUS). Problem. Supervised Learning. Problem. Semi-Supervised Learning. Problem. PU Learning. Problem. Unlabeled Examples Help. Problem. PU Learning To distinguish - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

Dell Zhang (BBK) and Wee Sun Lee (NUS)

Page 2: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Problem

Supervised Learning

Page 3: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Problem

Semi-Supervised Learning

Page 4: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Problem

PU Learning

Page 5: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Problem

Unlabeled Examples Help

Page 6: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Problem

PU Learning To distinguish

the interesting instances (the positive class C+) with

other instances (the negative class C-)

by learning a classifier from a set of positive examples P and a set of unlabeled examples U

There is no labeled negative example!

Page 7: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Applications To automatically filter web pages according to a user's

preference the browsed or bookmarked pages can be used as positive examples while unlabeled examples can be easily collected from the web

To automatically find machine learning literature the ICML papers can be used as positive examples while unlabeled examples can be easily collected from the ACM or IEEE

digital library To automatically identify cancer patients

the patients known to have cancers can be used as positive examples while unlabeled examples can be easily collected from the patient

database To automatically discover future customers for direct

marketing the current customers of the company can be used as positive examples while unlabeled examples can be purchased at a low cost compared with

obtaining negative examples ……

Page 8: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Approaches Existing Approaches

PNB (Denis et al. 2002); PNCT (Denis et al. 2003)

S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003)

PEBL (Yu et al. 2004); SVMC (Yu 2005) PN-SVM (Fung et al. 2005) W-LR (Lee & Liu 2003); B-SVM (Liu et al.

2003) Our Proposed Approach

B-Pr

Page 9: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Our Approach

Cx

Cx

p

1 pP

U1

Pr[ | ] Pr[ | ](1 )P C p x x

Pr[ | ] Pr[ | ] Pr[ | ]U C p C x x x

A Probabilistic Model

Page 10: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Our Approach

1Pr[ | ] Pr[ | ] Pr[ | ] Pr[ | ]

1

pC C P U

p

x x x x

( ) sgn Pr[ | ] Pr[ | ]f b P U x x x

( ) sgn Pr[ | ] Pr[ | ]f x C C x x

(1 ) (1 )b p p

Page 11: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Our Approach

Biased PrTFIDF (B-Pr) Estimate

PrTFIDF (Joachims 1997) Estimmate

Maximize On a held-out validation set (Lee & Liu 2003)

Linear Time Complexity!

b2Pr[ ] Pr[ ( ) 1]pr C r f x

Pr[ | ] and Pr[ | ]P Ux x

Page 12: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Experiments

Reuters-21578

B-Pr>RC-SVM>PEBL (p=0.55)

RC-SVM>B-Pr>PEBL (p=0.85)

Page 13: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Experiments

20NewsGroups

B-Pr>W-LR>S-EM (p=0.3)

B-Pr>W-LR>S-EM (p=0.7)

Page 14: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Conclusion

A New Approach to Learning from Positive and Unlabeled Examples As effective as the state-of-the-art

approaches Yet simpler and faster

Page 15: A Simple Probabilistic Approach to  Learning from  Positive and Unlabeled Examples

Thank you

Questions? Comments? Suggestions? ……