jaime carbonell, pinar donmez, jingui he & vamshi ambati language technologies institute...

60
Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University www.cs.cmu.edu/~jgc 27 October 2010 Active and Proactive Machine Learning

Upload: claire-caldwell

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi AmbatiLanguage Technologies Institute

Carnegie Mellon Universitywww.cs.cmu.edu/~jgc

27 October 2010

Active and Proactive Machine Learning

Page 2: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

2

Why is Active Learning Important? Labeled data volumes unlabeled data

volumes 1.2% of all proteins have known structures < .01% of all galaxies in the Sloan Sky Survey have

consensus type labels < .0001% of all web pages have topic labels << E-10% of all internet sessions are labeled as to

fraudulence (malware, etc.) < .0001 of all financial transactions investigated

w.r.t. fraudulence

If labeling is costly, or limited, select the instances with maximal impact for learning

Page 3: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Is (Pro)Active Learning Relevant to Language Technologies?

Text Classification By topic, genre, difficulty, … In learning to rank search results

Question Answering Question-type classification Answer ranking

Machine Translation Selecting sentences to translate for

LDL’s Eliciting partial or full alignment

October 2010 Jaime G. Carbonell, Language Technolgies Institute

3

Page 4: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

4

Active Learning

Training data: Special case:

Functional space: Fitness Criterion: a.k.a. loss function

Sampling Strategy:

iinkiikiii yxOxyx :}{},{ ,...1,...1

}{ lj pf

),()(minarg ,

,lj

iipji

ljpfxfy

l

0k

},...,{|))ˆ,(ˆ(minarg 1},...,{ 1

kitesttestxxx

xxxyxfLnki

Page 5: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

5

Sampling Strategies

Random sampling (preserves distribution) Uncertainty sampling (Lewis, 1996; Tong & Koller, 2000)

proximity to decision boundary maximal distance to labeled x’s

Density sampling (kNN-inspired McCallum & Nigam, 2004) Representative sampling (Xu et al, 2003) Instability sampling (probability-weighted)

x’s that maximally change decision boundary Ensemble Strategies

Boosting-like ensemble (Baram, 2003) DUAL (Donmez & Carbonell, 2007)

Dynamically switches strategies from Density-Based to Uncertainty-Based by estimating derivative of expected residual error reduction

Page 6: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Which point to sample?Grey = unlabeled

Red = class A

Brown = class B

October 2010 6Jaime G. Carbonell, Language Technolgies Institute

Page 7: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Density-Based Sampling

Centroid of largest unsampled cluster

October 2010 7Jaime G. Carbonell, Language Technolgies Institute

Page 8: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Uncertainty Sampling

Closest to decision boundary

October 2010 8Jaime G. Carbonell, Language Technolgies Institute

Page 9: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Maximal Diversity Sampling

Maximally distant from labeled x’s

October 2010 9Jaime G. Carbonell, Language Technolgies Institute

Page 10: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Ensemble-Based Possibilities

Uncertainty + Diversity criteria

Density + uncertainty criteria

October 2010 10Jaime G. Carbonell, Language Technolgies Institute

Page 11: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

11

Strategy Selection: No Universal Optimum

• Optimal operating range for AL sampling strategies differs

• How to get the best of both worlds?

• (Hint: ensemble methods, e.g. DUAL)

Page 12: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

12

How does DUAL do better? Runs DWUS until it estimates a cross-over

Monitor the change in expected error at each iteration to detect when it is stuck in local minima

DUAL uses a mixture model after the cross-over ( saturation ) point

Our goal should be to minimize the expected future error If we knew the future error of Uncertainty

Sampling (US) to be zero, then we’d force But in practice, we do not know it

( )

t

DWUSx

^ ^21

( ) [( ) | ] 0i i it

DWUS E y y xn

^* 2argmax * [( ) | ] (1 ) * ( )

U

is i i ii I

x E y y x p x

1

Page 13: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

13

More on DUAL [ECML 2007] After cross-over, US does better => uncertainty score

should be given more weight should reflect how well US performs

can be calculated by the expected error of

US on the unlabeled data* =>

Finally, we have the following selection criterion for DUAL:

* US is allowed to choose data only from among the already sampled instances, and is calculated on the remaining unlabeled set to

^ ^ ^* 2argmax(1 ( )) * [( ) | ] ( ) * ( )

U

is i i ii I

x US E y y x US p x

^ ^

( )US

^( )US

Page 14: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

14

Results: DUAL vs DWUS

Page 15: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

15

Beyond Dual

Paired Sampling with Geodesic Density Estimation Donmez & Carbonell, SIAM 2008

Active Rank Learning Search results: Donmez & Carbonell, WWW 2008 In general: Donmez & Carbonell, ICML 2008

Structure Learning Inferring 3D protein structure from 1D sequence

Remains open problem

Page 16: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

16

Active Sampling for RankSVM I

Consider a candidate Assume is added to training set with Total loss on pairs that include is:

n is the # of training instances with a different label than

Objective function to be minimized becomes:

Page 17: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

17

Active Sampling for RankSVM II

Assume the current ranking function is There are two possible cases:

Assume

Derivative w.r.t at a single point

or

Page 18: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

18

Active Sampling for RankSVM III

Substitute in the previous equation to estimate

Magnitude of the total derivative

estimates the ability of to change the current ranker if added into training

Finally,

Page 19: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

19

Active Sampling for RankBoost I Again, estimate how the current ranker would change if was in the training set

Estimate this change by the difference in ranking loss before and after is added

Ranking loss w.r.t is (Freund et al., 2003):

Page 20: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

20

Active Sampling for RankBoost II Difference in the ranking loss between the current and the enlarged set:

indicates how much the current ranker needs to change to compensate for the loss introduced by the new instance

Finally, the instance with the highest loss differential is sampled:

Page 21: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

21

Results on TREC03

Page 22: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

22

Active vs Proactive Learning

Active Learning Proactive Learning

Number of Oracles Individual (only one) Multiple, with different capabilities, costs and areas of expertise

Reliability Infallible (100% right) Variable across oracles and queries, depending on difficulty, expertise, …

Reluctance Indefatigable (always answers)

Variable across oracles and queries, depending on workload, certainty, …

Cost per query Invariant (free or constant) Variable across oracles and queries, depending on workload, difficulty, …

Note: “Oracle” {expert, experiment, computation, …}

Page 23: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Active Learning is Awesome, but … is it Enough?

Single Perfect Source

Multiple Sources

Differing Expertise

Labeling Noise

Answer Reluctance

Fixed Labeling Cost

Varying-Cost Model

TaskDifficulty

Ambiguity

ExpertiseLevel

TraditionalTraditionalActiveActiveLearningLearning

GoingGoingBeyondBeyond

Fixed over time Time-varying23JMLR_’09

KDD ‘09

SDM_sub ‘10

CIKM ‘08

ProactiveProactiveLearningLearning

Page 24: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

24

Scenario 1: Reluctance

2 oracles: reliable oracle: expensive but always

answers with a correct label reluctant oracle: cheap but may not

respond to some queries Define a utility score as expected value

of information at unit cost

( | , ) * ( )( , )

k

P ans x k V xU x k

C

Page 25: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

25

How to estimate ?

Cluster unlabeled data using k-means Ask the label of each cluster centroid to the reluctant

oracle. If label received: increase of nearby points

no label: decrease of nearby points

equals 1 when label received, -1

otherwise

# clusters depend on the clustering budget and oracle fee

ˆ( | , )P ans x k

ˆ( | ,reluctant)P ans x

ˆ( | ,reluctant)P ans x

max( , )0.5ˆ( | ,reluctant) exp ln2

tt t

t

d cc ct

c

x xh x yP ans x x C

Z x x

( , ) { 1, 1}c ch x y

Page 26: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

26

Algorithm for Scenario 1

Page 27: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

27

Scenario 2: Fallibility Two oracles:

One perfect but expensive oracle One fallible but cheap oracle, always answers

Alg. Similar to Scenario 1 with slight modifications

During exploration: Fallible oracle provides the label with its confidence

Confidence = of fallible oracle

If then we don’t use the label

but we still update

ˆ( | ) [0.45,0.5]P y x

ˆ( | )P y x

ˆ(correct | , )P x k

Page 28: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

28

Scenario 3: Non-uniform Cost Uniform cost: Fraud detection, face recognition,

etc.

Non-uniform cost: text categorization, medical diagnosis, protein structure prediction, etc.

2 oracles:Fixed-cost OracleVariable-cost Oracle

ˆmax ( | ) 1( ) 1

1 1y Y

non unif

P y x YC x

Y

Page 29: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

29

Underlying Sampling Strategy Conditional entropy based sampling, weighted by a

density measure

Captures the information content of a close neighborhood

2

2{ 1} { 1}ˆ ˆˆ ˆ( ) log min ( | , ) exp * min ( | , )

xy y

k x N

U x P y x w x k P y k w

close neighbors of x

Page 30: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

30

Results: Reluctance

Page 31: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

31

Cost varies non-uniformly

statistically significant (p<0.01)

Page 32: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Sequential Bayesian Filtering

Tracking the states of multiple systems as each evolves over time

Sequentially arriving observations (noisy labels)

Goal: Estimate posterior

distribution

32

Changing Accuracy with time t

Noisy labels

SDM ‘10

Page 33: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

33

Predict Update

A Closer Look to the Model SDM ‘10

Page 34: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Predictor Selection

34Accuracy at the last time selected

Probability of accuracy

There is a chance that the accuracy might have increased

Our belief of the accuracy diverges over time as thesource goes unexplored

SDM ‘10

Page 35: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Copyright@2009 Pinar Donmez 35

Red: true Blue: estimatedBlack: mle

Page 36: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Does Tracking Predictor Accuracy Actually Help in Proactive Learning?

36

SDM ‘10

Page 37: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

37

Proactive Learning in General Multiple Experts (a.k.a. Oracles) Different areas of expertise Different costs Different reliabilities Different availability

What question to ask and whom to query? Joint optimization of query & oracle

selection Referals among Oracles (with referal fees) Learn about Oracle capabilities as well as

solving the Active Learning problem at hand Non-static Oracle properties

Page 38: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

38

Current Issues in Proactive Learning

Large numbers of oracles [Donmez, Carbonell & Schneider, KDD-2009]

Based on multi-armed bandit approach Non-stationary oracles [Donmez, Carbonell & Schneider, SDM-

2010]

Expertise changes with time (improve or decay) Exploration vs exploitation tradeoff

What if labeled set is empty for some classes? Minority class discovery (unsupervised) [He & Carbonell,

NIPS 2007, SIAM 2008, SDM 2009]

After first instance discovery proactive learning, or minority-class characterization [He & Carbonell, SIAM 2010]

Page 39: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

39

Minority Classes vs Outliers Rare classes

A group of points Clustered Non-separable from

the majority classes

Outliers A single point Scattered Separable

Page 40: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

The Big Picture

UnbalancedUnlabeledData Set

RareCategoryDetection

Learning inUnbalanced

Settings

Classifier

RawData

FeatureRepresentation

Relational

Temporal

FeatureExtraction

October 2010 40Jaime G. Carbonell, Language

Technolgies Institute

Page 41: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

41 7. Budget exhausted?

Minority Class Discovery Method

1. Calculate problem-specific similarity a

2. , , ix S , ,i iNN x a x A x x a ,i in NN x a

3.

,max

j ii i j

x NN x a ts n n

4. Query argmaxix S ix s

5. a new class?x

Increase t by 1

6. Output

No

Yes

x

No

2t RelevanceFeedback

Page 42: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

42

Summary of Real Data Sets Abalone

4177 examples 7-dimensional

features 20 classes Largest class:

16.50% Smallest class:

0.34%

Shuttle 4515 examples 9-dimensional

features 7 classes Largest class:

75.53% Smallest class:

0.13%

Page 43: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

43

Results on Real Data SetsAbalone Shuttle

MALICEMALICE

InterleaveInterleave

Random sampling Random sampling

Page 44: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

SourceLanguageCorpus

ModelModel

Trainer

MT System

SS

Active Learner

S,TS,T

Active Learning for MT

ExpertTranslator

Sampled corpus

Parallel corpus

October 2010 44Jaime G. Carbonell, Language Technolgies Institute

Page 45: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

S,T1

S,T1

SourceLanguageCorpus

ModelModel

Trainer

MT System

SS

ACT Framework

.

.

.

S,T2

S,T2

S,Tn

S,Tn

Active Crowd Translation

SentenceSelection

TranslationSelection

October 2010 45Jaime G. Carbonell, Language Technolgies Institute

Page 46: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Active Learning Strategy:Diminishing Density Weighted Diversity Sampling

46

|)(|

)]/(*[^)/(

)( )(

sPhrases

LxcounteULxP

Sdensity sPhrasesx

Lifx

Lifx

sPhrases

xcount

Sdiversity sPhrasesx

1

0

|)(|

)(*

)( )(

)()(

)(*)()1()(

2

2

SdiversitySdensity

SdiversitySdensitySScore

Experiments:Language Pair: Spanish-EnglishIterations: 20 Batch Size: 1000 sentences eachTranslation: Moses Phrase SMTDevelopment Set: 343 sensTest Set: 506 sens

Graph:X: Performance (BLEU )Y: Data (Thousand words)

October 2010 Jaime G. Carbonell, Language Technolgies Institute

Page 47: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Translation Selection from AMT

• Translator Reliability

• Translation Selection:

October 2010 47Jaime G. Carbonell, Language Technolgies Institute

Page 48: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Parting Thoughts

Proactive Learning New field just started New work and full details Domnez Dissertation Applications Abound: e-science (compbio),

finance, network securirty, language technologies (MT), …

Theory still in the making (e.g. Liu Yang) Open challenge: Proactive structure learning

Rare Class discovery and Classification Dovetails with Active/Proactive Learning New Work and Full Details Jingrui He

Dissertation

October 2010 Jaime G. Carbonell, Language Technolgies Institute

48

Page 49: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

49

THANK YOU!

Page 50: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

50

Specially Designed ExponentialFamilies [Efron & Tibshirani 1996]

Favorable compromise between parametric and nonparametric density estimation

Estimated density

xtxgxg T100 exp

Carrier density

Normalizing parameter

parameter vector1p

vector of sufficient statistics1p

Page 51: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

51

Page 52: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

52

SEDER Algorithm

Carrier density: kernel density estimator To decouple the estimation of different

parameters Decompose Relax the constraint such that

Tdxxxt221 ,,

d

j

j

1 00

jx

jjjjij

ji

j

jdxx

xx1exp

2exp

2

1 2

102

2

Page 53: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

53

Parameter Estimation Theorem 3 [SDM 2009]: the maximum likelihood

estimate and of and satisfy the following conditions:

where

dj ,,1

n

kn

i j

ji

jkj

i

n

i

jjij

ji

jkj

in

k

jk

xx

xExx

x1

1 2

2

0

1

2

2

2

0

1

2

2ˆexp

2ˆexp

j1

ji0j

1̂ji0̂

jx

jjjjij

ji

j

j

jjji dxx

xxxxE

2

102

222 ˆˆexp

2exp

2

1

Page 54: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

54

Parameter Estimation cont. Let

:

where ,

212

111

jjj

b

: positive parameterjb

dj ,,1A

ACBBb j

2

4ˆ2

n

kn

i j

ji

jk

n

i

jij

ji

jk

xx

xxx

nA

1

1 2

2

1

2

2

2

2exp

2exp

1

2jB

n

k

jkxn

C1

21

in most cases

1ˆ jb

Page 55: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

55

Scoring Function The estimated density

Scoring function: norm of the gradient

where

n

i

d

j jj

ji

jj

jjbb

xbx

bnxg

1 1 2

2

2exp

2

11~

d

l ll

n

i

li

llkki

k

b

xbxxDs

1 22

2

1

d

j jj

ji

jj

jjib

xbx

bnxD

1 2

2

2exp

2

11

Page 56: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

56

Summary of Real Data Sets

Data Set

n d m Largest Class

Smallest Class

Ecoli 336 7 6 42.56% 2.68%

Glass 214 9 6 35.51% 4.21%

Page Blocks 5473 10 5 89.77% 0.51%

Abalone 4177 7 20 16.50% 0.34%

Shuttle 4515 9 7 75.53% 0.13%

Moderately Skewed

Extremely Skewed

Page 57: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

57

Moderately Skewed Data Sets

Ecoli Glass

MALICE

MALICE

Page 58: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

GRADE: Full Prior Information

2. Calculate class-specific similarity ca

3. , , ix S , ,c ci iNN x a x A x x a ,c c

i in NN x a

4.

,max

cj i

c ci i j

x NN x a ts n n

5. Query argmaxix S ix s

6. class c?x

Increase t by 1

7. Output

No

Yes

x

1. For each rare class c, 2 c m

RelevanceFeedback

October 2010 58Jaime G. Carbonell, Language Technolgies Institute

Page 59: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

Results on Real Data Sets

Ecoli

Glass

Abalone

Shuttle

MALICE MALICE

MALICEMALICE

October 2010 59Jaime G. Carbonell, Language Technolgies Institute

Page 60: Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University jgc 27 October 2010

October 2010 Jaime G. Carbonell, Language Technolgies Institute

60

Performance Measures

MAP (Mean Average Precision)

MAP is the average of AP values for all queries

NDCG (Normalized Discounted Cumulative Gain) The impact of each relevant document is discounted as a function of rank position