rethinking the essence, flexibility and reusability of...

http://www.reframe-d2k.org/

Rethinking the Essence, Flexibility and Reusability of Advanced Model Exploitation

1

}  Give an overview of threshold selection methods in multi-label; (i) context-independent and (ii) context-based.

}  Study different setting of thresholds: global, label-wise, instance-wise.

}  Analysis of the performance of these methods using multi-label cost curves.

2

}  Training set: ◦  Data with multiple labels/target variables.

◦  Binary targets.

◦  Targets costs.

}  Training context: ◦  A model.

}  Deployment context: ◦  Operating condition: costs.

}  Solution: ◦  i.e. Thresholds tuning.

Multi-label Data

Threshold Selection Methods

4

}  Context-independent: Cost is ignored (Fixed) 1.  Fixed-score 2.  RCut 3.  MCut instance-wise

globally or label-wise

EX

Features Targets

X1 X2 X3 … XF Y1 Y2 Y3 … YL

1 0.2 0.6 0.7 0.9

2 0.1 0.4 0.5 0.7

3 0.0 0.2 0.3 0.4

.

N . . . . .

+ + + +

+ + + +

Global Fixed-score 0.5 !

5



EX

Features Targets

X1 X2 X3 … XF Y1 Y2 Y3 … YL

1 0.2 0.6 0.7 0.9

2 0.1 0.4 0.5 0.7

3 0.0 0.2 0.3 0.4

.

N . . . . .

+ + +

+

+

+

RCut top 2 most relevant !

Threshold per instance

6



EX

Features Targets

X1 X2 X3 … XF Y1 Y2 Y3 … YL

1 0.2 0.6 0.7 0.9

2 0.1 0.2 0.5 0.7

3 . . . .

.

N . . . . .

+ + +

MCut !

Threshold per instance

EX

Features Targets

X1 X2 X3 … XF Y1 Y2 Y3 … YL

1 0.3 0.2 0.1 0.0

2 0.4 0.3 0.2 0.0

3 0.5 0.5 0.4 0.1

.

N 0.9 0.7 0.6 . 0.2

+

+ +

c=0.5 c=0.2

Score-driven label-wise

+

+

Threshold per label

7

}  Context-based: Cost is considered 1.  Score-driven: = 2.  Rate-driven (PCut): = 3.  Optimal (SCut)

ctt R-1(c)


EX

Features Targets

X1 X2 X3 … XF Y1 Y2 Y3 … YL

1 0.3 0.2 0.1 0.0

2 0.4 0.3 0.2 0.0

3 0.5 0.5 0.4 0.1

.

N 0.9 0.7 0.6 . 0.2

+

cRate-driven label-wise

+

Threshold per label

= = 0.5 R

8

}  Context-based: Cost is considered 1.  Score-driven: = 2.  Rate-driven (PCut): = 3.  Optimal (SCut)

ctt R-1(c)


9

}  Datasets: 6 multi-label datasets from Mulan ◦  Names: Enron, Birds, Yeast, Flags, Scene, Emotion ◦  ♯ of labels: 6—53

}  Trained model: BR + logistic regression

}  Different thresholding methods

}  Different setting: globally, label-wise, instance-wise

}  Misclassification cost ◦  Case 1: One uniform cost ◦  Case 2: L uniform costs, average is used for global threshold

}  Evaluation: cost curves for multi-label

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Cost

Loss

0

Socre−Driven Rate−Driven Optimal Fixed10

Los

s

Cost

Fixed Rate-driven

Optimal Score-driven

11

●

●●●

●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Average labels costs

Aver

age

loss

ove

r all

labe

ls0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Average labels costs

Aver

age

loss

ove

r all

labe

ls

Cost curves for equal costs

Scatter plots for unequal costs

12

●

●●●

●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Enron Dataset: Global Threshold

Average Cost

Loss

Score-driven Optimal

Fixed

53 Targets

ScoresFrequencies

0.0 0.2 0.4 0.6 0.8 1.0

05000

15000

25000

13

1 uniform random variable

cost

Frequency

0.0 0.2 0.4 0.6 0.8 1.0

0100

300

500

uniform random variables

cost

Frequency

0.0 0.2 0.4 0.6 0.8 1.0

0200

600

1000

5 uniform random variables

cost

Frequency

0.2 0.4 0.6 0.8

0500

1000

1500

53 uniform random variables

cost

Frequency

0.35 0.40 0.45 0.50 0.55 0.60 0.65

0500

1000

2000

14

●

●

●

●

●●●●●●●●●●●

●

●●●●●●●

●●●●●●●●

●

●●●●●●●●●●

●●●●

●●●●

●●●

●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Enron Dataset: Label−wsie Thresholds

Average Cost

Loss

● ●

●●● ●

●●

●

●

●●●●

●●

●●●●●

●●●

●

●●

●

●

●●

●●

●

●●●●

●

●

●●●●●

●●●●

● ●●●

● ●●● ●

●●●●●●●

●●●●●●●

●

●

●●●●●●●●

●

●●●●●

●●

●●●●●

●

●●

0.0 0.2 0.4 0.6 0.8 1.00.

00.

10.

20.

30.

40.

5

Yeast Dataset: Label−wsie Thresholds

Average Cost

Loss

Fixed Score-driven

Rate-driven

Optimal

15

●

●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●

●●

●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Enron Dataset: Instance−wsie Thresholds

Average Cost

Loss

MCut

RCut

0

0.1

0.2

0.3

0.4

0.5

Enron Birds Yeast Flags Emotions Scene

Aver

age

Los

s

One Global threshold when label costs are different

Score_driven Fixed_score Optimal_train

0 0.1 0.2 0.3 0.4 0.5

Aver

age

Los

s

Label-wise thresholds when label costs are different

Score_driven Rate_driven Optimal_train 0

0.1 0.2 0.3 0.4 0.5

Aver

age

Los

s

Instance-wise thresholds when label costs are different

Rcut Mcut

}  In the paper: ◦  A structured presentation of multi-label thresholding methods. ◦  A link to binary classification thresholding methods. ◦  A comparative experimental results about the performance of different

thresholds methods.

}  Next time ◦  Depth i.e. �  Study a possible link between thresholds and evaluation metrics �  Non uniform cost �  Cost-based f-measure �  A cost per data point ◦  What do you think?

17

rethinking the essence, flexibility and reusability of...

Documents