wracog: a gibbs sampling-based oversampling technique

Post on 21-May-2015

253 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This paper was presented at the International Conference on Data Mining, 2013.

TRANSCRIPT

Barnan DasSchool of Electrical Engineering and Computer Science

Washington State University

wRACOG: A Gibbs Sampling-Based Oversampling TechniqueBarnan Das, Narayanan C. Krishnan, Diane J. Cook

2

Imbalanced Class Distribution

3

Automated Prompting for Older Adults

4

Automated Prompting for Older Adults

Class Distribution

5

149

3831

Total number of data points

3980

Solution?

6

Preprocessing

Sampling• Over-sampling the minority class• Under-sampling the majority class

Oversampling• Spatial location of samples in Euclidean space

Proposed Approach

7

Preprocessing technique to oversample minority class

Approximate discrete probability distribution using

Generate new minority class data points using

Chow-Liu’s algorithm Gibbs sampling

Approximating Discrete Probability Distribution

8

Minority Class

Mutual Information Between Attributes

I (xi,xj)i = 1,2,…(n-1)j = 2,3,…,ni < j

Maximum-weighted Dependence Tree

Chow-Liu Dependence Tree

Gibbs Sampling

9

For all attributes

Chow-Liu Dependence Tree

Gibbs Sampling

10

Minority Class Samples

Majority Class Samples

Markov Chains

(wrapper-based)RApidly COnverging Gibbs sampler: RACOG & wRACOG

11

Differ in sample selection from Markov chains RACOG:• Based on burn-in and lag• Stopping criteria: predefined number of iterations• Effectiveness of new samples is not judged

wRACOG:• Iterative training on dataset, addition of

misclassified data points• Stopping criteria: No further improvement of

performance measure (TP rate)

Experimental Setup

12

Datasets

• prompting• abalone• car• nursery• letter• connect-4

Classifiers

• C4.5 decision tree

• SVM• k-Nearest

Neighbor• Logistic

Regression

Other Methods

• SMOTE• SMOTEBoost• RUSBoost

Results (Sensitivity)

13

Results (G-mean)

14

Results (ROC)

15

New Samples Generated

16

Iterations of Gibbs Sampler

17

Conclusion

18

• Oversampling technique to address imbalanced classes

• Takes probability distribution of minority class into account

• Performs better than other sampling methods

19

Backup Slides

20

top related