split criterions for variable selection using decision trees

25
Split Criterions for Variable Selection using Decision Trees J. Abellán, A. R. Masegosa Department of Computer Science and A.I. University of Granada Spain

Upload: ntnu

Post on 13-Apr-2017

214 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Split Criterions for Variable Selection Using Decision Trees

Split Criterions for Variable Selection using Decision Trees

J. Abellán, A. R. Masegosa

Department of Computer Science and A.I. University of Granada

Spain

Page 2: Split Criterions for Variable Selection Using Decision Trees

Outline 1. Introduction

2. Previous knowledge

3. Experimentation

4. Conclusions & future work

Page 3: Split Criterions for Variable Selection Using Decision Trees

Introduction Information from a data base

Attribute variables Class variable

Data Base

Calcium Tumor Coma Migraine Cancer

normal a1 absent absent absent

high a1 present absent present

normal a1 absent absent absent

normal a1 absent absent absent

high ao present present absent

...... ...... ...... ...... ......

Page 4: Split Criterions for Variable Selection Using Decision Trees

Introduction Classificacion tree (decision tree)

Tumor

Classification: absent Calcium

Classification: absent

Classification: present

Attribute variableNode

Case of the class variableLeaf

SPLIT CRITERION STOP CRITERION 1 LEAF = 1 RULE

Page 5: Split Criterions for Variable Selection Using Decision Trees

Introduction Classification tree. New observation Observation: ( high, a1, absent, present) Variables: [Calcium, Tumor, Coma, Migraine] Classification: Cancer present

normal high

a0 a1 Classification :

absent

Classification: absent

Classification: present

Calcium

Tumor

Page 6: Split Criterions for Variable Selection Using Decision Trees

Introduction Principal problems for the clasifiers

Redundant attribute variables Irrelevant attribute variables Excessive number of variables

Variable Selection Methods Filter methods Wrapper methods (classifier dependency)

Mark A. Hall y G. Holmes, Benchmarking Attribute Selection Techniques for Discrete Class Data Mining, IEEE TKDE (2003)

Page 7: Split Criterions for Variable Selection Using Decision Trees

Introduction Variable selection with classification trees

Xa

Xd Xc Xb

Xe Xf Xg Xh Xi Xk Xj

……………………………………………………..

{Xa, Xb, Xc, Xd}

{Xa, Xb,..., Xk}

FRIST LEVELS MORE SIGNIFICATIVE VARIABLES

Page 8: Split Criterions for Variable Selection Using Decision Trees

Introduction Variable selection with classification trees

……………….......... …………

DB

DB

DB

Training set

SET1

SET2

SETm

SET1 U...U SETm

…………

Training set

Training set

Page 9: Split Criterions for Variable Selection Using Decision Trees

Introduction Variable selection with classification trees

……………….......... …………

DB

DB

DB

SET1

SET2

SETm

…………

INFORMATIVE ORDER FOR THE ROOT NODE (Abellán &

Masegosa, 2007) Training set

Training set

Training set

SET1 U...U SETm

Page 10: Split Criterions for Variable Selection Using Decision Trees

Introduction Approach of the work presented Stablish the most suitable split criterion for building

decision trees to use it as base for those compose methods for VARIABLE SELECTION.

The variables of the first levels of one decision tree are extracted.

The performance of this variables is evaluated with a Naive Bayes classifier.

We carry out EXPERIMENTS on a large set of data bases using well-known split criterions (InfoGain, IGRatio and GiniIndex) and another one based on imprecise probabilities (Abellán & Moral, 2003), Imprecise InfoGain.

Page 11: Split Criterions for Variable Selection Using Decision Trees

Outline 1. Introduction

2. Previous knowledge

3. Experimentation

4. Conclusions & future works

Page 12: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Naive Bayes (Duda & Hart, 1973)

Attribute variables {Xi | i=1,..,r} Class variable C with states in

{c1,..,ck} Select state of C: arg maxci

(P(ci|X)). Supposition of independecy

known the class variable: arg maxci

(P(ci) ∏rj=1P(zj|ci))

C

X1 X2 Xr

Graphical Structure

Page 13: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Split Criterions for decision trees: Info-Gain (Quinlan, 1986)

Selects the attribute variable with higher positive value of IG(Xi,C) = H(C)-H(C|Xi)

H(C) = -∑j P(cj) log P(cj)SHANNON ENTROPY H(C|Xi) = -∑t ∑jP(cj|xi

t) log P(cj|xit)

ID3 Work only with discrete data bases Have a tendence to select variables with great

number of cases

Page 14: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Split Criterions for decision trees: Info-Gain Ratio (Quinlan, 1993)

Selects the attribute variable with higher positive value of IGR(Xi,C) = IG(Xi,C)/ H(Xi)

C4.5

Work with continuous data bases Have a posterior prune process Penalize the use of variables with higher number of

cases

Page 15: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Split Criterions for decision trees: Gini index (Breiman et al., 1984)

Selects the attribute variable with higher positive

value of GIx(Xi,C) = gini(C|Xi)-gini(C) gini(C) = ∑j (1-P(cj))2

gini(C|Xi) = ∑t P(xit) gini(C|x

it)

GINI INDEX

Quantify the impurity degree of a partition (a “pure partition” has only values in one case of C)

Page 16: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Split Criterions for decision trees: Imprecise Info-Gain (Abellán & Moral, 2003)

Representing the information from a data base Imprecise Dirichlet Model (IDM) Probability estimation

j

jj

ccc

j IsNsn

sNn

cP ≡

+

+

+∈ ,)(

})(|{)(jcj IcqqCK ∈= })(|{)|( },{ ij xcji IcqqxXCK ∈==

Credal Sets

Page 17: Split Criterions for Variable Selection Using Decision Trees

Previos knowledges Split Criterions for decision trees: Imprecise Info-Gain (Abellán & Moral, 2003)

Select the attribute variable with higher positive value of:

IGI(Xi,C) = S(K(C)) - ∑t P(xit) S(K(C| Xi=x

it))

with S as Maximum entropy function of a credal set.

Global uncertainty measure ⊃ conflict & no-specificity

Conflict is on the side of ramification. No-specificity tries to reduce the ramification.

Page 18: Split Criterions for Variable Selection Using Decision Trees

Outline 1. Introduction

2. Previous knowledge

3. Experimentation

4. Conclusions & future works

Page 19: Split Criterions for Variable Selection Using Decision Trees

Experimentation Data Bases

Preprocess:

- Filling of missing data (average & mode)

- Discretization of continuous values

Aplication of selection methods

Aplication of NB on original BDs with the set of selected variables

•Percentage of correct classification of NB before and after the selection process •Number of variable selected

Page 20: Split Criterions for Variable Selection Using Decision Trees

Experimentation Results with 3 levels. Correct classifications

NB comparison:

Accumulated Comparison:

10 fold-cross x 10 times. Corrected paired t-test with 5% of significance level

Page 21: Split Criterions for Variable Selection Using Decision Trees

Experimentation Results with 3 levels. Number of variables

Accumulated Comparison:

Page 22: Split Criterions for Variable Selection Using Decision Trees

Experimentation Results with 4 levels Comparison over right classifications:

Comparison over number of variables:

Page 23: Split Criterions for Variable Selection Using Decision Trees

Experimentation Results Analysis 1. Only using one tree, all the procedures obtain

good results using a few number of variables. 2. The improvement from 3 to 4 levels is not very

significative, except for IGR. 3. IGR penalizes excesivelly variables with high

number of cases (Audiology, Optdigits,..). 4. Using 3 levels, IIG has better results than the

other criterions. This outperforming is higher with 4 levels.

Page 24: Split Criterions for Variable Selection Using Decision Trees

Outline 1. Introduction

2. Previous knowledges

3. Experimentation

4. Conclusions & future works

Page 25: Split Criterions for Variable Selection Using Decision Trees

Conclusions & future works Experiments over 27 DBs present to IGI as a

outperforming split-criterion considering the trade off of accuracy and nº of variables.

Apply IGI criterion and others ones based on bayesian scores on the compose methods explained in the introduction.

Study the use of combined criterions, i.e. to use of one or other criterion with dependency of the characteristics of the BD (size, number of variables, number of cases, etc…) and level of the tree we stay.