entropy-based & chimerge data discretization feb. 12, 2008 team #4: seunghyun kim craig dunham...

6
Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Upload: cleopatra-gallagher

Post on 14-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based & ChiMerge Data Discretization

Feb. 12, 2008

Team #4: Seunghyun Kim Craig Dunham Suryo Muljono

Albert Lee

Page 2: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based discretization

• Table 6.1 Class-labeled training tuples from the AllElectronics customer database (page 299).

RID age income Stu-dent

Credit_rating Class: buy_computer

1 Youth High No Faire No

2 Youth High No Excellent No

3 Middle_ageed

High No Faire Yes

4 Senior Medium No Faire Yes

5 Senior Low Yes Faire Yes

6 Senior Low Yes Excellent No

7 Middle_aged Low Yes Excellent Yes

8 Youth Medium No Faire No

9 Youth Low Yes Faire Yes

10 Senior Medium Yes Faire Yes

11 Youth Medium Yes Excellent Yes

12 Middle_ageed

Medium No Excellent Yes

13 Middle_ageed

High Yes Faire Yes

14 Senior Medium No Excellent No

Page 3: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based (Cont’d)

• Information gain

• Info(D) = = 0.940 bits

• Infoage(D) =

= 0.649 bits

Page 4: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based (Cont’d)

•Gain(A) = Info(D) – InfoA(D).

•Gain(age) = Info(D) – Infoage(D) = 0.940 – 0.694 = 0.246 bits

•Gain(income)= Info(D) – Infoincome(D) = 0.940 – 0.911 = 0.029 bits

•Gain(student)= Info(D) – Infostudent(D)= 0.940 – 0.694 = 0.152 bits

•Gain(credit) = Info(D) – Infocredit(D) = 0.940 – 0.892 = 0.04 bits

Page 5: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based (Cont’d)

AllElectronics customer data-

base

Age ?

Senior Middle_age Youth

Page 6: Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

Entropy-based (Cont’d)

AllElectronics customer data-

base

Age ?

Senior Middle Youth

Stu-dent?

Credit?

StudentNon Stu-

dent

Excel-lent

Fair

yes

yes yesno no