fuzzy interpretation of discretized intervals author: dr. xindong wu ieee transactions on fuzzy...

32
Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong Chen

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Fuzzy Interpretation of Discretized Intervals

Author: Dr. Xindong Wu

IEEE TRANSACTIONS ON FUZZY SYSTEMVOL. 7, NO. 6, DECEMBER 1999

Presented by: Gong Chen

Page 2: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusions• Answers for Final Exam

Page 3: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Concepts Review

• Induction: Generalize rules from training data• Deduction: Apply generalized rules to testing data• Three possible results of Deduction:

– Single match– No match– Multiple match

Page 4: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Concepts Review

• Discretization of Continuous domains

– Continuous numerical domains can be discretized into intervals

– The discretized intervals can be treated as nominal values

Page 5: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Concepts Review

• Using Information Gain Heuristic for Discretization:

(employed by HCV)– x = (xi + xi+1)/2 for (i = 1, …, n-1)

– x is a possible cut point if xi and xi+1 are of different classes

– Use IGH to find best x– Recursively split on left and right– Stop recursive splitting when some criteria is met

Page 6: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam

Page 7: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Overview

Training Data

Discretizaion induction rules

Testing Data Deduction

No match

Single match

Multiple match

Fuzzy Borders

Page 8: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Several Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam

Page 9: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Problem

• Discretization of continuous domains does not always fit accurate interpretation!

• Recall, using Info Gain, --a kind of heuristic measure applying in training data, cannot accurately fit “data in real world”.

• Example

Page 10: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Problem• Heuristic 1(e.g. Information Gain)

• Heuristic 2(e.g. Gain Ratio)

18 35

young

49

old

49.49

18 35

young

50

old

49.49

Page 11: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Problem

• Suppose after induction, we just get one rule:

• If (age=old) then Class=MORE_EXPERIENCE

According to Heuristic 2,

Instance(age=49.49) No match!

Page 12: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam

Page 13: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

• More safe way to describe age=49.49 is to say: To some degree, it is young; To some degree, it is old.

• Rather than using one assertion that definitely tells it is young or old.

• Thus, to some degree, it can get its rule and classification result other than no match.– No matchSingle match or multiple match with some

degree

• This is so-called fuzzy match!

Page 14: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

• “Fuzziness is a type of deterministic uncertainty. It describes the event class ambiguity.”

• “Fuzziness works when there are the outcomes that belong to several event classes at the same time but to different degrees.”

• “Fuzziness measures the degree to which an event occurs.”

– Jim Bezdek, Didier Dubois, Bart osko, Henri Prade

Page 15: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

• “to some degree”?– Membership function describes “degree”– Membership function tells you to what degree, an eve

nt belongs to one class.– Membership function calculates this degree.

• Three widely used membership functions are employed by HCV.– Linear – Polynomial– Arctan

Page 16: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

• Linear membership function

xleft xright

l

sl

k = 1/2sl; a = -kxleft + ½; b = kxright + ½

linleft(x) = kx + a

linright(x) = -kx + b

lin(x) = MAX(0, MIN(1,linleft(x),linright(x)))

S: is user-specifiedparameter.

e.g.0.1 indicates the interval spreads out into adjacent intervals for 10% of its original length at each end.

Page 17: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

• Polynomial Membership Function—using more smooth curve function instead of linear function.

• Arctan Membership Function

• Experimental results shows that no significant difference between three kinds of functions—so Polynomial Membership Function is chosen.

Page 18: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Solution

polyside(x) = asidex3 + bsidex2 + csidex + dside

aside = 1/(4(ls)3)bside = -3asidexside side {left,right}cside = 3aside(xside

2 - (ls)2)dside = -a(xside

3 -3xside(ls)2 + 2(ls)3)

polyleft(x), if xleft -ls x xleft + lspoly(x) = polyright(x), if xright -ls x xright +ls

1, if xleft +ls x xright -ls0, otherwise

To what degree, x belongs to one

interval

Page 19: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems

Page 20: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Related Techniques

– No match• Largest Class

– Assign all no match examples to the largest class, the default class

– Multiple match• Largest Rule

– Assign examples to the rules which cover the largest number of examples

• Estimate of Probability– Fuzzy borders can bring multiple match--conflicts, so

hybrid method is desired for the whole progress

Page 21: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Related Techniques

• Estimate of Probability# of e.g.s in training se

t covered by conj

The probability of e belongs to clas

s ci Conj1 and Conj2 are two rules supporting e belongs to Ci

Page 22: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems

Page 23: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Algorithms Design in HCV

• HCV(Large)– No match: Largest Class– Multiple match: Largest Rule

• HCV(Fuzzy)– No match: Fuzzy Match – Multiple match: Fuzzy Match

• HCV(Hybrid)– No match: Fuzzy Match– Multiple match: Estimate of Probability

Page 24: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems

Page 25: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Experimental Results

• Data:– 17 datasets from UCI Machine Learning Repository– Why select these:

1) Numerical data

2) Situations where no rules clearly apply

• Test conditions– 68 parameters in HCV are all default except deductio

n strategy– Parameters for C4.5 and NewID are adopted as the o

ne recommended by respective inventors

Page 26: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Experimental ResultsDataset HCV HCV (large) HCV C4.5 C4.5 NewID

(hybrid) (fuzzy) (R 8) (R 5)

Anneal 98.00% 93.00% 93.00% 95.00% 93.00% 81.00%

Bupa 57.60% 55.90% 55.90% 71.20% 61.00% 73.00%

Cleveland 2 78.00% 68.10% 73.60% 71.40% 76.90% 67.00%

Cleveland 5 54.90% 56.00% 52.70% 51.60% 56.00% 47.30%

CRX 82.50% 72.50% 82.00% 83.00% 80.00% 79.00%

Glass (w/out ID) 72.30% 60.00% 60.00% 71.50% 64.60% 66.00%

Hungarian 2 86.30% 85.00% 85.00% 81.20% 80.00% 78.00%

Hypothroid 97.80% 86.30% 96.30% 99.40% 99.40% 92.00%

Imports 85 62.70% 59.30% 61.00% 61.00% 67.80% 61.00%

Ionosphere 88.00% 81.20% 81.20% 86.30% 85.50% 82.00%

Labor Neg 76.50% 76.50% 76.50% 82.40% 82.40% 65.00%

Pima 73.90% 69.10% 69.10% 73.50% 75.50% 73.00%

Swiss 2 96.90% 96.90% 96.90% 96.90% 96.90% 97.00%

Swiss 5 28.10% 25.00% 28.10% 40.60% 31.20% 22.00%

Va 2 78.90% 78.90% 78.90% 77.50% 70.40% 77.00%

Va 5 28.20% 25.40% 29.60% 31.00% 26.80% 20.00%

Wine 90.40% 76.90% 76.90% 90.40% 90.00% 90.40%

Page 27: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Experimental Results

• Predictive accuracy– HCV (hybrid) outperforms others in 9 datasets– HCV (large) 3 datasets– HCV (fuzzy) 2 datasets– C4.5 (R 8) 7 datasets– C4.5 (R 5) 6 datasets– NewID 3 datasets

– HCV (hybrid)clearly and significantly outperforms other interpretation techniques (in HCV) for datasets with numerical data in “no match” and “multiple match” cases.

• C4.5 and NewID are included for reference, not for extensive comparison.

Page 28: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems

Page 29: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Conclusion• Fuzziness is strongly domain dependent, HCV al

lows users to specify their own intervals and fuzzy functions.– An important direction to take with specific domains

• Fuzzy Borders design combined with probability estimation achieve better results in term of predicative accuracy.– Applicable to other machine learning and data mining

algorithms

Page 30: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Outline• Concepts Review• Overview• Problem• Solution• Related Techniques• Algorithms Design in HCV• Experimental Results• Conclusion• Answers for Final Exam Problems

Page 31: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Answers for Final Exam Problems

• Q1:When doing deduction on real world data, what are the three possible cases for each test example? – Single match– No match– Multiple match

• Q2: Of the three cases during deduction, which ones do the HCV hybrid interpretation algorithm use fuzzy borders to classify? – No match

• Q3: In the Hybrid interpretation algorithm used in HCV,– when are sharp borders set up?

• “Sharp borders are set up as usual during induction”– when are fuzzy border defined?

• In deduction, “only in the no match case, fuzzy borders are set up in order to find a rule which is closest to the test example in question”

Page 32: Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong

Thank You!