new challenges in learning classifier systems: mining rarities and evolving fuzzy rules

New Challenges in Learning Classifier g gSystems: Mining Rarities and Evolving

Fuzzy RulesFuzzy Rules

Student: Albert Orriols-Puig

Supervisor: Ester Bernadó-Mansilla

Grup de Recerca en Sistemes Intel·ligentsEnginyeria i Arquitectura La SalleEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Background

GRSI has been researching on machine learning and data miningEspecially focused on data classificationEspecially focused on data classificationResearch aims at

Improving learning methodsApplying learning methods to real-world applications

Application of LCS to classification problems is one of the main research linesLCS are appealing because the mine streams of examples

Many applications make the data available in streams

Slide 2Grup de Recerca en Sistemes Intel·ligents New Challenges in LCS

Important challenges need to be addressed to deal with complex applications

BackgroundGeneral schema of LCSs

Introduced by Holland

EnvironmentS i l

L iCl ifi 1

Apportionment of credit algorithms Online rule evaluator

Sensorialstate FeedbackAction

A R t ti Learning Classifier System

Classifier 1Classifier 2

Classifier n

XCS: Q-Learning (Sutton & Barto, 1998)

Uses Widrow-Hoff delta rule

Any Representationproduction rules,

genetic programs,tperceptrons,

SVMs

EvolutionaryAlgorithm

Rule evolutionTypically, a GA (Holland, 75; Goldberg, 89)

applied to the population


applied to the population.

When this Work Started

In 2004, when Michigan-style LCSs were reaching maturityFirst successful implementations (Wilson, 95; Wilson, 98)

Many other derivations YCS, UCS, XCSF, and many others

Applications in important domainspp p

Data mining (Bernadó et al, 02; Wilson, 02a; Bacardit & Butz, 04)

Function approximation (Wilson 02b)Function approximation (Wilson, 02b)

Reinforcement Learning (Lanzi, 02)

Th ti l l f d i (B t t l 02 03 04b)Theoretical analyses for design (Butz et al., 02, 03, 04b)

But still, there are important challenges to face


Two Key Challenges in ML and LCSs

1st challenge: Learning from domains that contain rare classesg gData classification: Extract interesting, useful, and hidden patterns

The most interesting knowledge resides in rare classesThe most interesting knowledge resides in rare classes

Example: fraud detection in credit card transactions

C l d l l t l ? M b t!Can learners model rare classes accurately? May be not!Knowledge ModelDataset

Learner

Mi i i l i

What about online learning?

Minimize learning error +maximize generalization

What about online learning?More challenging: Model rare classes on the fly

Slide 5

Aim: Analyze and improve LCS for mining domains with rarities

Grup de Recerca en Sistemes Intel·ligents New Challenges in LCS

Two Key Challenges in ML and LCSs

2nd challenge: Building more understandable models and g gbring reasoning mechanisms close to human ones

In some domains, interpretability is more important than accuracyLCSs most often use interval-based rules in domains described by continuous variables

V i bl “ ti f ”Variables are “semantic-free”

Analyses of the inference mechanisms are scarce

Fuzzy logics provides a robust framework forknowledge representation and

i d t i treasoning under uncertainty

Some fuzzy LCS approaches already existBut no online fuzzy LCS for supervised learning has been designed

Aim: Incorporate fuzzy logics into LCS for supervised learning


Goal of this WorkGeneral Goal: Address the two challenges withg

The extended classifier system (XCS) (Wilson, 95, 98)By far, the most influential Michigan-style LCS

The supervised classifier system (UCS) (Bernadó-Mansilla, 03)Inherits XCS’s architecture and specialized it for data classificationInherits XCS s architecture and specialized it for data classification

Two challenges with two LCSs that lead to four objectives

1. Revise and update UCS and compare it with XCS

Challenges Objectives2 4

LCS and rare classes

XCS and UCS 1. Revise and update UCS and compare it with XCS

2. Analyze and improve LCS for mining rarities

3. Apply LCSs for extracting models from real-world

Fuzzy logics in LCS

classification problems with rarities

4. Design and implement an LCS with fuzzy logicreasoning for supervised learning


Outline

1. Description of XCS and UCS

2. Revisiting UCS: Fitness Sharing and Comparison with XCS

3 Facetwise Analysis of XCS for Imbalanced Domains3. Facetwise Analysis of XCS for Imbalanced Domains

4. Carrying over the Facetwise Analysis into UCS

5. XCS and UCS in Imbalanced Real-World Classification Problems

6. Fuzzy-UCS: Evolving Fuzzy Rule Sets For Supervised Learning

7. Conclusions and Further Work7. Conclusions and Further Work


Description of XCSIn training mode for single step tasks (Wilson, 95)

ENVIRONMENT

Problem Match Set [M]

Population [P]

Problem instance

Match set

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]Selected

actionDesigned for reinforcement learning:1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp

Population [P] Match set generation

6 C A P ε F num as ts exp…

Select actionrandomly

REWARD

g gError: Error of the predicted payoffFitness: Computed as a function of the error

5 C A P ε F num as ts exp6 C A P ε F num as ts exp

…

randomly

Action Set [A]

Random Action

ClassifierParameters

Update(Widrow-Hoff rule)

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

[ ]Selection, reproduction,

and mutationDeletion

(Widrow Hoff rule)

Fitness Sharing…Genetic Algorithm

Competition in the niche


Description of UCSIn training mode (Bernadó-Mansilla & Garrell, 03)

ENVIRONMENT

Match Set [M]Problem instance

Stream ofexamples

Population [P]

1 C A acc F num cs ts exp3 C A acc F num cs ts exp5 C A acc F num cs ts exp6 C A acc F num cs ts exp

Match Set [M]Problem instance+

output class

1 C A acc F num cs ts exp2 C A acc F num cs ts exp3 C A acc F num cs ts exp4 C A acc F num cs ts exp

p [ ]

ClassifierParameters

Update

6 C A acc F num cs ts exp…

correct setgeneration

5 C A acc F num cs ts exp6 C A acc F num cs ts exp

…

pAverage of the

parameter values

No fitness sharing

Match set generation

3 C A F t

Correct Set [C]

Genetic Algorithm

Selection, Reproduction, and mutation

Deletion

Competition in the niche

3 C A acc F num cs ts exp6 C A acc F num cs ts exp

…

in the nicheKey differences with respect to XCS

Accuracy computation as average of correct predictionsExploration of the “correct class” instead of all classes


Exploration of the correct class instead of all classesNo fitness sharing

Outline






6. Fuzzy-UCS: Evolving Fuzzy Rule Sets for Supervised Learning



Fitness Sharing in UCSSharing or not sharing, a key difference between XCS and UCS

GoalDesign a fitness sharing schemeEmpirically compare whether fitness sharing is beneficial to UCSEmpirically compare XCS with UCS

Incorporate a fitness sharing scheme into UCS Take inspiration from XCS

Classifier accuracyClassifier numerosity

Relative accuracy

Classifier numerosity

Learning rateAnd finally, fitness is shared in [M]

Learning rate


Methodology of Analysis

Analysis divided into two comparisons1. Compare UCS without fitness sharing (UCSns) and with fitness sharing (UCSs)

2. Compare UCSs with XCS

Comparison on four boundedly-difficult problems, that permit moving the complexity along: number of classes, size of the b ildi bl k l i b l d ti f ibuilding block, class imbalance, and proportion of noise.

The parity problem (par)

Th d d bl (d )The decoder problem (dec)

The position problem (pos)

The 20-bit multiplexer with alternating noise (mux-an)The 20 bit multiplexer with alternating noise (mux an)


Does Fitness Sharing Benefit UCS?

Fitness sharing provides the following benefits:g p gHigher pressure toward deletion of over-general classifiersHigher selective pressure toward the fittest classifiers in [C]g p [ ]Better results in the four problems: par, dec, pos, and mux-an

UCSns vs UCSs in DecoderUCSns vs UCSs in Decoder

UCSs

UCSns


Comparison of UCS with XCSAdvantages of UCS due toThe exploration regimeThe exploration regime

XCS explores all the classes while UCS explores only the “correct” class

The accuracy guidanceThe accuracy guidanceXCS may provide a misleading guidance toward the fittest classifiers identified as the fitness dilemma (Butz et. al, 2003)

UCS solves this problem by computing accuracy as the proportion of correct predictions

UCSs vs XCS in Decoder

UCSs

XCS


Summary of the ComparisonThe empirical study has shown thatp y

UCS benefits from a fitness sharing scheme. Therefore, we use UCSs in the remaining of this workg

Key differences between XCS and UCS reviewed and experimentally analyzedexperimentally analyzed

Explore regimeAccuracy guidanceAccuracy guidancePopulation size

XCS is a more general architecture and can solve reinforcement learning problems


Outline









MotivationSo, does rare classes pose a challenge to XCSs?, p g

Test on unbalanced 11-bit multiplexer

number of examples of the majority class

%[O] ith XCS

number of examples of the majority classnumber of examples of the minority classIR =

%[O] with XCS


Design Decomposition

AimAnalyze the challenges that rare classes pose to XCS

Improve XCS in problems with rare classes

Design decomposition approach (Goldberg, 02) proposes toDecompose the problem in critical elementsp p

Derive “little” models or facetwise models for each element, assuming that the others behave in an ideal manner

Integrate all the models (patchquilt integration)


Focusing the ProblemHow should XCS partition the problem solution?p p

Nourished niche

Small Disjunct orStarved niche

Againmore smalldisjuncts

OOvergeneralClassifier


Critical Elements of LCS

Five critical elements to detect small niches were identified

Five critical elements:

1. Estimate the classifier parameters correctly

2. Analyze whether representatives of starved niches can be provided in initialization

3. Ensure the generation and growth of representatives of starved niches

4 Adjust the GA application rate4. Adjust the GA application rate

5. Ensure that representatives of starved niches will take over their niches

Derivations studied according to the imbalance ratio (IR)


Estimate Classifier Parameters1

Derive the maximum imbalance ratio

The error of over-general classifiers is:

However, empirical results did not agree with the theoryError of the most over-general classifier over time trackedg

Theoretical value

ir = 100Deviation between theoretical and empirical error ir 100empirical error

Over general classifiers may beOver-general classifiers may be considered accurate


Estimate Classifier Parameters1

We proposed two alternatives to obtain better estimatesTheoretical value

1. Tune the learning rate of theWidrow-Hoff rule according to ir

Theoretical value

Widrow Hoff rule according to irir = 100

2. Apply gradient descent th d (B t t l 2005)

Theoretical value

methods (Butz et. al, 2005)

ir = 10000


Provide Representatives in Initial.2

Can covering provide schemas of classifiers of starved niches?g p

Probability of activating covering in the first minority class instance

Specificity of [P]

Imbalance ratio

Length of the classifierLength of the classifier

For large values of ir, covering will not provide schemas of the minority class

W ti th l i iWe continue the analysis assuming acovering failure


Ensure Growth of Representatives3

How to size the population to ensure that representatives of p p pstarved niches will be supplied?

Assumptions:Crossover is not considered. Only mutation (probability of mutation μ).The time to create a representative of a starved niche is

Random deletionRandom deletion

A GA is applied to [A] every time [A] is activated

Time to receive a genetic event

Mixing all together: Population size bound to ensure reproductiveMixing all together: Population size bound to ensure reproductive opportunity Number of classes


Imbalance ratio

Ensure Growth of Representatives3

Theory matches empirical results (parity problem)y p (p y p )Imbalanced parity problem with building block length from 1 to 4

Unbalanced by removing instances of one of the classes

Theory matches also when the assumptions of the model are not met

Widrow-Hoff RuleAll assumptions satisfied


Adjust GA Application Rate4

Assumption in the previous modelp pA GA is applied to [A] every time [A] is activated

What is the effect of varying GA?What is the effect of varying GA?To guarantee that all niches receive the same number of genetic events approximately:

If satisfied, all niches receive the same number of geneticsame number of genetic opportunities

Thence, time of deletion increases linearly with ir and population size remains constant


Ensure Take Over of Represent.5

The previous facets set the conditions to ensure thatp1. Representatives of starved niches are created2. Representatives of starved niches receive a genetic eventp g

But still, to ensure full convergence we need thatRepresentatives of starved niches take over their nicheRepresentatives of starved niches take over their nicheEnsure that these representatives will not be extinguished

Study takeover time of representatives, which depends onInitial stock of classifiers in the nicheType of selection

Proportionate selection (Wilson, 95)

Tournament selection (Butz et al., 2005c)



Takeover time for proportionate selectionp pPopulation

size

Number of niches Ratio of the accuracy of the

Initial proportion of classifiersFinal proportion of classifiersNumber of niches

over-general classifier to theaccuracy of the best representative

Condition forniche extinction

Maximum acceptable errorpredicted by the

niche extinction model



Takeover time for tournament selectionPopulation

size

Initial proportion of classifiersFinal proportion of classifiers Tournament size

Condition forniche extinction

Key differences with respect to proportionate selection:

Number of

Key differences with respect to proportionate selection:Independent of the fitness of the best and the over-general classifierHighly dependent on the tournament size

Number of classifiersin the niche

predicted by the niche extinction model

Number of representatives


in the niche

Patchquilt IntegrationWill XCS learn rare classes? Lessons learned from the models

1. Parameters need to be correctly estimatedWidrow Hoff rule with auto adjusted βWidrow-Hoff rule with auto-adjusted β

Gradient descent methods

2. Representatives need to be created and evolvedCovering may fail if ir is large

Th h ll b t bThe challenge can be met bySizing the population according to the imbalance ratio

Setting θ according to the imbalance ratioSetting θGA according to the imbalance ratio

3. Niche extinction models set the conditions under which XCS will failIndicate how parameters should be tuned to satisfy the model

Takeover time models to predict the time to convergence


Why Is this Analysis Important?The lessons enable us to solve problems that previously eluded solutioneluded solution

Unbalanced 11-bit multiplexer problem

%[O] with XCS After the %[O] with XCS analysis

Before theBefore the analysis

Before we could solve up to ir=32

Slide 32

pNow we can solve up to ir=1024 and more


Outline





5. XCS and UCS in Iimbalanced Real-World Classification Problems




Reviewing the Critical ElementsEstimate the classifier parameters correctly1

Pure averages! We get the exact value

Analyze whether representatives of starved niches can be provided in initialization

2initialization

Covering applied if the correct set is emptyIf no mutation, covering will be always applied to the first minority g y pp yclass instances

Suppose the worst case: no provision

We derive maximum bounds


Reviewing the Critical Elements

Ensure the generation and growth of representatives of starved niches

3

I b l i Default configurationImbalance ratio Default configurationAll assumptions satisfied

Adjust the GA application rateXCS’s model is still valid

4

Ensure that representatives of starved niches will take over their nichesXCS’s takeover time models are still valid

5


Patchquilt IntegrationThe lessons enable us to solve problems that previously eluded solution

Results following the guidelines provided by the lessons

%[O] with UCS


Outline









MotivationFrom boundedly-difficult problems to real-world problemsy p p

RWP contain continuous attributes Interval-based rules

IF i [l ] d i [l ] d d i [l ] THEN l

Key difference: Problem characteristics not known

IF x1 in [l1, u1] and x2 in [l2, u2] and … and xn in [ln, nn] THEN classi

yGap between theory and application to RWP

How can we apply the recommendations extracted from the analysis?

Aim1. Start bridging the gap between theory and practiceSta t b dg g t e gap bet ee t eo y a d p act ce2. Confirm that both LCS are valuable for mining domains with rarities


What is Different in RWPImbalance ratio vs. niche imbalance ratio?

In boundedly-difficult problems IR equaled to the niche imbalance ratioIn RWP, this assumption may not holdp y

Same imbalance ratio, different niche imbalance ratio

Niche imbalance ratio (NIR) in RWP depends on:IR

Geometrical distribution of the examples

Slide 39

Knowledge representation


Self-Adaptation to Unknown Domains

Heuristic to estimate the niche imbalance ratioTake the strongest over-general classifierAssume NIR is the imbalance ratio of the over-general classifiergTune parameters according to NIR and the recommendations extracted from the facetwise analysis

Empirical test on the 11-bit multiplexer problem

%[B] with UCS%[O] with XCS %[B] with UCS%[O] with XCS


LCS in RWPComparison methodology

Id. Data set #Ins. #At. irbald1 balance disc. 1 625 4 11.76b ld2 b l di 2 62 4 1 1Comparison with:

C4.5 (Quinlan, 95)SMO (Pl tt 98)

bald2 balance disc. 2 625 4 1.17bald3 balance disc. 3 625 4 1.17bpa bupa 345 6 1.38glsd1 glass disc. 1 214 9 22.75

SMO (Platt, 98)IBk (Aha et al., 91)Configured to maximize performance

g gglsd2 glass disc. 2 214 9 15.47glsd3 glass disc. 3 214 9 11.59glsd4 glass disc. 4 214 9 6.38glsd5 glass disc 5 214 9 2 06Co gu ed to a e pe o a ce

Selection of 25 imbalanced real-world problems with different characteristics

glsd5 glass disc. 5 214 9 2.06glsd6 glass disc. 6 214 9 1.82h-s heart-disease 270 13 1.25pim pima-inidan 768 8 1.87

10-fold cross validation

Performance measure: TP rate · TN rate

tao tao-grid 1888 2 1.00thyd1 thyroid disc. 1 215 5 6.17thyd2 thyroid disc. 2 215 5 5.14thyd3 thyroid disc. 3 215 5 2.31

Statistical tests:Friedman’s test (Friedman, 37, 40)

thyd3 thyroid disc. 3 215 5 2.31 wavd1 waveform disc. 1 5000 40 2.02wavd2 waveform disc. 2 5000 40 1.96wavd3 waveform disc. 3 5000 40 2.02

b d Wi B 699 9 1 90Nemenyi test (Nemenyi, 63)Wilcoxon signed-ranks test (Wilcoxon, 45)

wbcd Wis. B. cancer 699 9 1.90wdbc Wis. diag. 569 30 1.68wined1 wine disc. 1 178 13 2.71wined2 wine disc. 2 178 13 2.02


wined3 wine disc. 3 178 13 1.51wpbc wine disc. 4 198 33 3.21

Summary of the ResultsTP rate · TN rate

XCS and UCS perform the best on average for the tested problems

However, no significant differences according to Friedman’s test

Pairwise analysis enables the extraction of further observations

XCS and UCS fail to create accurate models in problems such as bald2, bald3, and tao, which have low imbalance ratio

Presents difficulties to learn from domains with curved boundaries

Oth l iti i dditi t l i b l


Other complexities in addition to class imbalance

DiscussionWhen a ML practitioner has a new problemp p

Which learner should she or he apply?

The empirical analysis indicated thatShe or he should bet for LCSsBut no guarantees of being the best performer on a particular problem

What is missing?What is missing?Evaluate problem complexityLink problem complexity with domain of competence of LCSLink problem complexity with domain of competence of LCS

How?Complexity metrics is a good starting point (Ho & Basu, 02) to bridge the gap between theory and practice


Outline









MotivationCompetent data classification techniques should be able to

E l t d lEvolve accurate modelsin some legible structure

LCS li i l hi hl t d l liLCS are very appealing since evolve highly accurate models online

However:Tend to evolve a large number of semantic-free interval-based rules

Use reasoning mechanisms that can be little intuitive

(Bernadó et al., 02)


Design of Fuzzy-UCSLinguistic fuzzy representation

Disjunction of linguistic fuzzy terms

Rule: IF x1 is A1 and x2 is A2 … and xn is An THEN class1

Disjunction of linguistic fuzzy terms

Example: IF x1 is small and x2 is medium or large THEN class1

In our experiments, all variables shared the same semantics, which were d fi d b t i l b hi f tidefined by triangular membership functions

small medium large

Classifier parameters were changed to let them deal with fuzzy matching

Slide 46

C ass e pa a ete s e e c a ged to et t e dea t u y atc g


Design of Fuzzy-UCSThree procedures designed to infer the class of test examples, p g p ,which result in a tradeoff between intepretability and accuracy

Weighted average (wavg)

Action winner(awin)

Most numerous andfittest rules (nfit)

+ size of the rule set+ size of the rule set -

wavg Based on average voting. All rules considered.

awin Best rule decides the class. Only best matching rules considered.y g

nfit Based on average voting. Only most numerous rules considered.


Methodology of AnalysisComparison methodology Id Data set #Ins #At #Cl. %Min %Maj %MIgy

Two comparisonsFuzzy learners

ann Annealing 898 38 5 0.9 76.2 0.0aut Automobile 205 25 6 1.5 32.7 22.4bal Balance 625 4 3 7.8 46.1 0.0bpa Bupa 345 6 2 42 0 58 0 0 0

Non-fuzzy learners

Selection of 20 real-world problems 10 fold cross validation

bpa Bupa 345 6 2 42.0 58.0 0.0cmc Contrac. choice 1473 9 3 22.6 42.7 0.0col Horse colic 368 22 2 37.0 63.0 98.1gls Glass 214 9 6 4.2 35.5 0.0

10-fold cross validation

MetricsTest accuracy

h-c Heart-c 303 13 2 45.5 54.5 2.3h-s Heart-s 270 13 2 44.4 56.6 0.0irs Iris 150 4 3 33.3 33.3 0.0

68 8 2 3 9 6 1 0 0y

Number of rules of the models

Statistical tests:

pim Pima 768 8 2 34.9 65.1 0.0son Sonar 208 60 2 46.7 53.3 0.0tao Tao 1888 2 2 50.0 50.0 0.0thy Thyroid 215 5 3 14 0 60 0 0 0

Friedman’s test (Friedman, 37, 40)

Nemenyi test (Nemenyi, 63)

Bonferroni Dunn test (Dunn 61)

thy Thyroid 215 5 3 14.0 60.0 0.0veh Vehicle 846 18 4 23.5 25.8 0.0wbcd Wisc. breast-cancer 699 9 2 34.5 65.5 2.3wdbc Wisc. Diagnosis 569 30 2 37.3 62.7 0.0Bonferroni-Dunn test (Dunn, 61)

Wilcoxon signed-ranks test (Wilcoxon, 45)wne Wine 178 13 3 27.0 39.9 0.0wpbc Wisc. Prognostic 198 33 2 23.7 76.3 2.0zoo Zoo 101 17 7 4.0 40.6 0.0


Comparison with the Fuzzy LearnersAccuracy

F GP (GP) (Sá h l 01)1. Fuzzy GP (GP) (Sánchez et al., 01)2. Fuzzy GAP (GAP) Sánchez & Couso, 00)3. Fuzzy SAP (SAP) Sánchez et al, 01)

F Ad b t (AB) (d l J t l 04)4. Fuzzy Adaboost (AB) (del Jesus et al, 04)5. Fuzzy Logitboost (LB) (Otero & Sánchez, 06)6. Fuzzy MaxLogitBoost (MLB) (Otero & Sánchez, 07)

All methods run using KEEL (Alcalá-Fdez et. al, 08)

- Interpretability +

Fuzzy-UCS wavg(1000’s of rules)

Fuzzy-UCS awin(< 100 rules)

Fuzzy-UCS nfit(> 10 rules)(1000 s of rules) (< 100 rules)

Fuzzy GAP, Fuzzy SAP

(> 10 rules)

Fuzzy AdaBoost


Fuzzy GP, Fuzzy MLBFuzzy LogitBoost

Comparison with Non-Fuzzy LearnersAccuracy

1. C4.5 (Quinlan, 95)2. IBk (Aha et al., 91)3. Naïve Bayes (NB) (John & Langley, 95)3. Naïve Bayes (NB) (John & Langley, 95)4. Part (Frank & Witten, 98)5. SMO (Platt, 98)6. GAssist (Bacardit, 04)6. GAssist (Bacardit, 04)7. UCS (Bernadó & Garrell, 03)

- Interpretability +Interpretability

Fuzzy-UCS avg Fuzzy-UCS awinFuzzy-UCS nfit

GAssistNaïve Bayes

C4.5Part

SMOIBk

UCS


Naïve BayesPartIBk

Mining Large Volumes of DataThe last experimentp

Fuzzy-UCS to extract models from the 1999 KDD Cup intrusion detection mechanism data set494,022 examples with 41 features


Outline









Conclusions and Further Work

This work contributed to Increasing the comprehension of how LCS workImproving them to deal with problems that contain rare classesp g pProviding new implementations of LCS

Two challenges and four objectives addressed in the contextTwo challenges and four objectives addressed in the context of LCS

1. Revise and update UCS and compare it to XCSNew fitness sharing designedNew fitness sharing designedFitness sharing provides benefits to UCSKey differences between UCS and XCS empirically studiedKey differences between UCS and XCS empirically studiedFurther work: Complement the analysis with theory



2 & 3. Study LCS in domains with rare classesyStart with a systematic analysis validated with boundedly-difficult problemsFinish with its application to real-world problems with rare classes

Further workD i t h t i l ld l ifi ti bl

pp p

ProblemComplexsystems

Facetwiseanalysis

Design measures to characterize real world classification problemsMeasure the difficulty of the problems

Li k bl diffi lt ith d i f tLCSs can learnfrom imbalanced

domains Lots ofinteracting

Small models

Link problem difficulty with domain of competence

Include problem difficulty in the study of re-sampling techniques, etc.

First steps taken in (Bernadó et al 06; Orriols et al 08a)components

D i fApplication ofProblem

characterization

First steps taken in (Bernadó et. al, 06; Orriols et. al, 08a)

Domain of competence

of LCSs

Application of LCSs to a new

real-world problem

characterization

Heuristic to estimatethe niche imbalance ratio

Resampling

Complexitymetrics

Future research line


p gtechniques


4. Design and implement an LCS with fuzzy logic reasoning for g p y g gsupervised learning

Analysis to mixFurther work

Accurate online evaluation system of LCSs

Human like representation and reasoning mechanisms of fuzzy logics

Further workAdapt LCSs to extract association rules online

Robust discovery capabilities of GAs

Each of the three ideas was not novel itself, but the combination of them to create a supervised learning technique was

Many real-world applications generate data streams

LCS are appealing since they mine data streamsthem to create a supervised learning technique was.Fuzzy-UCS

Evolved highly accurate models of moderate size

However, in most cases, unlabeled data

Aim: design an LCS that is able to extract association rules onlineEvolved highly accurate models of moderate size

Was able to extract classification models from large volumes of data

Is prepared to deal with domains with uncertainty and vagueness

First steps taken in (Orriols et al., 2008f)

Is prepared to deal with domains with uncertainty and vagueness


Lessons Learned on the Way 1. The importance of design decomposition

W d t i LCS f i i itiWe need to improve LCS for mining rarities1. Mix existing, powerful techniques that solve problems that you intuitively

identifyidentifyThe thesis started in this way (Orriols-Puig, 05a, 05b)

Lesson: despite moderate success, poor understanding

2. Build complete models of your system

3. Design decomposition and facetwise analysis (Goldberg, 02)Key for success

Not only for GAs or LCSs

2. The relevance of ideas crossbreedingNew complex real-world problems require the best practices of different fieldsLCSs are friendly frameworks to ideas crossbreeding


Publications This work has resulted in 35 publications:

7 j l (4 d/ bli h d d 3 l b i d)7 journal papers (4 accepted/published and 3 currently submitted)5 papers in LNCS/LNAI volumes 6 book chapters6 book chapters15 international conference papers2 national conference papers

Selected publicationsAlbert Orriols-Puig, Ester Bernadó-Mansilla, David E. Goldberg, Kumara Sastry, and Pier Luca Lanzi. Facetwise Analysis of XCS for Problems with Class Imbalances IEEE Transactions on Evolutionary Computation 2008 submittedXCS for Problems with Class Imbalances. IEEE Transactions on Evolutionary Computation, 2008, submitted

Albert Orriols-Puig, Jorge Casillas and Ester Bernadó-Mansilla. Fuzzy-UCS: A Michigan-style Fuzzy-Learning Classifier System for Supervised Learning. IEEE Transactions on Evolutionary Computation, 2008, doi=10.1109/TEVC.2008.925144

Albert Orriols-Puig, Ester Bernadó-Mansilla. Evolutionary Rule-Based Systems for Imbalanced Datasets. Soft Computing Journal. Special Issue on Evolutionary and Metaheuristic-based Data Mining, 2008, doi=10.1007/s00500-008-0319-7

Albert Orriols-Puig and Ester Bernadó-Mansilla. Revisiting UCS: Description, Fitness Sharing, and Comparison with XCS. In Advances at the frontier of LCS, LNCS series, volume 4998, pages 96–116, Springer, 2008

Albert Orriols P ig Da id E Goldberg K mara Sastr and Ester Bernadó Mansilla Modeling XCS in Class ImbalancesAlbert Orriols-Puig, David. E. Goldberg, Kumara Sastry, and Ester Bernadó-Mansilla. Modeling XCS in Class Imbalances: Population Size and Parameter Settings. In GECCO’07, pages 1838-1845, ACM Press, 2007

Albert Orriols-Puig, Kumara Sastry, Pier Luca Lanzi, David E. Goldberg, and Ester Bernadó-Mansilla. Modeling Selection Pressure in XCS for Proportionate and Tournament Selection. In GECCO’07, pages 1846-1853, ACM Press, 2007

Slide 57

Albert Orriols-Puig and Ester Bernadó-Mansilla. Bounding XCS’s Parameters for Unbalanced Datasets. Best paper nomination. In GECCO’06, pages 1561-1568. ACM Press, 2006


AcknowledgmentsEnginyeria i Arquitectura La Salle

Prof. Ester Bernadó-Mansilla

My first “second home”: the IlliGALProf. David E. Goldberg for accepting my visits and for all his valuable lessons

All labbies, and especially Kumara Sastry, Xavier Llorà, and Tian Li Yu

My second “second home”: the SCI2S groupProf. Francisco Herrera for accepting my visits and for his time and advice

All labbies and especially Jorge CasillasAll labbies, and especially Jorge Casillas

My examining committeeProf. David E. Goldberg, Prof. Francisco Herrera, Prof. Martin V. Butz, Prof. Xavier Llorà, and Prof. Xavier Vilasís

All the people I have worked withEster Bernadó-Mansilla, Jorge Casillas, David E. Goldberg, Pier Luca Lanzi, Francisco J. Martínez-López, Sergio Morales-Ortigosa , Núria Macià, Joaquim Rios-Boutin, Kumara Sastry, Francesc Teixidó-Navarro

Th h t d bThe research was supported byDepartament d’universitats, recerca i societat de la informació (DURSI)

Under a FI scholarship with reference 2005FI-00252

Under two BE travel grants with references 2006BE-00299 and 2007BE2-00124

Generalitat de Catalunya, under grants 2002SGR-00155 and 2005SGR-00302

Ministerio de educación y ciencia under projects KEEL and KEEL2 with references (TIC2002-04036-C05-03 and TIN2005 08386 C05 04)

Slide 58

TIN2005-08386-C05-04)


New Challenges in Learning Classifier g gSystems: Mining Rarities and Evolving

Fuzzy RulesFuzzy Rules

Student: Albert Orriols-Puig

Supervisor: Ester Bernadó-Mansilla

Grup de Recerca en Sistemes Intel·ligentsEnginyeria i Arquitectura La SalleEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

new challenges in learning classifier systems: mining rarities and evolving fuzzy rules

Education