gecco'2007: modeling xcs in class imbalances: population size and parameter settings
DESCRIPTION
TRANSCRIPT
Modeling XCS in Class Modeling XCS in Class Imbalances: Population Size Imbalances: Population Size
and Parameter Settingsg
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1Kumara Sastry Ester Bernadó Mansilla
1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull UniversityEnginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana ChampaignUniversity of Illinois at Urbana Champaign
Framework
New instance
Learner M d l
Information basedon experience
Knowledgeextraction
New instance
Domain Learner Model
Consisting
Examples
Counter-examples
Predicted Output
ofCou te e a p es
In real-world domains, typically:, yp yHigher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually imbalanced
Applications:Fraud detectionMedical diagnosis of rare illnesses
Slide 2GRSI Enginyeria i Arquitectura la Salle
Medical diagnosis of rare illnessesDetection of oil spills in satellite images
Framework
Do learners suffer from class imbalances?– Methods that do global optimization
L Minimize theTrainingLearner Minimize the
global errorSet
examplesnumbererrorsnumerrorsnum
error cc 21 .. +=Biased towards
the overwhelmed class
Maximization of the overwhelmed class accuracy,in detriment of the minority class.
Slide 3GRSI Enginyeria i Arquitectura la Salle
Motivation
And what about incremental learning?
Sampling instances of the minority class less frequently
Rules that match instances of the minority class poorly activated
Rules of the minority class would receive less genetic opportunities (Orriols & Bernadó, 2006)
Slide 4GRSI Enginyeria i Arquitectura la Salle
Aim
Facetwise analysis of XCS for class imbalances
Impact of class imbalances on the initialization process
How can XCS create rules of the minority class if the covering process failsg p
Population size bound with respect to the imbalance ratioPopulation size bound with respect to the imbalance ratio
U til hi h i b l ti ld XCS b bl t lUntil which imbalance ratio would XCS be able to learn from the minority class?
Slide 5GRSI Enginyeria i Arquitectura la Salle
Outline1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 6GRSI Enginyeria i Arquitectura la Salle
Description of XCS1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsp
In single-step tasks:
6. Results7. Conclusions
Environment
g p
Problem instance
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
Match Set [M]Selected
action
Minorityclass instance
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
Match Set [M]Majorityclass instance
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
Match Set [M]
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp
Population [P] Match set generation
5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…Prediction Array
REWARD1000/01 C A P ε F num as ts exp
2 C A P ε F num as ts exp3 C A P ε F num as ts exp
Population [P] Match set generation
5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp
Population [P] Match set generation
5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
A ti S t [A]
c1 c2 … cn
Random Action
4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
… Starved niches
4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
… Nourished niches
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
C
Action Set [A]
Selection, Reproduction, Mutation
Deletion ClassifierParameters
Update6 C A P ε F num as ts exp…Genetic Algorithm
Update
Problem niche: the schema defines the relevant
Slide 7GRSI Enginyeria i Arquitectura la Salle
Problem niche: the schema defines the relevant attributes for a particular problem niche. Eg: 10**1*
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 8GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Study XCS capabilities to provide representatives of starved niches:– Population initialization– Generation of correct representatives of starved niches– Time of extinction of these correct classifiers
Derive a bound on the population sizeDerive a bound on the population sizeDepart from theory developed for XCS
– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS(Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche
supportsupport– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled
Approach to LCS Analysis and Design.
Slide 9GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Assumptions– Problems consisting of n classesProblems consisting of n classes
– One class sampled with a lower frequency: minority class
classminority theof instances num.classminority theother than classany of instances num.ir =
– Probability of sampling an instance of the minority class:
i11Ps(min) =
i1irPs(maj) =
ir1( )
+ ir1( j)
+
Slide 10GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Facetwise AnalysisPopulation initialization– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Slide 11GRSI Enginyeria i Arquitectura la Salle
Population Initialization1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsp6. Results7. Conclusions
Covering procedure– Covering: Generalize over the input with probability P#Covering: Generalize over the input with probability P#
– P# needs to satisfy the covering challenge (Butz et al., 01)
Would I trigger covering on minority class instances?– Probability that one instance is covered by at leastProbability that one instance is covered, by, at least,
one rule is (Butz et. al, 01):Inputlength
Population specificity
Initially 1 – P#
Population size
y #
Slide 12GRSI Enginyeria i Arquitectura la Salle
Population Initialization1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsp6. Results7. Conclusions
Slide 13GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Facetwise AnalysisPopulation initialization– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Slide 14GRSI Enginyeria i Arquitectura la Salle
Creation of Representatives of Starved Niches
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of DeviationsStarved Niches 6. Results7. Conclusions
AssumptionsCovering has not provided any representative of starved niches– Covering has not provided any representative of starved niches
– Simplified model: only consider mutation in our model.
How can we generate representative of starved niches?– Specifying correctly all the bits of the schema that represents theSpecifying correctly all the bits of the schema that represents the
starved niche
Slide 15GRSI Enginyeria i Arquitectura la Salle
Creation of Representatives of Starved Niches
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of DeviationsStarved Niches 6. Results7. Conclusions
Possible cases:– Sample a minority class instance
ir11 Ps(min)+
=
• Activate a niche of the minority class μ: Mutation probability
K : Order of the schema
• Activate a niche of another class
Km: Order of the schema
– Sample a majority class instanceir1
ir Ps(maj)+
=
• Activate a niche of the minority class
• Activate a niche of another class
Slide 16GRSI Enginyeria i Arquitectura la Salle
Creation of Representatives ofStarved Niches
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of DeviationsStarved Niches 6. Results7. Conclusions
Summing up, time to get the first representative of a starved nichesta ed c e
n: number of classes
μ: Mutation probability
Km: Order of the schema
It increases:Linearly with the number of classes
Exponentially with the order of the schemaExponentially with the order of the schema
It does not depend on the imbalance ratio
Slide 17GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Facetwise AnalysisPopulation initialization– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Slide 18GRSI Enginyeria i Arquitectura la Salle
Bounding the Population Size1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg p6. Results7. Conclusions
Time to extinction
– Consider random deletion:
Slide 19GRSI Enginyeria i Arquitectura la Salle
Facetwise Analysis1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Facetwise AnalysisPopulation initialization– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Slide 20GRSI Enginyeria i Arquitectura la Salle
Bounding the Population Size1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg p6. Results7. Conclusions
Population size bound to guarantee that there will be representatives of starved nichesep ese tat es o sta ed c es
– Require that:
– Bound:
Slide 21GRSI Enginyeria i Arquitectura la Salle
Bounding the Population Size1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg p6. Results7. Conclusions
Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity:– Consider θGA = 0
– We require that the best representative of a starved niche receive a genetic event before being removed
– Time to receive the first genetic event
Slide 22GRSI Enginyeria i Arquitectura la Salle
Bounding the Population Size1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg p6. Results7. Conclusions
Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity:o sta ed c es ece e a ge et c oppo tu ty
The population size to guarantee that the best representatives of starve niches will receive at least one genetic opportunity g pp y
increases linearly with the imbalance ratio
Slide 23GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 24GRSI Enginyeria i Arquitectura la Salle
Design of Test Problems1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg6. Results7. Conclusions
One-bit problem
000110 :0 Value of the left-most bit
Condition length (l)
– Only two schemas of order one: 0***** and 1*****
– Undersampling instances of the class labeled as 1
ir11 Ps(min)+
=
Slide 25GRSI Enginyeria i Arquitectura la Salle
Design of Test Problems1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsg6. Results7. Conclusions
Parity problemCondition
01001010
Condition length (l)
:1 Number of 1 mod 2
Relevant
– The k bits of parity form a single building block
bits ( k)
– Undersampling instances of the class labeled as 1
1ir1
1 Ps(min)+
=
Slide 26GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 27GRSI Enginyeria i Arquitectura la Salle
XCS on the one-bit Problem1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
XCS configuration
α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir
Evaluation of the results:Evaluation of the results:– Minimum population size to achieve:
TP rate * TN rate > 95%TP rate TN rate > 95%
R lt 25 d– Results are averages over 25 seeds
Slide 28GRSI Enginyeria i Arquitectura la Salle
XCS on the one-bit Problem1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
N remains constant up to ir = 64
N increases linearly from ir=64 to ir=256
N increases exponentially fromp yir=256 to ir=1024
Higher ir could not be solved
Slide 29GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 30GRSI Enginyeria i Arquitectura la Salle
Analysis of the Deviations1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Niched Mutation vs. Free Mutation– Classifiers can only be created if minority class instances are sampled– Classifiers can only be created if minority class instances are sampled
Inheritance Error of Classifiers’ Parameters– New promising representatives of starved niches are created from
l ifi th t b l t i h d i hclassifiers that belong to nourished niches
– These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as)This is specially delicate for the action set size (as).
– Approach: initialize as=1.
Slide 31GRSI Enginyeria i Arquitectura la Salle
Analysis of the Deviations1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Subsumption– An overgeneral classifier of the majority class may receive ir positive– An overgeneral classifier of the majority class may receive ir positive
reward before receiving the first negative reward
– Approach: set θsub>irpp sub
Stabilizing the population before testingStabilizing the population before testing– Overgeneral classifiers poorly evaluated
Approach: introduce some extra runs at the end of learning with the GA– Approach: introduce some extra runs at the end of learning with the GA switched off.
We gather all these little tweaks in XCS+PMC
Slide 32GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 33GRSI Enginyeria i Arquitectura la Salle
XCS+PCM in the one-bit Problem1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
N remains constant up to ir = 128
F hi h i N li htl iFor higher ir, N slightly increases
We only have to guarantee that aWe only have to guarantee that a representative of the starved niche will be created
Slide 34GRSI Enginyeria i Arquitectura la Salle
XCS+PCM in the Parity Problem1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviationsy6. Results7. Conclusions
Building blocks of size 3 need to be processed
Empirical results agree with thetheory
P l ti i b d t tPopulation size bound to guaranteethat a representative of the nichewill receive a genetic event
Slide 35GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2 Facetwise Analysis2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5 A l i f D i ti5. Analysis of Deviations
6. Results
7. Conclusions
Slide 36GRSI Enginyeria i Arquitectura la Salle
Conclusions and Further Work1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
We derived models that analyzed the representatives of starved niches provided by covering and mutation
A population size bound was derived
We saw that the empirical observations met the theory if four aspects were considered:
– Type of mutation
– as initialization
– Subsumption
– Stabilization of the populationStabilization of the population
Further analysis of the covering operator
Slide 37GRSI Enginyeria i Arquitectura la Salle
Modeling XCS in Class Modeling XCS in Class Imbalances: Population Size Imbalances: Population Size
and Parameter Settingsg
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1Kumara Sastry Ester Bernadó Mansilla
1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull UniversityEnginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana ChampaignUniversity of Illinois at Urbana Champaign