identifying interesting association rules with genetic algorithms
DESCRIPTION
Identifying Interesting Association Rules with Genetic Algorithms. Elnaz Delpisheh York University Department of Computer Science and Engineering October 11, 2014. Data mining. Too much data. Data. Data Mining. I = {i 1 ,i 2 ,...,i n } is a set of items . - PowerPoint PPT PresentationTRANSCRIPT
Elnaz DelpishehYork University
Department of Computer Science and Engineering
April 20, 2023
Identifying Interesting Association Rules with Genetic
Algorithms
Data mining
2
Data
Data Mining
Association rules
Too much data
•I = {i1,i2,...,in} is a set of items.•D = {t1,t2,...,tn} is a transactional database.•ti is a nonempty subset of I.•An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ .•Apriori algorithm is mostly used for association rule mining.•{milk, eggs}{bread}.
Apriori Algorithm
TID List of item IDs
T100
I1,I2,I3
T200
I2, I4
T300
I2, I3
T400
I1,I2,I4
T500
I1, I3
T600
I2, I3
T700
I1, I3
T800
I1, I2, I3, I5
T900
I1, I2, I3
3
Apriori Algorithm (Cont.)
4
Association rule mining
5
Too many
association rules
Data
Data Mining
Association rules
Too much data
Interestingness criteria
6
Comprehensibility.Conciseness.Diversity.Generality.Novelty.Utility....
Interestingness measures
Subjective measuresData and the user’s prior knowledge are considered.Comprehensibility, novelty, surprisingness, utility.
Objective measuresThe structure of an association rule is considered.Conciseness, diversity, generality, peculiarity.Example: Support
It represents the generality of a rule. It counts the number of transactions containing both A and
B.
7
Drawbacks of objective measuresDetabase-dependence
Lack of knowledge about the databaseThreshold dependence
SolutionMultiple database reanalysis
Problemo Large number of disk I/O
Detabase-independence
8
Genetic algorithm-based learning (ARMGA )1. Initialize population2. Evaluate individuals in population3. Repeat until a stopping criteria is met
A. Select individuals from the current population
B. Recombine them to obtain more individualsC. Evaluate new individualsD. Replace some or all the individuals of the
current population by off-springs
4. Return the best individual seen so far
9
ARMGA ModelingGiven an association rule XYRequirement
Conf(XY) > Supp(Y)
Aim is to maximise
10
ARMGA EncodingMichigan Strategy
Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅.
For example{A1,...,Aj}{Aj+1,...,Ak}
11
ARMGA Encoding (Cont.)
12
The aforementioned encoding highly depends on the length of the chromosome.
We use another type of encoding:Given a set of items {A,B,C,D,E,F}Association rule ACFB is encoded as follows
00A11B00C01D11E00F00: Item is antecedent11: Item is consequence01/10: Item is absent
ARMGA Operators
SelectCrossoverMutation
13
ARMGA Operators-SelectSelect(c,ps): Acts as a filter of the
chromosomeC: ChromosomePs: pre-specified probability
14
ARMGA Operators-CrossoverThis operation uses a two-point strategy
15
ARMGA Operators-Mutate
16
ARMGA Initialization
17
ARMGA Algorithm
18
Empirical studies and EvaluationImplement the entire procedure using
Visual C++Use WEKA to produce interesting
association rulesCompare the results
19
20