learning bayesian classifiers
TRANSCRIPT
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 1/12
Department of Computer Science and Engineering, National Institute of Technology, Warangal – 506004
Learning BayesianClassifiersUsing Differential Evolution algorithm for Variable Ordering
Project Guide: Dr. S. G. Sanjeevi (Head of the Department) – Associate Professor12/31/2011
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 2/12
P a g e | 1
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
CONTENTS 1. Introduction -- Page 2
a. Bayesian Network -- Page 2 b. K2 Algorithm -- Page 2 c. Learning Variable Ordering (VO) -- Page 3
2. Previous Experiments -- Page 5 a. Evolutionary Algorithms (EAs) -- Page 5 b. VOGA (Variable Ordering Genetic Algorithm) -- Page 5
i. What is VOGA? -- Page 5 ii. How it is implemented? -- Page 5 iii. Experiment -- Page 7
3. Scope -- Page 9 a. Differential Evolution -- Page 9
i. Algorithm -- Page 9 4. Conclusion -- Page 10 5. References -- Page 10
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 3/12
P a g e | 2
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
INTRODUCTION
Bayesian Network:
A Bayesian Network (G) has a directed acyclic graph (DAG) structure. Each node in the graph
corresponds to a discrete random variable in the domain. An edge, Y → X, on the graph, describes a
parent and child relation in which Y is the parent and X is the child. All parents of X constitute the parent
set of X which is denoted by ∏ . In addition to the graph, each node has a conditional probability
table (CPT) specifying the probability of each possible state of the node given each possible combination
of states of its parents. If a node contains no parent, the table gives the marginal probabilities of the
node.
In a process of learning BNs from data, the BN variables represent the dataset attributes (or features).
When using algorithms based on heuristic search, the initial order of the dataset attributes may be an
important issue. Some of these algorithms depend on this ordering to determine the arcs direction such
that an earlier attribute (in an ordered list) is a possible parent only of the later ones.
Instead of encoding a joint probability distribution over a set of random variables, a Bayesian Classifier
(BC) aims at correctly predicting the value of a designated discrete class variable given a vector of
attributes (predictors). Learning Bayesian Networks methods may be used to induce BC and it is done in
this work. The BN learning algorithm applied in our experiments is based on the K2 algorithm, which
constructs a BN from data and uses a heuristic search for doing so.
K2 Algorithm:
The K2 algorithm constructs a BN from data using a heuristic search. It receives as input a complete
database and a VO. Considering these assumptions, the K2 algorithm searches for the BN structure that
best represents the database. This algorithm is commonly applied due to its performance in terms of
computational complexity (time) and good results when an adequate VO is supplied.
The attributes preorder assumption is used to reduce the number of possible structures to be learned.
In this sense, K2 uses an ordered list (containing all the attributes including the class), which asserts that
only the attributes positioned before a given attribute A may be parents of A. Hence, the first attribute
in the list has no parent, i.e. it is a root node in the BN.
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 4/12
P a g e | 3
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
The algorithm uses a greedy method to search for the best structure. It begins as if every node had no
parent. Then, beginning with the second attribute from the ordered list (the first one is a root node), the
possible parents are tested and those that maximize the whole probability structure are added to the
network. This process is repeated to all attributes in order to get the best possible structure. K2 metric
to test each possible parent set to each variable is defined by the following equation.
∏( ) ∏
Where each attribute has possible values * +. D is a
dataset with m objects. Each attribute has a set of parents ( ), and is the number of
instantiations of ( ). Is the number of objects in D, in which has value and πxi is
instantiated as ( represents the j-th instantiation relative to D of πxi). Finally, =∑ .
With the best structure already defined, the network conditional probabilities are determined. It is done
using a Bayesian estimation of the (predefined) network structure probability.
When dataset D has a distinguished class variable, K2 may be used as a BC learning algorithm. This is
exactly our assumption.
Learning Variable Ordering (VO):Learning a Bayesian Network (BN) from data became an effervescent research topic in the last decade.
The search space for a BN with n variables has an exponential dimension. Therefore, finding the BN
structure that better represents the dependences among the variables is not a trivial task. This is a NP –
Complete problem, thus it is hard to identify the best solution for all the application problems. Trying to
reduce the search space of this process, some restrictions are usually imposed and often the algorithms
obtain good results with acceptable computational effort. A very common restriction when learning a
BN is the definition of a previous variables ordering (VO). The same situation happens when trying to
learn a Bayesian Classifier (BC) from data. We present a genetic algorithm namely VOGA (Variable
Ordering Genetic Algorithm) for the optimization of the learning BC from data process by means of the
identification of a suitable VO. In general, genetic algorithms are capable to identify and explore aspects
of the environment where the problem is inserted and to converge globally to excellent solutions, or
approximately excellent. Therefore, genetic algorithms are considered an efficient search and
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 5/12
P a g e | 4
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
optimization tool for most different types of problems. Several works propose hybrid GA/Bayes methods
using a GA to define an adequate VO:
• Presented a genetic algorithm to search for the best variable ordering. Each element of the
population is a possible ordering and their fitness function is the K2 metric.
• Implemented a GA for the problem of permutation of variables in BN learning and inference.
• Considers a subgroup of the set of dependence /independence relations to get the variables
ordering. This process is guided by genetic algorithms and simulated annealing.
Even having a number of works dealing with this issue; most of them are defined to learn unrestricted
BN. Our GA/Bayes hybrid approach (VOGA), on the other hand, is devoted to learn Bayesian Classifiers
from data. In this sense, the class variable may play an interesting role in the variable ordering
definition.
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 6/12
P a g e | 5
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
PREVIOUS EXPERIMENTS Genetic algorithms like VOGA and VOGA+ have been used for optimizing the learning of BC from data
process by means of the identification of a suitable VO. In these genetic algorithms each element of thepopulation is a possible ordering and their fitness is the K2 metric (g value). Evolutionary algorithms with
canonical crossover and mutation have also been used to find an appropriate VO.
Evolutionary Algorithms (EAs):
EAs are computational models that solve a given problem by maintaining a changing population of
chromosomes, each with its own level of ‘fitness’. A fitness function is used to measure the quality of
each chromosome. Genetic algorithms are most popular models of EAs. Differential Evolution
algorithms are also a class of Evolutionary Algorithms.
VOGA (Variable Ordering Genetic Algorithm):
What is VOGA?
The main idea in the proposed method is to use a GA and the class variable information to optimize the
variable ordering (VO) which will be used as an input to learn a BC from data. In this sense, we fix the
class variable as the first one in the VO. Subsequently, the GA is used trying to find the best ordering for
the remaining variables. Our method uses a GA in which the chromosomes represent possible variables
ordering. The variables identification (ID) is codified as an integer number. Therefore, each chromosome
has (n – 1) genes, where n is the number of variables (including the class variable) and each gene is
instanced with a variable ID. Thus, each possible ordering may form a chromosome. The fitness function
is given by the Bayesian score (g function) defined in K2 algorithm.
How it is implemented?
VOGA generates a random initial population. Each chromosome is evaluated by the K2 algorithm whose
function g is used as fitness function. The best chromosomes are selected, and using crossover and
mutation operators the next generation is generated. The process is repeated and for each generation
the best ordering is stored. If there is no improvement after 10 generations, the algorithm locks up and
returns the best found ordering. The flowchart summarizes the process all.
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 7/12
P a g e | 6
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
Flow Chart
In addition to the aforementioned VOGA algorithm, it was implemented as a slightly different version,
namely VOGA+, in which the initial population is not randomly generated. In VOGA+, more information
about the class variable is used trying to optimize the initial population and, therefore, trying to obtain
better BC structures (mainly in domains having many attributes).
In order to define the VO of the initial population chromosomes, the χ2 (chi -squared) statistical test is
performed using each variable jointly with the class variable (for this reason, VOGA+ can only be applied
in a classification context, where there is a distinguished variable, namely class variable). Thus, the
strength of the dependence relationship between each variable and the class can be measured.
Subsequently, the variables are decreasingly ordered according to their χ2 scores. The first variab le in
the ordered list has the highest χ2 score, i.e. it is the most dependent upon the class. Obviously, the
relation between the χ2 statistical test and the best VO may not hold strictly, but the work, show that
good results can be achieved using this heuristic.
Having defined the VO given by χ2 statistical test, all initial population chromosomes are defined using
this VO (all chromosomes are identical).
Start
Read data
Initial o ulation Generation
ChromosomesEvaluation
Selection
Crossover andMutation
ChromosomesEvaluation
Returns thebest VO
Sto ?
End
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 8/12
P a g e | 7
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
Experiment:
Seven domains were used in our simulations. Two well-known Bayesian Network domain (Engine Fuel
System and Asia) and five benchmark problems from the U. C. Irvine repository1 were used in the VO
and classification task, namely, Balance, Breast – w, Congressional Voting Records (Voting), Vehicle and
Iris. The following table summarizes the data set features.
Asia Balance Breast – w Engine Iris Vehicle Voting
AT 8 5 10 9 5 19 17
IN 15000 625 683 15000 150 846 232
CL 2 3 2 2 3 4 2
Datasets Description with dataset name (Data), number of attributes plus class (AT), number of instances (IN) and number of classes (CL).
The experiments were conducted following the steps below.
1. Initially, the datasets had been used as input to the K2 algorithm. The VO was the original one
given in the file. The Bayesian score (g) obtained to each dataset was stored.
2. The same datasets used in step 1 had been used as input to VOGA and VOGA+. The Bayesian
score (g) obtained to each dataset and the number of generations necessary to reach the
solution were stored.
Results achieved in steps 1 and 2 are presented in the following tables respectively.
Asia Balance Breast – w Engine Iris Vehicle Voting
K2 -33610 -4457 -8159 -33809 -2026 -10357 -1749
VOGA -33610 -4457 -8159 -33755 -2026 -10006 -1727
VOGA+ -33608 -4457 -8159 -33755 -2026 -9956 -1724Bayesian Score (g function) of each achieved Bayesian Network Structure. The best results in each dataset are in bold face.
Analyzing results presented in the above Table, it is possible to infer that, as far as the Bayesian score (g
function) is concerned, in all performed experiments, VOGA produced results at least as good as the
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 9/12
P a g e | 8
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
ones produced by K2 and in 3 out of the 7 datasets VOGA improved the results obtained using K2. In
addition, VOGA+ performed at least as well as VOGA and in 3 out of the 7 datasets VOGA+ improved the
results obtained using VOGA.
Another interesting issue revealed in Table 2 is that datasets having higher number of attributes, namelyVehicle (19 attributes) and Voting (17 attributes) favored the proposed method (VOGA), mainly when
using the enhanced version VOGA+.
Asia Balance Breast – w Engine Iris Vehicle Voting
VOGA 11 11 11 13 11 11 38
VOGA+ 19 11 11 12 11 15 6
Number of generations needed until convergence.
When the number of generations is concerned, in 4 (Balance, Breast-w, Engine and Iris) out of the 7
datasets VOGA and VOGA+ presented (mostly) the same results. The other 3 datasets (Asia, Vehicle and
Voting) revealed that, when the number of generations was not the same for VOGA and VOGA+, the
Bayesian score obtained by the later one was always better.
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 10/12
P a g e | 9
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
SCOPE Scope: Replacing Genetic Algorithm with Differential Evolution algorithm for better convergence and
for a better Variable Ordering (possibly).
Differential Evolution:A basic variant of the DE algorithm works by having a population of candidate solutions (called agents).
These agents are moved around in the search-space by using simple mathematical formulae to combine
the positions of existing agents from the population. If the new position of an agent is an improvement
it is accepted and forms part of the population, otherwise the new position is simply discarded.
Algorithm:Differential Evolution Algorithm:
• Let designate candidate solution in the population. The basic DE algorithm can then be
described as follows:
• Initialize all agents with random positions in the search-space.
• Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness
reached), repeat the following:
• For each agent in the population do:
• Pick three agents , and from the population at random, they must be
distinct from each other as well as from agent
• Pick a random index * +(n being the dimensionality of the problem to
be optimized).
• Compute the agent's potentially new position * +as follows:
• Pick a uniformly distributed number
• If r i < CR or i = R then set y i = a i + F (b i − ci ) otherwise set y i = x i
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 11/12
P a g e | 10
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
• If then replace the agent in the population with the improved
candidate solution, that is, replace with in the population.
• Pick the agent from the population that has the highest fitness or lowest cost and return it as
the best found candidate solution.
Note that , -is called the differential weight and , -is called the crossover probability ,
both these parameters are selectable by the practitioner along with the population size
CONCLUSION
Experiments for the usage of differential evolution to find a suitable variable ordering (possibly) and to
extend the results for Bayesian networks.
8/2/2019 Learning Bayesian Classifiers
http://slidepdf.com/reader/full/learning-bayesian-classifiers 12/12
P a g e | 11
Shruti B – 8772 Mouli C R K – 8792 Divya B V – 8773
REFERENCES
SANTOS, E. B.; HRUSCHKA JR., EBECKEN,Evolutionary Algorithm using Random Multi-point Crossover
Operator for Learning Bayesian Network Structures , In 9th INTERNATIOAL CONFERENCE ON MACHINE
LEARNING AND APPLICATIONS, 2010.
SANTOS, E. B.; HRUSCHKA JR., ER.VOGA: Variable ordering genetic algorithm for learning Bayesian
classifiers . In: 6TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS- HIS2006, 2006,
Auckland.