learning maximum likelihood bounded semi-naïve bayesian network classifier kaizhu huang, irwin...
Post on 19-Dec-2015
221 views
TRANSCRIPT
![Page 1: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/1.jpg)
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier
Kaizhu Huang, Irwin King, Michael R. Lyu
Multimedia Information Processing Laboratory
The Chinese University of Hong KongShatin, NT. Hong Kong
{kzhuang, king, lyu}@cse.cuhk.edu.hk
SMC2002, October 8, 2002Hammamet, Tunisia
![Page 2: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/2.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
2
Outline
Abstract Background
Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree
Bounded Semi-Naïve Bayesian Classifiers Experimental Results Discussion Conclusion
![Page 3: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/3.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
3
Abstract
Propose a technique for constructing semi-naïve Bayesian classifiers. It is bounded by the number of variables that can be
combined into a node. It has a less computational cost than the traditional
semi-naïve Bayesian networks. Experiments show the proposed technique is more
accurate.
![Page 4: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/4.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
4
A Typical Classification Problem
Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.
![Page 5: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/5.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
5
Classifiers Given a pre-classified dataset D,
where is the training data
in m-dimension real space, is the class label.
A classifier is defined as a mapping function:
to satisfy .
Background
![Page 6: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/6.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
6
Probabilistic Classifiers The classification mapping function is defined as:
The joint probability is not easily estimated from the dataset; however, the assumption about the distribution has to be made, e.g., dependent or independent?
a constant for a given x
Background
![Page 7: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/7.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
7
Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes
are independent: Classification mapping function
Related Work
![Page 8: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/8.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
8
Related Work
Naïve Bayesian Classifiers NB’s performance is comparable with some state-
of-the-art classifiers even when its independency assumption does not hold in normal cases.
Question: Question: Can the performance be better when the conditional Can the performance be better when the conditional
independency assumption of NB is independency assumption of NB is relaxedrelaxed??
![Page 9: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/9.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
9
Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables,
given the class label C.
Related Work
![Page 10: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/10.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
10
A tree dependence structure
Related Work
Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables,
given the class variable C.
![Page 11: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/11.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
11
A conditional tree
dependency assumption
among variables
A conditional independency
assumption among jointed
variables
Chow & Liu68 developed a
global optimal and polynomial
time cost algorithm
Traditional SNBs are not
well developed like CLT
Summary of Related Work
![Page 12: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/12.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
12
Kononenko91 Pazzani96
Local heuristicLocal heuristic
Efficient?
Accurate?
NoInefficient even in
jointing 3 variables
No
Exponential time cost
Problems of Traditional SNBs
![Page 13: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/13.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
13
Our Novel Bounded Semi-Naïve Bayesian Network
Accurate? We use a global combinatorial optimization method.
Efficient? We find the network based on Linear Programming,
which can be solved in polynomial time.
![Page 14: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/14.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
14
Jointed variables
Completely covering the variable set without overlapping
Conditional independency
Bounded
Bounded Semi-Naïve Bayesian Network Model Definition
![Page 15: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/15.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
15
Large search space
Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K
Hidden principle: When K is small, a K cardinality of jointed variables will be more accurate than
separating them into several jointed variables. Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).
Search space after reduction:
Constraining the Search Space
![Page 16: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/16.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
16
How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables)
from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood.
[x] means rounding the x to the nearest integer
Searching K-Bounded-SNB Model
![Page 17: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/17.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
17
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
No coverage among jointed
variables
All the jointed variables forms the variable set
Rounding Scheme:Rounding LP solution into an IP
Solution.
Rounding Scheme:Rounding LP solution into an IP
Solution.
Global Optimization Procedure
![Page 18: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/18.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
18
Rounding Scheme
![Page 19: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/19.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
19
Experimental Setup
Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”
Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1
![Page 20: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/20.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
20
Overall Prediction Rate(%)
• We set the bound parameter K to 2 and 3.• 2-BSNB means the BSNB model for bounded parameter set to 2.
Experimental Results
![Page 21: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/21.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
21
Experimental Results
0
0.05
0.1
0.15
0.2
0.25
0.3
1
Erro
r rat
e NB
CLT
2-BSNB
3-BSNB
Average Error Rate Chart
![Page 22: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/22.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
22
1 2 3
4 5 6
7 8 9
Results on Tic-Tac-Toe Dataset
9 attributes for Tic-Tac-Toe dataset
![Page 23: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/23.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
23
Observations
Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy
decreases.
Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias;
K=3, the accuracy increases.
![Page 24: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/24.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
24
Discussion
When n cannot be divided by K exactly (n mod K)=l, l0, The assumption that all the joined variable has the
same cardinality K will be violated.Solution:
Find an l-cardinality jointed variable with the minimum entropy Do the optimization on the other n-l variables since (n-l mod K) will be 0.
How to choose K ? When the sample number of the dataset is small, a large K may not
get a good performance. A good K should be related to the nature of the datasets.
How to relax SNB? SNB is still strongly constrained. Upgrading into a mixture of SNBs.
![Page 25: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory](https://reader035.vdocuments.net/reader035/viewer/2022062516/56649d3e5503460f94a17214/html5/thumbnails/25.jpg)
SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab
25
Conclusion
A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables
B-SNB to have global optimization.
The transformation from IP into a LP problem reduces the computational complexity into a polynomial one.
It outperforms NB and CLT in our experiments.