an overlapping fuzzy approach to pattern...

An improved hierarchical partitioning fuzzy approach to pattern classification

A dissertation submitted to The University of Manchester

for the degree of MSc Information Systems Engineering

in the Faculty of Engineering and Physical Sciences

2008

Han Ding

School of Computer Science

1

LIST OF CONTENTS LIST OF FIGURES AND TABLES...............................................................................................2

ABSTRACT .....................................................................................................................................3

DECLARATION .............................................................................................................................4

COPYRIGHT ..................................................................................................................................5

ACKNOWLEDGEMENT ..............................................................................................................6

1. Introduction .................................................................................................................................7

1.1 Pattern classification ........................................................................................................ 7 1.2 The problem...................................................................................................................... 9 1.3 Overview of dissertation ................................................................................................ 10

2. Background................................................................................................................................12

2.1 Statistical approaches..................................................................................................... 12 2.2 Neural network approaches .......................................................................................... 13 2.3 Structural approaches.................................................................................................... 15 2.4 Fuzzy approaches ........................................................................................................... 16 2.5 Comparison..................................................................................................................... 19

3. Research Method.......................................................................................................................21

3.1 Hierarchical overlapping fuzzy approach .................................................................... 21 3.1.1 Initial Input Partitioning .................................................................................... 21 3.1.2 Fuzzy Rules Generation ...................................................................................... 25 3.1.3 Fuzzy Inference Process...................................................................................... 26

3.2 Improvements ................................................................................................................. 29 3.2.1 Euclidean distance calculation in assigning rules ............................................. 29 3.2.2 Tuning slopes........................................................................................................ 31

4. Implementation .........................................................................................................................34

4.1 Implementation background and software environment............................................ 34 4.2 Programme procedure and operating results .............................................................. 35

5. Evaluation ..................................................................................................................................40

5.1 Training and testing data issue...................................................................................... 41 5.2 Iris dataset....................................................................................................................... 42 5.3 Wisconsin Breast Cancer dataset .................................................................................. 45 5.4 Comparisons with other methods and analysis ........................................................... 48

6. Conclusion and Future work....................................................................................................53

References ......................................................................................................................................55

Appendix 1: Sample source code .................................................................................................57

Appendix 2: A sample separation of Iris dataset ........................................................................70

Final word count: 12953

LIST OF FIGURES AND TABLES

Figure 1: A typical pattern classification system ..........................................................................8

Figure 2: A mapping x→ y from measurement space X to decision space Y............................12

Figure 3: An artificial neuron.......................................................................................................14

Figure 4: The multi-layer perceptron ..........................................................................................15

Figure 5: An example of structural patterns...............................................................................16

Figure 6: An iterative partitioning of the overlapping area ......................................................23

Figure 7: Hierarchy of the generated hyperboxes ......................................................................23

Figure 8: Class boundaries and a membership function of a 2-dimensional hyperbox ..........27

Figure 9: Class boundaries and membership functions of 2 overlapping hyperboxes............28

Figure 10: Modified class boundaries and membership functions ...........................................33

Table 1. Classification result for the first configuration ............................................................43

Table 2. Rules generated from Iris dataset..................................................................................43

Table 3. More results for other configurations on Iris dataset ..................................................44

Table 4. Rules generated from Wisconsin Breast Cancer dataset .............................................46

Table 5. Results for all configurations of Wisconsin Breast Cancer dataset ............................47

Table 6. Comparative results of several classification systems on Iris dataset ........................48

Table 7. Comparative results of systems on Wisconsin Breast Cancer dataset .......................50

2

ABSTRACT

Pattern classification has become an essential element in variety kind of realms such as

engineering control and medical diagnosis applications. There are numerous approaches

for classification and each of them proved effective in certain cases. Recently, a more

general, accurate and efficient method is still desirable and the approaches of fuzzy logic

have been successfully applied in this area.

In this dissertation, a pattern classification system is realised based on an improved

hierarchical partitioning fuzzy approach, which was initially proposed by I. Gadaras and L.

Mikhailov [9]. The approach is able to directly extract rules from numerical data for

classification and focus on achieving high accuracy with low expensiveness. A meaningful

input partitioning technique for overlapping area and some adjustments on membership

functions are highlighted.

A pattern classification system based on proposed fuzzy methodology is programmed out

by using Java language and JDBC database. This system is evaluated by Fisher Iris dataset

and Wisconsin Breast Cancer dataset, which are famously employed for testing

classification performance. Comparative results are analysed in detail with critical

conclusion and future suggestions.

Keywords: pattern classification, improved hierarchical partitioning, fuzzy approach.

3


DECLARATION No portion of the work referred to in the dissertation has been submitted in support of an

application for another degree or qualification of this or any other university of other

institution of learning.

4


COPYRIGHT The ownership of any intellectual property rights which may be described in this

dissertation is vested in the University of Manchester, subject to any prior agreement to the

contrary, and may not be made available for use by third parties without the written

permission of the University, which will prescribe the terms and conditions of any such

agreement.

5


ACKNOWLEDGEMENT I would like to take this opportunity to thank my supervisor Dr. Ludmil Mikhailov, for his

professional and amiable guidance, and PhD. Ioannis Gadaras, for his inspiring and

constant support. I also would like to thank my parents, Sophie and all who once helped

me during this year.

6


1. Introduction

In this chapter, basic knowledge of pattern classification is introduced, followed by

describing the problem in general level. Finally, the structure of dissertation is listed, with

brief introduction for each following chapter.

1.1 Pattern classification

Pattern classification is a vital intelligence of human being. A person can identify whether

an animal is a cat or a horse from its shape, size and behaviour. This process, from

receiving visual information to judging species, is exactly pattern classifying.

From artificial intelligence emerged in 1950s, people have been trying to empower

computer with this ability. 10 years later, pattern classification had become a new

discipline and developed very fast. Nowadays, pattern classification particularly aims to

classify real world applications as accurate and as efficient as possible. It includes a wide

range of information processing research and applies in widespread realms such as voice

recognition, figure print detection and diseases diagnosis [1]. Consequently, pattern

classification is not only a sole subject of computer science any more, it is involved with

diverse fields including cybernetics, linguistics and biology etc.

7


Data Acquisition

Feature Extractor

Classifier

Output Device

External Signals

Captured Data

Feature Vector

Class Indices

Figure 1: A typical pattern classification system

A typical four-operator pattern classification system can be illustrated by Figure 1 [2].

Initially, External Signals are caught by Data Acquisition and then transfer to a form that

can be understood by next operator in the system. Due to huge amount of data captured by

Data Acquisition, it is very difficult and unnecessary to process all information. Therefore,

Feature Extractor takes the responsibility to condense the information and discard those

unimportant data. Meanwhile, Feature Extractor also processes all useful information to a

multi-dimensional Feature Vector and conveys them into Classifier. The Classifier is now

able to assign inputs belonging to a specific class from received Feature Vector, and then

produces Class Indices to Output Device. Finally, the classification result is displayed or

proceeds by this Output Device.

A concrete example here is used for demonstrating this process. In a speaker recognition

system, physical sound wave is the External Signal. A Data Acquisition, microphone,

8


receives the wave and converts it into digital format in order to process in future. The

ambient noisy is also impeded in this stage. These digital data (Captured Data) become

Feature Vector including frequency and volume via a Feature Extractor machine. The

machine then provides quantificational information for Classifier. The Classifier here can

be a computer that stores personal records, algorithms as well as classification rules. It

recognises the person speaking and passes commands (Class Indices) to the display

(Output Device) in order to show the name of the person on the screen.

1.2 The problem

Feature extractor and classifier are two major focuses of pattern classification systems.

Either of these two elements could influence capacity of the system: if one of them

performs strongly enough, the other one can be entirely ignored. However, the

performance of feature extraction usually relays on the specific application, and that is

relatively difficult to improve generally. Therefore, more research now is focused on

improving the classifier. This dissertation is also trying to contribute improvements to the

classifier.

A good classifier aims to produce highly accurate pattern classification result with as less

expensiveness as possible. However in the past, it seems very difficult to achieve at the

same time. Some approaches provided an excellent accuracy but generated a great amount

of rules, which was costly and time consuming. On the other hand, some algorithms

achieved simple and fast, at the cost of low precision.

9


Therefore, the problem is how to create a novel method for the classifier, with an optimum

balance between accuracy and expensiveness.

1.3 Overview of dissertation

In Chapter 2, backgrounds and different methods to tackle pattern classification problem

are shortly presented. It briefly explains the principle of several familiar classification

approaches including statistical approaches, structural approaches and neural network

approaches. Moreover, fuzzy approaches are discussed in more detail with comparative

analysis of these methodologies.

Chapter 3 illustrates the major methodology of this dissertation, which is inspired from an

overlapping fuzzy classification approach [9]. Detailed description of main processes can

be found: initial input partitioning, fuzzy rules generation and fuzzy inference process.

Novel theoretical improvements are highlighted next, including utilising calculation of

Euclidean distance and tuning membership function slopes to enhance classifying

performance.

Chapter 4 covers implementation details from development techniques and environment to

the programming procedure. It explains how this programmed system implements the

proposed approach step by step.

In Chapter 5, evaluations are carried out by employing two popular pattern classification

10


testing datasets: Fisher Iris dataset and Wisconsin Breast Cancer dataset. An important

discussion about training and testing data selection is initially presented. After brief

introductions of each testing dataset, evaluation results are followed, with comparison of

different classification methods and critical analysis.

Chapter 6 concludes the dissertation and emphasizes significant achievements.

Investigations of current problems are also provided, with suggestions for several

directions of future research.

11


2. Background

There has been a great amount of effort paid on pattern classification research. Many

creative methods have been suggested and some of them proved quite effective in some

cases. Currently, a classification system usually employs one of the following approaches:

statistical (or theoretic) approaches, neural network approaches, structural (or syntactic)

approaches and fuzzy approaches. In this section, we briefly introduce each of them and

explain fuzzy approach relatively in detail, as it is the approach applied in this dissertation.

2.1 Statistical approaches

Pattern classification covers a wide range of problems and it is hard to find a single unified

approach. However, statistical decision and estimation are regarded as fundamental to the

discipline of pattern classification [3].

A A

C C

C C

A A A

B B

B B

B

Measurement Space X

Ay

By

Cy

Ny

.

.

.

Decision Space Y

A

Figure 2: A mapping x→ y from measurement space X to decision space Y

Statistics is a mathematical method that can be used to summarise a collection of data. It

12


describes the appearing frequency of variations. Statistical approach use algorithms to

analyse the probability of an input belonging to a certain class.

An example can be easily explained by Figure 2. Each input corresponds to a point in the

multi-dimensional Measurement Space X. Inputs who belong to the same class are closely

placed and mapping to one class in the Decision Space Y. The projecting process tries to

indicate the point in Y area linking to class y correctly as the best mapping is that gives the

maximum recognition rate.

The accuracy of this approach is actually relied on the natural distribution of inputs. That is

to say, if the inputs belonging to one class are loosely separated, the accuracy would be

affected by this. To solve this problem, additional knowledge or information about

distribution is required.

2.2 Neural network approaches

The inspiration of neural network came from observation in biology. The artificial neurone,

imitating biotic one, consists of an interconnected group of artificial neurons and processes

information using a connectionist approach to computation. The operation of a basic

element works at very low level and the only thing that a neural can output is a Boolean

value. Network is connected by a great number of these neurons with input and output

weights, which are able to change dynamically for learning process.

13


∑ xm f(x) Output

w1

w2

wm

Input x1

Input x2

Input xm

Figure 3: An artificial neuron

Neural network approaches are widely used and effective to solve pattern classification

problems. They provide a new suite of nonlinear algorithms as well as vehicle of using

existing feature extraction and classification algorithms for efficient implementation. In

spite of the different underlying principles, nearly all famous neural network models are

actually similar to classical statistical pattern classification methods [4].

The power of neural network is greatly improved by introducing Hidden layer (showed in

Figure 4). The hidden layer receives output form first input layer and produce adjusted

output to final layer to make decisions. This process is why it can overcome the difficulty

of dealing with non-linear separable problems.

14


Input layer

Hiddenlayer

Output layer

Input x1

Input Weights

w

Output Weights h

Input x2

Input xm

Output y1

Output ym

Figure 4: The multi-layer perceptron

Neural network approaches have a strongpoint for learning ability. By using adaptable

weight function and many available algorithms, it has great potential for parallelism

because of the independence of each calculation element. However the problems are:

firstly, neural network is unable to extract rules automatically from neurons to form a

system. Secondly, the relatively long learning time does affect the performance of pattern

classification. Thirdly, the trained network is very difficult to be analysed and researched

for future improvements.

2.3 Structural approaches

There are some complex patterns which are more suitable to apply in a different

perspective, especially when the pattern is viewed as being combined of some simple sub

patterns [5]. Structural approaches assume the pattern structural is quantifiable and can be

solved by hierarchically decomposing original structure into several manageable parts. It

15


describes how the given pattern is constructed.

Figure 5 would be a good example of structural approach. There are two patterns and each

of them contains several strings. It can be easily identified that all number in Pattern A

keeps the format of , and that in Pattern B has the format of , .

After abstract these rules, it is possible to determine that a string belonging to a specific

pattern, but strings like ABC could belong to both patterns.

nABC 0≥n CAB n 0≥n

AC

ABC

ABBC ABBBC

AB ABC

ABCCC ABCC

ABC

AB

Pattern A Pattern B

ABBC

AC

ABBC

Figure 5: An example of structural patterns.

2.4 Fuzzy approaches

Fuzzy approaches are originated from fuzzy logic, a concept created by Professor

L.A.Zadeh in 1965. Although it experienced fierce criticises and arguments at the

beginning, it has eventually been recognised and its applications contributed to various

fields. In the past, scientists widely applied probability theory to describe uncertainty. This

theory only predicted the probability to be one of two distinct values, 0 or 1. However,

sometimes it is unable to do that. For example, when considering whether a person is old

16


or young, assuming “old” means more than 70 and “young” means less than 50, how can

we describe a person whose age is 60? Obviously, it is inappropriate to use probability in

this case.

By introducing membership function, fuzzy theory is able to solve those issues properly.

Professor Zadeh expanded value description from only “0” and “1” to a continuous area [0,

1] (fuzzy set). If an element is impossible to belong to a class, its membership to this class

is 0. In the contrary, If an element is definitely belongs to this class, the membership is 1.

Consequently, the membership between 0 and 1 describes the degree of belonging [6].

Fuzzy rules consist of the antecedent such as “If x is in A” and the consequence such as

“Then y belongs to B”. For example, if the person is old then his acuity is poor. The

difference between fuzzy rules and crisp rules is that in crisp rules either antecedent or

consequence can only be true or false, but in fuzzy rules they can be any value between 0

and 1, for example, the degree of “old” could be 0.8. Afterward, by acquisition of these

IF-THEN rules as operators, a direct relationship from input elements to pattern

classification can be realised.

There are many kinds of fuzzy classification approaches. Two most common ones are

Wang and Mendel’s method and Abe and Lan’s method. Wang and Mendel’s method [13]

was introduced by their paper of 1992, containing five steps: firstly it divides input and

output spaces into fuzzy regions; secondly, it generates rules from the given numerical data;

17


thirdly, the degree of membership of generated rules is assigned for tackling the rule

intersection conflicts; fourthly, a Fuzzy-Associative-Memory bank is created by generated

rules and linguistic rules from human experts; finally, a mapping from input space to

output space is achieved by the FAM bank as the defuzzifing procedure. Wang and

Mendel’s method has proven effective and accurate on approximating, however it requires

initial partition of the input space and priori knowledge, which is hard to accomplish by

computer automatically.

In 1995, Shigeo Abe and Ming-Shong Lan suggested a method that is able to directly

extract rules from numerical data [14]. This method groups training input data into

activation hyperboxes corresponding to their output pattern. If there is an intersection area

between hyperboxes, containing inputs of more than one class, then lower level inhibition

hyperboxes are recursively formed until no overlaps remain. This method do not need

initial partition and prior knowledge, however, the partitioning process is difficult to

entirely remove all overlaps and for certain cases, some partitioning might be irrelevant

and redundant.

Another fuzzy classification method is a hierarchical overlapping fuzzy approach proposed

recently by I. Gadaras and L. Mikhailov [9]. The partitioning procedure of this approach is

very similar with Abe and Lan’s method, but it creatively provides terminative criterions

for the recursive partitioning, which perfectly avoid meaningless clustering. This approach

inherits advantages of Abe and Lan’s method as well as overcomes its problems,

18


considering both the accuracy and the number of generated rules. The fuzzy system based

on this approach has also been carefully evaluated by variety of testing datasets. The

comparison of its results with other fuzzy approaches showed that it achieved a fairly good

accuracy with relatively fewer generated costs.

2.5 Comparison

Each of approach has its advantages and disadvantages and might be suitable for certain

cases. Moreover, it is also possible to combine merits of several approaches together in

order to maximise their performance. Fuzzy theory and neural network are particularly

popular these years as they showed great performance and potential in solving pattern

classification problems.

Statistical approaches are fundamental and simple. When possessing a limited amount of

information, they might be good to solve problem directly and are not necessary to provide

a more general solution as an intermediate step [7]. However, as mentioned above, its

accuracy quite depends on natural distribution and if additional information is not

accessible, the result of classification could be severely affected.

Structural approaches can efficiently deal with obvious syntactic patterns, but those

approaches may raise a combinatorial explosion of possibilities to be investigated, which

requires huge training sets and large computational efforts [8]. Difficulties are also found

in dealing with segmentation of noisy patterns and other interferences.

19


The major advantage of neural network approaches is learning ability and accuracy, which

has proven by recent research. However the drawbacks of that are the time costly learning

process and inconvenient rules extraction and analysis.

Fuzzy approaches, on the other hand, have more outstanding benefits. As they are more

similar with logic of human being, inner operations of system can be clearly understood,

which enables further improving. Meanwhile, problems which are hard to model by

mathematics can also be handled by fuzzy approaches. Unfortunately, fuzzy approaches

are not perfect since they are difficult to formalise the process of rules generation. That is

to say, rules and benchmarks in the system require expertise from human being since it is

inevitable in any artificial intelligence related research. However, numerical cases give

fuzzy approaches a great platform to prove its ability when overcoming problems. By

empowering computers in a very similar way of human thinking, fuzzy approaches achieve

accuracy and efficiency simultaneously. Therefore, a fuzzy approach has been selected in

this dissertation as it focuses on extracting fuzzy rules directly from numerical data.

20


3. Research Method

This dissertation involves realisation of a pattern classification system using fuzzy theory.

As motioned above, many different fuzzy approaches already exist and the one we

employed for this classification system is the hierarchical overlapping fuzzy approach [9].

It is a novel approach that automatically extracts fuzzy rules form labelled numerical data,

with a meaningful input partitioning method and a hierarchical fuzzy rule structure.

This chapter starts with describing the selected fuzzy approach in detail. After that, we also

suggest theoretical improvements and adjustments, and then discuss possible effects on

them.

3.1 Hierarchical overlapping fuzzy approach

The hierarchical overlapping fuzzy approach has three stages: Initial input partitioning,

fuzzy rules generation and fuzzy inference process. It divides all training input vectors into

many regions and each of them assigned with a single type. After generation of certain rule

for each region, region boundaries expand by fuzzy inference. When classifying, the

system automatically assigns each input to a specific region by its relative location and

membership in each dimension, followed by executing corresponding rules.

3.1.1 Initial Input Partitioning

In the training stage, assume that there is a set of input-output data pairs:

21


[ );,...,,( )1()1()1(2

)1(1 yxxx m , , where

is the k-th attribute of the input vector x and is the output class for the j-th data pair

j=1,2,3,…,N. There are N different classes and their paired data belong to those classes.

The target is to build single-class regions by recursively partitioning from training

input-output data pairs.

]);,...,,(...,),;,...,,( )()()(2

)(1

)2()2()2(2

)2(1

NNm

NNm yxxxyxxx )( j

kx

)( jy

A hyperbox for the i-th class can be created, which contains its all paired inputs , by

the minimum value and maximum value for the k-th dimension. For all

iA iX

ikv ikV iXx∈

we have: = { |v ≤ x ≤ V , k=1, 2,…,m}. iA iXx∈ ik k ik

Hyperboxes might include data from other classes and consequently create overlapping

area with other hyperboxes. If that is the case, a new hyperbox is created for their

intersection . Of course, the overlapping area could be created by three

different classes , but here we just discuss the case of two hyperboxes

overlapping in order to explain briefly and clearly.

jiij AAA I=

kjiijk AAAA II=

For these overlapping hyperboxes, a recursive algorithm is suggested for partitioning the

input space. Whist l-th iteration, overlapping hyperbox can be continuously

partitioned into and if the terminative requirements are not met.

lijA

1+liA 1+l

jA

22


Iterating

liA

ljA

lijA

1−lijA

1+liA

1+ljA

1+lijA

lijA

Figure 6: An iterative partitioning of the overlapping area

As the iteration showed in Figure 6, this process can continue and form hierarchical

hyperboxes for class i, where L is the depth of repeat. (Figure 7) Liiii AAAA ,...,,, 210

…

0ijA 0

iA 0jA

1ijA 1

iA 1jA

2ijA 2

iA 2jA

LijA L

iA LjA

… …

A

Figure 7: Hierarchy of the generated hyperboxes

23


As mentioned above, iteration stops when the terminative requirements are reached. The

first criterion is to stop when the number of input data in overlapping area is relatively

small. This index can be presented by , where

is the number of inputs in the overlapping area and is the number of all

inputs in . If value of close to 0, the number of inputs in overlapping area must be

very small and further partitioning seems meaningless. The first terminative parameter Th1

can be set by user as when <Th1, the partitioning of is stopped, marking this

region by one of classes.

lijA

)(/)( lj

li

lj

li

lij AADAADR ∪∩= )( l

jli AAD ∩

)( lj

li AAD ∪

1−lijA l

ijR

lijR l

ijA

If >Th1, the second criterion is activated. It checks whether there is a class obviously

dominating the overlapping area. This index can be mathematically calculated

by

lijR

)(/)()( lj

li

lj

lij

lj

lii

lij AADAADAADS ∩∩−∩= , where is the total

number of inputs in intersection area and

)( lj

li AAD ∩

)()( lj

lij

lj

lii AADAAD ∩−∩ is the number of

difference between class i and j in this area. The value should be a number between 0

and 1. The more close to 1, the more a class dominates in this area, the less

meaningful to continue partitioning. The second terminative parameter Th2 can also be set

by user as when > Th2, the partitioning of is stopped, marking this region by one

of classes.

lijS

lijS

lijS l

ijA

Eventually, the whole inputs area is divided into regions and each of them belongs to a

single class.

24


3.1.2 Fuzzy Rules Generation

After input partitioning, the fuzzy rules can be generated for each region. For example, if a

hyperbox only has data from a class , rule (1) is generated: liA iY

IF x is in THEN y is in (1) liA iY

If hyperbox contain overlapping area , rule (2) is generated: liA l

ijA

IF x is in and x is not in THEN y is in (2) liA l

ijA iY

There are two additional rules generated for overlapping hyper box since this

hyperbox normally contain two more types of inputs. Two criterions can stop partitioning.

Therefore, when the portioning is terminated by either tiny number or obvious domination

in overlapping area, all these areas are trivial for accuracy. Consequently, without

significant influence on accuracy, two simple but meaningful algorithms are introduced:

lijA

IF x is in THEN y is in when > , OR y is in when < ,

where and . (3)

lijA iY iw jw jY iw jw

)(/)( lij

lijii ADADw = )(/)( l

ijlijjj ADADw =

However, if is exactly same with , then a distance-based rule is applied. This rule

calculates Euclidean distances and , form x to the two centroids of classes

and where centroids C are calculated by arithmetic means for each dimension. For

instance, if the overlapping area contains N points that belong to class ,

, p=1,2,…,N, then the centroid of class i is:

iw jw

id jd

iC jC

iY

),...,,( )()(2

)(1

)( pm

ppp xxxx =

),...,,(C 21iiii c

mcc xxx= , k=1, 2,…m.Nxxxx N

kkkck

i /)...( )()2()1( +++=

And the rule (4) is:

IF x is in THEN y is in when > OR y is in when > , where lijA iY jd id jY id jd

25


2222

211 )(...)()( iii C

mmCC

i xxxxxxd −++−+−= ,

2222

211 )(...)()( jjj C

mmCC

j xxxxxxd −++−+−= . (4)

After rules generation process, all partitioned regions are assigned by specific fuzzy rules.

In order to produce more accuracy and flexibility, fuzzy inference is applied as it

guarantees executing the most appropriate rule for a new input.

3.1.3 Fuzzy Inference Process

As reasons mentioned before, the membership function in fuzzy theory is introduced. To

determine which rule is executed, the degree of membership to each region needs to be

investigated. Assume after completed partition, if an input point is inside the single

hyperbox in a dimension, then the membership of the input for this hyperbox in this

dimension is 1. It decreases from 1 to 0 as it moves away from boundaries of the region.

The fuzzy area of a hyperbox called “generalisation area” and if the input is inside this area,

the membership to this hyperbox is between 0 and 1, indicating it partially belongs to this

hyperbox. The membership function can be presented as a trapezoid shape, describing full

membership by its upper base as well as gradually decreasing membership by its slopes.

This expression of membership employed in [10] and [11] is now widely used.

If a hyperbox does not overlap with other hyperboxes, the membership function can

be defined by equation:

liA

(a), where )))]}(,1min(,0max(1[)))],(,1min(,0max(1min{[)( likkkk

likkk

li Vxxvxm −−−−= γγ

26


kγ is the sensitivity parameter for the k-th dimension (or attribute).

liA

( )kli xm

kx

k

likv γ

1−k

likV γ

1+likv l

ikV

generalisable region

kx

Figure 8: Class boundaries and a membership function of a 2-dimensional hyperbox

Figure 8 illustrates generalisation area and its boundaries as well as a membership function

of a 2-dimensional hyperbox. If the input vector is placed inside of hyperbox , this

vector has degrees of membership =1 in each k-th dimension since satisfies

, . If the input vector is placed outside of in some dimensions

but not far away, the degrees of membership in these dimensions are since

satisfies , which belongs to generation area. If in k-th

dimension there is or , the degree of membership in k-th

dimension is 0, which means this vector is outside of generalised region. The sensitivity

liA

)( kli xm kx

likk

lik Vxv << mk ,...,1= l

iA

1)(0 << kli xm

kx kl

ikkklik Vxv γγ /1/1 +<<−

klikk vx γ/1−≤ k

likk Vx γ/1+≥

27


parameter kγ is able to enlarge or reduce the area of generalised region.

This method can also be extended to the situation that two hyperboxes overlap. If a

hyperbox does overlap with another hyperbox , the membership functions can be

defined by equation (b):

liA l

jA

)))]}))(/1/(1(,1min(,0max(1[)))],(,1min(,0max(1min{[)( likk

ljkk

likk

likkk

li VxvVxvxm −−+−−−= γγ

liA

( )kli xm

kx

k

likv γ

1−k

likV γ

1+likv l

ikV

kx

ljAl

ijA

k

ljkv γ

1−k

ljkV γ

1+ljkv l

jkV

( )klj xm

Figure 9: Class boundaries and membership functions of 2 hyperboxes

In Figure 9, similar with that in Figure 8, generalised regions are produced by both

hyperboxes. For overlapping area, two rules of the type (2) are applied and we use min

operator in each rule. min operator takes the minimum degree of membership of fuzzy

28


value for a given input:

( ) ( ){ }kliR

xmxd li

min= , k=1, 2,…m, where is the degree of membership to execute

rule in region of class i. The min operator guarantees the degree of membership to execute

a rule equals to one only if input vector is placed inside of hyperbox in every dimension. If

the input vector is outside of hyperbox or in the overlapping area, the degree of

membership to execute the rule must be less than one. If the input vector belongs to two

regions and they represent different class, the degrees of membership to execute rules

or can be calculated respectively. If input vector is placed into common

area of two hyperboxes, rule (3) or rule (4) is executed. Similar with single hyperbox case,

the sensitivity parameter

liR

d

)(xd liR )(xd l

jR

kγ is able to control the area of generalised region.

3.2 Improvements

In order to achieve better performance of the method above, creative adjustments are

theoretically suggested below. Predictions of possible effects are discussed in the end of

this chapter. In evacuation part all of them will be careful examined and analysed.

3.2.1 Euclidean distance calculation in assigning rules

The hierarchical overlapping fuzzy approach is very efficient classification method. It has a

very effective partitioning procedure and commendably integrates fuzzy theory,

considering both accuracy and the number of generated rules.

29


This approach utilises a rational way to deal with partitioned regions and generates rules,

which is primarily based on density-based judgement. For example, in a region contains

two classes and the partition is stopped by Th1 or Th2, how this input vector is classified

requires comparison of the densities of two classes in this area. The input will belong to the

class which has a greater density. If and only if the densities of two classes are exactly

same, the Euclidean distance-based comparison is activated as the input will be classified

to the class that has shorter distance form its centroid to the input vector.

However, the situations in the partitioned region are different because the portioning can be

terminated by either criterion Th1 or Th2. For example, if the partitioning is stopped by

Th2, one of the classes must dominate the area and consequently employ density

comparison. But if the partitioning is stopped by Th1, it only suggests this area could be

trivial for overall accuracy, the dominative situation of classes in this area is still unknown.

Although the overlapping approach has suggested Euclidean distance-based comparison,

but it is only activated when the density is perfectly same, which is a rare case. This

probably results the Euclidean distance-based comparison effacing itself.

For human being, the judging process normally contains these two criterions together and

equally. When people come across the problem of two classes overlaps, they seem to

equally use density judgement (How frequently it happened in here in the past?) and the

Euclidean-distance judgement (Does it numerically close to the mathematical mean?).

30


Therefore, we suggest enhancing the power of Euclidean-distance judgement in the

overlapping area. It can be replaced density-based method or coexisted with it, which

might be achieving by setting a benchmark to activate the function of Euclidean-distance

comparison rather than wait until the draw of density comparison. When the density

difference of classes is not that obvious, for example the density difference is less than 2,

density judgments are immediately given up and start using distance judging function. It

can also be adjusted by user to control the preference between density comparison and

distance comparison for specific problem.

The adjustment is hopefully to improve the accuracy a little, especially for some particular

cases. It enhances the function of Euclidean-distance comparison and is more flexible to

different issues. Of course the final result still needs to be verified. This will be compared

with the original approach in evaluation chapter.

3.2.2 Tuning slopes

When testing the performance of pervious version of this approach, lots of points were

found missing from classification, which severely affected the accuracy. Although missing

of points is sometimes inevitable, the points near the boundaries, which obviously belong

to a certain class have also been ignored since the slopes made by . Theoretically,

if we extend the slopes as “1” near the boundary (because no other classes in that direction),

some missing points would be classified correctly. This could simply improve accuracy

without creating more misclassified points. The advantage of this adjustment should be

klikv γ/1−

31


particularly obvious when there are only 2 or 3 classes but have multiple dimensions (such

as Wisconsin breast cancer dataset). This action is more similar with human actions that

can make some obvious decision on points when they are nor numerically apparent. The

modification will not affect the side of overlapping area, because in overlapping area the

situation of assigning those points remain unknown. This may imply that so, if many

classes exist rather than one or two, for example there are more than 10 classes, the

improvement could be very trivial.

Another modification on slope is the length of the slope (membership functions).

Previously, all slopes of classes were only determined by a sensitivity parameter γ .

Although for each application the sensitivity parameterγ can be set by the user, when

changingγ , all slopes equally extended a certain distance, regardless the size of the

hyperboxes. This could be unreasonable when the sizes of hyperboxes dramatically vary.

For example, a small hyperbox may have the generation area (related with slopes) that

twice of its original size, whilst another big hyperbox may have its generation area that

only one tenth of its original area. Generally speaking, a bigger hyperbox means its points

distribute more sparsely than others, consequently it needs bigger generation area.

One of the best solutions to this is tuning slopes according the size of the hyperbox. It

adjusts all extended distance for slopes from γ/1 to . That means all slopes

and generation areas are now related to the size of the hyperbox. It would give each

hyperbox a proper generation area rather than all hyperboxes keeping a fixed one. Figure

γ/lik

lik vV −

32


10 illustrates the modified version of membership functions.

liA

( )kli xm

kxlikv l

ikV

kx

ljAl

ijA

ljkv l

jkV

( )klj xm

k

lik

likl

ikvVv γ

−−k

ljk

ljkl

jkvVv γ

−−

k

lik

likl

ikvVV γ

−+k

ljk

ljkl

jkvVV γ

−+

Figure 10: Modified class boundaries and membership functions

33


4. Implementation

This chapter is to describe the implementation of proposed method by a programme. It

involves two sections: the first section introduces implementation background and software

environment. The second section describes specific programme logic and procedures, with

concrete examples of executing results.

4.1 Implementation background and software environment

There was a fuzzy classification system developed by I. Gadaras, which provided an

excellent general framework of all similar fuzzy systems as well as a well-developed fuzzy

java library. In this system, all fuzzy concepts were realised as objects. For example, fuzzy

sets, fuzzy values and fuzzy variables were designed as separated classes, which was very

flexible on applying to different applications. Although many modifications were necessary

to realise proposed method, the general framework and basic logic were inherited. There

are two major reasons for that: First, all effects of modifications would be easily verified

by comparing the performance of the original system. Second, it is also possible for future

developer to reuse part of the programme. Since new modifications need only change

related sections, the format and logic always remain clear. Then, more attention could be

paid on new theoretical improvements.

More than realising proposed method based on the previous framework, the current version

of system enabled to test approaches in a more general way. For example, it separated

34


training and testing datasets in order to evaluate under real world situations and make it

more convenient to examine different configurations. Also, the framework now was more

flexible for new theoretical modifications. A source code sample of modified system for

this dissertation can be found in Appendix 1.

The programme was based on object-oriented programming concept. Although it is

possible to write the programme in various languages, as the one of the most popular

programming language, Java (Jbuilder as the development tool) was employed for

realisation. This programme was also easily understandable and maintainable, and all the

theoretical parts of proposed method can be clearly identified in programme.

JDBC database was selected to store different testing datasets and separations. When

testing new datasets or configurations, only simple updating on table names was required.

Manipulation of these datasets and separations was also efficient by using SQL commands.

As many previous versions of similar classification systems used text file to store data, this

programme also allowed transforming between different formats of datasets.

4.2 Programme procedure and operating results

In this section, the programme realisation for each theoretical part is described. The

programme was written in the order of the proposed method. Although for each dataset the

programme slightly varied, the general logic and procedure were the same.

35


Step1: Initialisation

First of all, the programme imports all libraries needed. The most important one fuzzy

library “nrc.fuzzy.*” contains all classes for fuzzy concepts. Then the programme connects

database by administrative username and password. After that, the programme read all

training data into memory from database, for constructing initial hyperboxes.

Step2: Construct hyperboxes

After input all training data, the programme identifies both maximum and minimum value

by class in each dimension. It use Result set: “rs = stmt.executeQuery ("SELECT

MAX(attr_1) AS maxX, MAX(attr_2) AS maxY…” to keep these values. The programme

works as the same way as initial construction of hyperboxes when re-partitioning, until all

overlapping areas meet termination criteria. Fuzzy value can be set by programmers by any

numerical value as long as they can recognise output from operating results.

Step3: Extract rules from training data

In this stage, training data that had been read into memory generate fuzzy rules. Each

input-output pair of data transforms to a rule for future classification, and then fuzzy rule

class stores all rules by double arrays in the main memory. For example:

“double outxlow[] = {0, 1.5, 3};

double outylow[] = {0, 1, 0};

double outxmed[] = {3.5, 5, 6.5};

output.addTerm("low", outxlow, outylow, 3);”

36


This means that when the input belongs to 0-3, then the output belongs to 3.5-6.5. The

value of output is used for final classification. These rules can be either set by human

expert or automatically extract from training data. Now the training part of this programme

finishes.

Setp4: Obtain testing data

In order to examine the performance of the classification system, the programme read the

other part of the whole datasets. Testing data is introduced in the same way as training data

but without the class outputs, which will be classified by the system. Here, the programme

also categorises which inputs are in overlapping area and which are out of it.

Step5: Produce results of testing data

Rules generated from training data had been stored in the memory, and after inputs of

testing data arrive, the programme starts classifying. For each region, regardless it is

overlapping or non-overlapping area a fixed rule has already waited there. Every input

matches the rule assigned in the region, and is determined to the final output. These

classification results will be compared with original outputs.

Here is a concrete example of executing result:

Fuzzy Variable -> out [ 0, 10.0 ] unit(s)

Fuzzy Set -> { 0/3.5 0.15/3.73 0.15/6.27 0/6.5 }

34

37



Fuzzy Set -> { 0/3.5 0.5/4.25 0.5/5.75 0/6.5 }

46


Fuzzy Set -> { 0/0 0.75/1.12 0.75/1.88 0/3 }

1


Fuzzy Set -> { 0/0 0.75/1.12 0.75/1.88 0/3 }

2 …

The classification result for each testing data is showed in three rows: the first row defines

the total range of the Fuzzy Variable. “out [ 0, 10.0 ] unit(s)” means that the fuzzy value

could be a number between 0 and 10. The second row Fuzzy Set shows the result set: it

could be one of several sets. For instance, “{ 0/3.5 0.15/3.73 0.15/6.27 0/6.5 }” means that

it belongs to the set “0/3-0/6.5” or the intermediate class. Last row lists the ID of the

testing data, which is used for marking errors or statistics.

Step6: Show classification results and calculate the accuracy

By comparison the classified result and its original results, it is easy to identify whether

38


they are correctly classified and the general performance of the system. The classified

results are stored in memory and in order to compare, the programme imports the output

for each testing data from database. It finds all differences between classified and original

outputs, and then counts how many points that are misclassified. The system not only

calculates the accuracy by errors number, but also does it show the time for executing and

details of errors, which are used for evaluating performance and future analysis.

39


5. Evaluation

This chapter contains classification results for purposed method. These results are

investigated in detail with critical analysis. The performance of the system is evaluated

against two criteria: the pattern classification accuracy and the generation expressiveness.

Accuracy could be the most important issue to evaluate classification systems, since it is

the original reason for developing similar systems and the major direction for improving.

Expressiveness is also an essential consideration especially when dealing with a huge

amount of data. For example if a system generates too many fuzzy rules, it requires a long

executing time and consumes considerable computer resources, which might lead to an

unrealistic cost. Another issue worthy to be mentioned is the difference between artificial

testing environments and real world cases. That is to say, in real world cases the testing

data could be very different from training patterns, and consequently yields more errors.

Therefore, testing systems in an environment that similar with real world cases is crucial.

Two famous data sets were selected to test the proposed method and the system: Iris

Flower Dataset and Wisconsin Breast Cancer Dataset, which can properly illustrate the

accuracy and expensiveness performed in real world situation. Detailed results, discusses

and analysis can be found in following sections.

40


5.1 Training and testing data issue

Before evaluating the result for each dataset, general discussion about training and testing

data is necessary. Inevitably, the performance of classification systems is affected by

selection of training and testing data.

There are many controversies on how to select training data and testing data. Data

selection could significantly affect the classification result: if the training data are “good”

enough, all the testing data will be included in the initial hyperboxes, and even without

fuzzy inference the result could be accurate since only few points were outside of the

generation areas. On the other hand, if using very less or “poor” data for training, there will

be a great number of missing points, which lead to a low accuracy.

Basically, a fuzzy classification system should be general enough to adjust itself to

different situations and applications. So using equal amount and random data for training

and testing is a normal procedure. However in practice, when the users apply this fuzzy

system to a specific situation, it is realistic for them to select some typical case for training,

for the system working much better. For example, the user could make a separation of

training dataset which contains all maximum and minimum value in each dimension. If no

testing input locates outside of the initial hyperboxes boundaries, more adjustments are

able to concentrate on overlapping areas, and then achieve better accuracy.

In this dissertation, the proposed method was tested in both ways. Random data was used

41


for testing the general performance of the system, with also trying to use different

separations for datasets for training and testing to indentify the best performance of the

systems.

5.2 Iris dataset

Iris Flower Dataset or Fisher’s Iris Dataset [12] is a popular dataset testing pattern

classification. It is a four-dimensional dataset invented by Sir Ronald Aylmer Fisher,

containing 150 samples from three species of Iris flowers, including Iris setosa, Iris

virginica and Iris versicolor. The features are measured against the length and the width of

sepal and petal. Pattern classification systems utilise the combination of these for features

to determine which species they are.

As mentioned before, two types of configurations were employed for evaluating

performance of proposed method on Iris dataset, one for examining its average ability, the

other for examine its best performance. In first type of configuration, 75 samples (25 for

each class) had been chosen randomly for training and the other 75 for testing (25 for each

class). In order to compare it with similar systems, a separation of data applied by P. C.

Chen [2] was initially experimented (Table 1). This separation detailed can be found in

appendices as an example.

42


Table 1. Classification results for the first configuration.

Training Testing Randomly Correctly Number of Error Error

Sample size Sample size Selected? Classified Errors Rate Ids

Config1 75 75 YES 68 7 9.3%

28,58,68 69,70,63 53

In the first configuration, 2 testing points fell into the overlapping area, and they were

correctly classified. No.63 and No.53 which originally belong to Iris Versicolor were

misclassified to Iris Virginica. No.28, No.58, No.68, No.69 and No.70 were missing, which

means they were not classified at all. Different results could be found if tuning sensitivity

parameter γ , since the number of misclassified points and missing points may vary.

However in this configuration γ =1.5 was the optimum one, and because of the fixed

memberships set near boundaries, the effect of tuning slopes (by sensitivity parameterγ )

had been reduced. Rules generated can be found in Table 2.

Table 2. Rules generated from Iris dataset

Rule No. Sepal Width Sepal Lengt Petal Width Petal Length class

Rule 1 ’low’ ’high’ ’low’ ’low’ setosa

Rule 2 ’med’ ’low’ ’med’ ’med’ versicolor

Rule 3 ‘high’ ’med’ ‘high’ ‘high’ Virginica

Rule 4 ‘med- ’low- ‘med- ‘med- versicolor high’ mediu m’ high’ high’ or Virginica

43


In order to evaluate the average performance of proposed method on Iris, more

configurations had been tested. One of them was purely random selection (configuration 2)

and the other followed a simple numerical rules: No.1-25, No.51-75, No.100-125 for

training and the rest for testing (configuration 3).

Finally, as mentioned above, a careful selection of training data had also been tested to find

out the best performance of this proposed method (configuration 4). Although there were

only 20 training data, they contained all maximum and minimum value for each dimension.

Therefore, no point was allocated outside of the initial hyperboxes after training and no

point was missing from classification.

Table 3. More results for other configurations.

Training Testing Randomly Correctly Number of Error Error

Sample size Sample size Selected? Classified Errors Rate Ids

Config1 75 75 YES 68 7 9.3%

28,58,68 69,70,63 53

Config2 75 75 YES 71 4 5.2%

42,139,7178

Config3 75 75 No 69 6 8.0%

42,44,134135,84,78

Config4 20 130 No (careful selection)

129 1 0.7%

42

44


Configuration 2 achieved better accuracy than the first one. Only 4 errors: No.42 and

No.139 missing, and No.71 and No.78 which originally belong to Iris Versicolor were

misclassified to Iris Virginica. The optimum sensitivity parameter γ of these

configurations is 4. In configuration 3, No.84 and No.78 were misclassified as the same

situation as config2, but just more points were missing. After carefully selection of data,

the accuracy achieved an extreme high level: for 130 testing data, only No.42 was missing,

6 points in overlapping area were perfectly decided. This also proved the importance of

data selection, so paying attention when training data in real applications is really

desirable.

5.3 Wisconsin Breast Cancer dataset

Wisconsin Breast Cancer Dataset is invented by Medical College of Wisconsin and has

already been widely used for the evaluation of pattern classification systems. It contains

nine attributes describing blood ingredients (These are: Clump Thickness (UC),

Uniformity of Cell Size (UC), Uniformity of Cell Shape (UC), Marginal Adhesion (MA),

Single Epithelial Cell Size (SE), Bare Nuclei (BN), Bland Chromatin (BC), Normal

Nucleoli (NN), Mitoses (Mit) ) with two output classes for the nature of cancer: benign or

malign. This dataset has 699 observations with 16 cases are deliberately excluded for in

complete description of attributes. 444 cases of the total 683 belong to benign class and the

other 239 to malign. 252 of all cases are placed in the overlapping area. It is a fairly

multi-dimensional dataset that two classes coincide expansively.

45


Same as the evaluation on Iris dataset, three investigations were conducted: two for testing

average performance, and one for testing the maximum ability for the proposed method on

Wisconsin Breast Cancer dataset. All configurations used 340 samples for training and 343

samples for testing, but the training data for last configuration were carefully selected. In

database there are nine columns for attributes and one for output class: benign or

malignant.

Table 4. Rules generated from Wisconsin Breast Cancer dataset

Rule CT UC UC MA SE BN BC NN Mit. Class

R1 low low low low low low low low low benign

R2 high high high high high high high high high malign

R 3.1 low-

med

low-

med

low-

med

low-

med

low-

med

low-

med

low-

med

low-

med

low-

med

benign

malign R 3.2 med-

high

med-

high

med-

high

med-

high

med-

high

med-

high

med-

high

med-

high

med-

high

Table 4 showed the rules generated from training data after two iterations. The number of

iterations can be manually controlled by the user, who set suitable stopping parameter Th1

or Th2. In this experiment two iterations were set for following reasons: firstly, after two

iterations there had no explicit improvements for the accuracy but might lead higher cost of

producing rules. Secondly, it would be easy to compare with the pervious versions of

46


proposed method, for explicitly identify the advantages of current version.

In the first configuration, all the training data were selected referencing their ids: extracting

all odd id numbers as training data and all even id numbers as testing data. This is a

convenient way to separate dataset which equals the random separation. Configuration 2

took pure random separation to present normal performance. In configuration 3, training

data had been selected in the same way as configuration 4 of Iris data, which contained all

maximum and minimum value to optimise initial hyperboxes.

Table 5. Results for different configurations of Wisconsin Breast Cancer dataset.

Training Testing Randomly Correctly Number of Error

Sample size Sample size Selected? Classified Errors Rate

Config1 340 343 YES 330 13 3.79%

Config2 340 343 YES 329 14 4.0%

Config4 340 343 No (careful selection)

339 4 1.17%

The accuracy of configuration 1 was 96.21% with 13 points were misclassified in the

second iteration overlapping area. Because there were only two classes in this experiment,

and we modified the membership near the boundary, therefore no point was missing. The

47


second configuration made by entirely random selection, achieved worse but similar

accuracy 96.0%. It showed the average accuracy was around 96.10%. Configuration 3

proved the importance of selection again: only 4 points found misclassified. After careful

choosing training data, the accuracy increased to 98.83%.

All of the testing were experimented under a situation that similar with real world cases

and utilised totally different data for training and testing. Sensitivity parameterγ was set

as 3, as it was the optimum for these tests.

5.4 Comparisons with other methods and analysis

First, results of various methods on Iris dataset were compared. These comparisons with

classification systems using other methods or approaches clearly proved the performance

of the proposed method (Table 6).

Table 6. Comparative results of several classification systems on Iris dataset

Approaches Training No. Testing No. Error

No.

Error Rate

Bayes Classifier 75(selected) 75(selected) 2 2.6%

Fuzzy k-NN 36(random) 36(random) 4 11.0%

K-nearest neighbour 75(random) 75(random) 4 5.2%

Fuzzy Perceptron Whole set Whole set 2 2.6%

Abe and Lan’s Method 75(random) 75(random) 7 9.3%

Proposed Method 75(random) 75(random) 4 5.2%

Proposed Method* 20(selected) 130(selected)) 1 0.7%

48


The result of proposed method on Iris achieved a low error rate 5.2%. This was same with

k-nearest neighbour approach, and better than Fuzzy k-NN approach [15] and Abe and

Lan’s method [6], which also based on overlapping approach. Although Fuzzy Perception

only had 2 errors in 75 testing data, it used whole set for training and testing, which means

the training data could be the best set. The same as Fuzzy Perception, Bayes Classifier

actually selected training and testing data but did not provide information about how these

75 training selected. The last row on Table 6 showed that after careful selection, the

proposed system could achieve very high accuracy, and only one was misclassified. In

previous system using similar approach, which developed by I. Gadaras and L. Mikhailov

[9], the best performance was 2 errors for 75 testing data (2.7% error rate). This

improvement in current version proved those modifications actually increased accuracy.

Because we employed the original way to generate rules, so there was no increase in

expensiveness without complex mathematical calculation. Furthermore, the proposed

method did not need initial partition of input space nor any priori knowledge like Bayes

Classifier. This enabled the method to be easily applied in a more general environment.

For Wisconsin Breast Cancer dataset, we also compared with other methods in the same

way of comparing Iris. Several results of other research listed in Table 7, with its average

ability and best performance. All methods listed below were eminent research in the realm:

an evolutionary method VISIT suggested by Chang and Lilly [16], a neuro-fuzzy approach

NEFCLASS by Nauck and Kruse [17], an alternative technique based on decision trees

49


initialisation by Abonyi and Szeifert’s [18] and Gadaras and Mikhailov’s method [9] which

is the pre-existence of proposed method.

Table 7. Comparative results of systems on Wisconsin Breast Cancer dataset

Approaches Training No. Testing No. Accuracy

VISIT, Chang & Lilly’s 400(random) 283(random) 96.50%

NEFCLASS Whole Set Whole Set 95.06%

Abonyi & Szeifert’s Method 342(random) 341(random) 95.57%

Gadaras and Mikhailov’s Method 340(random) 343(random) 96.08%

Proposed Method 340(random) 343(random) 96.10%

Proposed Method* 340(selected) 343(selected) 98.83%

For average ability, the accuracy of proposed method outperformed NEFCLASS, Abonyi

& Szeifert’s Method and pervious Gadaras and Mikhailov’s Method. The advantages were

even more obvious since NEFCLASS required long training period and more than 10

conditions for rule pruning, and in Abonyi & Szeifert’s Method the parameter initialization

and 3-4 conditions were needed. VISIT achieved higher accuracy than proposed method,

however 400 data were set as training data and it required initialization of membership

functions, which lead to more than 100 learning iterations. The average accuracy of

Gadaras and Mikhailov’s Method was 96.10% and maximum performance of that was

97.12% (careful selection of training data). The proposed method exceeded in both

situations. Gadaras and Mikhailov’s Method trained very fast and required no prior

50


knowledge, which had the same advantages on expensiveness with proposed method. The

modifications and improvements of his method reflected on accuracy.

It is observable that classification results of proposed method were comparable to others

modern eminent approaches, in both accuracy and expensiveness perspectives, especially

in high dimensional cases. The method required only few parameters which are flexible for

the user to adjust according to specific situations and their needs. When proper training

data obtained, the system can even achieve a highly accurate level based on proposed

method.

Meanwhile, when testing the proposed method, we also observed that pure Euclidean

distance based algorithm nearly achieved the same accuracy as density based algorithm.

However, in the test of Wisconsin Breast Cancer dataset, which has nine dimensions,

calculation of Euclidean distance consumed a little more time than just simply comparing

densities in overlapping area. Accuracy of different algorithms also varies for different

separations of dataset, so maybe algorithm selection ought to be related to specific cases.

Overall speaking, the shift between these two algorithms did not apparently affect accuracy.

Moreover, iteration in overlapping area was a great idea to obtain accurate result,

especially when the number of data was huge and proper amount of them fall into

overlapping area.

However two situations might not be perfectly suitable for using this approach: first, if

51


there were only few points allocated in overlapping areas, iterations seemed redundant and

constructing new hyperboxes consumed extra resource; second, if the distribution were

highly separative, this overlapping approach might not be efficient enough to deal with

non-overlapping situations. The first problem could be solved by careful controlling of

terminative parameters or even adding new criteria to stop iteration under certain

classification cases. The second problem of data distribution is a general problem for all

classification systems. There is also an interesting approach of non-overlapping methods

being researched by L. Mikhailov [19], which is an alternative approach that might be

suitable for other certain situations.

52


6. Conclusion and Future work

In this dissertation a fuzzy system using hierarchical overlapping approach was realised.

The theoretical method of this system was based on an approach proposed by I. Gadaras

and L. Mikhailov, which enables to directly extract fuzzy rules from numerical data. The

system was developed by Java programming language with JDBC database. All testing

results of the system were evaluated by eminent datasets: Fisher Iris dataset and Wisconsin

Breast Cancer dataset. Comparison with other methods and similar systems were also

provided, followed by comparative analysis.

The major achievements of this dissertation were: Firstly, Euclidean distance based

calculation algorithm was realised as an alternative of density based one. Results showed

that in some certain cases, this approach could achieve slightly better accuracy than density

based version. This improved accuracy a little but not generally enough, and if the number

of data were very large then the calculation of Euclidean distance could be costly. Secondly,

a modification of membership that near the boundaries also successfully applied. This

considerably rectified the situation of missing points, which helped to decrease error rate.

It was especially useful when there were huge amount of data with only few classes.

Thirdly, tuning slopes according to the size of the hyperbox was proven effective for

improving accuracy, which was a noticeable advancement of the I. Gadaras and L.

Mikhailov’s method. Moreover, this system inherited the pervious advantage of low

expensiveness, since no prior knowledge for initialisation was required.

53


Due to the complexity of classification problems themselves, it is very difficult to obtain a

system that is general enough to suit various situations. The pattern classification system

developed in this dissertation had proven comparative good performance, but its result

actually did rely on data distribution to some extent. For example, if the data located very

loosely or only few of them fell into overlapping area, then other methods such as

non-overlapping method might have better performance. Moreover, iteration approach was

suitable for heavily overlapped situations, and could not guarantee improvement on

accuracy for all cases. Stopping criteria of iteration partially solved this problem, but the

user still had to modify it for each specific application.

Future research on this subject remains desirable. Deeper analysis in various situations

could help to produce better achievements. One direction would be trying to propose a

creative approach or algorithm that may widely suit different circumstances, or at least be

able to reduce influence by data distribution; another direction would be integrating all

different approaches in one system, which can manually or automatically select algorithm

for specific application. This could be achieved by syncretising pervious classification

systems, controlled under a proper UI interface.

54


References

[1] C. Bishop, Neural networks for pattern recognition. Oxford University Press, 1995.

[2] P. C. Chen, Fuzzy Approach for Pattern Classification. Dissertation submitted to

University of Manchester, 1999.

[3] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, 1990.

[4] A. Jain, P. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, 2000.

[5] K. S. Fu, Syntactic Pattern Recognition and Applications. Prentice-Hall, 1982.

[6] S. Abe, M.S. Lan, “Fuzzy Rules Extraction Directly from Numerical Data for Function

Approximation,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 25, no. 1,

January 1995.

[7] V. N. Vapnik, Statistical Learning Theory. New York: John Wiley & Sons, 1998.

[8] L. I. Perlovsky, “Conundrum of Combinatorial Complexity,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 20, no. 6, pp. 666-670, 1998.

[9] I. Gadaras, L. Mikhailov, “Generation of Fuzzy Classification Rules Directly from

Overlapping Input Partitioning,” Fuzzy IEEE Conference, London, pp.1-6, 2007.

[10] J. Abonyi, F. Szeifert, “Supervised fuzzy clustering for the identification of fuzzy

classifers,” Elsevier Pattern Recornision Letters, vol. 24, 2195-2207, 2003.

[11] M. Setnes and H. Roubos, “GA-fuzzy modelling and classification: Complexity and

Performance,” IEEE Transactions on Systems, vol. 8, pp. 509-522, Oct. 2000.

[12] R. A. Fisher, The Use of Multiple Measurements in Taxonomic Problems. Annals of

55


Eugenics 7: 179–188. Cambridge University Press, 1936.

[13] L. Wang, J. M. Mendel, “Generating Fuzzy Rules by Learning from Examples,” IEEE

Transactions on Systems, Man and cybernetics, vol. 22, pp. 1414-1427, 1992.

[14] S. Abe, M. Lan, ”Fuzzy Rules Extraction Directly from Numerical Data for Function

Approximation,” IEEE Transactions on Systems, Man and cybernetics, vol.25, no.1,

119-229, 1995.

[15] J. M. Keller, M. R. Gray and J. A. Givens, “A fuzzy k-nearest neighbor algorithm”,

IEEE Transactions on System, Man, and Cybernetics, CMC-15, pp. 580-585, 1985.

[16] X. Chang, J. H. Lilly, “Evolutionary Design of Fuzzy Classifier from Data,” IEEE

Transactions on Systems, Man and Cybernetics, vol.34, pp. 1894-1906, 2004.

[17] D. Nauck, R. Kruse, “Obtaining interpretable fuzzy classification rules from medical

data,” Artificial Intelligence in Medicine, vol.16, pp. 149-169, 2006.

[18] J. Abonyi, F. Szeifert, “Supervised fuzzy clustering for the identification of fuzzy

classifiers,” Elsevier Pattern Recognition Letters, vol.24, pp. 2195-2207, 2003.

[19] L. Mikhailov, “Generation of Fuzzy Classification Rules by Non-Overlapping Input

Partitioning,” IEEE International Symposium on Evolving Fuzzy Systems EFS'06, Lake

District, UK, 7-9, pp.365-369, 2006.

56


Appendix 1: Sample source code

HyperBoxM.class

///////////////////////////////////////////////////////////////////////////////////

//construct the hyperbox

///////////////////////////////////////////////////////////////////////////////////

public class HyperBoxM

{

private double xmin, xmax, ymin, ymax, zmin, zmax, wmin, wmax;

public HyperBoxM(double xmin, double xmax, double ymin, double ymax, double zmin, double

zmax, double wmin, double wmax)

{

this.xmin = xmin;

this.xmax = xmax;

this.ymin = ymin;

this.ymax = ymax;

this.zmin = zmin;

this.zmax = zmax;

this.wmin = wmin;

this.wmax = wmax;

}

///////////////////////////////////////////////////////////////////////////////////

//Obtain Minimum value in each dimension

///////////////////////////////////////////////////////////////////////////////////

public double getMinimum(int dimension)

{

double min = 0.0;

switch( dimension )

{

case 1:

min = xmin;

break;

case 2:

min = ymin;

break;

case 3:

min = zmin;

break;

57


case 4:

min = wmin;

break;

default:

min = 0.0;

}

return min;

}

///////////////////////////////////////////////////////////////////////////////////

//Obtain Maximum value in each dimension

///////////////////////////////////////////////////////////////////////////////////

public double getMaximum(int dimension)

{

double max = 0.0;

switch( dimension )

{

case 1:

max = xmax;

break;

case 2:

max = ymax;

break;

case 3:

max = zmax;

break;

case 4:

max = wmax;

break;

default:

max = 0.0;

}

return max;

}

///////////////////////////////////////////////////////////////////////////////////

//Obtain centre for calculation of Euclidean distance

///////////////////////////////////////////////////////////////////////////////////

public double getCentre(int dimension)

{

double centre = 0.0;

58


switch (dimension)

{

case 1:

centre = (xmax + xmin) / 2;

break;

case 2:

centre = (ymax + ymin) / 2;

break;

case 3:

centre = (zmax + zmin) / 2;

break;

case 4:

centre = (wmax + wmin) / 2;

break;

default:

centre = 0.0;

}

return centre;

}

}

59


classificationTestM.class

///////////////////////////////////////////////////////////////////////////////////

//Imports

///////////////////////////////////////////////////////////////////////////////////

import java.sql.*;

import java.util.*;

import java.lang.*;

import nrc.fuzzy.*;

///////////////////////////////////////////////////////////////////////////////////

//select separation and set sensitivity parameter

///////////////////////////////////////////////////////////////////////////////////

public class classificationTestM

{

private final static String tableName = "iris_Train";

private final static String tableName2 = "iris_Test";

private double r = 2;

///////////////////////////////////////////////////////////////////////////////////

//generate hyperboxes

///////////////////////////////////////////////////////////////////////////////////

public boolean hyperboxGeneration()

{

HyperBoxM hBox1 = null, hBox2 = null, hBox3 = null;

rs = stmt.executeQuery("SELECT MAX(attr_1) AS maxX, MAX(attr_2) AS maxY, MAX(attr_3) AS

maxZ, MAX(attr_4) AS maxW, MIN(attr_1) AS minX, MIN(attr_2) AS minY, MIN(attr_3) AS minZ,

MIN(attr_4) AS minW FROM " + tableName + " WHERE out = 1");

if( rs.next() )

{

hBox1 = new HyperBoxM(rs.getDouble("minX"), rs.getDouble("maxX"), rs.getDouble("minY"),

rs.getDouble("maxY"), rs.getDouble("minZ"), rs.getDouble("maxZ"), rs.getDouble("minW"),

rs.getDouble("maxW"));

rs.close();

}




if( rs.next() )

60


{




rs.close();

}




if( rs.next() )

{




rs.close();

}

stmt.close();

///////////////////////////////////////////////////////////////////////////////////

//set fuzzy varibles

///////////////////////////////////////////////////////////////////////////////////

FuzzyVariable attribute1 = new FuzzyVariable("attr_1", 0, 10, "unit(s)");




FuzzyVariable output = new FuzzyVariable("out", 0, 10, "unit(s)");

///////////////////////////////////////////////////////////////////////////////////

//generate membership functions

///////////////////////////////////////////////////////////////////////////////////

double d1xlow[] = {0, 0, (hBox1.getMaximum(1)), (hBox1.getMaximum(1) +

(hBox1.getMaximum(1)-hBox1.getMinimum(1)) / r)};

double d1ylow[] = {0, 1, 1, 0};

double d1xmed[] = {(hBox2.getMinimum(1) - ((hBox2.getMaximum(1)-hBox2.getMinimum(1))

/ r)), hBox2.getMinimum(1), hBox3.getMinimum(1), hBox2.getMaximum(1)};

double d1ymed[] = {0, 1, 1, 0};

double d1xhigh[] = {hBox3.getMinimum(1), hBox2.getMaximum(1), 10, 10};

double d1yhigh[] = {0, 1, 1, 0};

attribute1.addTerm("low", d1xlow, d1ylow, 4);

attribute1.addTerm("medium", d1xmed, d1ymed, 4);

attribute1.addTerm("high", d1xhigh, d1yhigh, 4);

61


double d2xlow[] = {0, 0, hBox3.getMinimum(2), hBox2.getMaximum(2) +

(hBox2.getMaximum(2)-hBox2.getMinimum(2))/r};

double d2ylow[] = {0, 1, 1, 0};

double d2xmed[] = {hBox3.getMinimum(2), hBox2.getMaximum(2), hBox3.getMaximum(2),

hBox3.getMaximum(2)};

double d2ymed[] = {0, 1, 1, 0};

double d2xhigh[] = {hBox1.getMinimum(2) - (hBox1.getMaximum(2)-hBox1.getMinimum(2)) /

r, hBox1.getMaximum(2), 10, 10};

double d2yhigh[] = {0, 1, 1, 0};




double d3xlow[] = {0, 0, (hBox1.getMaximum(3)), hBox1.getMaximum(3)+

(hBox1.getMaximum(3)-hBox1.getMinimum(3))/r };

double d3ylow[] = {0, 1, 1, 0};

double d3xmed[] = {(hBox2.getMinimum(3) -

(hBox2.getMaximum(3)-hBox2.getMinimum(3))/r), hBox2.getMinimum(3), hBox3.getMinimum(3),


double d3ymed[] = {0, 1, 1, 0};


double d3yhigh[] = {0, 1, 1, 0};




double d4xlow[] = {0, 0, (hBox1.getMaximum(4)-hBox1.getMinimum(4))/r,


double d4ylow[] = {0, 1, 1, 0};

double d4xmed[] = {(hBox2.getMinimum(4) - (hBox2.getMaximum(4)-hBox2.getMinimum(4))/

r), hBox2.getMinimum(4), hBox3.getMinimum(4), hBox2.getMaximum(4)};

double d4ymed[] = {0, 1, 1, 0};


double d4yhigh[] = {0, 1, 1, 0};




///////////////////////////////////////////////////////////////////////////////////

//set fuzzy values

///////////////////////////////////////////////////////////////////////////////////

62


double outxlow[] = {0, 1.5, 3};

double outylow[] = {0, 1, 0};

double outxmed[] = {3.5, 5, 6.5};

double outymed[] = {0, 1, 0};

double outxhigh[] = {7, 8.5, 10};

double outyhigh[] = {0, 1, 0};

output.addTerm("low", outxlow, outylow, 3);

output.addTerm("medium", outxmed, outymed, 3);

output.addTerm("high", outxhigh, outyhigh, 3);

///////////////////////////////////////////////////////////////////////////////////

//generate fuzzy rules

///////////////////////////////////////////////////////////////////////////////////

FuzzyRule lhll = new FuzzyRule();

FuzzyRule mlmm = new FuzzyRule();

FuzzyRule hmhh = new FuzzyRule();

lhll.addAntecedent(new FuzzyValue(attribute1, "low"));

lhll.addAntecedent(new FuzzyValue(attribute2, "high"));



lhll.addConclusion(new FuzzyValue(output, "low"));

mlmm.addAntecedent(new FuzzyValue(attribute1, "medium"));

mlmm.addAntecedent(new FuzzyValue(attribute2, "low"));



mlmm.addConclusion(new FuzzyValue(output, "medium"));

hmhh.addAntecedent(new FuzzyValue(attribute1, "high"));

hmhh.addAntecedent(new FuzzyValue(attribute2, "medium"));



hmhh.addConclusion(new FuzzyValue(output, "high"));

int overIds = 0, nonOverIds = 0,overIds2 = 0, overIds3 = 0;

Vector<Integer> overlapIds = new Vector<Integer>();

Vector<Integer> nonOverlapIds = new Vector<Integer>();

63


///////////////////////////////////////////////////////////////////////////////////

//Obtain testing data (overlapping and non-overlapping points)

///////////////////////////////////////////////////////////////////////////////////

rs = stmt.executeQuery("SELECT count(id) as cnt FROM " + tableName2 + " WHERE (attr_1

>= " + hBox3.getMinimum(1) + " AND attr_1 <= " + hBox2.getMaximum(1) + ") AND " +

"(attr_2 > " +

hBox3.getMinimum(2) + " AND attr_2 < " + hBox2.getMaximum(2) + ") AND "+

"(attr_3 > " +


"(attr_4 > " +

hBox3.getMinimum(4) + " AND attr_4 < " + hBox2.getMaximum(4) + ") AND out=2");

if( rs.next() )

overIds2 = rs.getInt("cnt");

rs.close();



"(attr_2 > " +


"(attr_3 > " +


"(attr_4 > " +

hBox3.getMinimum(4) + " AND attr_4 < " + hBox2.getMaximum(4) + ") AND out=3");

if( rs.next() )

overIds3 = rs.getInt("cnt");

rs.close();

//overlapping points



"(attr_2 > " +


"(attr_3 > " +


"(attr_4 > " +

hBox3.getMinimum(4) + " AND attr_4 < " + hBox2.getMaximum(4) + ")");

if( rs.next() )

overIds = rs.getInt("cnt");

rs.close();

rs = stmt.executeQuery("SELECT id FROM " + tableName2 + " WHERE (attr_1 >= " +

hBox3.getMinimum(1) + " AND attr_1 <= " + hBox2.getMaximum(1) + ") AND " +

64


"(attr_2 > " +


"(attr_3 > " +


"(attr_4 > " +

hBox3.getMinimum(4) + " AND attr_4 < " + hBox2.getMaximum(4) + ")");

while(rs.next())

{

overlapIds.addElement( rs.getInt("id") );

}

rs.close();

FuzzyValueVector fvv1, fvv2, fvv3;

///////////////////////////////////////////////////////////////////////////////////

//classify each point in overlapping area

///////////////////////////////////////////////////////////////////////////////////

for(int i = 0; i < overIds; i++)

{

rs = stmt.executeQuery("SELECT id, attr_1, attr_2, attr_3, attr_4 FROM " + tableName2

+ " WHERE id = " + overlapIds.elementAt(i));

if( rs.next() )

{

double a = rs.getDouble("attr_1");

double b = rs.getDouble("attr_2");

double c = rs.getDouble("attr_3");

double d = rs.getDouble("attr_4");

FuzzyValue at1FV = new FuzzyValue(attribute1, new SingletonFuzzySet(a));

FuzzyValue at2FV = new FuzzyValue(attribute2, new SingletonFuzzySet(b));

FuzzyValue at3FV = new FuzzyValue(attribute3, new SingletonFuzzySet(c));

FuzzyValue at4FV = new FuzzyValue(attribute4, new SingletonFuzzySet(d));

rs.close();

double c21 = hBox2.getCentre(1);






65




///////////////////////////////////////////////////////////////////////////////////

//calculate Euclidean distance and match rule

///////////////////////////////////////////////////////////////////////////////////

if (distance(a, b, c, d, c21, c22, c23, c24) <= distance(a, b, c, d, c31, c32, c33,

c34))

{

mlmm.removeAllInputs();

mlmm.addInput(at1FV);




if( mlmm.testRuleMatching() )

{

fvv2 = mlmm.execute();

System.out.println(fvv2.fuzzyValueAt(0));

System.out.println(overlapIds.elementAt(i));

}

}

else

{

hmhh.removeAllInputs();

hmhh.addInput(at1FV);




if( hmhh.testRuleMatching() )

{

fvv3 = hmhh.execute();


System.out.println(overlapIds.elementAt(i));

}

}

}//end if

}//end for overlapping

///////////////////////////////////////////////////////////////////////////////////

//deal with points in non-overlapping area

///////////////////////////////////////////////////////////////////////////////////


66


<= " + hBox3.getMinimum(1) + " or attr_1 >= " + hBox2.getMaximum(1) + ") or " +

"(attr_2 <= " +

hBox3.getMinimum(2) + " or attr_2 >= " + hBox2.getMaximum(2) + ") or "+

"(attr_3 <= " +


"(attr_4 <= " +

hBox3.getMinimum(4) + " or attr_4 >= " + hBox2.getMaximum(4) + ")");

if( rs.next() )

nonOverIds = rs.getInt("cnt");

rs.close();

rs = stmt.executeQuery("SELECT id FROM " + tableName2 + " WHERE (attr_1 <= " +

hBox3.getMinimum(1) + " or attr_1 >= " + hBox2.getMaximum(1) + ") or " +

"(attr_2 <= " +


"(attr_3 <= " +


"(attr_4 <= " +

hBox3.getMinimum(4) + " or attr_4 >= " + hBox2.getMaximum(4) + ")");

while(rs.next())

{

nonOverlapIds.addElement( rs.getInt("id") );

}

rs.close();

for(int j = 0; j < nonOverIds; j++)

{

rs = stmt.executeQuery("SELECT id, attr_1, attr_2, attr_3, attr_4 FROM " + tableName2

+ " WHERE id = " + nonOverlapIds.elementAt(j));

if( rs.next() )

{

FuzzyValue at1FV = new FuzzyValue(attribute1, new

SingletonFuzzySet(rs.getDouble("attr_1")));







rs.close();

lhll.removeAllInputs();

67


lhll.addInput(at1FV);




if( lhll.testRuleMatching() )

{

fvv1 = lhll.execute();


System.out.println(nonOverlapIds.elementAt(j));

}

mlmm.removeAllInputs();





if( mlmm.testRuleMatching() )

{

fvv2 = mlmm.execute();



}

hmhh.removeAllInputs();





if( hmhh.testRuleMatching() )

{

fvv3 = hmhh.execute();



}

}

}//end for non overlap

///////////////////////////////////////////////////////////////////////////////////

//detect existence of overlapping area

///////////////////////////////////////////////////////////////////////////////////

private boolean existsOverlapping(HyperBoxM box1, HyperBoxM box2)

{

if( box1.getMaximum(1) >= box2.getMinimum(1) )


68




return true;

return false;

}

///////////////////////////////////////////////////////////////////////////////////

//calculate Euclidean distance

///////////////////////////////////////////////////////////////////////////////////

public double distance(double a, double b, double c, double d, double a1, double b1,

double c1, double d1)

{

double dis = 0;

dis = Math.sqrt(Math.abs(a - a1) * Math.abs(a - a1) + Math.abs(b - b1) * Math.abs(b

- b1) + Math.abs(c - c1) * Math.abs(c - c1) + Math.abs(d - d1) * Math.abs(d - d1));

return dis;

}

///////////////////////////////////////////////////////////////////////////////////

//main

///////////////////////////////////////////////////////////////////////////////////

public static void main(String[] args)

{

classificationTestM calgo = new classificationTestM("CAlgoDTB");

if( calgo.hyperboxGeneration() )

System.out.println("[+] Finish");

}

69


Appendix 2: A sample separation of Iris dataset

Training data

*species: 1 for setosa, 2 for versicolor, 3for virginica

id species Petal width Petal length Sepal width Sepal length

1 1 0.2 1.4 3.3 5

2 1 0.2 1.6 3.1 4.8

3 1 0.2 1.3 3.2 4.4

4 1 0.2 1.4 3 4.9

5 1 0.4 1.5 3.4 5.4

6 1 0.2 1.4 4.2 5.5

7 1 0.2 1.4 2.9 4.4

8 1 0.1 1.4 3 4.8

9 1 0.4 1.5 3.7 5.1

10 1 0.2 1.3 3 4.4

11 1 0.2 1.6 3.2 4.7

12 1 0.1 1.1 3 4.3

13 1 0.2 1.4 3.5 5.1

14 1 0.4 1.6 3.4 5

15 1 0.2 1.3 3.2 4.7

16 1 0.2 1.5 3.4 5.1

17 1 0.1 1.5 3.1 4.9

18 1 0.2 1.5 3.7 5.4

19 1 0.3 1.3 2.3 4.5

20 1 0.3 1.5 3.8 5.1

21 1 0.2 1.5 3.5 5.2

22 1 0.6 1.6 3.5 5

23 1 0.2 1.4 3.2 4.6

24 1 0.2 1.5 3.1 4.6

25 1 0.2 1.5 3.7 5.3

26 2 1.3 4.5 2.8 5.7

27 2 1.2 4 2.6 5.8

28 2 1 4.1 2.7 5.8

29 2 1.5 4.5 2.9 6

30 2 1 3.3 2.4 4.9

31 2 1.5 4.2 3 5.9

32 2 1.5 4.9 2.5 6.3

33 2 1.4 4.4 3 6.6

34 2 1.1 3.9 2.5 5.6

35 2 1.5 4.5 3 5.4

70


36 2 1 3.5 2.6 5.7

37 2 1.3 4.2 2.7 5.6

38 2 1.3 5.4 2.9 6.2

39 2 1.2 4.7 2.8 6.1

40 2 1.3 4.1 2.8 5.7

41 2 1.5 4.9 3.1 6.9

42 2 1.3 4 2.5 5.5

43 2 1.5 4.6 2.8 6.5

44 2 1.8 4.8 3.2 5.9

45 2 1.3 4 2.8 6.1

46 2 1.1 3.8 2.4 5.5

47 2 1.2 4.2 3 5.7

48 2 1.3 5.6 2.9 6.6

49 2 1.5 4.7 3.1 6.7

50 2 1.3 4 2.3 5.5

51 3 2.4 5.6 3.1 6.7

52 3 1.9 5.1 2.7 5.8

53 3 1.9 5 2.5 6.3

54 3 1.8 4.9 2.7 6.3

55 3 1.5 5 2.2 6

56 3 2 4.9 2.8 5.6

57 3 1.8 5.8 2.5 6.7

58 3 2.1 5.4 3.1 6.9

59 3 2.1 5.5 3 6.8

60 3 1.5 5.1 2.8 6.3

61 3 2.3 5.9 3.2 6.8

62 3 2.5 5.7 3.3 6.7

63 3 2.1 5.7 3.3 6.7

64 3 1.8 4.8 3 6

65 3 1.8 5.5 3 6.5

66 3 2.1 6.6 3 7.6

67 3 1.8 6 3.2 7.2

68 3 2 6.7 2.8 7.7

69 3 1.4 5.6 2.6 6.1

70 3 2.4 5.6 3.4 6.3

71 3 1.6 5.8 3 7.2

72 3 2.3 6.9 2.6 7.7

73 3 1.9 6.1 2.8 7.4

74 3 2.2 5.8 3 6.5

75 3 2 5 2.5 5.7

71


Testing data

*species: 1 for setosa, 2 for versicolor, 3for virginica

id species Petal width Petal length Sepal width Sepal length

1 1 0.2 1 3.6 4.6

2 1 0.1 1.4 3.6 4.9

3 1 0.2 1.6 3.8 5.1

4 1 0.2 1.6 3 5

5 1 0.4 1.9 3.8 5.1

6 1 0.2 1.4 3.6 5

7 1 0.3 1.7 3.8 5.7

8 1 0.2 1.3 3.5 5.5

9 1 0.2 1.2 3.2 5

10 1 0.1 1.5 4.1 5.2

11 1 0.2 1.5 3.1 4.9

12 1 0.4 1.7 3.9 5.4

13 1 0.4 1.3 3.9 5.4

14 1 0.3 1.4 3.4 4.6

15 1 0.5 1.7 3.3 5.1

16 1 0.2 1.4 3.4 5.2

17 1 0.2 1.2 4 5.8

18 1 0.2 1.5 3.4 5.2

19 1 0.3 1.3 3.5 5

20 1 0.3 1.4 3.5 5.1

21 1 0.2 1.7 3.4 5.4

22 1 0.2 1.6 3.4 4.8

23 1 0.4 1.5 4.4 5.7

24 1 0.3 1.4 3 4.8

25 1 0.2 1.9 3.4 4.8

26 2 1 3.3 2.3 5

27 2 1.6 4.7 3.3 6.3

28 2 1.4 4.7 3.2 7

29 2 1.4 3.9 2.7 5.2

30 2 1.2 3.9 2.7 5.8

31 2 1.3 4.4 2.3 6.3

32 2 1.1 3 2.5 5.1

33 2 1.3 3.6 2.9 5.6

34 2 1.7 5 3 6.7

35 2 1.5 4.5 2.2 6.2

36 2 1.4 4.6 3 6.1

37 2 1.5 4.5 3.2 6.4

38 2 1.4 4.4 3.1 6.7

39 2 1.3 4.2 2.6 5.7

72


40 2 1.6 4.5 3.4 6

41 2 1 3.5 2 5

42 2 1 5 2.2 6

43 2 1.4 4.8 2.8 6.8

44 2 1.2 4.4 2.6 5.5

45 2 1.5 4.5 3 5.6

46 2 1.6 5.1 2.7 6

47 2 1.4 4.7 2.9 6.1

48 2 1 3.7 2.4 5.5

49 2 1.3 4.1 3 5.6

50 2 1.3 4.3 2.9 6.4

51 3 2.3 5.1 3.1 6.9

52 3 2 5.2 3 6.5

53 3 1.7 4.5 2.5 4.9

54 3 2.1 5.6 2.8 6.4

55 3 1.9 5.1 2.7 5.8

56 3 1.8 5.5 3.1 6.4

57 3 2.3 5.7 3.2 6.9

58 3 2.5 6.1 3.6 7.2

59 3 2.2 5.6 2.8 6.4

60 3 2.3 5.4 3.4 6.2

61 3 1.8 5.1 3 5.9

62 3 2.3 5.3 3.2 6.4

63 3 1.3 5.2 3 6.7

64 3 1.8 4.9 3 6.1

65 3 2.3 6.1 3 7.7

66 3 2 5.1 3.2 6.5

67 3 2.5 6 3.3 6.3

68 3 2.2 6.7 3.8 7.7

69 3 2 6.4 3.8 7.9

70 3 1.8 4.8 2.8 6.2

71 3 2.1 5.9 3 7.1

72 3 1.8 5.6 2.9 6.3

73 3 1.8 6.3 2.9 7.3

74 3 2.4 5.1 2.8 5.8

75 3 1.9 5.3 2.7 6.4

73

an overlapping fuzzy approach to pattern...

Documents