signal processing for electronic nose, signal processing

Ref. code: 25605722300067ONR

SIGNAL PROCESSING FOR ELECTRONIC NOSE

BY

MD. MIZANUR RAHMAN

A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF

PHILOSOPHY (ENGINEERING & TECHNOLOGY)

SIRINDHORN INTERNATIONAL INSTITUTE OF TECHNOLOGY

THAMMASAT UNIVERSITY

ACADEMIC YEAR 2017

Ref. code: 25605722300067ONR

Ref. code: 25605722300067ONR

ii

Acknowledgements

I express my gratefulness to almighty Allah who blessed me to perform

this work. My sincere gratitude is to my advisor Assoc. Prof. Dr. Chalie

Charoenlarpnopparut for the continuous support of my Ph.D study and related

research, for his patience, motivation, and for his knowledgeable guidance throughout

this research. His guidance helped me in all the time of research and writing of this

thesis. I could not have imagined having a better advisor and mentor for my Ph.D

study. My special gratitude is to Dr. Attaphongse Taparugssanagorn and Dr.

Wiroonsak Santipach for their precious and deliberate suggestions which helped me

to improve this work.

Besides my advisor, I am also grateful to my Co-advisor Dr. Prapun

Suksompong for his benevolent and insightful suggestions. I would like to thank the

rest of my thesis committee: Prof. Dr. Banlue Srisuchinwong, Assoc. Prof. Pisanu

Toochinda, Ph.D., and Asst. Prof. Dr. Attaphongse Taparugssanagorn, for their

insightful comments and encouragement.

I am also grateful to all the SIIT faculty members and staffs for every

kind of support during my study and research in SIIT. I also thank my friends in SIIT

for their love and company.

Finally I recall my parents and family members specially my son

Mohammad Ateeb Mahir who sacrificed his father‟s affection and company since

October 25, 2012.

Ref. code: 25605722300067ONR

iii

Abstract

SIGNAL PROCESSING FOR ELECTRONIC NOSE

by

MD. MIZANUR RAHMAN

Bachelor of Science (Electronics and Communication Engineering), Khulna

University, 2002

Master of Science (Engineering and Technology), Sirindhorn International Institute of

Technology, Thammasat University, 2014

Doctor of Philosophy (Engineering and Technology), Sirindhorn International

Institute of Technology, Thammasat University, 2017

Ripeness identification of fruits with short lifetime is important to benefit

both the cultivators and consumers. Recently electronic noses (E-Noses) have become

popular for fruit quality checkup for its sturdiness and repetitive usability without

fatigue dissimilar to human experts. The primary components of an E-Nose are a data

acquisition device, a sensor panel and a classification algorithm. Most sensors which

are used for E-Noses are expensive. In addition a sensor panel with large number of

sensors increases design complexity. Thus to find a minimal set of sensors with

maximum relevant data classification efficiency is of vital importance. To analyze the

classification efficiencies of different classification methods fruits, such as banana,

mango, sapodilla, and pineapple are chosen. Two novel methods for finding a

minimal set of sensors are proposed in this thesis. One is a principal component

loading and mutual information based approach, and the other is a threshold based

approach. With these methods minimal set of sensors are found which show more

than 90% classification accuracy while classifying each of the four fruit types at three

ripeness states. Once a sensor panel is designed and a data acquisition device chosen,

a simple, fast, efficient classification method is required for classifying data of

Ref. code: 25605722300067ONR

iv

relevant training classes, and to reject any irrelevant data. At present to classify E-

Nose data, k-nearest neighbor (k-NN), support vector machine (SVM) and multilayer

perceptron neural network (MLPNN) classification algorithms are often applied. Due

to open ended hyperplane based classification boundaries, these algorithms falsely

classify extraneous odor data. For reducing false classification error and thereby

improve correct rejection performance classification algorithms with hyperspheric

boundary such as generalized regression neural network (GRNN) and radial basis

function neural network (RBFNN) should be used. Simulation results show that

GRNN has better ability to overcome false classification problem compared to

RBFNN. For large number of neurons requirement, designing a GRNN is complex

and expensive. A simple hyperspheric classification method based on minimum,

maximum, and mean (MMM) values of the training data is also proposed in this

thesis. It is observed that the MMM algorithm is simpler, faster, and have higher

accuravy for classifying data of training classes and correctly rejecting data of

irrelevant classes.

Keywords: Electronic nose, Pattern recognition, Signal processing, False alarm,

Classification

Ref. code: 25605722300067ONR

v

Table of Contents

Chapter Title Page

Signature Page i

Acknowledgements ii

Abstract iii

Table of Contents v

List of Figures viii

List of Tables x

1 Introduction 1

1.1 Problem statement 1

1.2 Motivation 3

1.3 Objective 5

1.4 Contribution of this thesis 5

2 Background Study and Literature Review 7

2.1 Human olfactory system 7

2.2 Human nose versus E-Nose 9

2.3 Literature review 10

2.3.1 Sensor array minimization techniques 13

2.3.2 False alarm reduction techniques 14

3 System Design and Methodology 16

3.1 E-Nose design process 16

3.2 Sample fruit collection 17

Ref. code: 25605722300067ONR

vi

3.3 Sensor array 17

3.4 Experimental setup 21

3.5 Data acquisition system 21

3.6 Pattern recognition and classification algorithms 24

3.6.1 Principal component analysis (PCA) 24

3.6.2 k-nearest neighbor (k-NN) 25

3.6.3 Support vector machine (SVM) 25

3.6.4 Multilayer perceptron neural network (MLPNN) 27

3.6.5 Generalized regression neural network (GRNN) 29

3.6.6 Radial basis function neural network (RBFNN) 30

3.6.7 Linear discriminant analysis (LDA) 31

3.7 Sensor panel minimization techniques 33

3.7.1 Exhaustive search method 33

3.7.2 PCA loading and mutual information based approach 34

3.7.3 Threshold based approach 36

3.8 Hyperplane versus hyperspheric classification 37

3.8.1 Minimum-maximum-mean based hyperspheric classification 40

4 Results and Discussions 43

4.1 Data preprocessing 43

4.2 Sensor panel minimization methods 46

4.2.1 Between to within variance method 46

4.2.2 Exhaustive search method 47

4.2.3 PCA loading and mutual information based approach 50

4.2.4 Threshold based method 51

4.3 False classification reduction: Hyperplane versus hyperspheric 53

4.3.1 GRNN compared to SVM, LDA, k-NN, and MLPNN 53

4.3.2 MMM versus k-NN, SVM, GRNN, RBFNN, and MLPNN 57

5 Conclusions and Recommendations 63

5.1 Conclusions 63

5.2 Future works 64

Ref. code: 25605722300067ONR

vii

References 65

Appendices 76

Appendix A 77

Ref. code: 25605722300067ONR

viii

List of Figures

Figures Page

1.1 (a) Hyperspheric (ellipsoid in two dimension) vs. (b) hyperplane (line in

two dimension) classification boundary 3

2.1 Position of a human olfactory system 7

2.2 A human olfactory system and its parts: 1: Olfactory bulb, 2: Mitral cells,

3: Bone, 4: Nasal Epithelium, 5: Glomerulus, 6: Olfactory receptor cells. 8

2.3 Detailed view of a human olfactory system 8

2.4 Comparison of Human olfaction system to an E-Nose olfaction system 10

3.1 An E-Nose design process 16

3.2 Fruit samples at different ripeness states: (a) unripe banana, (b) ripe

banana, (c) rotten banana, (d) unripe mango, (e) ripe mango, (f) rotten

mango, (g) unripe sapodilla, (h) ripe sapodilla, (i) rotten sapodilla, (j)

unripe pineapple, (k) ripe pineapple, and (l) rotten pineapple 18

3.3 The E-Nose sensor array. (a) Sensors mounted on a breadboard, (b) PCB

design. 5 sensors are of TGS 26XX series and 3 sensors are of TGS

8XX series. Cermet type variable resistors are used. 19

3.4 The E-Nose experimental setup. Valves 1 and 2, and fans 1 and 2 are for

air flow control, valves 3 and 4, and fans 3 and 4 are for circulating the

odor between sample chamber and measurement chamber. A E-Nose

power supply powers the sensors, fans, and the myRIO 21

3.5 LabVIEW schematic diagram for data acquisition 23

3.6 Block diagram of a multilayer perceptron neural network 27

3.7 Generalized regression neural network block diagram. d2

l,m is Euclidean

distance, is the spreading factor, m is class index, and l is data index

within class m 29

3.8 Two dimensional PC loading of the fruits data to explain relation of PC

loading to mutual information 35

3.9 Gaussian functions with means, µ=0.61, µ=1.30, µ=1.90 and standard

deviations =0.04, =0.22, =0.34 for green mango, ripe banana, and

ripe sapodilla, respectively 37

Ref. code: 25605722300067ONR

ix

3.10 A PCA of the dataset to explain false classification problems of k-NN,

SVM, LDA, and MLPNN. The dataset is a PCA scores plots of the four

fruit types each at three ripeness states. In the figure GB is unripe

banana, RB is ripe banana, RtB rotten banana, GM is unripe mango, RM

is ripe mango, RtM is rotten mango, GS is unripe sapodilla, RS is ripe

sapodilla, RtS is rotten sapodilla, GP is unripe pineapple, RP is ripe

pineapple, and RtP is rotten pineapple 38

3.11 A PCA plot with closed boundary to explain the effect of hyperspheric

closed boundary to avoid false classification. In the figure GB is unripe

banana, RB is ripe banana, RtB rotten banana, GM is unripe mango, RM

is ripe mango, RtM is rotten mango, GS is unripe sapodilla, RS is ripe

sapodilla, RtS is rotten sapodilla, GP is unripe pineapple, RP is ripe

pineapple, and RtP is rotten pineapple 39

4.1 Sensor responses from eight sensors for ripe sapodilla 44-45

4.2 PCA scores plot of training data of four fruit types at three ripeness states 45

4.3 Between to within variance of all the classes for each sensor 46

4.4 Signature patterns for three types of fruits. (a) Ripe mango, (b) ripe

sapodilla and (c) ripe pineapple 54

4.5 Scores plot of three types of fruits 54

4.6 Classification of ripe mango samples, and test samples of sapodilla and

pineapple by exact GRNN and approximate GRNN at different

spreading factor 57

4.7 Signature patterns of the means of three types of fruits at three ripeness

states: (a) green banana, (b) ripe banana, (c) rotten banana, (d) green

sapodilla, (e) ripe sapodilla, (f) rotten sapodilla, (g) green pineapple, (h)

ripe pineapple and (i) rotten pineapple 58

4.8 Box plots showing minimum maximum values of the sensor variables for

the training classes i.e. green banana (GB), ripe banana (RB), rotten

banana (RtB), green sapodilla (GS), ripe sapodilla (RS), rotten sapodilla

(RtS), green pineapple (GP), ripe pineapple (RP), and rotten pineapple

(RtP). 59

Ref. code: 25605722300067ONR

x

List of Tables

Tables Page

1.1 Total fresh fruit import value for mainland china by country, 2012–2015 4

3.1. Figaro gas and VOC sensors used in the E-Nose design 20

3.2. Experimental data collection plan for fruit odor classification 22

4.1 Pattern recognition error rates of test data for between to within variance

method 47

4.2 Classification error by MLPNN and RBFNN methods. For different

number of sensor cases, the combinations with minimum pattern

recognition errors are recorded 48

4.3 Classification error by k-NN and SVM methods. For different number of

sensor cases, the combinations with minimum pattern recognition errors

are recorded 49

4.4 Principal component loadings 50

4.5 Mutual information between pairs of sensors. The self-information in

diagonal cells is omitted 51

4.6 Maximum number of pairs of classes classifiable by combinations of

three sensors 53

4.7 Maximum number of pairs of classes classifiable by combinations of four

sensors 53

4.8 Classification of ripe mango, ripe sapodilla and ripe pineapple samples

by different algorithms 56

4.9 Training and testing time taken by the algorithms to train and test with

the data samples of banana, sapodilla, and pineapple, each at three

ripeness states 60

4.10 Misclassification error and correct classification rate of the

classification algorithms while testing with test data set from training

classes i.e. banana, sapodilla, and pineapple each at three ripeness states 61

4.11 False classification performance of the algorithms with irrelevant data

(i.e. mango odor data at three ripeness states) 62

Ref. code: 25605722300067ONR

1

Chapter 1

Introduction

1.1 Problem statement

The sense of smell is a fundamental sense for humans as well as animals.

The vertebrates and other organisms has olfactory receptors in the olfaction system to

identify foods, predators, mates, and provides sensual pleasure of the good smells as

well as warnings of food quality, chemical dangers, ripeness of fruits, etc. A smell is

discerned by humans and animals based on a sensitivity pattern generated by the

olfactory sensory neurons without identifying individual chemical components within

odors. These neurons are located into their noses. Whereas traditional chemical or

analytical instruments such as chemical analysis, gas chromatography or mass

spectrometry (GC/MS) instead of deciding object type or quality, finds the

composition of the volatile organic compounds (VOCs) in the odor. On the other hand

human experts suffer from fatigue. As a result the concept of an electronic nose (E-

Nose) was initiated in 1982 by Persaud and Dodd to mimic a human or animal nose

[1].

The gas sensors used in E-Nose application are expensive and have

distinct sensitivity to wide ranges of gases and VOCs. A specific odor or VOC can be

sensed by multiple sensors with different sensitivity. The wide range sensitivity of the

sensors gives us the idea to exclude redundant sensors from the sensor panel and

reduce the design complexity and cost. However finding and excluding the redundant

sensors without significant degradation of an E-Nose performance is a challenge.

A proficient, quicker, and simpler classification algorithm which

classifies types of target object odors with an acceptable error rate is required for E-

Nose training. When odor data, with which an E-Nose is trained, is classified to an

expected training class correctly, it is called “correct classification”. On the other

hand, odor data of an untrained or irrelevant class should be correctly rejected and

should not be classified to any training class, and thereby should not produce false

classification error, called “false alarm”. In short, data from training classes are

required to be correctly classified and from irrelevant odor classes are to be correctly

Ref. code: 25605722300067ONR

2

excluded. Incorrect classifications of the data which are truly are of training classes or

irrelevant classes‟ outcomes in classification errors. These classification errors can be

categorized as follows:

i) misclassification: if odor data of training classes are classified to

another training or irrelevant class, and

ii) false classification: if odor data of an unrelated class classified to a

known training class. False alarm is caused if any false classification occurs.

Thus along with good classification accuracy (correct classification rate),

less processing time and better simplicity; false classification rate as well as correct

rejection rate should are to be considered as an essential conditions to choose a

classification method for E-Nose applications.

The classification methods such as principal component analysis (PCA),

k-nearest neighbor (k-NN), support vector machine (SVM), multilayer perceptron

neural network (MLPNN), radial basis function neural network (RBFNN),

generalized regression neural network (GRNN), and linear discriminant analysis

(LDA), etc. are commonly used for E-Nose data classification. Among these methods

k-NN, SVM, MLPNN, and LDA classify data by open ended hyperplane based

classification boundaries and are susceptible to false classification and thereby

generate false alarm in presence of irrelevant odor due to wideness of sensor

sensitivity. Classification methods such as RBFNN and GRNN which use Gaussian

activation functions in their hidden layer generate bounded hyperspheric classification

boundary and is able to cope with the false alarm issue. Due to large number of

neuron requirements application of these methods are expensive in presence of large

training dataset. Thus a simple classification method is required for E-Nose

application.

E-Nose data does not extend indefinitely into the space rather acquired

sensor data of any odor varies within definite limit. The feature variables of an odor

class for an E-Nose are distributed around a mean (or mean vectors in

multidimensional case) within distinct limits. Any data which are located in the empty

spaces around different class boundaries and are not reasonably close to any class

should not be classified into any training classes rather should be rejected. This

objective is achievable by a classification algorithm which uses hyperspheric

Ref. code: 25605722300067ONR

3

classification boundary. The effect of hyperspheric versus hyperplane classification

boundaries is shown in Figure 1.1. Hyperspheric classification boundaries (ellipsoid

in two dimensions) always form closed boundaries around classes (e.g. class

boundaries D and E having centers „d‟ and „e‟ in Figure 1.1(a)) and classify data to a

class if it resides within the class boundary. Whereas hyperplane classification

boundaries (lines in two dimensions) are unbound and classifies data to a class

regardless of its distance from class D or E having centers at „d‟ or „e‟ (Figure 1.1(b)).

Let us consider a data point „f‟ which is far from the class centers „d‟ and „e‟ in both

Figure 1.1(a) and Figure 1.1(b). The hyperspheric classification boundaries (Figure

1.1(a)) will not classify the data point „f‟ to either class D or E, while it is seen that

the data point „f‟ will be classified to class E (Figure 1.1(b)) by a hyperplane based

classification algorithm.

Figure 1.1 (a) Hyperspheric (ellipsoid in two dimension) vs. (b) hyperplane (line in

two dimension) classification boundary.

1.2 Motivation

In the growth and development of human life fruits and vegetables have

always been very essential. Fruit benefits us a healthy lifestyle by giving

carbohydrates, fiber and micro-nutrients which aids our bodies to function properly.

Ref. code: 25605722300067ONR

4

Fruits give more energy than sugar or sweets as they contain natural glucose and

fructose.

As food value chains become longer and more complex, quality and

safety of foods and fruits are becoming an increasingly important issue for consumers

as well as for governments. Fresh foods and fruits need to meet food safety

requirements and marketing standards. National and international buyers also use

their own quality specifications. Fresh foods and fruits are perishable products.

Quality of these products deteriorates faster which causes fall of prices. Table 1.1

shows that Thailand is the top exporter of fruits to China. To keep this position and

finding more opportunities in other countries quality and standards should be

maintained which imply that fruits should be plucked and distributed at their right

maturity level. As human experts suffer from fatigue and are expensive as well an E-

Nose can be used instead for fruit type and quality identification.

Table 1.1 Total Fresh Fruit Import Value for Mainland China by Country, 2012–2015

[2].

Country Imported fresh fruit by China in millions of USD in the years

2012 2013 2014 2015

Thailand 975.6 1106.2 1022.8 1066.3

Chile 571.1 615.6 776.1 971

Vietnam 441.5 546.3 682.2 861.4

Philippines 320.3 315.3 607.0 564.8

USA 288.1 253.8 253.1 290.6

Peru 66.8 98.4 202.6 214.3

Ecuador 31.0 19.9 154.7 220.5

South Africa 64.5 88.0 156.3 140.5

New Zealand 117.5 108.5 153.9 274.9

Australia 22.5 46.8 71.2 114.5

Four popular and healthy fruits namely banana, mango, pineapple,

sapodilla are chosen for this research. Due to fast metabolism process these fruits rot

Ref. code: 25605722300067ONR

5

faster once they are plucked. Identification of these fruits at the perfect maturity level

can benefit farmers, businessmen, and consumers.

Designing an E-Nose includes knowledge on electronic circuit design i.e.

designing a sensor panel with gas and/or VOCs sensors, interfacing the sensor panel

with a data acquisition device, digital signal processing, and pattern classification. As

an Electronics and Communication Engineer the author is inspired to deeply learn and

accumulate this knowledge to design and explore the working process of an E-Nose

and contribute in ongoing researches in this area. This helps to understand and find

solution to the existing problems. Furthermore an E-Nose is not only a laboratory

device anymore, it has practical applications to analyze food quality, fruit ripeness,

hazardous chemicals or gases detection, etc. An E-Nose designed for fruit ripeness

detection benefits both the farmer and consumer by identifying the proper state of a

fruit to pluck or consume the fruits at the best time before they are rotten. For mass

availability designing an E-Nose with minimum number of sensors, maximum correct

classification, and minimum false classification performance is essential.

1.3 Objective

The objectives of this thesis include:

i. to design an E-Nose with gas sensors to identify ripeness of

tropical fruits,

ii. to develop redundant sensor exclusion method to reduce E-Nose

design complexity and cost.

iii. to find a classification algorithm with irrelevant data rejection

capability and thereby reduce false alarm in E-Nose.

1.4 Contribution of this thesis

In this thesis a sensor panel has been designed with eight metal oxide gas

(MOG) sensors which efficiently classify four tropical fruits namely, banana, mango,

sapodilla, and pineapple each at three ripeness states namely unripe, ripe, and rotten.

Later two novel methods for finding an optimal set of sensors are proposed which are

less complex and more efficient compared to existing methods. One is a principal

component loading and mutual information based approach, and the other is a

Ref. code: 25605722300067ONR

6

threshold based approach. An optimal set of three sensors are found by the proposed

methods which performance is tested with RBFNN, MLPNN, k-NN and SVM

classification algorithms and show minor classification errors.

The term „false classification‟ is not addressed for E-Nose application

except fire alarm detection in literature. It is presented that an RBFNN and a GRNN

with Gaussian activation function at the hidden layer neurons produce bounded

hyperspheric classification boundary around the training classes. The GRNN method

is found to be more efficient to reduce false classification as well false alarm by

rejecting any irrelevant odor data. In addition a simple hyperspheric classification

method which is based on minimum, maximum, and mean (MMM) values of the

features of the training data is proposed in this paper. It is shown that this MMM

algorithm is simpler, faster and have higher accuracy to classify odor data of training

classes and correctly rejects data of extraneous classes.

In summary, this thesis shows the following contribution:

i. design of a sensor panel which classifies four fruits each at three

ripeness states,

ii. two novel methods for sensor panel minimization,

iii. hyperspheric boundary based classification methods are better for

E-Nose data classification,

iv. a novel hyperspheric classification method for E-Nose data

classification based on minimum, maximum, and mean of the

features of each class.

Ref. code: 25605722300067ONR

7

Chapter 2

Background Study and Literature Review

2.1 Human olfactory system

Humans and animals sense the smell by the in-built sensory system into

their noses called olfactory system. The olfactory system has two distinct parts such

as:

a main olfactory system: used for detecting volatile, airborne substances, and

an accessory olfactory system: senses fluid-phase stimuli.

The mechanism of the olfactory system can be divided into

a peripheral one: external stimulus is sensed and encoded as an electric signal

in neurons, and

a central one: all signals are assimilated in the central nervous system.

Figure 2.1 Position of a human olfactory system. [3]

Ref. code: 25605722300067ONR

8

Figure 2.2 A human olfactory system and its parts: 1: Olfactory bulb, 2: Mitral cells,

3: Bone, 4: Nasal Epithelium, 5: Glomerulus, 6: Olfactory receptor cells. [4]

Figure 2.3 Detailed view of a human olfactory system. [3]

Position of a human olfactory system is shown in Figures 2.1 and 2.2, and

a detail view of human olfactory system is shown in Figure 2.2 and 2.3. The main

olfactory system of the humans detects odorants i.e. VOCs that are inhaled through

the nose from the atmosphere. Objects emitting VOCs into the air can be recognized

by the humans and animals from the past odour experiences recorded into their

Ref. code: 25605722300067ONR

9

memory. The VOCs reach the olfactory sensory cells at the olfactory Epithelium

located into the inner most part of the nose. The detection is performed with a small

area which is about 2.5 square centimeters. VOCs reception and sensory transduction

starts at the mucous-cilia layer. The mucous layer is approximately 60 microns thick.

8-20 whip-like cilia of 30-200 microns long are connected to the main olfactory

receptor neurons (also called olfactory receptor cells) in olfactory epithelium. The

olfactory epithelium contains approximately 50 million various sensory receptor cells.

Olfactory receptor neurons transduce VOCs‟ sensation at the cilia into electrical

signals. On the other side olfactory receptor cells form axons within the epithelium.

10-100 axons are bundled in groups, penetrate the cribriform bone, converge and

terminate to form synaptic structures called glomeruli. Glomeruli are located at the

olfactory bulb. Identical signals from different olfactory receptors combine at the

glomeruli to which they are connected and travel along the olfactory nerve which

terminates in the olfactory bulb and also belongs to the central nervous system. A new

odor and its concentration are classified by a complex set of olfactory receptors

neurons. The central nervous system views the odors as distinct neural activity

patterns. The data remain into the memory as synaptic weights for future

identification of the similar VOCs and identifying an object [1].

2.2 Human Nose versus E-Nose

An E-Nose is a measurement unit that attempts to imitate the human

olfactory system to identify odors or flavors. An E-Nose instigates complex multi-

dimensional data for each measurement, and the data is interpreted and mapped to a

target value or class label by a pattern recognition technique. The stages of the

recognition process of an E-Nose are similar to human olfaction. The olfaction

processes are performed for identifying, comparing, quantifying type or quality of

odor emitting objects and other applications. A comparison of human olfaction to that

of an E-Nose is shown in Figure 2.4.

As seen in Figure 2.4 the olfaction process for an E-Nose and a human

nose begins from an odor source. Similar to a human olfactory receptor an E-Nose

uses a sensor array and acquisition device to generate and collect signal in response to

Ref. code: 25605722300067ONR

10

an odor type. A signal processing algorithm preprocesses the signal to a suitable

format as required by the classification algorithm. An implemented classification

algorithm such as, k-NN, SVM, MLPNN, RBFNN, GRNN, and LDA just to name a

few compares the presently sensed data to the previously stored data and provides

decision on an odor or its source type.

Figure 2.4 Comparison of Human olfaction system to an E-Nose olfaction system.

2.3 Literature review

In this section firstly different applications of E-Noses and classification

methods in literature are presented, later existing methods of sensor array

minimization techniques and false alarm reduction methods are discussed.

Beverages, such as orange juice, mango juice, and blackcurrent juice;

spoiled and fresh milks, pasteurized milk, milk processed with ultrahigh temperature

were classified by PCA and MLPNN in [5] by an E-Nose. Biodiesels, such as

babassu, beef tallow, palm, and chicken grease were classified in [6] also by PCA and

MLPNN. Eastern and north-eastern Indian black tea was classified by MLPNN in [7],

in [8] for fruit (peaches, pears, and apples) ripeness (green, ripe, and overripe)

determination, in [9] for identification of ammonia in waste water. In [10] three

separate experiments were performed to classify four types of tea; five types of

coffee; and water, ethanol, triethyl-amine, and methyl-salicylate using MLPNN.

MLPNN was also applied in [11] and [12] for classification. In [11] it was used to

classify fragrant herbs species such as: lemongrass (cymbopogen citrates), curry

Ref. code: 25605722300067ONR

11

(murraya koenigii), pandan leaves (pandanus amaryllifolius), kaffir lime/limau purut

(citrus hystrix), and golden lime/limau Kasturi (citrus microcarpa) and in [12] to

classify coffee, tea, and cocoa. In [13] three concentration levels of gasoline, benzene,

xylene, ethyl benzene, and toluene were classified by PCA and GRNN. Twenty diesel

fuels were classified by LDA and PCA in [14]. PCA, probabilistic neural networks

(PNN) were applied to classify and monitor the quality of diesel and gasoline fuel in

[15]. A review on the classification of foods and beverages, such as grains, fish,

alcoholic drinks, fruits, meat, non-alcoholic drinks, milk, and dairy products, fresh

vegetables, eggs, olive oils, and nuts were presented in [16] partial least squares

(PLS), cluster analysis (CA), PCA, and MLPNN were used as classification methods.

In [17] and [18] correlation based analysis of E-Nose data were performed for beef

freshness, and beverages (black-current juice, orange juice and soy milk)

identification, respectively. An E-Nose used a k-NN method to classify lemon,

banana, and litchi fragrances in [19]. Five different tea samples of different qualities

were classified by RBFNN, fuzzy C means, and self-organizing map (SOM) in [20].

Another E-Nose designed with MOG sensors used PCA to analyse the quality of soft

drink, olive oil, and tomato pulp [21]. In [22] ripening shelf-life of peaches,

nectarines, apples, and pears were monitored by an E-Nose designed with MOG

sensors. It was shown by PCA analysis that fruit ripeness assessment by E-Nose is

reliable. A MOG sensor based E-Nose was proposed in [23] where MLPNN was

applied for classification and detection of apple ripeness. Machine vision system,

near-infrared spectrophotometer, and an E-Nose system were combined together in

[24] and few numbers of misclassification were observed. Here the E-Nose was used

to identify the rotting stage of apple by MLPNN. In [25] concentration estimation of

formaldehyde based on chaotic sequence optimized MLPNN was done. Basal Stem

Rot disease of oil palm was identified with application of ANNs such as: MLPNN,

PNN, and RBFNN in [26]. In [27] an E-Nose with five semiconductor thin film

sensors was designed. Here two groups of seven coffees‟ quality were analysed by

training an MLPNN. Before feeding the data to MLPNN, dimensions of the data were

reduced by PCA. MOG sensors were used to develop an E-Nose to recognize

benzene, toluene, ethyl benzene, xylene, and gasoline based on PCA, and ANN [13].

In [28] dimensionality was reduced by PCA and then a neural network was used to

Ref. code: 25605722300067ONR

12

discriminate different types of coffee, aroma oils, perfumes and alcohol. Six clonal

varieties of orthodox finished tea were classified by PCA and MLPNN [29]. PCA was

applied to visualize data in lower dimension and PNN was applied for classification in

[30]. Collected data from MOG sensor based E-Nose were analyzed by MLPNN, and

SVM to classify shelf-life stages of banana [31]. An SVM was also applied in [32] to

recognize ethanol, gasoline and acetone with good accuracy and in [33] to identify

spoiled beef, and fish. Eleven different types of Spanish olive oils were classified by

PCA followed to an LDA projection with 79% accuracy in [34]. A MOG sensor based

E-Nose was designed in [35] with wireless connectivity where k-NN, MLPNN, SVM

with linear and RBF kernel, PCA, and LDA algorithms were used for classification.

Coffee roasting degree was analysed in [36] by applying two dimensional PCA scores

to MLPNN, and GRNN. In [37] a portable E-Nose was used to collect sensing data

and pattern recognition algorithms as LDA, MLPNN, PNN, and GRNN were used to

identify adulteration of sesame oil. Applications of an E-Nose in determining fruit

quality and ripening have been studied in the literature. PCA and MLPNN were used

in [38] to study the quality and ripening of peach, pear, and apple. Tomato aroma

profile was studied with the aid of PCA and an E-Nose in [39]. Maturity and storage

shelf life of tomato were studied by a portable E-Nose named „PEN 2‟ in [40] and

[41], respectively, where data analysis and classification were done by PCA and

LDA. Analysis of the ripening process of pinklady apples during shelf life was done

in [42] by PCA and Fuzzy adaptive resonance theory map. The partial least square

method was applied to predict apple harvesting data in [43]. PCA was applied to

analyse the shelf life of apples from GS-MS and E-Nose data in [44]. The non-

destructive quality of “Fuji” apples was analysed by fusion of three different sensors,

namely: a NIR, a machine vision system, and an E-Nose based on MLPNN in [45].

To monitor mandarin picking, a PEN2 E-Nose was used in [46], where data analysis

was done by PCA and LDA. Four peach cultivars were classified and peach ripening

stages were assessed by a commercial E-Nose named PEN2, based on PCA and LDA

in [47]. The quality of post-harvested oranges and apples were analysed by thickness

shear mode quartz resonators, based on PCA and PLS in [48]. An embedded E-Nose

was designed in [49], which showed good response to onions and oranges. An ANN,

Ref. code: 25605722300067ONR

13

trained with back-propagation, and an artificial bee colony algorithm, was applied to

classify strawberry, lemon, cherry, and melon [50].

2.3.1 Sensor array minimization techniques

Reducing an E-Nose manufacturing cost and computational complexity is

important for mass production which can be achieved by excluding the sensors with

insignificant information from an E-Nose sensor panel. Various methods have been

proposed by the authors in literature.

A sub-array based sensor optimization technique was presented in [51],

where, at first, sensor sub-arrays were formed by combining sensors with similar gas

discriminating capability and later, sensors from the sub-arrays were chosen as per

classification capability. Another method named combination optimization method

was used in [52] to discriminate different grades of Longjing tea. This method is

based on hypothesis testing by analysis of variance (ANOVA) at the beginning and

later principal component (PC) loading analysis is utilized to reduce the number of

sensors. The PC loading value method was also used in [53] to exclude redundant

sensors as [52]. It was also shown that ANOVA and Tukey multiple comparison

suggest similar choice of the sensors. But the Tukey multiple-comparison is mainly

based on distance between means of each sensor data considering all the classes in the

same group. Here each sensor‟s capability to classify classes is not considered. In

addition, [52] and [53] did not show how to use this method for the cases where

higher dimensional PCs might contain significant data variance.

A genetic algorithm (GA) was used in [54] to optimize an E-Nose sensor

array and determine the tea quality. For GA an individual gene is represented by a

sensor. Genes are combined to construct chromosomes. A number of such

chromosomes (combinations of sensors) form a population. GA technique aims to

reduce the number of sensors (gene) in the combination which limit the length of the

constructed chromosome. The fixed chromosome length overlooks the longer or

shorter chromosomes which can mislead to suboptimal set of sensors. To overcome

this problem, one way is to perform exhaustive search within all combinations of

sensors and apply a classification algorithm to find the classification errors. The

optimal sensor set is the one which meets the acceptable pattern recognition errors

Fuzzy

ARTMAP

Ref. code: 25605722300067ONR

14

(set by the designer or application) with minimum number of sensors. In addition,

global optimum cannot be achieved all the time with GA, especially when overall

solution has various populations. In addition, global optimum is not achievable all the

time with GA. To achieve optimum solution uncertain number of trials is required

[55]. The ratio of between variance to within variance [56] for individual sensors over

all classes was applied as a method to choose the optimal number of sensors for an E-

Nose to identify ethanol, 2-propanol, acetone, and ammonia [57]. With this method

optimal set of sensors were chosen as per to the higher ratio of inter-class variation to

within-class variation of each sensor. The performance of the optimal sensor set was

verified by a three layer MLPNN. This method did not consider similarity or

dissimilarity i.e. mutual information contained in the selected sensors which might

mislead to choosing alike sensors.

2.3.2 False alarm reduction techniques

Until present, few researches addressed “false classification” or “false

alarm” for E-Nose applications. The issue was addressed for fire detection in [58-60].

Threshold based approach was used by the authors in [61] to detect false alarm for

concentration estimation and chemical agent detection. The false alarm reduction

performance of an E-Nose in presence of irrelevant gases or VOCs is not considered.

A false alarm is likely to originate if an E-Nose comes in contact with any irrelevant

gases or VOCs due to the sensor responses to wide ranges of gases or VOCs.

Classification algorithms, such as PCA [39, 41, 42, 44, 46-48, 62-66], k-

NN [19, 65-67], SVM [67-72], GRNN [13], RBFNN [73-77], MLPNN [63-66, 75-

79], and LDA [34,35,37,41,46,47], etc. just to name a few are commonly applied for

odor classification by different E-Noses. SVM, LDA, and MLPNN separate the

classes by unbound classification boundaries, i.e., lines in two dimensional space,

planes in three dimensional space, and hyper-planes in higher dimensional space. The

generated classification boundaries by lines, planes or hyperplanes are similar to

Voronoi diagrams, where outer classes have open boundaries for multiclass problems.

For two or three class problems the boundaries are always open. In this thesis it is

shown that the classification algorithms having open ended classification boundaries

Ref. code: 25605722300067ONR

15

are not suitable to reduce false alarm. The classification boundaries produced by

Gaussian activation functions for RBFNN and GRNN are hyperspheric. A Gaussian

activation function produces boundary around the training classes by deemphasizing

far data and emphasizing near data from the mean. Thus the classification boundaries

produced by RBFNN and GRNN are bounded in two, three, or higher dimensions. For

more than three dimensions they produce hyperspheric classification boundary. The

RBFNN and GRNN training is fast but their design complexity is high as their

required number of neurons is to the order of number of training data samples. A less

complex hyperspheric classification algorithm is proposed in [80]. This method

defines the hypersphere in an n-dimensional space having center c = (c1, c2, …, cn)

and radius Rn as

n

iii n

Rcx1

22, (2.1)

where an unknown n-dimensional data x = (x1, x2, …, xn) is then classified as follows.

If

n

iii n

Rcx1

22, (2.2)

the class 1 is decided, otherwise, class 2. Thus, this method is not suitable to classify

more than two classes. An inversion algorithm for MLPNN shown in [81] generates

bounds training data with closed boundary can classify two classes of data where one

class is inside the boundary and the other outside the boundary. A Gaussian kernel

based hyperspheric decision boundary is applied for one class classification by SVM

[82]. We observe that the classification methods in [80-82] are suitable for one class

classification or novelty testing or outlier detection, and are not fitted to classify more

than two classes.

In the next chapter the designing process of the E-Nose designed for this

thesis is explained. The existing popular pattern recognition techniques and sensor

array minimization methods are also presented in contrast to the proposed methods of

sensor array minimization and false alarm reduction techniques.

Ref. code: 25605722300067ONR

16

Chapter 3

System Design and Methodology

In this chapter E-Nose design procedure, sample collection, designing a

sensor array, data acquisition system, experimental setup, existing classification

methods and sensor array minimization techniques, and proposed methods for sensor

array minimization techniques and hyperspheric classification to reduce E-Nose false

alarm are presented.

3.1 E-Nose design process

Figure 3.1 An E-Nose design process.

The E-Nose design process is presented in Figure 3.1. Odour samples are

prepared and kept in the sample chamber. The odour source can be foods, fruits,

chemicals, explosives etc. This research is performed on fruit odours to classify fruits

and their ripeness states. The odour samples are allowed to enter from the sample

chamber into a measurement chamber that consists of a sensor array and data

acquisition device powered by external E-Nose source. The sensor array comprises

sensors with wide and dissimilar selectivity. Odour VOCs are adsorbed at the sensor

Ref. code: 25605722300067ONR

17

surface and causes physical change of the sensor behaviour (resistivity for gas

sensors). In response to an odour the current flowing through the gas sensors as well

as the voltage developed across the sensor load resistors change. Responses are

acquired by data acquisition device transforming the signal into a digital value and

produce signature patterns for different kinds of objects. The data is collected and

recorded by a wifi connected computer from the acquisition device. Recorded data are

then pre-processed based on statistical methods. Data pre-processing means to convert

the sensed signature patterns to a suitable format by reducing the dimensionality of

the measurement space, extract information relevant for pattern recognition by

applying PCA, LDA, etc. so that the data can be applied to pattern recognition and

classification algorithms for classification. For classification k-NN, MLPNN,

RBFNN, GRNN, SVM, PCA, and LDA are applied in this thesis. In decision making

stage the decisions such as odour class, concentration, unidentified etc. are mapped to

the results from pattern recognition and classification stage.

3.2 Sample fruit collection

Sapodilla, pineapple, banana, and mango shown in Figure 3.2 are selected

for this research as their ripening process is fast, and they rot soon, which may cause

loss to a business or customer. Four sapodillas, one pineapple, two bananas, and a

mango are kept in turn in the sample chamber during experiment with each type of

fruit. Throughout the experiment the fruits are preserved at 28oC. Separate

impermeable boxes are used to store the fruits to prevent their odour mixing with each

other and thereby noise induction with each other.

3.3 Sensor Array

For this research eight MOG sensors which have a wide range of

sensitivity to a variety of gases and VOCs are purchased. The sensors are TGS2612

(S1), TGS821 (S2), TGS822 (S3), TGS813 (S4), TGS2602 (S5), TGS2603 (S6),

TGS2620 (S7) and TGS2610 (S8) as listed in Table 3.1. S1 to S8 are indices assigned

to the sensors. A sensor panel for the E-Nose is designed on a breadboard as shown in

Figure 3.3(a) using the sensors, and later the sensor panel is implemented on a PCB as

Ref. code: 25605722300067ONR

18

shown in Figure 3.3(b). The sensor panel is sensitive to methyl mercaptan, trimethyl

amine, hexane, acetone, ammonia, benzene, carbon monoxide, hydrogen sulphide,

hydrogen, butane, acetylene, ethylene, and propane.

The MOG sensors used for this research are conductivity sensors. Basic

construction material of these sensors is tin di-oxide (SnO2). Resistance of these

sensors is high in the presence of free air or oxygen. In contact with target VOCs or

gases resistivity of the sensors decreases, this causes increase of the sensor current as

well as the load current. As a result, the voltage across the load resistor increases

proportionately to the VOCs concentration levels. This voltage signals across the load

resistors of the sensors are recorded. The circuit voltage and heater voltage of the

sensors are set to 5 volts dc throughout the experiment.

Figure 3.2 Fruit samples at different ripeness states: (a) unripe banana, (b) ripe

banana, (c) rotten banana, (d) unripe mango, (e) ripe mango, (f) rotten mango, (g)

unripe sapodilla, (h) ripe sapodilla, (i) rotten sapodilla, (j) unripe pineapple, (k) ripe

pineapple, and (l) rotten pineapple.

Ref. code: 25605722300067ONR

19

(a)

(b)

Figure 3.3 The E-Nose sensor array. (a) Sensors mounted on a breadboard, (b) PCB

design. 5 Sensors are of TGS 26XX series and 3 sensors are of TGS 8XX series.

Cermet type variable resistors are used. [83]

Ref. code: 25605722300067ONR

20

Table 3.1. Figaro gas sensors used in the E-Nose design and the gases or VOCs to which the sensors are sensitive.

[83]

Sensor

model

Sensor

Index

Gases and VOCs

CH2 C2H2 C3H8 C4H10 H2 H2S CO C6H6 NH3 (CH3)2CO C6H14

Trimethyl

amine and

Methyl

mercaptan

TGS 2612 S1

TGS 821 S2

TGS 822 S3

TGS 813 S4

TGS 2602 S5

TGS 2603 S6

TGS 2620 S7

TGS 2610 S8

Ref. code: 25605722300067ONR

21

3.4 Experimental setup

The block diagram of the experimental setup is shown in Figure 3.4. The

experimental setup comprises a sample chamber, measuring chamber, and a data

acquisition system. Concentration of odour VOCs and measurement of corresponding

electronic signal at the sensor load resistors are the two different phases of the

experiment. The sample chamber and the measurement chambers are connected by

two 1 inch transparent plastic tubes via control valves 3 and 4. Fans 3 and 4 are used

to circulate the odour headspace between the sample chamber and the measurement

chamber during measurement phase. To prevent measurement contamination both the

chambers are air tightened during experiment. After each experiment, valves 1 and 2,

and dc fans 1 and 2 are used to circulate the free air through the measuring chamber to

achieve sensor base level response. The sensors, fans, and acquisition device are

powered by dc voltage source. The overall system is in a temperature-controlled

laboratory at 28oC.

Figure 3.4 The E-Nose experimental setup. Valves 1 and 2, and fans 1 and 2 are for

air flow control, valves 3 and 4, and fans 3 and 4 are for circulating the odor between

sample chamber and measurement chamber. A dc power supply powers the sensors,

fans, and the myRIO. [83]

3.5 Data acquisition system

Data acquisition is performed by myRIO data acquisition device which is

operated wirelessly by a LabVlEW installed computer. The labVIEW schematic

Ref. code: 25605722300067ONR

22

diagram for data acquisition is shown in Figure 3.5. The myRIO is powered with a 10

volts dc power supply. At each ripeness state for each fruit type the experiments are

repeated 20 times to produce 20 data samples for each class as shown in Table 3.2.

Two digit class codes are assigned to each class i.e. each fruit at each ripeness state.

The left digit in each of the two digit code defines the fruit type, and the right digit

indicates ripeness levels. For example, 1 stands for banana and 3 stands for rotten, in

the class code 13. To experiment with a fruit the sensors are preheated for five

minutes to achieve proper sensing behaviour. The odour headspace of the sample

chamber containing the fruit sample under test is sampled in sequence as follows: i) a

sample measurement typically takes two minutes to complete, ii) to remove any

residual odour and return the sensors to their base level free air is pumped into the

sensor chamber using control valves 1 and 2 on the right side of the measurement

chamber for three minutes. For these three minutes the headspace is accumulated for

the next experiment cycle. The signals corresponding to odour concentration has a

transient shape at the beginning. It is observed that after 30 to 60 seconds (different

for each fruit at different ripeness state), it starts to be steady. The steepness of

transient and value of steady state responses are different for different fruit and their

ripeness states.

Table 3.2. Experimental data collection plan for fruit odor classification.

Fruit Class label Ripeness state Number of

experiments

Banana

11 Unripe 20

12 Ripe 20

13 Rotten 20

Mango

21 Unripe 20

22 Ripe 20

23 Rotten 20

Sapodilla

31 Unripe 20

32 Ripe 20

33 Rotten 20

Pineapple

41 Unripe 20

42 Ripe 20

43 Rotten 20

Ref. code: 25605722300067ONR

23

Figure 3.5 LabVIEW schematic diagram for data acquisition.

Ref. code: 25605722300067ONR

24

3.6 Pattern Recognition and Classification Algorithms

The SVM, PCA, k-NN, GRNN, MLPNN, RBFNN, LDA, and the

proposed MMM classification methods are presented in this section. X is the data

matrix and t is the corresponding target vector. The elements of t are class labels as

given in Table 3.2. X and t are expressed as in Eq. 3.1 and Eq. 3.2,

M

m

X

X

2X

1X

X

and

M

m

t

t

2t

1t

t

, (3.1)

where

L,m,NL,m,nL,m,

l,m,Nl,m,nl,m,

,m,N,m,n,m,

,m,N,m,n,m,

m

XXX

XXX

XXX

XXX

X

1

1

2212

1111

and

L,m

l,m

,m

,m

m

t

t

t

t

t

2

1

, (3.2)

l is experimental indices of class m, where, l=1,2,…,L; m=,1,2,…,M; M is the number

of classes, L is the number of data samples in each class, n=1,2,…,N is sensor index

and N is the number of sensors. The rows in X are experimental samples and the

columns are feature or sensor variables. The training, validation and testing data are

chosen from X as per the ratio defined by the designer. From each class 70% i.e. 0.7L

samples are taken for training the classification algorithms in this research.

Experimental samples are recorded into the rows of X and columns of X represent

sensor variables or features. The data ratio to be chosen from the matrix X for testing,

validation, and training are defined by the designer. For this research, to train the

classification methods by 70% i.e. 0.7L samples are chosen.

3.6.1 Principal Component Analysis (PCA)

Preservation of data variance while reducing dimensionality is provided

by the unsupervised classification method known as PCA [41, 62-66]. Orthogonal

Ref. code: 25605722300067ONR

25

transformation process is applied to convert the data to a new feature space.

Maximum data variance in the original space is represented by PC1, the second

largest data variance is orthogonal to PC1 and is the second PC2, the third largest data

variance corresponds to PC3 which is orthogonal to both PC 1 and PC 2. This process

is followed by the higher dimensional PCs. The PCA algorithm is given as follows:

Step 1. Evaluate the covariance matrix of X.

Step 2. Calculate the eigenvectors and eigenvalues of the covariance matrix.

Step 3. Eigenvectors are to be sorted in descending order of eigenvalues.

Step 4. PC1 is the eigenvector with maximum eigenvalue, and so on for the other

PCs according to descending eigenvalues.

3.6.2 k-Nearest Neighbor (k-NN)

The k-NN [19, 65-69] is the simplest machine learning techniques. During

training phase feature vectors and class labels corresponding to training data are

loaded into memory from data matrix X and the target vector t, respectively as in Eq.

3.1. The Euclidean distance metric is applied to find the nearest k training samples

from the test data vector. A test data is assigned to the winning class by a majority

voting performed between the closest k neighbors. k is chosen not to be a multiple of

the number of classes but an odd number to avoid any possible tie. The order of

computational complexity of k-NN is given by O(LM(N+k)).

3.6.3 Support Vector Machine (SVM)

For SVM [65-72] classification hyperplanes are found such that they do

not include interior data and provide maximum margin between pairs of classes.

Support vectors are the data points which lie on the hyperplanes. Maximum-margin

hyperplanes lie between two hyperplanes to maximize the distances to the support

vectors of related classes. The SVM algorithm is given as follows:

Step 1. The Lagrangian dual in Eq. 3.3 is maximized such that 0,, m l

mltml

and 0, ml .

Ref. code: 25605722300067ONR

26

jgml

jgmljgtmltjgmlml

mlmlG,,,

,,,,,,2

1

,,, xx , (3.3)

where g and l are experiment indices within a class, j and m are class indices, l,m

represent Lagrange multipliers, tl,m represent class labels (+1 or -1), and x are data

vectors.

Step 2. Calculate the values of l,m from Eq. 3.3.

Step 3. Find the weight vector w by using l,m and Eq. 3.4.

ml

mlmltml,

,,, xw , (3.4)

Step 4. Bias b is calculated by Karush–Kuhn–Tucker condition [84] shown in Eq.

3.5 using the weights found in step 3.

01,,,

b

mlmlml xwt , (3.5)

Step 5. Test data are classified as per to the sign of the solution of Eq. 3.6,

btbfS

ssss

1

uxuwu , (3.6)

where, s, S, b, and u are support vector index, number of support vectors, bias, and

data vector under test, respectively.

The basic SVM is a binary classifier. A multiclass SVM classifier is a

combination of multiple binary SVM classifiers. For classification by a multiclass

SVM a majority voting between the binary SVM classifiers are performed. Based on

small margin or large margin violations SVM complexity varies between O(NL2M

2)

and O(NL3M

3).

Ref. code: 25605722300067ONR

27

3.6.4 Multilayer Perceptron Neural Network (MLPNN)

The MLPNN [63, 64-66, 75-79] is a supervised feed-forward error back-

propagation neural network. Figure 3.6 shows a three layer MLPNN. Layers of the

MLPNN are fully connected to the next layer as a directed graph. After every epoch

the mean squared errors (MSEs) are calculated between outputs and targets. Synaptic

weights and activation thresholds are adjusted to reduce the MSEs for next cycle. The

input, hidden and output layers have N, J, and D neurons, respectively. N corresponds

to eight sensors. The hidden layer neurons use sigmoid activation function. The output

layer use linear activation function. The MLPNN training algorithm is given below.

Step 1. Weights are initialized to small negative and positive random values.

Step 2. Apply training data to the network to get the network outputs and

calculate errors as shown in step 3 to step 6.

Step 3. Compute the backpropagation error terms for the links from hidden

neurons to output neuron as in Eq. 3.7 below,

Figure 3.6 Block diagram of a multilayer perceptron neural network.

mltmlymlymlyml ,

3

,

3

,13

,

2

, . (3.7)

Ref. code: 25605722300067ONR

28

Step 4. By Eq. 3.8 below back-propagation error term is calculated for each

hidden node,

D

djdwmljyjyj

1

22

,

21

21 . (3.8)

Step 5. The synaptic weights from a node in layer 1 to a node (neuron) in layer 2

are updated as, w1

ij = –1jxl,m,n, and apply, w

1ij = w

1ij + w

1ij. The synaptic

weights from a node at layer 2 to a node in layer 3 are update as, w2

jd = –2dyj,

and apply, w2

jd = w2jd + w

2jd.

Step 6. Mean square error (MSE) is calculated by Eq. 3.9 as follows:

LM

lmlmtlmy

LMMSE

15.0,

1,1

2

,

3

,15.0

1

. (3.9)

As 15% data from each class from matrix X are utilized for validation, L in above

equation is multiplied by 0.15.

Step 7. Until the number of epoch limit or the error limit reached the process is

repeated from Step 2.

Network classification performance is tested by 15% remaining data after

training and validation.

Assumptions:

w1

ij are weights from node i of layer 1 (input layer) to node j of layer 2 (hidden

layer),

w2

jd are weights from node j of layer 2 to node d of layer 3 (output layer),

hidden layer neurons‟ activation functions are tan sigmoid transfer functions,

y2

j is the output from node j of layer 2,

y3

l,m is the output from the output layer,

tl,m is the corresponding target,

is the learning rate,

Biases to any neuron in layer 2 are considered to be 1 (not shown in the algorithm

and in Figure 3.6 for simplicity.) The order of computational cost of MLPNN is

O(I2

MLPNN) where IMLPNN is the total number of neurons in the MLPNN.

Ref. code: 25605722300067ONR

29

3.6.5 Generalized Regression Neural Network (GRNN)

A GRNN [13, 65] has three main layers namely, input, hidden, and output

layers as shown in Figure 3.7. The synaptic weights w1ij are initialized to the training

data vectors and the synaptic weights w2

jd are initialized to the targets corresponding

to the training data at the beginning of the training phase. Hidden layer receives inputs

from the input layer and calculates the Euclidean distances (weights) from the training

data to the test data and produce output using Gaussian activation function (hj). At the

Figure 3.7 Generalized regression neural network block diagram. d2

l,m is Euclidean

distance, is the spreading factor, m is class index, and l is data index within class m.

output layer the class label of the test data is obtained by predefined decision

mapping. The GRNN training algorithm is given below.

Step 1. Choose 70% training samples, 15% validation samples and 15% test

samples from matrix X.

Step 2. Initialize the synaptic weights w1ij to the training samples and synaptic

weights w2jd with the corresponding training targets in vector t.

Ref. code: 25605722300067ONR

30

Step 3. Validation inputs xv,m are applied to find validation outputs by Eq. 3.10

as,

ml

mld

ml

mld

ml

mvy

, 22

2

,exp

, 22

2

,exp,

,

t

x , (3.10)

Where is spreading factor, v=1,2,…,0.15L, is validation data index in class m,

and d2

l,m=(xv,m–xl,m)T(xv,m–xl,m), is the square Euclidean distance from a training

sample xi,m to a validation sample xv,m, ((.)T indicates transpose).

Step 4. Mean square error, E is calculated as

mv

mvtmvyE,

2

,,x . (3.11)

Step 5. , Continue to Step 3 and adjust if E > Ethreshold, else stop.

Considering that IGRNN is the total number of neurons in the GRNN the

computational cost of GRNN is O(IGRNN).

3.6.6 Radial Basis Function Neural Network (RBFNN)

The RBFNN [65, 66, 73-77] can be designed in a similar way to the

GRNN shown in Figure 3.7 with exception that the synaptic weights, w2jd, are

initialized to small random number instead the true targets unlike GRNN during the

training phase. The RBFNN algorithm is presented below.

Step 1. is chosen initially randomly.

Step 2. Choose random weights for w2

jd.

Step 3. Calculate the entries of the matrix given in Eq. 3.12.

Ref. code: 25605722300067ONR

31

22

2

exp22

2

exp

22

2

exp22

2

exp

11

111111

L,ML,M,L,M

L,M,,,

XXXX

XXXX

Φ

(3.12)

Step 4. Weight vector, w2 =

–1t is calculated.

Step 5. Output, y= w2 is calculated.

Step 6. Mean squared error, E, is evaluated by, mv

mvmvyE,

2

,, tx .

Step 7. Continue to step 4 and change if E > Ethreshold, else exit.

If optimal weights are found in one epoch the computational complexity

of RBFNN is O(IRBFNN) else the complexity can be up to O(I2

RBFNN). IRBFNN is the

total number of neurons in the RBFNN and is equal to IGRNN.

3.6.7 Linear Discriminant Analysis (LDA)

LDA [34, 35, 37, 41, 46, 47, 65, 66] is a generalization of Fisher's linear

discriminant. It finds a linear combination of features which is used as a linear

classifier or for dimensionality reduction before later classification. LDA assumes that

the independent variables or features are normally distributed. Unlike PCA, LDA is a

supervised classification method. LDA explicitly attempts to model the difference

between the classes of data.

A set of training data samples (70%) are randomly chosen from matrix X

and then a good predictor is found given a new observation x. The idea of LDA is to

find a projection where class separation is maximized. Given two sets of labeled data,

1X and

2X , with class mean vectors

1Xμ and 2Xμ defined as,

Ref. code: 25605722300067ONR

32

L

l mlmXL

7.01 ,

7.0

1Xμ

, (3.13)

where 0.7L is the number of training examples of class mX . The goal of linear

discriminant analysis is to maximize between to within class variance i.e. to maximize

ww

www

within

between

ST

ST

J )(,

(3.14)

where, betweenS is between-class covariance matrix and

withinS is the average of

within-class covariance matrices of the corresponding classes (as LDA considers all

classes within variances to be equal) and defined as

M

m

L

l mml

mmlML

within

T

between

T

1

7.0

1,,7.0

1

1212

XX

XXXX

μXμXS

μμμμS

, (3.15)

where, M=2 for LDA between two classes. For multiclass LDA, LDA of all pairs of

classes are formed and their classification decisions are combined to assign a test data

to a class. Differentiating (w)J with respect to w , setting equal to zero, and

simplifying gives

12

1

XXμμSw

-

within . (3.16)

A new point is classified by projecting it onto the maximally separating direction and

classifying it as in class 1

X among 1

X and 2X if:

Ref. code: 25605722300067ONR

33

2

112, log

X

XXX

2

μμ

Xwp

pml

T

. (3.17)

In the case where there are more than two classes, the classes are

partitioned, and LDAs are used to classify the partitions. One way of partitioning is

done as “one against the rest” where the dataset from one class are put in one group

and everything else in the other. This will result in M classifiers, whose results are

combined. Another common method is pairwise classification, where a new classifier

is created for each pair of classes (giving 2/1MM classifiers in total), with the

individual classifiers combined to produce a final classification. In this thesis LDA is

applied to train with two training classes for false classification and correct

classification analysis in subsection 4.3.1.

3.7 Sensor panel minimization techniques

The number of sensors in an E-Nose sensor panel is needed to be

minimized to reduce E-Nose design cost and complexity. This purpose can be

satisfied by finding and excluding less responsive sensors to the target odour from the

panel without affecting the E-Nose classification performance. In this section the

novel sensor array minimization techniques developed in this research for fruit odour

detection are discussed.

3.7.1 Exhaustive search method

All possible combinations of sensors are found first. The number of

combinations is, 12 N

, where N is the total number of sensors [83]. For this

research the classification algorithms (multiple SVM, and k-NN) are then trained with

70% randomly chosen samples from all the classes. Remaining 30% of the samples

are then used to verify the pattern recognition capability of each combination of

sensors. The sensor combinations, with minimum number of sensors and acceptable

pattern recognition error level, are the optimal sensor sets.

Ref. code: 25605722300067ONR

34

3.7.2 PCA loading and mutual information based approach [83]

PCA analysis of the training data over all the classes is performed. The

PC loading information is recorded in matrix A as in Eq. (3.18).

NNaNa

jia

Naa

,,1

,

1,1,1

A , (3.18)

where, ai,j indicates loading of i th sensor variable on PC j, and N is the number of

sensors. The columns of the matrix A define the PCs.

As the experimental data within each class are randomly distributed with

respective means and variances, the sensor data related to a class is considered to be

Gaussian distributed in this thesis. Entropy of a univariate Gaussian random variable

is given as in Eq. (3.19).

22ln

2

1)( zeZH , [85] (3.19)

where, e is a constant, 2z is the variance of random variable Z. Thus mutual

information between two random variables, Y and Z is given by,

2,1ln

2

1, ZYcorrZYI , [85] (3.20)

where, corr indicates correlation. From Eq. (3.20) when, 0, ZYcorr , the

mutual information will be large. This is possible when loadings of the sensors under

consideration, on any PC, are close in magnitude and have same sign, i.e. coincide

with same side (positive or negative axis) of a particular PC. Figure 3.8 shows a

loading plot of the experimental data on first two PCs. For example, in Figure 3.8, as

TGS822 and TGS813 are inclined to same (positive) side of PC 1, they are expected

to have large mutual information with TGS822 larger loading on PC 1. Thus TGS822

should be chosen among the two. On the other hand, TGS822 and TGS2602 should

have small mutual information. More detail is presented in results and discussion

section. On inclusion of every new sensor in the optimal sensor group, the group‟s

error performance is verified. If the performance is not satisfactory, a new sensor is

Ref. code: 25605722300067ONR

35

chosen following similar process as per descending values of mutual information

from higher dimensional PCs. The mutual information between each pair of sensors

for the training samples over all classes is calculated as per Eq. (3.20) and stored in

the B matrix as in Eq. (3.21).

NNsNs

jis

Nss

,,1

,

1,1,1

B , (3.21)

where, jis , is the mutual information between sensors i and j , B is a NN

dimensional symmetric matrix, and N is the number of sensors. The diagonal elements

in matrix B for which, i equal to j are self-information and are very large. These

elements are not required and are omitted for simplicity of the algorithm to find the

elements with maximum mutual information and descendants.

Figure 3.8 Two dimensional PC Loading of the Fruits‟ Odor Data to Explain PC

Loading and Mutual Information Based Approach. [83]

Ref. code: 25605722300067ONR

36

3.7.3 Threshold based approach

For any Gaussian variable 99.7% of the data remain within three standard

deviations on both side of the mean. Gaussian curve of the TGS822 sensor response

for green mango, ripe banana, and ripe sapodilla are plotted with respective standard

deviation and mean. As shown in Figure 3.9, if the ratio of the distance between the

means of two Gaussian random variables to the summation of three times of the

standard deviations of the corresponding Gaussian random variables is approximately

equal then the bell shapes of the Gaussian functions will be sufficiently apart or

negligibly overlap. In Figure 3.9, the Gaussian curves for green mango and ripe

banana do not overlap, while Gaussian curves for ripe banana and ripe sapodilla

overlaps significantly as their standard deviation is high compared to mean distances.

The sensors for which this overlap occurs are not good choice to classify

corresponding classes. Let this ratio be defined as,

thnjni

njninji

,,

,,,, , (3.22)

where, ni, and nj , are the means of classes i and j respectively, ni, and

nj , are standard deviations of classes i and j respectively, n is the corresponding

sensor, and th is the threshold. Larger values of nji ,, indicate that means of two

classes are sufficiently far and/or the variances are small enough and the possibility of

overlapping of the corresponding Gaussian functions are less. For each sensor

according to Eq. (3.22) the ratio, nji ,, , of all pairs of classes are calculated. The

class pairs and sensors for which nji ,, are greater than or equal to the threshold (

th ) are recorded in a NPN dimensional matrix, where, PN is the number of

pairs of classes and N is the number of sensors. The sensor that classifies highest

number of pairs is chosen first, the detection error to classify all classes are calculated

by classification algorithms. If the detection error is not acceptable, another sensor is

added such that the first and second sensor acting together classifies more number of

Ref. code: 25605722300067ONR

37

class pairs. If the error performance is not acceptable, one more sensor is added such

that maximum number of class pairs is classified. This process should be continued

until the error rate is acceptable. The sensor combination that classifies the classes

with minimum number of classification errors is the optimal sensor set.

Figure 3.9 Gaussian functions with means, µ=0.61, µ=1.30, µ=1.90 and standard

deviations =0.04, =0.22, =0.34 for green mango, ripe banana, and ripe sapodilla,

respectively. [83]

3.8 Hyperplane versus Hyperspheric Classification

Sensors convert odour data to electronic signal. These signals are then pre-

processed to make it suitable to be applied to pattern recognition algorithms for

training. Meaningful extraction of information from sensed data with good correct

classification and less false alarm necessitates a scrupulous pattern recognition method.

Hyperspheric classification boundary based classification methods are potential

candidates for reduction of false classification rate and thereby improve “correct

rejection” rate in E-Nose applications.

Ref. code: 25605722300067ONR

38

Figure 3.10 A PCA of the dataset to explain false classification problems of k-NN,

SVM, LDA, and MLPNN. The dataset is a PCA scores plots of the four fruit types

each at three ripeness states. In the figure GB is unripe banana, RB is ripe banana,

RtB rotten banana, GM is unripe mango, RM is ripe mango, RtM is rotten mango, GS

is unripe sapodilla, RS is ripe sapodilla, RtS is rotten sapodilla, GP is unripe

pineapple, RP is ripe pineapple, and RtP is rotten pineapple.

The data point “R” shown in Figure 3.10 does not belong to any class as it

is reasonably far from all the classes although it is situated in the middle of the dataset

and an E-Nose should not classify it to any odour class which might cause false

classification. However it will be classified to the nearest class by k-NN. With SVM

and LDA classification methods it will fall on the side of a class and thereby will be

false classified to the class. An MLPNN classification boundary although encloses the

inner classes by the outer classes remain open as a Voroni diagram. In addition the

enclosed inner classes contain empty spaces (as seen in Figure 3.10) within the classes

which should not be. Thus performing pattern recognition by any method which

produces hyperplanes to classify will classify the unwanted data to a class among any

two classes and cause false classification. To overcome this false classification issue

Ref. code: 25605722300067ONR

39

algorithms which classify data by hyperspheres shall be considered. In Figure 3.11

enclosed ellipsoid boundaries are shown around the classes. This can be achieved with

RBFNN and GRNN with Gaussian activation functions in the hidden layer. In this

process the any data and empty spaces around the relevant classes are excluded and

false classification does not occur.

As researches show GRNN to be better functioning compared to RBFNN,

GRNN should be chosen to overcome false classification for E-Nose applications. As

the number of neurons required by GRNN is high, an approximate GRNN [86] is also

presented in this thesis. The number of hidden layer neurons needed by approximate

GRNN is equal to the number of classes. In the next subsection a novel hyperspheric

classification method based on minimum, maximum, and mean of each features for all

the classes is shown which is also capable to reduce false classification error.

Figure 3.11 A PCA plot with closed boundary to explain the effect of hyperspheric

closed boundary to avoid false classification. In the figure GB is unripe banana, RB is

ripe banana, RtB rotten banana, GM is unripe mango, RM is ripe mango, RtM is

rotten mango, GS is unripe sapodilla, RS is ripe sapodilla, RtS is rotten sapodilla, GP

is unripe pineapple, RP is ripe pineapple, and RtP is rotten pineapple.

Ref. code: 25605722300067ONR

40

3.8.1 Minimum-maximum-mean (MMM) hyperspheric classification method

In this thesis a simple classification method named as MMM [87] method

is proposed. The method is based on maximum, minimum, and mean responses of the

sensors for different classes. The minimum vectors (nCv ), mean vectors (

nCu ), and

maximum vectors (nCq ) are calculated during training for each class from the

training data samples and are stored in matrices Q, V, and U, respectively as

NMNMM

N

MM

m

,1,

,11,1

1

1

qq

qq

q

q

q

Q

, (3.23)

NMNMM

N

MM

m

,1,

,11,1

1

1

vv

vv

v

v

v

V

, and (3.24)

NMNMM

N

MM

m

,1,

,11,1

1

1

uu

uu

u

u

u

U

. (3.25)

In Eqs. (3.23)-(3.25) N and M are the number of sensors and number of

samples, respectively. The rows of the matrices of Q, V, and U, are minimum,

Ref. code: 25605722300067ONR

41

maximum, and mean vectors, respectively. While the columns of Q, V, and U

represent sensor variables. Features of a test data vector is compared element by

element to the minimum and maximum vectors of each class. A test data is classified

to a class for which the following two criteria satisfy: (i) every feature variable of the

test data vector is less than or equal to every element of the maximum vector of the

corresponding class and (ii) every feature of the test data vector is greater than or

equal to every element of the minimum vector of the corresponding class. The criteria

(i) and (ii) ensure the minimum-maximum (min-max) range assessment for each

sensor i.e. feature variable of the test data. As shown in Figure 3.11 test data might be

assigned to multiple classes due to minor overlapping between classes. If any odor

data is classified to multiple classes the Euclidean distances betwixt test data and the

class mean vectors of the corresponding tied classes in matrix U are calculated. The

test data is assigned to the class which mean vector is the closest to the test data

vector. The algorithm is given as follows:

Step 1. Compute the maximum, minimum, and mean matrices Q, V, and U of the

training dataset.

Step 2. Compare each feature of a test data vector to the corresponding feature in

the minimum and maximum vectors of each class.

Step 3. Assign a test data to a class if every feature of the test data is within the

min-max limits of a class. If the test data does not fall within the min-max range of

any of the classes, then label it as “unclassified” or “correctly rejected,” and stop.

Step 4. If any test data are within min-max limits of multiple classes, then the

case of a tie occurs. To break this tie the test data are assigned to the class whose

mean vector (measured by Euclidean distance metric) is the closest to the test data

vector. Once the test data are assigned to a class, exit the program.

Step 5. Run the steps from 1 to 4 and calculate the percentage of error by

215.0,15.01,1 ,,

MLml mlmlMMME ty . If the error

thresholdMMM EE , exit, else add

m η σ to corresponding rows of Q to expand the hyperspheric classification

boundary and continue to Step 1. Where, η is learning rate, mσ is standard deviation

vector of class m.

Ref. code: 25605722300067ONR

42

All sensors in the sensor array are required to function properly. A

malfunctioning sensor should be detected and replaced, and the E-Nose should be

trained again to achieve good classification performance.

Each of the mean, minimum, and maximum matrices has computational

cost of O(M). If tie occurs the complexity becomes O(M+Ities(N+k)), where Ities is the

number of tied classes. The terms Ities(N+k) above indicates complexity to find the

nearest class from the means of the tied classes and k is considered 1 for tie breaking.

Due to E-Nose data pattern ties are less likely to occur and thereby the computational

complexity remains low.

Simulation and analytical results of the sensor panel minimization

methods, and hyperspheric classification methods presented in this chapter are shown

in the next results and discussions chapter.

Ref. code: 25605722300067ONR

43

Chapter 4

Results and Discussions

4.1 Data Preprocessing

Time versus voltage responses of the sensors of one measurement to the

VOCs from four fruit types at each ripeness state namely green mango, ripe mango,

rotten mango, green banana, ripe banana, rotten banana, green sapodilla, ripe

sapodilla, rotten sapodilla, green pineapple, ripe pineapple, and rotten pineapple are

shown in Figure 4.1. The response curves have two segments, namely, transient and

steady state. Average value of the steady state segment for every sensor is calculated.

The combination of these average values forms a signature pattern for an experiment.

Training data is a collection of signatures from all the experiments. The experiments

for each type of the fruits at three ripeness states are repeated 20 times forming total

240 samples of experimental data for twelve classes.

Figure 4.2 shows a three dimensional PCA scores plot of the training data

of four types of fruits (pineapple, banana, sapodilla, and mango) at three different

ripeness states. The labels G, R, Rt indicate unripe, ripe, and rotten states of fruits,

respectively, while B, M, S, and P represent pineapple, mango, banana, and sapodilla,

respectively. For example, RtB stands for rotten banana.

It is seen from Figure 4.2 that the scores of ripe banana and ripe pineapple

do not overlap with any other classes and are fully classifiable, while the scores of the

other fruits at different ripeness states have minor overlap. This overlapping indicates

that the minor classification likely to occur to classify corresponding classes.

Ref. code: 25605722300067ONR

44

Figure 4.1 Sensor responses from eight sensors to four kinds of fruits at three

ripeness states (continued to next page). [83]

Ref. code: 25605722300067ONR

45

Figure 4.1 Sensor responses from eight sensors to four kinds of fruits at three

ripeness states (continued from previous page).

Figure 4.2 PCA scores plot of training data of four fruit types at three ripeness states.

Ref. code: 25605722300067ONR

46

4.2 Sensor panel minimization methods

Performance of two approaches of sensor panel minimization techniques

such as between to within variance based method [56-57] and exhaustive search

method are presented first. Later performance of two proposed novel methods of

sensor panel minimization are also shown.

4.2.1 Between to within variance based method

In Figure 4.3 the between to within [56-57] class variances for each

sensor are shown. To find the optimal set of sensors, the sensors are sorted as, S3, S4,

S5, S6, S7, S8, S1, and S2, as per their descending values of between to within class

variances. As S3 has the highest ratio of between to within variance, it is picked first.

Next, two sensors S3 and S4 are picked, and then three sensors S3, S4, and S5 are

picked, and so on. The percentage of pattern recognition error caused by, MLPNN,

RBFNN, k-NN, and SVM classification algorithms for different combination of

sensors is listed in Table 4.1. The MLPNN and RBFNN show very high classification

errors compared to k-NN and SVM. We see that the pattern recognition errors with k-

NN and SVM algorithms are less than 10% for three or more sensor combinations.

Thus the three sensor combination S3, S4, and S5 is considered as the optimal set of

sensors with k-NN, or SVM chosen as classification algorithm. For improved

performance sensor combinations with more number of sensors should be preferred,

as shown in Table 4.1.

Figure 4.3 Between to within variance of all the classes for each sensor. [83]

Ref. code: 25605722300067ONR

47

Table 4.1 Pattern recognition error rates of test data for between to within variance

method. [83]

4.2.2 Exhaustive search method

Pattern recognition capability of all the (28 – 1 = 255) combinations of

sensors are evaluated by MLPNN, RBFNN, k-NN, and SVM. The sensor

combinations with minimum pattern recognition errors for different number of

sensors are recorded in Table 4.2 and Table 4.3. It is observed from Table 4.2 and

Table 4.3 that MLPNN and RBFNN show higher error compared to k-NN and SVM.

The sensor combinations with three or more sensors show less than 10% errors for k-

NN and SVM. For the three sensor combinations k-NN and SVM show 9.72% and

2.78% classification errors, respectively. It is observed that the three sensor

combination S3, S5, and S8 is common in both the cases. For more than three sensor

cases the pattern recognition error decrease more for SVM, compared to k-NN.

In the next subsections the results of the proposed methods to reduce the

number of sensors for an E-Nose are presented.

Sensor combinations with minimum

error

Percentage error

MLPNN RBFNN k-NN SVM

(S3) 88.89 87.50 20.83 37.50

(S3, S4) 94.44 91.67 11.11 11.11

(S3, S4, S5) 80.56 69.44 6.94 4.17

(S3, S4, S5, S6) 72.22 80.56 6.94 2.78

(S3, S4, S5, S6, S7) 72.22 54.17 5.56 2.78

(S3, S4, S5, S6, S7, S8) 55.56 63.89 4.17 2.78

(S3, S4, S5, S6, S7, S8, S1) 55.56 33.33 0.00 1.39

(S3, S4, S5, S6, S7, S8, S1, S2) 19.44 27.78 0.00 0.00

Ref. code: 25605722300067ONR

48

Table 4.2 Classification error by MLPNN and RBFNN methods. For different number of sensor cases, the combinations

with minimum pattern recognition errors are recorded. [83]

MLPNN RBFNN

Sensor combination

Pattern

recognition

error (%)

Sensor combination

Pattern

recognition

error (%)

Single sensor case (S7), (S8) 80.56 Single sensor case (S8) 79.17

Two sensor cases (S2, S8), (S5, S8), (S6, S8) 66.67 Two sensor cases (S4, S5), (S6, S8) 60.06

Three sensor cases (S3, S6, S8), (S5, S6, S8) 44.44 Three sensor cases (S2, S4, S5) 62.50

Four sensor cases (S1, S2, S5, S8), (S2, S5, S6,

S8), (S3, S5, S6, S8) 52.78 Four sensor cases (S1, S2, S5, S7), (S1, S2, S5, S8) 41.67

Five sensor cases (S1, S2, S4, S5, S7), (S1, S2,

S4, S5, S8), (S2, S3, S5, S7, S8) 38.89 Five sensor cases

(S1, S2, S3, S4, S5), (S1, S2, S4,

S5, S8) 33.33

Six sensor cases (S1, S2, S3, S5, S6, S7) 38.89 Six sensor cases

(S1, S2, S3, S5, S6, S8), (S1, S3,

S4, S5, S7, S8), (S2, S3, S5, S6,

S7, S8)

31.94

Seven sensor cases (S1, S2, S3, S4, S6, S7, S8) 27.78 Seven sensor cases (S1, S2, S3, S4, S5, S6, S7) , (S2,

S3, S4, S5, S6, S7, S8) 31.94

Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 19.44 Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 27.78

Ref. code: 25605722300067ONR

49

Table 4.3 Classification error by k-NN and SVM methods. For different number of sensor cases, the combinations with

minimum pattern recognition errors are recorded. [83]

k-NN SVM

Sensor combination

Pattern

recognition

error (%)

Sensor combination

Pattern

recognition

error (%)

Single sensor case (S8) 27.78 Single sensor case (S5) 30.56

Two sensor cases (S3, S8), (S4, S6), (S5, S7),

(S6, S7), (S6, S8) 15.28 Two sensor cases (S5, S8) 4.17

Three sensor cases (S3, S5, S8), (S4, S5, S7), (S4,

S5, S8) 9.72 Three sensor cases

(S3, S5, S7), (S3, S5, S8), (S5, S6,

S7) 2.78

Four sensor cases

(S3, S4, S5, S6), (S3, S4, S5,

S8), (S4, S5, S6, S7), (S4, S5,

S7, S8)

8.33 Four sensor cases (S3, S5, S6, S7), (S3, S5, S6, S8) 1.39

Five sensor cases

(S3, S4, S5, S6, S7), (S3, S4,

S5, S6, S8), (S3, S4, S5, S7,

S8)

8.33 Five sensor cases (S3, S4, S5, S6, S7), (S3, S4, S5,

S7, S8), (S3, S5, S6, S7, S8) 2.78

Six sensor cases S3, S4, S5, S6, S7, S8 9.72 Six sensor cases S3, S4, S5, S6, S7, S8 2.78

Seven sensor cases (S1, S2, S3, S4, S5, S6, S8),

(S1, S2, S3, S4, S5, S7, S8) 8.33 Seven sensor cases (S2, S3, S4, S5, S6, S7, S8) 0.00

Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 5.56 Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 0.00

Ref. code: 25605722300067ONR

50

4.2.3 PCA loading and mutual information based approach

Table 4.4 and Table 4.5 show the PC loadings and mutual information

between each pair of sensor data, respectively. The diagonal elements of Table 4.5 are

self-information hence large and are ignored for simplicity of the algorithm to find the

sensor pairs with high mutual information. The sensor pair S3 and S7 has the largest

mutual information (Table 4.5) and S3 has higher loading on PC1 (Table 4.4). Thus,

sensor S3 is chosen. The sensor pair S7 and S8 has the second largest mutual

information (Table 4.5) and S8 has higher loading on the negative PC2 axis (Table

4.4). Thus, S8 is chosen from the pair S7 and S8. The sensor pair S5 and S6 has the

third largest mutual information (Table 4.5) and S5 has higher loading on the positive

PC2 axis (Table 4.4). Thus, S5 is chosen from the pair S5 and S6. In this way, the

minimal set of sensors is S3, S5, and S8, with 9.72% pattern recognition error for k-

NN, and 2.78% for SVM (Table 4.3). For the sensor combination S3, S5, an S8 the

pattern recognition errors with MLPNN and RBFNN are 77.78% and 87.50%,

respectively (Table 4.2). High error with MLPNN and RBFNN indicates that they are

not good choice when number of sensors is reduced.

Table 4.4 Principal component loadings. [83]

Sensors

Principal Component (PC)

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8

S1 0.0954 –0.0401 0.1541 –0.2580 0.3402 0.8735 0.1418 –0.0113

S2 0.0307 0.1511 0.6316 0.4201 0.5535 –0.1618 –0.2200 0.1411

S3 0.5877 –0.2413 –0.3490 0.6504 –0.0319 0.2050 –0.0893 –0.0231

S4 0.4995 –0.2156 0.6289 –0.1967 –0.5099 –0.0467 0.0684 –0.0543

S5 0.3328 0.6816 –0.1111 –0.2193 –0.0844 0.0747 –0.5773 –0.1353

S6 0.3689 0.4790 –0.0792 –0.0222 0.1852 –0.2032 0.7426 0.0259

S7 0.2974 –0.2336 –0.1785 –0.3790 0.2305 –0.1776 –0.1599 0.7558

S8 0.2444 –0.3497 –0.0976 –0.3210 0.4716 –0.2969 –0.0930 –0.6215

Ref. code: 25605722300067ONR

51

Table 4.5 Mutual Information Between Pairs of Sensors. The self-information in

diagonal cells is omitted. [83]

4.2.4 Threshold based method

For any Gaussian variable, 99.7% data remain within three standard

deviations on both sides of the mean. Thus according to Eq. (3.22), a choice for the

threshold th equal to 3 will not pick any sensor for which class overlapping occurs.

This confirms that the algorithm will pick those sensors which cause less overlapping

and thereby reduce the number of classification errors. Smaller threshold raises

acceptable error limit, and larger threshold decreases the acceptable error limit with 3

as the optimal threshold. Four kinds of fruits at three ripeness states make 12 classes.

The total numbers of class pairs are 66. Each pair is classified sequentially with each

sensor and the corresponding errors are recorded. The smallest group of sensors that

meets the desired error limit is chosen. The three and four sensor combinations, the

number of class pairs they classify, and corresponding classification errors are listed

in Table 4.6 and Table 4.7. It is seen from Table 4.6 that the three sensor combination,

S3, S5, and S8 classifies maximum pairs of classes, compared to the other three

sensor combinations. From Table 4.7, four sensor combinations, (S3, S4, S5, and S8),

(S3, S5, S6, and S8), and (S3, S5, S7, and S8) classify more pairs of classes than the

other four sensor combinations. The classification performance of the three sensor

combinations and four sensor combinations noted above are verified by MLPNN,

RBFNN, k-NN and SVM. For both three and four sensor combinations MLPNN and

RBFNN show large classification errors. It is found that for k-NN and SVM the three

sensor combination S3, S5, and S8 has 9.72% and 2.78% pattern recognition errors,

Sensor

Index S1 S2 S3 S4 S5 S6 S7 S8

S1 - 0.0516 0.7881 0.9682 0.3070 0.4602 0.8097 0.6833

S2 0.0516 - 0.0197 0.0555 0.1382 0.1151 0.0061 0.0005

S3 0.7881 0.0197 - 1.4419 0.4015 0.6539 1.6569 1.0766

S4 0.9682 0.0555 1.4419 - 0.3783 0.6104 1.3474 1.0017

S5 0.3070 0.1382 0.4015 0.3783 - 1.5397 0.2889 0.1517

S6 0.4602 0.1151 0.6539 0.6104 1.5397 - 0.4842 0.2879

S7 0.8097 0.0061 1.6569 1.3474 0.2889 0.4842 - 1.5770

S8 0.6833 0.0005 1.0766 1.0017 0.1517 0.2879 1.5770 -

Ref. code: 25605722300067ONR

52

respectively as found by exhaustive search method, and PCA loading and mutual

information based approach. Among the four sensor cases, (S3, S4, S5, S8) shows

8.33% and 4.17% errors for k-NN and SVM, respectively. The other sensor

combinations show higher number of errors (Table 4.6 and Table 4.7). Thus with the

threshold based method it is found that the three sensor combination (S3, S5, S8) is

the minimal sensor set.

Implementation complexity of both PC loading and mutual information,

and threshold based approaches are similar. With the PC loading and mutual

information based approach, and threshold based approach the classification

algorithm is needed to be simulated few number of times to find the minimal sensor

set. The total complexity of the PCA loading and mutual information based approach

is the sum of the PCA complexity, complexity of finding mutual information, and the

number of trials multiplied by the complexity of the classification algorithm. Whereas

with the threshold based approach few suboptimal combinations of sensors are found

first, and later their pattern recognition performances are analysed with a

classification algorithm. Thus complexities of the proposed methods are less, as the

expensive classification algorithm is needed to be simulated much less number of

times compared to exhaustive search or GA method. Based on the complexity and

pattern recognition errors, the SVM algorithm along with the PCA and mutual

information based method, or the threshold based method can be chosen to design an

E-Nose sensor panel with minimum number of sensors at acceptable error rate. It is

found that an E-Nose can be designed picking only three sensors (S3, S5, S8) and

SVM as the classification algorithm to classify banana, mango, sapodilla, and

pineapple at three ripeness states with 2.78% possible misclassification errors. This

result is also better compared to the between to within variance ratio method [56-57],

where (S3, S4, S5) is the optimal sensor set with 6.94% error with k-NN and 4.17%

error with SVM algorithm.

Ref. code: 25605722300067ONR

53

Table 4.6 Maximum number of pairs of classes classifiable by combinations of three

sensors. [83]

Three sensor

combinations

Number of pairs of

classes classifiable out

of total 66 pairs

Detection errors (%)


S3, S4, S5 52 80.56 69.44 11.11 4.17

S3, S5, S7 52 80.56 91.67 12.50 2.78

S3, S5, S8 54 77.78 87.50 9.72 2.78

S5, S6, S8 53 44.44 98.61 13.89 4.17

S5, S7, S8 53 72.22 94.44 13.89 5.56

Table 4.7 Maximum number of pairs of classes classifiable by four sensors

combinations. [83]

4.3 False classification reduction: Hyperplane versus hyperspheric

In this section classification performance of the present hyperplane and

hyperspheric boundary based classification algorithms to correctly reject irrelevant

data and thereby reduce false classification is analyzed. Later the classification

performance of the MMM method in terms of correct classification and false

classification rate is analyzed.

4.3.1 GRNN compared to SVM, LDA, k-NN, and MLPNN

The mean signature patterns of ripe mango, ripe sapodilla and ripe

pineapple as shown in Figure 4.4 are significantly different from each other.

Responses from the sensors S1 and S2 are small and do not vary significantly for

Four sensor

combinations

Number of pairs of

classes classifiable out

of total 66 pairs

Detection errors (%)


S3, S4, S5, S8 55 72.22 48.61 8.33 4.17

S3, S4, S6, S8 54 63.89 62.50 12.50 6.94

S3, S5, S6, S8 55 52.78 61.11 12.50 1.39

S3, S5, S7, S8 55 72.22 77.78 12.50 5.56

S4, S5, S7, S8 54 63.89 54.17 8.33 6.94

S5, S6, S7, S8 54 63.89 88.89 12.50 2.78

Ref. code: 25605722300067ONR

54

different fruits. The sensors S3 to S8 show good responses for different fruits and

produce distinct signature patterns. The signatures are eight dimensional as eight

sensors are chosen for this experiment. A PCA scores plot in Figure 4.5 shows that

the three classes of fruits are significantly separable in a two dimensional PC space.

Figure 4.4 Signature patterns for three types of fruits. (a) Ripe mango, (b) ripe

sapodilla and (c) ripe pineapple. [86]

Figure 4.5 Scores plot of three types of fruits. [86]

Ref. code: 25605722300067ONR

55

The odor data from ripe sapodilla (class 1) and ripe pineapple (class 2) are

considered as training classes and the data from ripe mango (class 3) are considered as

untrained class. The E-Nose is trained to classify ripe sapodilla and ripe pineapple

into class 1 and class 2, respectively. From each training class 70% data are randomly

chosen for training SVM, LDA and k-NN classification algorithms and remaining

30% are taken for testing. For GRNN and MLPNN 70% odor data are used for

training, 15% for validation and 15% for testing. The data from ripe mango is

considered irrelevant i.e. untrained class for this analysis and are not used to train the

E-Nose classification algorithms. The classification performance of SVM, LDA and

k-NN are tested with the whole dataset from ripe mango and 30% remaining data

from ripe sapodilla and ripe pineapple. For GRNN and MLPNN as 15% of the data

from ripe sapodilla and ripe pineapple are used for validation, remaining 15% from

each of them and whole dataset of ripe mango are used for testing classification

performances. Simulation results (Table 4.8) show that the ripe sapodilla (class 1) and

ripe pineapple (class 2) are rightly classified by each of the algorithms except

MLPNN having 11% misclassification errors. The ripe mango samples (irrelevant

class i.e. class 3) are falsely classified to either ripe sapodilla (class 1) or ripe

pineapple (class 2) by SVM, LDA and k-NN. Whereas MLPNN falsely classifies

1.67% ripe mango samples to class 1 and 8.33% to class 2 and 90% to unknown

classes. The GRNN does not falsely classify the ripe mango samples to any training

class instead correctly rejects them. This is expected as GRNN creates bounded

hyperspheric classification boundaries around the classes to which it is trained. Thus

it is seen that only GRNN does not false classify and correctly rejects data of

irrelevant classes. In other way, it does not result in any “false alarm” or “false

classification”.

Ref. code: 25605722300067ONR

56

Table 4.8 Classification of ripe mango, ripe sapodilla and ripe pineapple samples by

different algorithms. [86]

Exact versus approximate GRNN

For exact and approximate GRNN [86] as shown in Figure 4.6, minimum

classification errors occur at spreading factor 0.06 and 0.15, respectively. The exact

GRNN model shows zero classification error at spreading factor 0.06, while

approximate GRNN shows minimum error of 3.85% at a spreading factor of 0.15.

Reduction of the spreading factor further from 0.06 for exact GRNN and 0.15 for

approximate GRNN over fitting occurs and training class classification performances

of both the GRNN models degrade. It is seen in Figure 4.6 that, classification errors

start to increase at a spreading factor of 0.05 for exact GRNN and at 0.14 for

approximate GRNN. On the other hand, increasing the spreading factors more than

the spreading factors at minimum classification error classification boundary of the

training classes widens for both the GRNN models. Due to this widening, fraction of

the classification boundaries of the training classes overlaps which causes

misclassification of the test data of training classes. Widening classification boundary

also causes data from untrained class to fall into training classes‟ boundary and causes

false classification. As a result total classification error increases. At spreading factor

0.1 for exact GRNN and 0.16 for approximate GRNN classification error start to

increase from the minimum classification error levels. It is also discerned from Figure

4.6 that approximate GRNN can be implemented to reduce implementation

Classification

algorithm

(%) Ripe mango samples (irrelevant class 3

samples) (%) Ripe sapodilla

(class 1) and ripe

pineapple (class 2)

misclassified

false classified as Classified to

unknown

classes

Correct

rejection Class 1 Class 2

SVM 40.00 60.00 0.00 0.00 0.00

LDA 21.25 78.75 0.00 0.00 0.00

k-NN 11.00 89.00 0.00 0.00 0.00

MLPNN 1.67 8.33 90.00 0.00 11.00

GRNN (at

spread = 0.09,

and bias = 6)

0.00 0.00 0.00 100.00 0.00

Ref. code: 25605722300067ONR

57

complexity and cost with a small increase in classification error at a suitable

spreading factor.

Figure 4.6 Classification errors to classify ripe mango samples, and test samples of

ripe sapodilla and ripe pineapple by exact GRNN and approximate GRNN at different

spreading factors. [86]

4.3.2 MMM versus k-NN, SVM, GRNN, RBFNN, and MLPNN

Correct classification rate and false classification rate of the proposed

MMM [87] are compared with that of k-NN, SVM, GRNN, RBFNN, and MLPNN in

this sub-section. 70% of the data samples from sapodilla, pineapple, and banana each

at three ripeness state are used for training and 30% for validation and testing.

The signature patterns of nine trained classes, i.e., green sapodilla, ripe

sapodilla, rotten sapodilla, green banana, ripe banana, rotten banana, green pineapple,

ripe pineapple, and rotten pineapple are shown in Figure 4.7. The signature pattern are

mean signature patterns. They are composed of eight features corresponding to the

eight sensors. It is visible from Figure 4.7 that the signatures are significantly

different and distinguishable from each other. The sensors S1 and S2 show

insignificant variation for different classes. The sensors S3 to S8 show significant

variations and produce distinct signature patterns for different fruit odors classes.

Ref. code: 25605722300067ONR

58

Figure 4.7 Signature patterns of the means of three types of fruits at three ripeness

states: (a) green banana, (b) ripe banana, (c) rotten banana, (d) green sapodilla, (e)

ripe sapodilla, (f) rotten sapodilla, (g) green pineapple, (h) ripe pineapple and (i)

rotten pineapple. [87]

Observation of any particular class for different sensors in the box plot in

Figure 4.8 reveal the classification process by the MMM method. Suppose a test data

which belongs to RS class is needed to be classified. According to MMM method the

test data will be assigned to a class for which all the features of the test data falls

within the min-max ranges of the corresponding features of the test data. From the

box plot in Figure 4.8 it is observed that the min-max range of sensor S1 for class RS

partially over laps with RtB, RtS, and RtP; for S2 the min-max range of RS partially

over laps with RtB, RtS, RP, and RtP; for S3 the min-max range of RS partially over

laps with GS; for S4 the min-max range of RS partially over laps with RtB, GS, and

RtS; for S5 the min-max range of RS partially over laps with RB, RtB, RtS, and RtP;

Ref. code: 25605722300067ONR

59

for S6 the min-max range of RS partially over laps with RB, RtS, and RtP; for S7 the

min-max range of RS partially over laps with RB, GS, and RtS; for S8 the min-max

Figure 4.8 Box plots showing min-max range of the sensor variables for the training

classes i.e. green banana (GB), ripe banana (RB), rotten banana (RtB), green sapodilla

(GS), ripe sapodilla (RS), rotten sapodilla (RtS), green pineapple (GP), ripe pineapple

(RP), and rotten pineapple (RtP).

Ref. code: 25605722300067ONR

60

range of RS partially over laps with RB, GS, RS, and RtS. Thus it is seen that there

exists no class all the features of which partially or fully over laps with the

corresponding features of class RS. As a result the data of class RS will be correctly

classified to the class and will not produce false classification. Same incidents are

likely to occur for other classes and thus the MMM method is capable to reduce false

classification as well as false alarm.

Table 4.9 Time taken by the algorithms to train and test with the data samples of

pineapple, sapodilla, and banana, each at three ripeness states. [87]

Algorithm Train time (sec.) Test time (sec.)

k-NN 0.2175 0.2175

SVM 0.7452 0.0461

GRNN 0.6922 0.0946

RBFNN 0.9922 0.0230

MLPNN 0.8652 0.0153

MMM 0.1874 0.0047

The MLPNN, GRNN, RBFNN, and MMM classification methods are

trained, validated, and tested by 70%, 15%, and 15% randomly chosen data from each

training class. The SVM and k-NN algorithms are trained and validated with 70% and

30% data as they do not require validation step.

In Table 4.9 time consumed by different classification algorithms for

testing and training are shown. The proposed MMM algorithm consumes minimum

training time compared to the k-NN, GRNN, SVM, MLPNN and RBFNN algorithms,

respectively. Testing time consumed by the MMM method is also found to be

minimum followed by the MLPNN, RBFNN, SVM, GRNN, and k-NN algorithms,

respectively. As observed training and testing time consumed by the proposed MMM

method are less compared to the MLPNN, SVM, RBFNN, k-NN, and GRNN

algorithms. The big „O‟ complexities of the algorithms given in section 2.6 are

O(LM(N+k)) for k-NN, O(IGRNN) for GRNN, O(I2

MLPNN) for MLPNN, within

O(NL2M

2) to O(NL

3M

3) for SVM, within O(M) to O(M+Ities(N+k)) for MMM and

within O(IRBFNN) to O(I2

RBFNN) for RBFNN method. It is found that the SVM and

MLPNN requires more computations and thereby are more complex. Due to large

Ref. code: 25605722300067ONR

61

number of hidden layer neuron requirements GRNN and RBFNN complexities are

high although they has low order of complexity. The MMM method incorporates k-

NN to break possible ties, however Ities are less likely to occur and O(M+Ities(N+k)) is

usually insignificant. The complexity of MMM method is approximately O(M).

Comparison between MMM, k-NN, MLPNN, GRNN, RBFNN, and SVM are

presented in Table 4.9.

Table 4.10 Misclassification error and correct classification rate of the classification

algorithms while testing with test data set from training classes i.e. banana, sapodilla,

and pineapple each at three ripeness states. [87]

Algorithm Misclassification error (%) Correct classification (%)

k-NN 1.8519 98.1481

SVM 1.8519 98.1481

GRNN 1.8519 98.1481

RBFNN 24.0740 75.9260

MLPNN 29.6296 70.3704

MMM 1.8519 98.1481

In Table 4.10 correct classification and misclassification performances are

summarized. Misclassification errors with the RBFNN is 24.0740% and for

29.6296%. The MMM, SVM, GRNN, and k-NN classification methods show

1.8519% misclassification errors. The misclassification and correct classification

results listed in Table 4.10 are consistent with the previous works. Classification

accuracy ranges from 82.4% to 100% with k-NN [19, 67-69], 86% to 98.66% with

SVM [67-72], 100% with GRNN [13], 88% to 100% with RBFNN [73-77], and 68%

to 100% with MLPNN [63, 64, 75-79].

Ref. code: 25605722300067ONR

62

Table 4.11 False classification performance of the algorithms with irrelevant data (i.e.

mango odor data at three ripeness states). [87]

Algorithm

False

classification

(%)

Misclassification to

unknown irrelevant

classes (%)

Unclassified or

correctly rejected

(%)

k-NN 100 0 0

SVM 100 0 0

GRNN 0 0 100

RBFNN 15 85 0

MLPNN 35 65 0

MMM 0 0 100

False classification and correct rejection performances of the

classification algorithms analyzed in this paper are summarized in Table 4.11. These

analyses have not been evaluated in literature yet to the best of our knowledge. It is

expected that the odor data from the irrelevant (i.e. mango odor data at three ripeness

states) classes should be correctly rejected by an E-Nose, and thereby should not

produce false classification errors. The k-NN and SVM algorithms falsely classify all

data of irrelevant classes and produce false alarm. The RBFNN algorithm falsely

classifies 15% of the mango data to trained classes and misclassifies 85% to unknown

extraneous classes. The MLPNN algorithm falsely classifies 35% of the data samples

and misclassifies 65%. It is seen that the MMM method and the GRNN method do not

show any false classification error and all irrelevant data samples are correctly

rejected.

Ref. code: 25605722300067ONR

63

Chapter 5

Conclusions and Recommendations

5.1 Conclusions

In this thesis an E-Nose is designed which comprises sensor panel

designed with MOG sensors, a data acquisition device, a sample chamber, a

measurement chamber, a computer for data storage, preprocessing, training, and

implement pattern recognition algorithms. Four kinds of fruits namely banana,

mango, sapodilla, and pineapple each at three ripeness states namely unripe, ripe, and

rotten are chosen as samples. Acquired data are preprocessed to make it suitable for

classification algorithms namely PCA, k-NN, SVM, LDA, MLPNN, RBFNN, and

GRNN. Performance of these classification algorithms are compared in terms of

speed, correct classification rate, and false classification rate.

The higher the number of sensors i.e. dimensionality the higher is the

sensor panel design cost, complexity and classification algorithm complexity. To

reduce dimensionality by excluding the sensors carrying insignificant or similar

information two new approaches, one is PCA loading and mutual information based

approach, and another is threshold based approach to optimize the number of sensors

and analyses the pattern classification performances in contrast to, between to within

variance ratio based method and exhaustive search method. It is seen that the number

of sensors in an E-Nose sensor panel can be reduced with the methods proposed in

this thesis with low pattern classification errors. Reduction of number of sensors in an

E-Nose sensor panel decreases data dimensionality as well as design cost and

complexity. The classification performance of the minimized sensor panel is observed

with k-NN, SVM, MLPNN, and RBFNN. The classification errors with MLPNN and

RBFNN are found high for this application research. The pattern recognition error

performance is found better with SVM, compared to k-NN, MLPNN, and RBFNN.

With the proposed sensor reduction techniques the number of sensors is reduced to

three with only 2.78% classification errors for SVM and 9.72% for k-NN. It is also

noted that both the proposed sensor array minimization methods resulted same

Ref. code: 25605722300067ONR

64

combination of optimal sensor set with k-NN and SVM pattern classification

algorithms.

Odor data from an irrelevant class should not be classified to any training

class by an E-Nose and thereby produces no false classification error as well as false

alarm. It is shown that the hyperspheric classification algorithms such as GRNN and

RBFNN with Gaussian activation function are capable to reduce false alarm with

GRNN having better performance. As GRNN comprises neurons to the order of

number of training samples it complexity become high with large training dataset a

new hyperspheric classification method i.e. MMM method is proposed in this thesis

which is less complex and fast. In addition the MMM method shows similar correct

classification of training data and false classification performance as GRNN. The

MMM method can be a prominent method for E-Nose application as it is simple to

implement, and shows small misclassification and false classification.

5.2 Future works

In this research ripeness states of four types of fruits are explored and

classified by the designed E-Nose. This research could be extended to analyze the

classification capability of the E-Nose to classify additional fruits. The classification

algorithms used for analyses are mostly used ones in literature. However the

classification performance of the algorithms namely Bayesian classification

algorithm, PNN, PLS discriminant analysis, quadrature discriminant analysis could be

explored with the designed E-Nose and compared with the applied and proposed

classification methods in this thesis.

Ref. code: 25605722300067ONR

65

References

1. Persaud, K. & George, D. (1982). Analysis of discrimination mechanisms in

the mammalian olfactory system using a model nose. Nature, 299(5881), 352-355.

2. Exporting Fresh Fruit and Vegetables to China: A Market Overview and

Guide for Foreign Suppliers. (2016). China: Produce Marketing Association.

3. Leffingwell, J. C. (2002). Olfaction. Retrieved September 29, 2014, from

http://www.leffingwell.com/olfaction.htm

4. Wikipedia. 2016. Olfaction. Retrieved June 11, 2014, from

https://en.wikipedia.org/wiki/Olfaction

5. Mamat, M., Samad, S. A. & Hannan, M. A. (2011). An electronic nose for

reliable measurement and correct classification of beverages. Sensors, 2011(11),

6435-6453. Retrieved March 16, 2014, from http://www.mdpi.com/1424-

8220/11/6/6435/pdf

6. Giordani, D. S., Castro, H. F., Oliveira, P. C., & Siqueira, A. F. (2007,

September). Biodiesel characterization using electronic nose and artificial neural

network. Paper presented at the Proceedings of the European Congress of Chemical

Engineering, Copenhagen, Denmark. Retrieved August 17, 2014, from

http://folk.ntnu.no/skoge/prost/proceedings/ecce6_sep07/upload/348.pdf

7. Chowdhury, S. S., Tudu, B., Bandyopadhyay, R., & Bhattacharyya, N. (2008,

December). Portable electronic nose system for aroma classification of black tea.

Paper presented at the Proceedings of the IEEE Region 10 and the Third Int. Conf. on

Industrial and Information Systems, Kharagpur, India. Retrieved December 1, 2014,

from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4798403

8. Brezmes, J., Llobet, E., Vilanova, X., Saiz, G. & Correig. X. (2000). Fruit

ripeness monitoring using an Electronic Nose. Sensors and Actuators B: Chemical,

69(3), 223-229.

9. Fang, X., Guo, X., Shi, H., & Cai, Q. (2010, June). Determination of Ammonia

Nitrogen in Wastewater Using Electronic Nose. Paper presented at the Proceedings of

the 4th Int. Conf. on Bioinformatics and Biomedical Engineering (iCBBE), Chengdu,

China. Retrieved September 11, 2014, from

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5515426

Ref. code: 25605722300067ONR

66

10. Omatu, S., Araki, H., Fujinaka, T., & Yano, M. (2012). Intelligent

classification of odor data using neural networks. Paper presented at the Proceedings

of the Sixth International Conference on Advanced Engineering Computing and

Applications in Sciences. Barcelona, Spain. Retrieved August 29, 2012 from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.677.544&rep=rep1&type=p

df

11. Soh, A. C., Chow, K. K., Yusuf, U. M., Ishak, A. J., Hassan, M. K., &

Khamis, S. (2014). Development of neural network-based electronic nose for herbs

recognition. International Journal on Smart Sensing and intelligent Systems, 7(2),

584–609.

12. Omatu, S. (2013, September). Odor classification by neural networks. Paper

presented at the Proceedings of the 2013 IEEE 7th International Conference on

Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany.

Retrieved September 2, 2014, from


13. Kurup, P. U. (2008, May). An electronic nose for detecting hazardous

chemicals and explosives. Paper presented at the Proceedings of the IEEE Conference

on Technologies for Homeland Security, Waltham, Massachusetts, USA. Retrieved

2014 September 1, 2014, from


14. Feldhoff, R., Bernadet, P., & Saby, C. A. (1999). Discrimination of diesel

fuels with chemical sensors and mass spectrometry based electronic noses. Analyst,

124(8), 1167-1173.

15. Sobański, T., Szczurek, A., Nitsch, K., Licznerski, B. W., & Radwan, W.

(2006). Electronic nose applied to automotive fuel qualification. Sensors and

Actuators B: Chemical, 116(1), 207-212.

16. Berna, A. (2010). Metal oxide sensors for electronic noses and their

application to food analysis. Sensors, 10(4), 3882-3910. Retrieved December 29,

2014, from http://www.mdpi.com/1424-8220/10/4/3882/pdf

17. Mamat, M., & Samad, S. A. (2010, December). The design and testing of an

Electronic Nose prototype for classification problem. Paper presented at the

Proceedings of the 2010 International Conference on Computer Applications and

Ref. code: 25605722300067ONR

67

Industrial Electronics (ICCAIE), Kuala Lumpur, Malaysia. Retrieved November 12,

2014, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5735108

18. Zhe Zhang, Jin Tong, Dong-hui Chen & Yu-bin Lan. (2008). Electronic nose

with an air sensor matrix for detecting beef freshness [Electronic version]. Journal of

Bionic Engineering, 5(1), 67-73.

19. Tang, K. T., Chiu, S. W., Pan, C. H., Hsieh, H. Y., Liang, Y. S., & Liu, S. C.

(2010). Development of a portable electronic nose system for the detection and

classification of fruity odors. Sensors, 10(10), 9179-9193. Retrieved January 11,


20. Dutta, R., Hines, E. L., Gardner, J. W., Kashwan, K. R., & Bhuyan, M. (2003,

July). Determination of tea quality by using a neural network based electronic nose.

Paper presented at the Proceedings of the International Joint Conference on Neural

Networks, Portland, Oregon, USA. Retrieved October 5, 2014, from


21. Concina, I., Falasconi, M., & Sberveglieri, V. (2012). Electronic noses as

flexible tools to assess food quality and safety: Should we trust them?. IEEE sensors

journal, 12(11), 3232-3237.

22. Brezmes, J., Fructuoso, M. L., Llobet, E., Vilanova, X., Recasens, I., Orts, J.

et al. (2005). Evaluation of an electronic nose to assess fruit ripeness. IEEE Sensors

Journal, 5(1), 97-108.

23. Hines, E. L., Llobet, E., & Gardner, J. W. (1999). Neural network based

electronic nose for apple ripeness determination. Electronics Letters, 35(10), 821-823.

24. Xiaobo, Z., & Jiewen, Z. (2005, October). Apple quality assessment by fusion

three sensors. Paper presented at the Proceedings of the IEEE Sensors, Irvine,

California, USA. Retrieved November 16, 2014, from


25. Zhang, L., & Tian, F. (2012, November). A novel chaotic sequence

optimization neural network for concentration estimation of formaldehyde by an

electronic nose. Paper presented at 2012 Fourth International Conference on

Computational Intelligence and Communication Networks (CICN), Uttar Pradesh,

India. Retrieved September 16, 2015, from


Ref. code: 25605722300067ONR

68

26. Abdullah, A. H., Shakaff, A. M., Zakaria, A., Saad, F. S. A., Shukor, S. A., &

Mat, A. (2014, August). Application Specific Electronic Nose (ASEN) for Ganoderma

boninense detection using artificial neural network. Paper presented at the

Proceedings of the 2nd International Conference on Electronic Design (ICED),

Penang, Malaysia. Retrieved August 19, 2014, from


27. Pardo, M., Faglia, G., Sberveglieri, G., & Quercia, L. (2001, May). Electronic

nose for coffee quality control. Paper presented at the Proceedings of the IEEE

Instrumentation and Measurement Technology Conference, Budapest, Hungary.

Retrieved November 29, 2014, from


28. Singh, R. (2002, January). An intelligent system for odour discrimination.

Paper presented at the Proceedings of the First IEEE International workshop on

Electronic Design, Test and Applications (DELTA‟02), Christchurch, New Zealand.

Retrieved December 18, 2014, from


29. Bhattacharyya, N., Tudu, B., Bandyopadhyay, R., Bhuyan, M., & Mudi, R.

(2004, November). Aroma characterization of orthodox black tea with electronic

nose. Paper presented at the Proceedings of the IEEE Region 10 Conference

TENCON 2004, Chiang Mai, Thailand. Retrieved March 2, 2015, from


30. García-Cortés, A., Martí, J., Sayago, I., Santos, J. P., Gutiérrez, J., & Horrillo,

M. C. (2009, February). Detection of stress through sweat analysis with an electronic

nose. Paper presented at the Proceedings of the Spanish conf. on electronic devices,

Santiago de Compostela, Spain. Retrieved December 4, 2014, from


31. Sanaeifar, A., Mohtasebi, S. S., Ghasemi-Varnamkhasti, M., & Siadat, M.

(2014, November). Application of an electronic nose system coupled with artificial

neural network for classification of banana samples during shelf-life process. Paper

presented at the Proceedings of the 2014 International Conference on Control,

Decision and Information Technologies (CoDIT), Metz, France. Retrieved November

16, 2015, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6996991

Ref. code: 25605722300067ONR

69

32. Wang, X. D., Zhang, H. R., & Zhang, C. J. (2005, August). Signals

recognition of electronic nose based on support vector machines. Paper presented at

the Proceedings of the 2005 International Conference on Machine Learning and

Cybernetics, Guangzhou, China. Retrieved January 12, 2016, from


33. ul Hasan, N., Ejaz, N., Ejaz, W., & Kim, H. S. (2012, October). Malicious

odor item identification using an electronic nose based on support vector machine

classification. Paper presented at the Proceedings of the 2012 IEEE 1st Global

Conference on Consumer Electronics (GCCE), Tokyo, Japan. Retrieved November 18

2015, year, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6379638

34. Perera, A., Gomez-Baena, A., Sundic, T., Pardo, T., & Marco, S. (2002).

Machine olfaction: pattern recognition for the identification of aromas. 16th

International Conference on Pattern Recognition, 2, 410-413.

35. Hassan, M., & Bermak, A. (2014, November). Threshold detection of

carcinogenic odor of formaldehyde with wireless electronic nose. Paper presented at

the Proceedings of the 2014 IEEE Sensors, Valencia, Spain. Retrieved June 17, 2015,

from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6985266

36. Romani, S., Cevoli, C., Fabbri, A., Alessandrini, L., & Dalla Rosa, M. (2012).

Evaluation of Coffee Roasting Degree by Using Electronic Nose and Artificial Neural

Network for Off‐line Quality Control. Journal of Food Science, 77(9), 960–965.

37. Hai, Z., & Wang, J. (2006). Electronic nose and data analysis for detection of

maize oil adulteration in sesame oil. Sensors and Actuators B: Chemical, 119(2), 449-

455.

38. Brezmes, J., Llobet, E., Vilanova, X., Saiz, G., & Correig, X. (2000). Fruit

ripeness monitoring using an electronic nose. Sensors and Actuators B: Chemical,

69(3), 223-229.

39. Berna, A. ., Lammertyn, J., Saevels, S., Di Natale, C., & Nicola , B. M.

(2004). Electronic nose systems to study shelf life and cultivar effect on tomato aroma

profile. Sensors and Actuators B: Chemical, 97(2), 324-333.

40. Gómez, A. H., Hu, G., Wang, J., & Pereira, A. G. (2006). Evaluation of

tomato maturity by electronic nose. Computers and Electronics in Agriculture, 54(1),

44-52.

Ref. code: 25605722300067ONR

70

41. Gómez, A. H., Wang, J., Hu, G., & Pereira, A. G. (2008). Monitoring storage

shelf life of tomato using electronic nose technique. Journal of Food Engineering,

85(4), 625-631.

42. Brezmes, J., Llobet, E., Vilanova, X., Orts, J., Saiz, G., & Correig, X. (2001).

Correlation between electronic nose signals and fruit quality indicators on shelf-life

measurements with pinklady apples. Sensors and Actuators B: Chemical, 80(1), 41-

50.

43. Saevels, S., Lammertyn, J., Berna, A. Z., Veraverbeke, E. A., Di Natale, C., &

Nicola , B. M. (2003). Electronic nose as a non-destructive tool to evaluate the

optimal harvest date of apples. Postharvest Biology and Technology, 30(1), 3-14.

44. Saevels, S., Lammertyn, J., Berna, A. Z., Veraverbeke, E. A., Di Natale, C., &

Nicola , B. M. (2004). An electronic nose and a mass spectrometry-based electronic

nose for assessing apple quality during shelf life. Postharvest Biology and

Technology, 31(1), 9-19.

45. Xiaobo, Z., & Jiewen, Z. (2005, October). Apple quality assessment by fusion

three sensors. Paper presented at the Proceedings of the IEEE Sensors, Irvine, USA.

Retrieved October 9, 2014, from


46. Gómez, A. H., Wang, J., Hu, G., & Pereira, A. G. (2006). Electronic nose

technique potential monitoring mandarin maturity. Sensors and Actuators B:

Chemical, 113(1), 347-353.

47. Benedetti, S., Buratti, S., Spinardi, A., Mannino, S., & Mignani, I. (2008).

Electronic nose as a non-destructive tool to characterise peach cultivars and to

monitor their ripening stage during shelf-life. Postharvest Biology and Technology,

47(2), 181-188.

48. Di Natale, C., Macagnano, A., Martinelli, E., Paolesse, R., Proietti, E., &

D‟Amico, A. (2001). The evaluation of quality of post-harvest oranges and apples by

means of an electronic nose. Sensors and Actuators B: Chemical, 78(1), 26-31.

49. Kumbhar, A., Gharpure, D. C., Botre, B. A., & Sadistap, S. S. (2012, March).

Embedded e-nose for food inspection. Paper presented at the Proceedings of the 2012

1st International Symposium on Physics and Technology of Sensors (ISPTS-1), Pune,

Ref. code: 25605722300067ONR

71

India. Retrieved November 19, 2014, from


50. Adak, M. F., & Yumusak, N. (2016). Classification of E-nose aroma data of

four fruit types by ABC-based neural network. Sensors, 16(3), 304. Retrieved July 24,


51. Zhang, S., Xie, C., Zeng, D., Li, H., Liu, Y., & Cai, S. (2009). A sensor array

optimization method for electronic noses with sub-arrays. Sensors and Actuators B:

Chemical, 142(1), 243-252.

52. Zhao, L., Shi, B. L., Wang, H. Y., & Li, Z. (2009). Combination Optimization

Method for Screening Sensor Array of Electronic Nose. Journal of Food Science, 20,

087.

53. Hongmei, Z., & Jun, W. (2006). Optimization of sensor array of electronic

nose and its application to detection of storage age of wheat grain. Transactions of the

Chinese Society of Agricultural Engineering, 22(12), 164-167.

54. Shi, B., Zhao, L., Zhi, R. & Xi. X. (2013). Optimization of electronic nose

sensor array by genetic algorithms in Xihu-Longjing Tea quality analysis.

Mathematical and Computer Modelling, 58(3), 752-758.

55. Tabassum, M., & Mathew, K. (2014). A genetic algorithm analysis towards

optimization solutions. International Journal of Digital Information and Wireless

Communications (IJDIWC), 4(1), 124-142. Retrieved July 21, 2015, from

http://sdiwc.net/digital-

library/request.php?article=860cd5681b5f2bd16486ee6f367b2437

56. Duda, R. O., Stork, D. G., & Hart, P. E. (2001). Pattern classification. New

York: Wiley.

57. Sysoev, V. V., Musatov, V. Y., Silaev, A. V., & Zalyalov, T. R. (2007, April).

The Optimization of Number of Sensors in One-Chip Electronic Nose Microarrays

with the Help of 3-Layered Neural Network. Paper presented at the Proceedings of the

Siberian Conference on Control and Communications (SIBCON-2007), Tomsk,

Russia. Retrieved June 18, 2015, from


Ref. code: 25605722300067ONR

72

58. Omatu, S. & Yoshioka, M. (2009). Electronic Nose for a Fire Detection

System by Neural Networks. 2nd IFAC Conference on Intelligent Control Systems

and Signal Processing, 42(19), 209-214.

59. Scorsone, E., Pisanelli, A. M., & Persaud, K. C. (2006). Development of an

electronic nose for fire detection. Sensors and Actuators B: Chemical, 116(1), 55-61.

Retrieved January 14, 2016, from http://ac.els-cdn.com/S0925400506001626/1-s2.0-

S0925400506001626-main.pdf?_tid=40c47c82-01c6-11e7-946e-

00000aacb362&acdnat=1488733846_f1895146c7332e9219f4aacb0b755afa

60. Reimann, P., & Schütze, A. (2012). Fire detection in coal mines based on

semiconductor gas sensors. Sensor Review, 32(1), 47-58. Retrieved August 18, 2015,

from http://www.emeraldinsight.com/doi/pdfplus/10.1108/02602281211197143

61. Kwan, C., Schmera, G., Smulko, J. M., Kish, L. B., Heszler, P., & Granqvist,

C. G. (2008). Advanced agent identification with fluctuation-enhanced sensing. IEEE

Sensors Journal, 8(6), 706-713. Retrieved July 11, 2015, from


62. Go´mez, A. H., Wang, J., Hu, G. & Pereira, A. G. (2007). Discrimination of

storage shelf-life for mandarin by electronic nose technique. LWT-Food Science and

Technology, 40(4), 681-689.

63. Yu, H., & Wang, J. (2007). Discrimination of LongJing green-tea grade by

electronic nose. Sensors and Actuators B: Chemical, 122(1), 134-140.

64. Kiani, S., Minaei, S., & Ghasemi-Varnamkhasti, M. (2016). A portable

electronic nose as an expert system for aroma-based classification of saffron.

Chemometrics and Intelligent Laboratory Systems, 156, 148-156.

65. Rahman M. M., Charoenlarpnopparut C. & Suksompong P. (2015)

Classification and pattern recognition algorithms applied to E-Nose. 2015 2nd

International Conference on Electrical Information and Communication Technologies

(EICT), pp. 44-48.

66. Rahman M. M., Charoenlarpnopparut C. & Suksompong P. Signal processing

for multi-sensor E-nose system: Acquisition and classification. 2015 10th

International Conference on Information, Communications and Signal Processing

(ICICS), pp. 1-5.

Ref. code: 25605722300067ONR

73

67. Güney, S. & Atasoy, A. (2012). Multiclass classification of n-butanol

concentrations with k-nearest neighbor algorithm and support vector machine in an

electronic nose. Sensors and Actuators B: Chemical, 166, 721-725.

68. Shao, X., Li, H., Wang, N., & Zhang, Q. (2015). Comparison of different

classification methods for analyzing electronic nose data to characterize sesame oils

and blends. Sensors, 15(10), 26726-26742. Retrieved April 21, 2016, from

http://www.mdpi.com/1424-8220/15/10/26726/pdf

69. Güney, S., & Atasoy, A. (2011, June). Classification of n-butanol

concentrations with k-NN algorithm and ANN in electronic nose. Paper presented at

the Proceedings of the International Symposium on Innovations in Intelligent Systems

and Applications (INISTA), Istanbul, Turkey. Retrieved November 4, 2014, from


70. Khalaf, W., Pace, C., & Gaudioso, M. (2008). Gas detection via machine

learning. World Academy Of Science, Engineering And Technology, 27, 139-143.

71. Amari, A., El Barbri, N., Llobet, E., El Bari, N., Correig, X., & Bouchikhi, B.

(2006). Monitoring the freshness of Moroccan sardines with a neural-network based

electronic nose. Sensors, 6(10), 1209-1223. Retrieved March 28, 2015,

http://www.mdpi.com/1424-8220/6/10/1209/pdf

72. Sanaeifar, A., Mohtasebi, S., Ghasemi-Varnamkhasti, M., Ahmadi, H., &

Lozano Rogado, J. S. (2014). Development and application of a new low cost

electronic nose for the ripeness monitoring of banana using computational techniques

(PCA, LDA, SIMCA, and SVM). Czech Journal of Food Sciences, 32(6), 538-548.

Retrieved January 7, 2016, from

http://dehesa.unex.es/bitstream/handle/10662/4367/1805-

9317_32_6_538.pdf?sequence=1

73. Xiong, Y., Xiao, X., Yang, X., Yan, D., Zhang, C., Zou, H. et al. (2014).

Quality control of Lonicera japonica stored for different months by electronic nose.

Journal of pharmaceutical and biomedical analysis, 91, 68-72. Retrieved February

11, 2016, from http://ac.els-cdn.com/S0731708513006018/1-s2.0-

S0731708513006018-main.pdf?_tid=7b11690c-0236-11e7-9e83-

00000aacb361&acdnat=1488782047_6de8a06b91ea78c0f85e84027c3f0936

Ref. code: 25605722300067ONR

74

74. Evans, P., Persaud, K. C., McNeish, A. S., Sneath, R. W., Hobson, N., &

Magan, N. (2000). Evaluation of a radial basis function neural network for the

determination of wheat quality from electronic nose data. Sensors and Actuators B:

Chemical, 69(3), 348-358.

75. Dutta, R., Hines, E. L., Gardner, J. W., Udrea, D. D., & Boilot, P. (2003).

Non-destructive egg freshness determination: an electronic nose based approach.

Measurement Science and Technology, 14(2), 190.

76. Dutta, R., Kashwan, K. R., Bhuyan, M., Hines, E. L., & Gardner, J. W. (2003).

Electronic nose based tea quality standardization. Neural Networks, 16(5), 847-853.

77. Borah, S., Hines, E. L., Leeson, M. S., Iliescu, D. D., Bhuyan, M., & Gardner,

J. W. (2008). Neural network based electronic nose for classification of tea aroma.

Sensing and Instrumentation for Food Quality and Safety, 2(1), 7-14.

78. Anjos, O., Iglesias, C., Peres, F., Martínez, J., García, Á., & Taboada, J.

(2015). Neural networks applied to discriminate botanical origin of honeys. Food

Chemistry, 175, 128-136.

79. Llobet, E., Hines, E. L., Gardner, J. W., Bartlett, P. N., & Mottram, T. T.

(1999). Fuzzy ARTMAP based electronic nose data analysis. Sensors and Actuators

B: Chemical, 61(1), 183-190.

80. Cooper, P. W. (1962). The hypersphere in pattern recognition. Information

and Control, 5(4), 324-346.

81. Hwang, J. N., Choi, J. J., Oh, S., & Marks, R. J. (1990, May). Classification

boundaries and gradients of trained multilayer perceptrons. Paper presented at the

Proceedings of the IEEE International Symposium on Circuits and Systems, New

Orleans, USA. Retrieved January 9, 2015, from


82. Tax, D. M., & Duin, R. P. (1999). Support vector domain description. Pattern

Recognition Letters, 20(11), 1191-1199.

83. Rahman, M. M., Charoenlarpnopparut C., Suksompong P., and Toochinda P.

(2017) Sensor Array Optimization for Complexity Reduction in Electronic Nose

System. ECTI Transactions on Electrical Engineering, Electronics, and

Communications, 1, 49-59.

Ref. code: 25605722300067ONR

75

84. Kuhn, H.W.; Tucker, A.W. Nonlinear programming. Proc. Second Berkeley

Symposium on Mathematical Statistics and Probability, Berkeley, USA, July-August

1950; Jerzy Neyman; Publisher: University of California Press, Berkeley, USA, 1951.

85. Krafft, P. (2013). Building Intelligent Probabilistic Systems. Retrieved

January 15, 2016, from https://hips.seas.harvard.edu/blog/2013/02/13/correlation-and-

mutual-information/

86. Rahman, M.M., Charoenlarpnopparut, C., Suksompong, P., Toochinda, P. and

Taparugssanagorn, A. (2017) A False Alarm Reduction Method for a Gas Sensor

Based Electronic Nose. Sensors, 17(9), p.2089.

87. Rahman, M. M., Charoenlarpnopparut C., Suksompong P., and

Taparugssanagorn A. E-Nose False Alarm Reduction: Hyperplane versus

Hyperspheric Classification Boundary. Walailak Journal of Science and Technology

(WJST). [In Review]

Ref. code: 25605722300067ONR

76

Appendices

Ref. code: 25605722300067ONR

77

Appendix A

Matlab Codes for Classification Methods

% PCA classification

clear;

tic

% data preparation

[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx'); % dataset import

TGS2612 = cfruits(:,2);








S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];

% PCA

[coeffS,scoresS,latentS,tsquaredS,explainedS] = pca(S(:,:));

figure(1); FS = 20;

biplotLabels = {'\fontname{Times New Roman} \fontsize{20} TGS 2612',...

'\fontname{Times New Roman} \fontsize{20} TGS 821','\fontname{Times New

Roman} \fontsize{20} TGS 822',...





'\fontname{Times New Roman} \fontsize{20} TGS 2610'};

biplot(coeffS(:,1:2),'Linewidth',2,'VarLabels',biplotLabels)

xlabel('\fontname{Times New Roman} \fontsize{25} PC 1');

ylabel('\fontname{Times New Roman} \fontsize{25} PC 2');

Ref. code: 25605722300067ONR

78

set(gca,'FontSize',20)

figure(2);

pareto(explainedS(1:2,:));

xlabel('\fontname{Times New Roman} \fontsize{25} Principal Component');

ylabel('\fontname{Times New Roman} \fontsize{25} Variance Explained (%)')

title('\fontname{Times New Roman} \fontsize{25} Variance explained of PCA');


grid on

figure(3);

Nd = 20;

plot3(scoresS(1:Nd,1),scoresS(1:Nd,2),scoresS(1:Nd,3),'ko',scoresS((20*1+1):20*1+

Nd,1),scoresS((20*1+1):20*1+Nd,2),scoresS((20*1+1):20*1+Nd,3),'kx',...

scoresS((20*2+1):20*2+Nd,1),scoresS((20*2+1):20*2+Nd,2),scoresS((20*2+1):20*2

+Nd,3),'k+',scoresS((20*3+1):20*3+Nd,1),scoresS((20*3+1):20*3+Nd,2),scoresS((20

*3+1):20*3+Nd,3),'k*',...


+Nd,3),'ks',scoresS((20*5+1):20*5+Nd,1),scoresS((20*5+1):20*5+Nd,2),scoresS((20

*5+1):20*5+Nd,3),'kd',...


+Nd,3),'kv',scoresS((20*7+1):20*7+Nd,1),scoresS((20*7+1):20*7+Nd,2),scoresS((20

*7+1):20*7+Nd,3),'k^',...


+Nd,3),'kh',scoresS((20*9+1):20*9+Nd,1),scoresS((20*9+1):20*9+Nd,2),scoresS((20

*9+1):20*9+Nd,3),'k>',...

scoresS((20*10+1):20*10+Nd,1),scoresS((20*10+1):20*10+Nd,2),scoresS((20*10+1

):20*10+Nd,3),'kp',scoresS((20*11+1):20*11+Nd,1),scoresS((20*11+1):20*11+Nd,2

),scoresS((20*11+1):20*11+Nd,3),'k<','LineWidth',2,'MarkerSize',15);grid on



zlabel('\fontname{Times New Roman} \fontsize{30} PC 3');

title('\fontname{Times New Roman} \fontsize{30} Principal Component Analysis');

Ref. code: 25605722300067ONR

79

legend('\fontname{Times New Roman} \fontsize{25} GB','\fontname{Times New

Roman} \fontsize{25} RB',...

'\fontname{Times New Roman} \fontsize{25} RtB','\fontname{Times New

Roman} \fontsize{25} GM',...

'\fontname{Times New Roman} \fontsize{25} RM','\fontname{Times New

Roman} \fontsize{25} RtM',...

'\fontname{Times New Roman} \fontsize{25} GS','\fontname{Times New Roman}

\fontsize{25} RS',...

'\fontname{Times New Roman} \fontsize{25} RtS','\fontname{Times New

Roman} \fontsize{25} GP',...

'\fontname{Times New Roman} \fontsize{25} RP','\fontname{Times New Roman}

\fontsize{25} RtP');


% Between to within covariance of the dataset

figure(4);

fcov = cov(S);

bar3(fcov);

legend('TGS2612','TGS821','TGS822','TGS813','TGS2602','TGS2603','TGS2620','TG

S2610');

xlabel('\fontname{Times New Roman} \fontsize{16} Sensor');

ylabel('\fontname{Times New Roman} \fontsize{16} Sensor');

title('\fontname{Times New Roman} \fontsize{15} Between and within covariance of

the sensors')


eTimePCA = toc

% PCA PLOT THREE CLASSES ONLY ripe (mango, sapodilla, pineapple)

figure;

Nd = 20

plot(scoresS((20*4+1):20*4+Nd,1),scoresS((20*4+1):20*4+Nd,2),'k*',...

scoresS((20*7+1):20*7+Nd,1),scoresS((20*7+1):20*7+Nd,2),'ks',...

scoresS((20*10+1):20*10+Nd,1),scoresS((20*10+1):20*10+Nd,2),'k^','LineWidth',2,'

MarkerSize',15);grid on

Ref. code: 25605722300067ONR

80



title('\fontname{Times New Roman} \fontsize{30} Principal Component Analysis');

legend('\fontname{Times New Roman} \fontsize{30} Ripe mango','\fontname{Times

New Roman} \fontsize{30} Ripe sapodilla',...

'\fontname{Times New Roman} \fontsize{30} Ripe pineapple');


% k-NN classification

clc

tic % time count begin

format compact; % formatting

[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx'); % import dataset









% Training and test data preparation


rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...

randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...


randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];

trainIndex =

[rIndex(20*0+1:16+20*0);rIndex(20*1+1:16+20*1,:);rIndex(20*2+1:16+20*2,:);...

rIndex(20*3+1:16+20*3,:);rIndex(20*4+1:16+20*4,:);rIndex(20*5+1:16+20*5,:);...


Ref. code: 25605722300067ONR

81

rIndex(20*9+1:16+20*9,:);rIndex(20*10+1:16+20*10,:);rIndex(20*11+1:16+20*11,:

)];

testIndex =

[rIndex(20*0+17:20*1);rIndex(20*1+17:20*2,:);rIndex(20*2+17:20*3,:);...

rIndex(20*3+17:20*4,:);rIndex(20*4+17:20*5,:);rIndex(20*5+17:20*6,:);...


rIndex(20*9+17:20*10,:);rIndex(20*10+17:20*11,:);rIndex(20*11+17:20*12,:)];

trainData = S(trainIndex,:);

target = cfruits(trainIndex,10);

testData = S(testIndex,:);

testTarget = cfruits(testIndex,10);

% kNN model

knnmodel = fitcknn(trainData,target,'NumNeighbors',13,'Standardize',1);

Odors = predict(knnmodel, testData)

% Counting number of errors

error = 0;

for i=1:length(testData);

if testTarget(i)~=Odors(i)

error = error+1;

end

end

error

eTimeKNN = toc % Time duration

% SVM classification

% multiclass svm by fitcecoc (ecoc = error correction output code)

clear

tic

format compact;

% Import dataset

[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');

% Prepare training and test data

Ref. code: 25605722300067ONR

82














trainIndex =





)];

testIndex =






trainTarget = cfruits(trainIndex,10);



% Prepare and train an SVM model

t = templateSVM('Standardize',1,'KernelFunction','Gaussian');

Ref. code: 25605722300067ONR

83

SVMMdl =

fitcecoc(trainData,trainTarget,'Learners',t);%,'ResponseName',responseName,...

MultiSVMtrainTime = toc;

% test classification errors of the SVM model

tic

classTestData = predict(SVMMdl,testData)

SVMerror = 0;


if testTarget(i)~=classTestData(i)

SVMerror = SVMerror+1;

end

end

disp(['SVM classification errors:', num2str(SVMerror)]);

disp(['SVM training time :', num2str(MultiSVMtrainTime)]);

MultiSVMtestTime = toc;

disp(['SVM testing time :', num2str(MultiSVMtestTime)]);

% MLPNN classification

tic

format compact;

% import dataset


% extract sensor variables









% prepare training, test, and validation data

Ref. code: 25605722300067ONR

84






trainIndex =





)];

valIndex =





testIndex =









% designing the MLPNN

net = feedforwardnet(10); % feedforwardnet(hiddenSizes,trainFcn)

net.trainParam.min_grad = 1e-5;

net.trainParam.max_fail = 6; % for validation

displayed in the command line, you can set the parameter

net.trainParam.showCommandLine = true;

net.divideParam.trainInd = trainIndex;

Ref. code: 25605722300067ONR

85

net.divideParam.valInd = valIndex;

net.divideParam.testInd = testIndex;

[net,tr] = train(net,S',cfruits(:,10)'); % initialization of weights and training is done by

'train command'

FFBPNNtrainTime = toc;

disp(['MLPNN training time:', num2str(FFBPNNtrainTime)]);

tic

building data set

ar = round(net(testData')) % classify test data

% count errors

MLPNNerror = 0;


if testTarget(i)~=ar(i)

MLPNNerror = MLPNNerror+1;

end

end

FFBPNNtestTime = toc;

disp(['MLPNN testing time :', num2str(FFBPNNtestTime)]);

disp(['MLPNN errors time :', num2str(MLPNNerror)]);

figure;

plotperform(tr); % plots number of epoch vs. mean squared error

set(gca,'FontSize',14,'FontWeight','bold');

figure;

plottrainstate(tr); % plot training state: validation fail, mu, gradient

figure

e = testTarget' - ar;

ploterrhist(e,'bins',20);

figure

plotregression(testTarget‟,ar,'regression');

% GRNN classification

clc

Ref. code: 25605722300067ONR

86

tic

format compact;


% extract sensor responses from the complete dataset „cfruits'










% random index matrix to select train and test data randomly





% indices of training data

trainIndex =





)];

% indices of validation data

valIndex = [rIndex(20*0+15:(20*1-3));rIndex(20*1+15:(20*2-

3),:);rIndex(20*2+15:(20*3-3),:);...

rIndex(20*3+15:(20*4-3),:);rIndex(20*4+15:(20*5-3),:);rIndex(20*5+15:(20*6-

3),:);...


3),:);...

Ref. code: 25605722300067ONR

87

rIndex(20*9+15:(20*10-3),:);rIndex(20*10+15:(20*11-

3),:);rIndex(20*11+15:(20*12-3),:)];

% indices of test data

testIndex =





trainData = S(trainIndex,:); % training dataset of the sensor

trainTarget = cfruits(trainIndex,10); % prepare target matrix

valData = S(valIndex,:); % select data from full sensor data matrix

valTarget = cfruits(valIndex,10);

testData = S(testIndex,:); % select data from full sensor data matrix

testTarget = cfruits(testIndex,10); % targets of test data

tempError = 0; % initialize error

valError = size(valData);

gSig = 0.01;% initialize sigma

optSig = gSig;

while (valError > 1)

valError = 0;

grnnModel = newgrnn(trainData',trainTarget',gSig); % each experiment to each

column, each target to each column

valClass = round(grnnModel(valData')); % output class

for i=1:length(valData); % calculate errors

if valTarget(i)~=valClass(i)

valError = valError+1;

end

end

gSig = gSig + 0.01;

end

grnnTraintime = toc; % Training time count end

Ref. code: 25605722300067ONR

88

grnnModel = newgrnn(trainData',trainTarget',optSig); % each experiment to each

column, each target to each column

tic % testing time count begin

oClass = grnnModel(testData'); % output class

detectClass = round(oClass)'; % class value rounded

error = 0;

for i=1:length(testData); % calculate errors

if testTarget(i)~=detectClass(i)

error = error+1;

end

end

grnnTesttime = toc;

% Display results

disp(['Number of errors = ', num2str(error)]);

disp(['GRNN train time = ', num2str(grnnTraintime)]);

disp(['GRNN test time = ', num2str(grnnTesttime)]);

% RBFNN classification

clc

tic

format compact;

% import dataset










% Prepare training, validation, and test datasets

Ref. code: 25605722300067ONR

89






trainIndex =





)];

valIndex = [rIndex(20*0+15:(20*1-3));rIndex(20*1+15:(20*2-

3),:);rIndex(20*2+15:(20*3-3),:);...


3),:);...


3),:);...

rIndex(20*9+15:(20*10-3),:);rIndex(20*10+15:(20*11-

3),:);rIndex(20*11+15:(20*12-3),:)];

testIndex =







valData = S(valIndex,:);

valTarget = cfruits(valIndex,10);



% calculate phi matrix

sqrError = 20;

Ref. code: 25605722300067ONR

90

errorThr = 0;

k = 0;

gma = 0.5; % Spreading factor 7919

phiMatrix = zeros(length(trainTarget),length(trainTarget));

while sqrError > errorThr

for i = 1:size(trainData,1)

for j = 1:size(trainData,1)

phiMatrix(i,j) = exp(-gma*norm(trainData(i,:)-trainData(j,:)));

end

end

W = (phiMatrix\trainTarget);

% calculate error

hx = trainTarget'*W;

sqrError = norm(hx-trainTarget);

if sqrError > errorThr

gma = gma+0.01;

else

break

end

k = k+1;

if k == 90 % number of epochs

break;

end

end

RBFtrainTime = toc

% Classify test data

tic

phiMatrixd = zeros(size(testData,1),size(trainData,1)); % generate phi matrix

for i = 1:size(testData,1)

for j = 1:size(trainData,1)

phiMatrixd(i,j) = exp(-gma*norm(testData(i,:)-trainData(j,:)));

end

Ref. code: 25605722300067ONR

91

end

hxd = phiMatrixd*W

hxdr = round(hxd)

RBFtestTime = toc;

disp(['RBF training time: ', num2str(RBFtrainTime)]);

disp(['RBF testing time: ', num2str(RBFtestTime)]);

% Count RBF errors

RBFerror = 0;


if testTarget(i)~=hxdr(i)

RBFerror = RBFerror+1;

end

end

disp(['RBF errors : ', num2str(RBFerror)]);

% LDA classification

clc

tic

format compact;

% import dataset


% prepare traing and testing data










Ref. code: 25605722300067ONR

92





% training with two classes

trainIndex = [rIndex(20*7+1:20*7+14,:);...

% rIndex(20*4+1:14+20*4,:);...

rIndex(20*10+1:20*10+14,:)];

% testing with three classes to identify false classification performance

testIndex = [rIndex(20*4+1:20*5,:);...

rIndex(20*7+15:20*8,:);...

rIndex(20*10+15:20*11,:)];


trainTarget = [ones(14,1);ones(14,1)*2];%cfruits(trainIndex,10);


testTarget = [ones(20,1)*3;ones(6,1);ones(6,1)*2];

% training an LDA

LinClassifier = fitcdiscr(trainData,trainTarget);

% First retrieve the coefficients for the linear

% boundary between the second and third classes

wK12 = LinClassifier.Coeffs(1,2).Const;

wL12 = LinClassifier.Coeffs(1,2).Linear;

% Plot the curve K + [x1,x2]*L = 0:

f = @(x1,x2) wK12 + wL12(1)*x1 + wL12(2)*x2;

h3 = ezplot(f,[0 3.5 0.5 3.5]);

h3.Color = 'k';

h3.LineWidth = 2;

hold on

plot(trainData(1:14,3),trainData(1:14,4),'ko')

hold on

plot(trainData(15:28,3),trainData(15:28,4),'ks')

xlabel('TGS 2602')

Ref. code: 25605722300067ONR

93

ylabel('TGS 813')

title('{\bf Fisher Linear Classification}')

% Test data classification

LDAtrainTime = toc;

tic

g = zeros(size(testData,1),2);

falseDetection = 0;

LDAerrors = 0;

for i=1:size(testData,1)

g12(i) = wL12'*(testData(i) -

(mean(trainData(1:14,:))+mean(trainData(15:28,:)))/2)';

if g12(i)>=0

g(i,1)=g(i,1)+1;

else g(i,2)=g(i,2)+2;

end

% count false detections and misclassification errors

if (i <= 20) && ((g(i,1) == 1) || (g(i,2) == 2))

falseDetection = falseDetection+1;

elseif (i > 20) && (g(i,1) ~= 1) && (i<=26)

LDAerrors = LDAerrors + 1;

elseif (i > 26) && (g(i,2) ~= 2)

LDAerrors = LDAerrors + 1;

end

end

disp(['LDA training time:', num2str(LDAtrainTime)])

LDAclassificationTime = toc;

disp(['LDA testing time :', num2str(LDAclassificationTime)]);

disp(['LDA false classification errors:',num2str(falseDetection)]);

disp(['LDA misclassification errors :',num2str(LDAerrors)]);

% MMM classification

% clear all

Ref. code: 25605722300067ONR

94

format compact;

tic

% import dataset


% prepare training and test datasets









S =

[TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610,cfruits

(:,10)];

%% This section should be activated for outlier exclusion

ammend = 0;

nofClass = 12;

% indxAmmend = 0;

skp = [4 5 6]; % skip three classes for false classification testing

for i = 1:nofClass

if ismember(i,skp)

continue

else

X = S(20*(i-1)+1:20*i,1:9);

pDistM = squareform(pdist(X(:,1:8)))%sqrt(sum(X.^2,2));

distM = sum(pDistM,2)/19; % the diagonal elements are zeros

indices = distM<(mean(distM)+3*std(distM)); % find smaller elements

remainingPoints = X(indices,:); % exclude outliers if needed

classSizes(i) = size(remainingPoints,1);

newS(ammend+1:ammend+size(remainingPoints,1),:) = remainingPoints;

Ref. code: 25605722300067ONR

95

recordIndices(:,i) = indices;

% indxAmmend = indxAmmend + size(indices,1);

ammend = ammend + size(remainingPoints,1);

end

end

%% training and test data




trainIndex = [rIndex(1:13,:);rIndex(20:32,:);rIndex(39:52,:);...

rIndex(59:71,:);rIndex(78:91,:);rIndex(98:111,:);...

rIndex(118:130,:);rIndex(137:149,:);rIndex(156:169,:)];

testIndex = [rIndex(14:19,:);rIndex(33:38,:);rIndex(53:58,:);...

rIndex(72:77,:);rIndex(92:97,:);rIndex(112:117,:);...

rIndex(131:136,:);rIndex(150:155,:);rIndex(170:175,:)];

trainData = newS(trainIndex,1:8);

trainTarget = newS(trainIndex,9);

testData = newS(testIndex,1:8);

testTarget = newS(testIndex,9);

%% find the centers i.e. mean vectors, maximum, and minimum matrices

nofClass = 9;

nDimen = size(newS,2)-1

classMeans = zeros(nofClass,nDimen);

classMins = zeros(nofClass,nDimen);

classMaxs = zeros(nofClass,nDimen);

for m = 1:nofClass

switch m

case 1

clStart = 1; clEnd = 13;

case 2


case 3

Ref. code: 25605722300067ONR

96


case 4


case 5


case 6


case 7


case 8


case 9


otherwise disp('Error in number of classes.');

end

classMeans(m,:) = mean(newS(clStart:clEnd,1:8));%finding mean vectors

classMins(m,:) = min(newS(clStart:clEnd,1:8)); % minimum vectors

%./0.9; % maximum vectors

classMaxs(m,:) = max(newS(clStart:clEnd,1:8))+std(newS(clStart:clEnd,1:8))*1.1;

end

% find the centers i.e. mean vectors, maximum, and minimum matrices ends %

trainTime = toc;

disp(['MMM train time:',num2str(trainTime)]);

%% Classification of test data by max min boundary begins here %

tic

newData = testData;%S(1:240,:);

newdataClass = zeros(nofClass,size(newData,1));

for i = 1:size(newData,1);

for j = 1:nofClass % sensor index

maxSum = sum(newData(i,:) <= classMaxs(j,:)); minSum = sum(newData(i,:)

>= classMins(j,:));

if maxSum >= 8 && minSum >= 8

Ref. code: 25605722300067ONR

97

newdataClass(j,i) = j; % first index is class, & i is test data index

else newdataClass(j,i) = 0;

end

end

end

%% Break ties

for i = 1:size(newdataClass,2)

tie = find(newdataClass(:,i),9);

if tie > 1

[row,col,val] = find(newdataClass(:,i)); % find tie classes

distance = zeros(size(val,1));

for j = 1:size(val) % size(val) gives number of non zero elements

distance(j) = norm(classMaxs(val(j),:)-newData(i,:));

end

[minDist,index] = min(distance);

for j = 1:size(val) % break ties

if (minDist < distance(j))

newdataClass(row(j)) = 0;

end

end

end

end

%% calculate errors

misClassError = 0;

falseError = 0;

for i = 1:size(newdataClass,1)

for j = 6*(i-1)+1:6*i

if newdataClass(i,j) == 0

misClassError = misClassError + 1;

elseif newdataClass(i,j) ~= i

falseError = falseError + 1;

end

Ref. code: 25605722300067ONR

98

end

end

testTime = toc;

disp(['MMM test time:',num2str(testTime)]);

disp(['MMM mis classification errors :',num2str(misClassError)]);

disp(['MMM false classification errors:',num2str(falseError)]);

signal processing for electronic nose, signal processing

Documents