applied soft computing - · pdf filen. md nor et al. / applied soft computing 61 (2017)...

14
Applied Soft Computing 61 (2017) 959–972 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Full Length Article Fault diagnosis and classification framework using multi-scale classification based on kernel Fisher discriminant analysis for chemical process system Norazwan Md Nor, Mohd Azlan Hussain , Che Rosmani Che Hassan Department of Chemical Engineering, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia a r t i c l e i n f o Article history: Received 5 August 2016 Received in revised form 27 July 2017 Accepted 5 September 2017 Available online 12 September 2017 Keywords: Fault classification Fault diagnosis Kernel Fisher discriminant analysis Wavelet analysis Support vector machine a b s t r a c t Fault detection and diagnosis (FDD) in chemical process systems is an important tool for effective process monitoring to ensure the safety of a process. Multi-scale classification offers various advantages for monitoring chemical processes generally driven by events in different time and frequency domains. However, there are issues when dealing with highly interrelated, complex, and noisy databases with large dimensionality. Therefore, a new method for the FDD framework is proposed based on wavelet analysis, kernel Fisher discriminant analysis (KFDA), and support vector machine (SVM) classifiers. The main objective of this work was to combine the advantages of these tools to enhance the performance of the diagnosis on a chemical process system. Initially, a discrete wavelet transform (DWT) was applied to extract the dynamics of the process at different scales. The wavelet coefficients obtained during the analysis were reconstructed using the inverse discrete wavelet transform (IDWT) method, which were then fed into the KFDA to produce discriminant vectors. Finally, the discriminant vectors were used as inputs for the SVM classification task. The SVM classifiers were utilized to classify the feature sets extracted by the proposed method. The performance of the proposed multi-scale KFDA-SVM method for fault classification and diagnosis was analysed and compared using a simulated Tennessee Eastman process as a benchmark. The results showed the improvements of the proposed multiscale KFDA-SVM framework with an average 96.79% of classification accuracy over the multi-scale KFDA-GMM (84.94%), and the established independent component analysis-SVM method (95.78%) of the faults in the Tennessee Eastman process. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Effective fault detection and diagnosis (FDD) methods for chem- ical process systems are important to ensure the consistency of high quality products, as well as the safety of these systems. Any abnor- mal process operations should be detected during the early stage to reduce the risk of machinery damages and economic losses. The root causes of process faults should be diagnosed earlier so that cor- rective actions can be taken to recuperate the plant system back to its normal operating conditions. Various FDD methodologies have been developed and proposed in the literature, and these are generally classified into three groups: quantitative model-based Corresponding author. E-mail address: mohd [email protected] (M.A. Hussain). methods, qualitative model-based methods, and process history- based methods [1–3]. Generally, quantitative model-based methods, such as the observer-based method, parity space, and parameter estimation method mainly employ mathematical models that are constructed from first principles for their process monitoring and fault detection task. Based on these mathematical models, a concept of analyt- ical redundancy is introduced to estimate the process behaviour and the process residuals. However, the effectiveness of these approaches depend on the precision of the mathematical models that have been constructed [4]. Furthermore, as chemical process systems become more complicated and complex, the characteriza- tions of first-principle models could also become difficult and even nearly impossible to build [5]. On the contrary, qualitative model-based methods, such as the signed digraphs (SDG) and fault tree analysis methods have employed cause-effect reasoning approaches to describe the pro- http://dx.doi.org/10.1016/j.asoc.2017.09.019 1568-4946/© 2017 Elsevier B.V. All rights reserved.

Upload: duongthu

Post on 15-Mar-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

F

Fcc

ND

a

ARRAA

KFFKWS

1

iqmtrrthg

h1

Applied Soft Computing 61 (2017) 959–972

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsev ier .com/ locate /asoc

ull Length Article

ault diagnosis and classification framework using multi-scalelassification based on kernel Fisher discriminant analysis forhemical process system

orazwan Md Nor, Mohd Azlan Hussain ∗, Che Rosmani Che Hassanepartment of Chemical Engineering, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia

r t i c l e i n f o

rticle history:eceived 5 August 2016eceived in revised form 27 July 2017ccepted 5 September 2017vailable online 12 September 2017

eywords:ault classificationault diagnosisernel Fisher discriminant analysisavelet analysis

upport vector machine

a b s t r a c t

Fault detection and diagnosis (FDD) in chemical process systems is an important tool for effective processmonitoring to ensure the safety of a process. Multi-scale classification offers various advantages formonitoring chemical processes generally driven by events in different time and frequency domains.However, there are issues when dealing with highly interrelated, complex, and noisy databases withlarge dimensionality. Therefore, a new method for the FDD framework is proposed based on waveletanalysis, kernel Fisher discriminant analysis (KFDA), and support vector machine (SVM) classifiers. Themain objective of this work was to combine the advantages of these tools to enhance the performanceof the diagnosis on a chemical process system. Initially, a discrete wavelet transform (DWT) was appliedto extract the dynamics of the process at different scales. The wavelet coefficients obtained during theanalysis were reconstructed using the inverse discrete wavelet transform (IDWT) method, which werethen fed into the KFDA to produce discriminant vectors. Finally, the discriminant vectors were usedas inputs for the SVM classification task. The SVM classifiers were utilized to classify the feature setsextracted by the proposed method. The performance of the proposed multi-scale KFDA-SVM method

for fault classification and diagnosis was analysed and compared using a simulated Tennessee Eastmanprocess as a benchmark. The results showed the improvements of the proposed multiscale KFDA-SVMframework with an average 96.79% of classification accuracy over the multi-scale KFDA-GMM (84.94%),and the established independent component analysis-SVM method (95.78%) of the faults in the TennesseeEastman process.

© 2017 Elsevier B.V. All rights reserved.

. Introduction

Effective fault detection and diagnosis (FDD) methods for chem-cal process systems are important to ensure the consistency of highuality products, as well as the safety of these systems. Any abnor-al process operations should be detected during the early stage

o reduce the risk of machinery damages and economic losses. Theoot causes of process faults should be diagnosed earlier so that cor-ective actions can be taken to recuperate the plant system backo its normal operating conditions. Various FDD methodologies

ave been developed and proposed in the literature, and these areenerally classified into three groups: quantitative model-based

∗ Corresponding author.E-mail address: mohd [email protected] (M.A. Hussain).

ttp://dx.doi.org/10.1016/j.asoc.2017.09.019568-4946/© 2017 Elsevier B.V. All rights reserved.

methods, qualitative model-based methods, and process history-based methods [1–3].

Generally, quantitative model-based methods, such as theobserver-based method, parity space, and parameter estimationmethod mainly employ mathematical models that are constructedfrom first principles for their process monitoring and fault detectiontask. Based on these mathematical models, a concept of analyt-ical redundancy is introduced to estimate the process behaviourand the process residuals. However, the effectiveness of theseapproaches depend on the precision of the mathematical modelsthat have been constructed [4]. Furthermore, as chemical processsystems become more complicated and complex, the characteriza-tions of first-principle models could also become difficult and even

nearly impossible to build [5].

On the contrary, qualitative model-based methods, such asthe signed digraphs (SDG) and fault tree analysis methods haveemployed cause-effect reasoning approaches to describe the pro-

9 ft Com

cama[

skiapssnsi

dbbmIa[fodiamdmpcasc

oippFdaitcrvma

fmTbitt

ttndc

60 N. Md Nor et al. / Applied So

ess system’s behaviour. Although the qualitative model-basedpproach is highly appropriate for diagnosis at the unit level, imple-enting this monitoring method for a large plant is complicated

nd inefficient, particularly for complex model in real time mode6].

Process history-based or data-driven methods generally con-ist of pattern recognition, significant statistical computation, ornowledge transformation of a large amount of process histor-

cal data. These methods eliminate the use of detailed modelsnd the difficulties in their development, and will enable properrocess monitoring and fault detection task, especially for large-cale chemical process systems. Therefore, data-driven techniques,uch as artificial neural network (ANN) [7–10], fuzzy logic [11,12],euro-fuzzy [13,14], and other machine learning methods, such asupport vector machine (SVM) [15,16], have largely been appliedn the process monitoring and fault diagnosis research field.

However, the development of process monitoring and faultetection approaches, especially in large-scale process systems cane challenging due to the complex and non-linear interactionsetween the faults and their symptoms, high correlation among theeasured variables, and a large number of sensors to be observed.

n addition, it is beyond the capabilities of an operator to monitorll the variables and still effectively assess the process operations17]. Furthermore, the effectiveness of a process monitoring andault detection system can also be impaired by the so-called ‘cursef dimensionality’ [18]. In general, due to the high dimensionalityatabase, the process data would contain redundant and irrelevant

nformation, which will increase the process system’s complexitynd computational requirement, as well as degrade the perfor-ance of the process monitoring system. Another issue of concern

uring the development of a process monitoring approach is theulti-scale properties of the process data, particularly in chemical

rocess systems [19]. This multi-scaling nature is inherent in pro-esses that are generally driven by events located in different timend frequency domains. Therefore, it is unsuitable for conventionaltatistical methods, especially in the separation of the deterministicomponents from its original input data.

These issues have prompted the development and applicationf numerous multivariate statistical techniques for dimensional-

ty reduction approaches and fault detection systems, such as therincipal component analysis (PCA) [20,21], independent com-onent analysis (ICA) [22], partial least square (PLS) [23], andisher discriminant analysis (FDA) [24,25]. For instance, statisticalimensionality reduction methods have been applied to capturend explain the variability between the variables, while retain-ng important salient characteristics, and hence, the efficiency ofhe process monitoring and fault detection system can signifi-antly be improved. In comparison, the PCA approach generallyepresents high-dimensional process data in a reduced dimensionia reconstruction, whereas the FDA approach provides an opti-al lower dimensional representation in terms of discrimination

mong classes.In many cases, the FDA performs better than the PCA approach

or classification problems, although both showed limited perfor-ance efficiencies in non-linear systems due to their linearity [26].

hus, the concept of kernel-based function to the FDA algorithm haseen proposed to alleviate these problems, as applied and reported

n the literature [27–29]. The use of kernel-based FDA (KFDA) takeshe advantage of non-linear kernel function mapping to maximizehe Fisher criterion in the high-dimensional space.

Additionally, one possible solution for dealing with data extrac-ion for multi-scale systems is through the use of wavelet

ransformation analysis. Wavelet analysis is a time-frequency sig-al analysis method, which has the local characteristics of timeomain and frequency domain. For instance, a group of statisti-al features, such as kurtosis, standard deviation, and maximum

puting 61 (2017) 959–972

value, which are extracted from the wavelet coefficients in the timedomain signals, could form a set of input features of these analyseddata. Since it is important to have good selections of input featuresin pattern classification and diagnosis, a multi-scale analysis of theprocess data would be advantageous in providing useful classifi-cation information, as concluded by previous researchers, such asLau et al., Zamanian and Ohadi, Adewole and Tzoneva, and Sun et al.[30–33].

For the aspect of fault diagnosis, the original SVM was designedby Vapnik in 1998 [34] and regarded as an efficient classifica-tion method for binary classifying. However, several different andmodified SVM methods have been proposed in recent years, sincethe fault diagnosis is usually solved as multi-class classificationproblems. Among them are combination methods named one-against-one SVM and one-against-all SVM, which were used by Jingand Hou [15], and Yin and Hou [35] for fault diagnosis in a chemi-cal process system. Yin and Hou also stated that the combination ofSVM and other approaches usually performs better than SVM alone[35]. It has also been necessary to develop new methods to deal withthe various types of complicated industrial systems. A combinationof SVM and wavelet transform for fault detection and diagnosishas also been applied in different applications, such as in high-voltage power transmission lines [36], refrigerant flow system [37],gearboxes system [38], and rolling element bearings in machines[39–41]. In another work, the combination of wavelet transformand SVM was further extended, using nonlinear PCA combinationfor better classification performance in reciprocating compressor[42].

In Lau et al. [30], the fault diagnosis framework has been devel-oped by combining multi-scale PCA (MSPCA) for feature extractionand ANFIS for fault detection and diagnosis. Firstly, the MSPCAmethod is used to extract the relevant features in the form ofscores and residuals space. Then, the selected MSPCA subspacesare fed into multiple ANFIS classifiers for learning the fault-symptom correlations in order to diagnose different faults fromthe Tennessee Eastman process. On the other hand, Nor et al. [24]proposed a process monitoring and fault detection method basedon the multi-scale KFDA (MSKFDA) feature extraction scheme. ThisMSKFDA-based fault detection work utilized the combination ofKFDA and wavelet analysis for multi-scale feature extraction. Theretained wavelet coefficients are reconstructed before XmR chartsare used for the fault detection task in the Tennessee Eastman pro-cess.

Hence, from the strong motivation to deal with these variousissues previously mentioned in relation to data extraction, classifi-cation, and fault diagnosis of a nonlinear chemical process system,an extension of the MSKFDA method previously proposed in Noret al. [24] is presented in this work. In this study, we used theMSKFDA method for the feature extraction, but we applied theSVM method to improve the performance of the fault classifica-tion. Furthermore, in this study we also analysed all the 21 faultsclassification in the Tennessee Eastman process while the previouswork only evaluated selected faults of the same process. The find-ings of this study would contribute significantly to the developmentof a robust multi-scale feature extraction and fault classificationmethod, and its implementation within the FDD framework is amajor contribution of this work.

This paper has been organized as follows: Section 1 describesthe literature survey regarding the proposed framework, while Sec-tion 2 introduces the background of the discrete wavelet transform(DWT), kernel Fisher Discriminant Analysis (KFDA), and the pro-posed fault classifier approaches, namely, support vector machine

and Gaussian mixture method. The proposed multi-scale KFDA-SVM framework methodology and a case study involving theapplication of the Tennessee Eastman process is provided in Sec-

ft Com

tfi

2

i

2

satda

wipdtbf

tcm

wrtaras

oirrs

dodtat

ω

atwth

ω

N. Md Nor et al. / Applied So

ion 3, while Section 4 presents the results and the discussion, andnally, Section 5 concludes the paper.

. Background

This section describes the general background and structure ofmportant elements in the proposed data-driven FDD framework.

.1. Discrete wavelet transform (DWT)

The discrete wavelet transform (DWT) was chosen for the multi-cale-based feature extraction approach, for multi-scale analysisnd decomposition, as described in [43]. Wavelets are basis func-ions that are localized in both time and frequency. Generally,iscrete dyadic form of wavelets is used, and can be representeds

mk (t) = 2-m/2�(

2-mt-k), (1)

here is the mother wavelet, and m and k are the scaling (dilat-ng) and shifting (translation) parameters, respectively. The shiftingarameter determines the location of the wavelet in the timeomain, while the scaling parameter determines its scale and loca-ion in the frequency domain. By projecting a signal on the waveletasis function, its contributions in different regions of the time-requency space can be obtained.

A wavelet transform involves the decomposition of a signal vec-or into simpler blocks at different scales and positions. The scalingoefficient, aL and wavelet coefficient, dL represent the originaleasured data vector, x as

aL = hLx,

dL = gLx(2)

here, hL represents m projection on the scaling function, gL rep-esents (L-1) projection on the scaling function, and L representshe level of decomposition. The sequences, hL and gL are low-passnd high-pass filters derived from the corresponding basis function,espectively. The scaling coefficients represent the lower frequencypproximation of the signal, while the wavelet coefficients repre-ent the higher frequency details of the signal.

The objective of scale determination is to find the combinationf scales that contain the most discriminating features. As for chem-

cal process data, the approximation at the coarsest scale generallyeflects the dynamic trend of the original variable, while the detailseflect, for example, sensors and process oscillations in differentcales. Thus, all of them should be involved in the scale selection.

Two main methods for threshold determination in waveletecomposition are the hard threshold and soft threshold meth-ds. The detail coefficient thresholding was applied in the waveletomain, where the hard threshold method handles the data by set-ing absolute values of wavelet parameters to constant when theyre greater than the threshold values, and they are set to zero whenhey are less than the threshold values, as shown by Eq. (3).

j,k = {ωj,k, |ωj,k| ≥ T

0, |ωj,k| < T(3)

In contrast, the soft threshold method handles the data byppointing the absolute values of wavelet parameters to zero whenhey are less than the threshold values, and they become constanthen they are greater than the threshold values. Signals by the soft

hreshold method, which are shown in Eq. (4), are better than the

ard threshold method, particularly for chemical process systems.

j,k = {sign(ωj,k)(|ωj,k| − T), |ωj,k| ≥ T

0, |ωj,k| < T(4)

puting 61 (2017) 959–972 961

where T is the threshold, while ωj,k denotes the variables in the

transformation domain, and the estimate ωj,k are obtained by keep-ing or zeroing the individual wavelet coefficients.

2.2. Kernel Fisher discriminant analysis (KFDA)

The idea behind KFDA is to solve the problem of FDA in the kernelfeature space, as shown in the between-class-scatter kernel matrix,SB, and within-class-scatter kernel matrix, SW, in Eqs. (5) and (6),respectively.

wTSBw=˛TKB ̨ (5)

wTSWw = ˛TKW ̨ (6)

The within-class-scatter matrix and the between-class-scattermatrix contain all the basic information about the relationshipswithin the groups and between them as formulated in Eqs. (7) and(8), respectively.

SW =K∑

k=1

∑nεCk

(xn-mk) (xn-mk)T (7)

SB =K∑

k=1

Nk (xn-mk) (mk-m)T (8)

where x is the observation number, and y is the variable number ofn by m matrix. A solution can be achieved by maximizing Eq. (9).This equation is achieved by following the Fisher criterion, with thecombination of Eqs. (5)–(8), as shown below:

J (�) = arg max|�TKB�||�TKW�| . (9)

2.3. Support vector machine (SVM)

A support vector machine (SVM) maps input feature vectors intoa higher dimensional feature space through nonlinear mappings. Itsgoal is to find the representative training observations, referred toas ‘support vectors’, to define two boundary hyperplanes with amaximum margin between them. With regards to linearly separa-ble two-class classification, the training data can be represented byEq. (10):

(xi, yi) , i = 1, 2, . . ., m, xi ∈ Rn, yi ∈

{+1, −1

}, (10)

whereby the SVM classifier can be expressed by a constrained opti-mization problem, as shown in Eq. (11):

minϕ(w) = w2

2(11)

s.t.yi (w · xi + b) − 1 ≥ 0, i = 1, 2, . . ., n,

where w·xi + b is the linear discriminant function, w is the weightvector, b is the threshold, and ϕ(w) is the reciprocal of the margin.This constrained optimization problem can be converted into a rel-atively simple dual problem using the Lagrange multiplier methodbased on dual theory. The final decision function can be expressedas shown by Eq. (12):

f (x) = sgn(

w∗ · x + b∗) = sgn

(s∑˛∗i yixi · x + b∗

)(12)

i=1 i

where w∗ is the weight vector of the optimal classification plane, ˛∗i

is the optimal solution, and b∗ is the classification threshold. If thereare some inseparable observations, the generalized classification

9 ft Com

pb

tld

f

wsR

sidn

2

muGma

p

wic

tb

L

tbomeegl

62 N. Md Nor et al. / Applied So

lane can be used to control the classification accuracy, which cane represented by Eq. (13):

n

i=1

yi�i = 0, 0 ≤ � ≤ C, i = 1, 2, . . ., n. (13)

For linearly inseparable two-class classification, a nonlinearransform can be used to convert it into a linearly separable prob-em in a higher dimensional space. The corresponding separationecision function can be expressed by Eq. (14):

(x) = sgn

(s∑

i=1

�∗i yiK(xi, x) + b∗

)(14)

here the nonlinear transformation from the lower dimensionalpace to the higher dimensional space can be expressed as ϕ ( · ) :d → R

k. This implementation by the so-called kernel trick, con-tructs a kernel function, K as K

(xi, xj

)= ϕ (xi)ϕ

(xj)

, so that thenner product operation in the higher dimensional space can beirectly obtained using the values of the original variables withouteeding an explicit expression of the nonlinear transform, ϕ ( · ).

.4. Gaussian mixture method (GMM)

The Gaussian mixture method (GMM) employs Gaussian PDFodels with the assumption of Gaussian distribution. It has been

sed as a tool in probability distribution function (PDF), where aMM is a weighted sum of M component densities; each being aultivariate Gaussian with mean, �i and covariance matrix,

∑i,

s shown by Eq. (15):

(z|�

)=

M∑i=1 i

˛iG(

z;�i,∑

i

)(15)

here the weights satisfy the constraint:∑M

i=1˛i = 1. A GMMs parameterized by the mixture weights, mean vectors, andovariance matrices: � =

{˛i, �i,

∑i; i = 1, . . ., M

}. Given a set of

raining data{zn, n = 1, . . ., N

}, the parameters can be estimated

y maximizing the likelihood function in Eq. (16):

(�)

=N∏n=1

p(zn|�

)(16)

Since the number of mixtures, M has been previously defined,he modelling time and the approximation capability also need toe considered beforehand. Therefore, a fast and precise methodf tuning parameters should be used when applying the GMModel. To estimate the mean and variance of the GMM, the

xpectation-maximization (EM) algorithm is used iteratively. Inach EM iteration, the following updating formulas are used touarantee a monotonic increase in the likelihood value (maximumikelihood (ML) estimation), as given by Eqs. (17)–(20):

i = 1N

N∑n=1

p(i|zn, �) (17)

i =∑N

n=1p(i |zn, �)zn∑N p(i|zn, �)(18)

n=1

i=

∑Nn=1p(i|zn, �)(zn − �i)(zn − �i)

T∑Nn=1p(i|zn, �)

(19)

puting 61 (2017) 959–972

where

p(i|zn, �) = ˛iG(zn; �i;∑

i)∑Mk=1˛kG(zn; �k; ;

∑k)

(20)

Eqs. (17)–(20) constitute the EM algorithm, where Eq. (18) isthe expectation step of the EM algorithm. This step exploits allthe parameters and the prior probability to obtain the posteriorprobability. Meanwhile, Eqs. (18)–(20) are involved in finding allparameters based on the posterior probability, which is also knownas the maximization step. The classification equation is expressedas Eq. (21):

Classify (zi) = argmaxpi=1Pi,GMM (21)

where p represents the class of the faults, which is based on theiteration of the expectation and the maximization steps to find aconverged solution of the problems.

3. Fault detection and diagnosis system based onmulti-scale KFDA framework

This section describes the new FDD framework based onmulti-scale KFDA feature extraction approach, with support vectormachine (SVM) classification strategy. The proposed methodologyof this novel multi-scale KFDA-SVM framework is highlighted bythe flowchart in Fig. 1.

This strategy consists of two major steps following data acqui-sition and normalization, as seen in Fig. 1, which are the featureextraction step and the fault diagnosis classification step. First, oncedata acquisition from the process system was completed, the inputdatabase that consisted of the normal and faulty data were pre-processed using the normalization method. Then, the normalizeddata was fed to the multi-scale KFDA feature extraction step. Dur-ing this step, the discrete wavelet transform (DWT) method, withits multi-scale feature decomposition, was applied to give the dis-tinguished characteristic features of the input data. Then, for thefeature extraction, the threshold limit was applied to both of theDWT decomposed coefficients. Once the input features have beenextracted, the DWT reconstruction method, known as the inversediscrete wavelet transform (IDWT), was applied to the selectedcoefficients. Through these procedures, the dimensions of the inputpatterns can be reduced and useful information can be extracted.Next, the reduced dimensional database was further distinguishedand separated by the kernel FDA method into the discriminativefeature space, and was fed to the SVM pattern classifier to performthe fault classification step. The SVM method was applied to theframework to find the patterns in the extracted multi-scale KFDAsubspaces. When faults are detected, the classifier would diagnosethe faults by assigning the features to the corresponding detectedfault classes. If the SVM classifier detected a normal condition, thedata used by the SVM classifier would be stored in the database asnormal data. The detailed steps of the proposed multi-scale KFDA-SVM framework are further described in the following sections.

3.1. Multi-scale KFDA-based feature extraction approach

This section describes the proposed multi-scale KFDA-basedfeature extraction step in details, in reference to Fig. 2.

3.1.1. Data acquisition and normalizationAs shown in Fig. 2, the data acquisition step for the chemical

process system was applied to obtain samples for the database. The

data acquisition step refers to the process of collecting useful datafrom the process system. The acquired process data were dividedinto two subsets: the training dataset and the testing dataset. Thetraining dataset was used for developing the models, while the

N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972 963

Fig. 1. Flowchart of the proposed multi-scale KFDA-SVM fault diagnosis framework.

ale KF

td

duIad

v

oasa

Fig. 2. The proposed multi-sc

esting dataset was reserved for evaluating the performance of theeveloped classification models.

The normalization technique was applied to the acquired inputatabase to linearly scale the collected variables in the [0,1] rangesing a min-max normalization equation, as expressed by Eq. (22).

t is important to scale the data before applying the multivari-ble methods to avoid variables with greater numerical range fromominating those with smaller numerical range.

′ij = vij − mini

maxi − mini(22)

In this equation, maxi is the maximum and mini is the minimum

f the ith attribute values. Meanwhile, vij is the value of the ithttribute of jth object and v′

ij is the normalized value. This initialtep is crucial for improving data quality as well as to improve theccuracy and efficiency of the statistical and computational process.

DA feature extraction steps.

3.1.2. DWT decompositionThen, each of the variables in the input data was individually

decomposed by applying the discrete wavelet transform (DWT)approach, with Daubechies wavelet as the mother wavelet. Briefly,the multi-scale KFDA approach contained, for example, inputmatrix x with m variables and n samples, which can be consideredas an n × m data matrix. Each of the m columns was individuallydecomposed, where each of the m variables was applied with thesame level of decomposition, and labelled as L. Based on Eq. (2), thescaling and wavelet function values were calculated for each iter-ation and presented as coefficients ai and di, as given by Eqs. (23)and (24), respectively:

ai = h0x2i + h1x2i+1 + h2x2i+2 + . . . + hL−1x2L−1 (23)

di = g0x2i + g1x2i+1 + g2x2i+2 + . . . + gL−1x2L−1 (24)

9 ft Com

g

fiac×bi

[

[

rd

3

reatfis

S

T

w�s0b

3

iiaa

64 N. Md Nor et al. / Applied So

The details of these equations, including the sequence of hLandL derivation have been given in Eq. (2).

Then, the wavelet approximation coefficients, ai and detail coef-cients, di from each of the variable decompositions, were collectednd constructed in their respective matrix, approximation coeffi-ients matrices, and detail coefficient matrices. The matrix size ofn2L

, depended on the number of decomposed variables, m, num-er of observations, n, and the level of decomposition, L, as shown

n Eq. (25).

...

ai

ai+1

ai+2

...

] = [

. . ....

...... · · ·

......

...... . .

.

· · · h0 h1 h2 h3 0 0 0 0 · · ·

· · · 0 0 h0 h1 h2 h3 0 0 · · ·

· · · 0 0 0 0 h0 h1 h2 h3 · · ·

. .. ...

......

......

......

.... . .

] • [

...

x2i

x2i+1

x2i+2

...

]

...

di

di+1

di+2

...

] = [

. . ....

...... · · ·

......

...... . .

.

· · · g0 g1 g2 g3 0 0 0 0 · · ·

· · · 0 0 g0 g1 g2 g3 0 0 · · ·

· · · 0 0 0 0 g0 g1 g2 g3 · · ·

. .. ...

......

......

......

.... . .

] • [

...

x2i

x2i+1

x2i+2

...

](25)

As a result, a total of L+1 matrices were formed, with each rep-esenting the approximation coefficients and detail coefficients atifferent levels of decomposition scale.

.1.3. Threshold determinationIn the next step, threshold determination was initiated, with the

etained wavelet coefficient was larger than the threshold param-ter. The wavelet coefficients usually correspond to a significantction in the process, where the aim of the threshold determina-ion, or scale selection would be to find the most discriminatingeatures. Stein’s unbiased likelihood estimate method was appliedn the soft threshold method based on the nonlinear transform, ashown by Eq. (26):

(x) = { sign(x)(|x| − T), |x| ≥ T

0, |x| < T, (4)

=√

2�2 log (n)/n, (26)

here T is a threshold, n is the length of the input vector, and2 is the estimated variance of the input data. In this work, theoft thresholding value for normalized data with range of [0,1] was.4. The reconstruction method was applied after the threshold haseen calculated.

.1.4. IDWT reconstructionThen, the multi-scale model was produced by restructuring

mportant scales from the previous decomposition and threshold-ng step. The inverse discrete wavelet transformation (IDWT) waspplied for restructuring the deterministic components in the vari-bles from the retained wavelet coefficients following the threshold

puting 61 (2017) 959–972

calculation. For example, all coefficients through the filter h0 andh1, were recombined and reconstructed in the time-domain space,as shown in Fig. 3. The reconstruction process consisted of level 2up-sampling, as denoted by ↑2 symbol in the figure.

A reconstruction method was applied by constructing L + 1single-scale classifiers, including L detail classifications at all lev-els and one approximation classifier at the coarsest level. Based onthe results of both cross-validation and testing data validation, thescales were added one by one to the present scales until the optimaloverall classification accuracy was obtained.

3.1.5. KFDA discriminant vectorDuring the final step, the KFDA approach was applied to each

of the reconstructed matrices, with the objective of extracting thediscriminative attributes across the multi-scale data. The key ideabehind the KFDA algorithm was to enable homogenous data tocome close within each other, and the heterogeneous data to bedistant from each other by projecting data to the high dimensionalspace using kernel mapping.

The KFDA discriminant vector can be obtained using the within-class-scatter matrix, Sw and between-class-scatter matrix, SB basedon Eqs. (5) and (6), by maximizing the criterion in Eq. (27):

maxvk /= 0

vTkSBvk

vTkSwvk

, (27)

where vk is the KFDA vector, and can be expressed as shown by Eq.(28):

vk =m∑i=1

˛i� (xi) (28)

with coefficient �i, i = 1, . . ., m.Based on the KFDA discriminant score vectors for multiple types

of pattern data, the first-two dimension feature vectors of the scorevectors became the inputs for the SVM models for pattern classifi-cation.

3.2. SVM fault classification strategy

For the proposed multi-scale KFDA-SVM fault detection anddiagnosis framework, the C-SVM model was selected for thefault classification task. The generalized decision function can beexpressed as shown by Eq. (29):

f (x) = sgn

(s∑i=1

˛∗i yixi · x + b∗

)

where ˛∗i

is the optimal solution and b* is the classificationthreshold. The generalized classification plane, which controls theclassification accuracy, can be represented by Eq. (30):

n∑i=1

yi˛i = 0, 0 ≤ ̨ ≤ C, i = 1, 2, . . ., n. (30)

Eq. (30) shows that the suitable penalty parameter, also knownas the C parameter, needs to be carefully determined since themodel is highly sensitive, especially to the parameter settings andkernel function type. Additionally, the radial basis function (RBF)kernel was used with the kernel parameter, , which must be takeninto consideration in this work. Therefore, the selected RBF kernelfunction for the C-SVM model was based on Eq. (31):( )

k(xi, x

′j

)= exp −‖xi − x

′j‖

2, > 0. (31)

These algorithms were implemented in MATLAB, where theC-SVM classifier was developed for every combination of faults.

N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972 965

t Tran

Hwitaccm

3

mamutsC

3

itpw[cchfiigot

oa

A

wcioo

3

wsV

Fig. 3. Inverse Discrete Wavele

owever, the most suitable C and values for the given parametersere unknown beforehand. Therefore, the C and parameters were

nitially fixed at 0.1, before each of these parameters was graduallyuned to reduce the SVM training errors. The correct tuning anddjustment of these parameters have significantly influenced thelassification outputs. Finally, the combined multi-scale KFDA-SVMlassifiers were applied with the best SVM classification perfor-ance for every designated fault used in the proposed framework.

.2.1. Comparison for the classification strategyWith the aim of highlighting the superiority of the perfor-

ance of this proposed multi-scale KFDA-based feature extractionpproach, another classifier, known as the Gaussian mixtureethod (GMM), was implemented for comparison. This method

sed the expectation-maximization (EM) algorithm to estimate allhe parameters in the model, as shown in Eqs. (18)–(20) in sub-ection 2.4, where some details of this method have been given.omparison was also made with the ICA-SVM method [44].

.2.2. Performance evaluationFor the classification performance, the separability and cluster-

ng between these classified faults database can be seen throughhe classification projection diagram. Additionally, the diagnosiserformance for the proposed multi-scale KFDA-SVM frameworkas initially illustrated using the confusion matrix, as suggested by

45]. From the confusion matrix, the total number of samples wasonsidered for each of the designated faults. Data points that wereorrectly classified were represented by the diagonal elements andighlighted in green squares, whereas the others were misclassi-ed data points, coloured in red squares; either as the correct or

ncorrect responses. The accuracy per output class was given by therey squares, while the blue squares presented the overall accuracyf the SVM classifier. The classification efficiency was 100% whenhere was no misclassification.

The overall performance evaluation of the proposed methodol-gy and the other methods can be measured using the tool defineds the accuracy rate, given by Eq. (32):

ccuracyRate =(

1-FP+FN

TP+TN+FP+FN

)× 100%, (32)

here True Positive (TP) is a fault indication on faulty operatingondition, False Positive (FP) is a fault indication on normal operat-ng condition, True Negative (TN) is a normal indication of normalperating condition, and False Negative (FN) is a normal indicationf faulty operating condition.

.3. Case study - Tennessee Eastman process (TEP)

The realistic simulation of the Tennessee Eastman process (TEP)as chosen as a case study for the proposed FDD framework. This

imulation of the industrial process was introduced by Downs andogel [46] for evaluating and benchmarking various process control

sform for signal reconstruction.

and fault monitoring methods. The details can be found in [46] and[47].

3.3.1. Process descriptionThe Tennessee Eastman process, as shown in Fig. 4, was devel-

oped by the Eastman Chemical Company with the intention ofproviding a realistic simulation of an industrial process. The controlstrategy implemented in the process has been described by Lymanand Georgakis in [48].

The process involves four irreversible exothermic gas reactions.These reaction rates depend on the temperature and the concen-trations of the reactants. The process has five major units: reactor,product condenser, vapour-liquid separator, recycle compressor,and product stripper. Four reactants are the inputs of the processthat produce two products and two by-products, named alphabeti-cally from A to H. The heat of the reactions is removed by the coolingwater in the heat exchanger. The products and unconverted reac-tants leave the reactor, as vapours, which were partly converted toliquid in the condenser, as given by Eq. (33)–(36).

A (g) + C (g) + D (g) → G (liq) (33)

A (g) + C (g) + E (g) → H (liq) (34)

A (g) + E (g) → F (liq) (35)

3D (g) → 2F (liq) (36)

3.3.2. Process faultsThe process in this study contained 52 variables, which included

41 measured and 11 manipulated variables, as summarized inTable 1. The input variables were XMEAS(1) to XMEAS(36), andXMV(1) to XMV(11). XMEAS(1) to XMEAS(36) were the processmeasurements, and XMV(1) to XMV(11) were the manipulatedvariables, whereas the output variables were XMEAS(37) toXMEAS(41). There were 21 process faults (Fault 1 to Fault 21)introduced in the TEP, as summarized in Table 2. These faults rep-resented several types of process faults, such as step disturbances,random variations, a slow kinetic drift, valve sticking, and otherunknown conditions.

In this paper, all of the faults in the TEP were tested and analysedby the proposed framework. Data acquisition was established for21 of the designated faults introduced at different range of oper-ating conditions. Data acquisition for each fault has been carriedout to include the 52 types of variables, based on three-minuteinterval time sampling. Meanwhile, faults were designated to occurafter one hour of running the process with the training data andafter eight hours of running the process with the testing data,with the specification for each designated process fault tabulatedin Table 2. The proposed framework was then applied for these

datasets, with the classification performance evaluated based onthe testing datasets. Comparisons were also made with the multi-scale KFDA-GMM and ICA-SVM methods, as detailed in the nextsection.

966 N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972

Fig. 4. Tennessee Eastman process diagram.

Table 1Measured and Manipulated Variables of the TE Process.

Identification Description Identification Description

XMEAS(1) A feed Stream 1 XMEAS(27) Reactor feed component EXMEAS(2) D feed Stream 2 XMEAS(28) Reactor feed component FXMEAS(3) E feed Stream 3 XMEAS(29) Purge component AXMEAS(4) Total feed Stream 4 XMEAS(30) Purge component BXMEAS(5) Recycle flow XMEAS(31) Purge component CXMEAS(6) Reactor feed rate XMEAS(32) Purge component DXMEAS(7) Reactor pressure XMEAS(33) Purge component EXMEAS(8) Reactor level XMEAS(34) Purge component FXMEAS(9) Reactor temperature XMEAS(35) Purge component GXMEAS(10) Purge rate XMEAS(36) Purge component HXMEAS(11) Separator temperature XMEAS(37) Product component DXMEAS(12) Separator level XMEAS(38) Product component EXMEAS(13) Separator pressure XMEAS(39) Product component FXMEAS(14) Separator underflow XMEAS(40) Product component GXMEAS(15) Stripper level XMEAS(41) Product component HXMEAS(16) Stripper pressure XMV(1) D feed flow Stream 2XMEAS(17) Stripper underflow XMV(2) E feed flow Stream 3XMEAS(18) Stripper temperature XMV(3) A feed flow Stream 1XMEAS(19) Stripper steam flow XMV(4) Total feed flow Stream 4XMEAS(20) Compressor work XMV(5) Compressor recycle valveXMEAS(21) Reactor cooling water outlet temp. XMV(6) Purge valveXMEAS(22) Separator cooling water outlet temp. XMV(7) Separator product liquid flowXMEAS(23) Reactor feed component A XMV(8) Stripper product liquid flow

4

nftsa

XMEAS(24) Reactor feed component B

XMEAS(25) Reactor feed component C

XMEAS(26) Reactor feed component D

. Results and discussions

The proposed framework was implemented to detect and diag-ose the faults in the TEP database, with the data for training used

or framework modelling, while the data for testing were used

o evaluate the classification results through the accuracy rate, ashown in Eq. (32). For the first part of the discussion, an evalu-tion of the multi-scale KFDA-based feature extraction work was

XMV(9) Stripper steam valveXMV(10) Reactor cooling water flowXMV(11) Condenser cooling water flow

designed to investigate the efficiency of the wavelet decomposi-tion in the proposed approach, using the normalized data of Fault 4from the reactor’s cooling water flow variable (XMV10), as shownin Fig. 5.

From Fig. 5(b), the approximation coefficient for Fault 4 data of

the transformed signal clearly showed a significant difference inthe amplitude of the plot compared to the normal data. This distin-guished features showed that the disturbances or fault events have

N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972 967

0 50 100 150 200 250 300 350 40 0 45 0304050

DWT de compo sition of XMV(10) of Fault 4

2 4 6 8 10 12 14 16 18 20200250300

50 100 150 200-505

0 10 20 30 40 50 60-505

2 4 6 8 10 12 14 16 18 20-202

No faultFault 4

Fig. 5. Variable XMV10 with normal and Fault 4 for: (a) original data; (b) fifth-level appro(d) third-level detail coefficients decomposition, and (e) fifth-level detail coefficients dec

Table 2Faults defined in the TE process.

Fault ID Description Type

DV1 A/C feed ratio, B composition constant(stream 4)

Step

DV2 B composition, A/C ratio constant(stream 4)

Step

DV3 D feed temperature (stream 2) StepDV4 Reactor cooling water inlet

temperatureStep

DV5 Condenser cooling water inlettemperature

Step

DV6 A feed loss (stream 1) StepDV7 C header pressure loss-reduced

availability (stream 4)Step

DV8 A, B, C feed composition (stream 4) Random variationDV9 D feed temperature (stream 2) Random variationDV10 C feed temperature (stream 4) Random variationDV11 Reactor cooling water inlet

temperatureRandom variation

DV12 Condenser cooling water inlettemperature

Random variation

DV13 Reaction kinetics Slow driftDV14 Reactor cooling water valve StickingDV15 Condenser cooling water valve StickingDV16-DV20 Unknown UnknownDV21 Valve for stream 4 fixed at the

steady-state positionConstant position

osds

was combined with the SVM classifier, to perform the final diag-nosis. Figs. 8 and 9 show the final diagnosis of Fault 5 and Fault 16,

ccurred in this data. The detailed coefficient for level 5 decompo-ition in Fig. 5(e) also showed some distinctive characteristics to

ifferentiate the normal and fault conditions in the database. Ashown and proven by Fig. 5, discrete wavelet analysis transforma-

ximate coefficient decomposition, (c) first-level detail coefficients decomposition,omposition.

tion had decomposed the faulty data into significant informationrelated to the process variables.

Next, the KFDA was applied on all of multi-scaled data for faultclassification. Figs. 6 and 7 show the classifications for Fault 4, Fault9, and Fault 11 based on the FDA projection and the proposed multi-scale KFDA, respectively. Fig. 6 clearly shows that the FDA wasable to classify Fault 4 and Fault 9 data, but failed to separate andclassify Fault 11. The FDA method was unable to distinguish Fault11 because the separation between-classes was not large enough,whereas the distribution within-classes was quite large. This wasbecause all variables in the datasets could not be separated withoutproper elimination of the insignificant information, which sharedcommon characteristics.

In contrast, the proposed multi-scale KFDA method hasimproved the power of discrimination and classification, especiallyin multiple time and frequency domains, as shown in Fig. 7, whichshows the projection data onto the first two multi-scale KFDAvectors. Undoubtedly, the figure shows that there was a large sep-aration in-between-class distribution, while in the scattering ofwithin-classes, distance was shortened compared to in the FDA,which proved that the proposed multi-scale KFDA has better dis-criminative power than the normal FDA method for faults 4, 9, and11.

Furthermore, the classification and diagnosis performance ofthe proposed multi-scale KFDA-SVM framework has been tabu-lated using the confusion matrix, as shown in Figs. 8 and 9. In thisproposed multi-scale KFDA method, the feature extraction method

respectively, based on the performance by the multi-scale KFDA,FDA-SVM, and single SVM fault classification frameworks.

968 N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972

lassifi

efiwgotc

uapcosrct5Saaethwwt

Fig. 6. Fault 4, Fault 9 and Fault 11 c

From the confusion matrix, 960 samples were considered forach of the designated faults. Data points that were correctly classi-ed were represented by the diagonal elements, whereas the othersere misclassified responses. The accuracy per output class was

iven by the grey squares, while the blue squares presented theverall accuracy of the SVM classifier. Even though the classifica-ion efficiency was 100% when there was no misclassification, thelassification accuracy was only specific to this particular dataset.

Fig. 8 shows the confusion matrix of the classification accuracysing the multi-scale KFDA-SVM (Fig. 8(a)), FDA-SVM (Fig. 8(b)),nd SVM (Fig. 8(c)) frameworks on Fault 5 of the Tennessee Eastmanrocess data. Fault 5 was related to the step change in the condenserooling water’s inlet temperature. When this fault occurred, theutlet stream flow rate from the condenser to the vapour-liquideparator had also increased, and hence, had increased the sepa-ator cooling water’s outlet temperature, and initiated unwantedhanges in the process temperature in the system. Fig. 8 also showshe comparisons for the classification accuracy results for Fault, using the proposed multi-scale KFDA-SVM, with FDA-SVM, andVM fault detection frameworks. Both the multi-scale KFDA-SVMnd FDA-SVM methods could classify the fault data with 99.5%nd 91.4% accuracy, while the SVM classifier, without any featurextraction method, has a 76.8% accuracy for Fault 5 classifica-ion. However, the proposed multi-scale framework showed much

igher accuracy in classification compared to the FDA-SVM frame-ork. This was because the combination of the wavelet analysisith KFDA has the ability to extract richer faulty information from

he measured variables. These classifiers could also maximize and

cation projection using normal FDA.

minimize the between-class and within-class data for better clas-sification

The classification results for Fault 16 are shown in Fig. 9,whereby Fault 16 was an unknown fault, in which the root causewas unidentified. This type of fault could produce multi-scale devi-ations in the measured variables, thus, may lead to difficulties inclassification. However, Fig. 9(a) shows that the proposed multi-scale KFDA-SVM framework has classified the fault with 99.8%accuracy compared to FDA-SVM and SVM frameworks, at 90.3%and 79.7% accuracy, respectively. The higher accuracy achieved bythe proposed framework also showed that the multi-scale featureextraction method had filtered the data in both time and frequencylevels for better classification compared to the other methods.

Table 3 presents the classification results for the proposed multi-scale KFDA-SVM, compared with the multi-scale KFDA-GMM, forall 21 types of fault classes. All the steps for the multi-scaleKFDA-SVM and multi-scale KFDA-GMM methods used the MAT-LAB programme, based on Intel i7-4770 processors with 3.4 GHzand 8GB of memory. The average detection for each fault tookapproximately 0.2 milliseconds for the proposed method, withthe computational time for fault detection could be improved byincreasing the speed and memory of the processors.

It is worth to note that the multi-scale KFDA-SVM offered higheraccuracy rate for the TEP classification case compared to the other

multi-scale KFDA-GMM framework. Comparisons of the percent-ages of diagnosis accuracy, as presented in Table 3 showed that theaverage diagnostic accuracy percentage for the multi-scale KFDA-SVM was significantly higher for most of the faults. On average,

N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972 969

Fig. 7. Fault 4, Fault 9, and Fault 11 classification projection using multi-scale KFDA.

Table 3Comparison among diagnosis accuracy using proposed approaches for all faults.

Accuracy rate (%)

Multi-scale KFDA-SVM Multi-scale KFDA-GMM

Fault 1 99.80 98.00Fault 2 99.87 98.37Fault 3 43.33 15.75Fault 4 95.89 62.75Fault 5 99.50 97.62Fault 6 96.02 89.00Fault 7 99.44 95.75Fault 8 95.81 54.37Fault 9 51.56 56.50Fault 10 93.99 23.62Fault 11 95.38 26.75Fault 12 95.63 78.12Fault 13 95.24 73.87Fault 14 98.05 85.87Fault 15 64.74 68.75Fault 16 97.80 93.62Fault 17 97.32 98.37Fault 18 95.15 89.0Fault 19 96.87 62.87Fault 20 93.63 62.50Fault 21 87.64 15.12

ttts

Table 4Comparison of diagnosis accuracy using different approaches for selected faults (Allbut Fault 3, Fault 9, Fault 15 and Fault 21).

Accuracy rate (%)

Multi-scale KFDA-SVM Multi-scale KFDA-GMM ICA-SVM

Fault 1 99.80 99.4 99.6Fault 2 99.87 98.3 98.1Fault 4 95.89 96.1 99.1Fault 5 99.50 95.1 99.0Fault 6 96.02 99.3 100Fault 7 99.44 99.3 99.1Fault 8 95.81 97.8 97.8Fault 10 93.99 57.7 89.4Fault 11 95.38 68.6 82.7Fault 12 95.63 97.6 99.5Fault 13 95.24 95.5 95.5Fault 14 98.05 90.7 99.0Fault 16 97.80 63.8 93.0Fault 17 97.32 62.9 95.7Fault 18 95.15 90.4 91.5Fault 19 96.87 56.0 96.6

Average 90.13 68.88

he KFDA-SVM had produced higher diagnosis accuracy compared

o the multi-scale KFDA-GMM framework, even though some ofhe unobservable faults, such as Fault 9 and Fault 15 were closelyimilar in their accuracy.

Fault 20 93.63 75.4 92.7Average 96.79 84.94 95.78

However, as discussed by He et al. [49], the detection of faultsfor Fault 3, 9, 15, and 21 were very difficult as there were noneof the observable changes in the means, variance, or the peak time,which made them difficult to isolate from the normality. Thus, these

faults were excluded during further analysis. The resulting simula-tion results for the classification accuracy, without Faults 3, 9, 15,and 21 are given in Table 4.

970 N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972

for (a)

srbgidd

htTKpEid

tc

Fig. 8. Confusion matrices for classification accuracy of Fault 5

To further demonstrate the superiority of the proposed multi-cale KFDA feature extraction, a comparison with the classificationesults of the ICA-SVM framework was also conducted, as proposedy Hsu et al. [44]. This work was developed using the LIBSVM pro-ramme, based on the same TEP dataset from [47], which was usedn this work. Since the main objective of this work was to classifyifferent types of faults, the difference in programming languagesid not affect their performance.

Table 4 clearly shows that the multi-scale KFDA-SVM has theighest average of classification accuracy (96.79%) compared withhe ICA-SVM (95.78%), and the multi-scale KFDA-GMM (84.94%).hese present findings also suggested that the proposed multi-scaleFDA-SVM has more capabilities in diagnosing faults when com-ared with a traditional combination method, such as the ICA-SVM.ven though ICA can extract rich information from the input data,

t is still a linear learning technique, which may not fit this type of

ata structure in the TEP process.

To summarize, the FDA and KFDA methods, which aimedo find the optimal transformation by minimizing the within-lass distance and maximizing the between-class distance, had

multi-scale KFDA-SVM, (b) FDA-SVM, and (c) SVM framework.

simultaneously contributed towards improving fault classification.However, the multi-scale KFDA classifications, with the SVM classi-fiers, had produced averagely higher diagnosis accuracy comparedto other network combination approaches. The results of this studyhave indicated that multi-scale feature extraction approach canfilter data at both time and frequency levels for better diagno-sis compared to a method that considers only single-scale featureextraction. Therefore, the proposed multi-scale KFDA-SVM methodhad given significantly better results and the classification ratehad remained consistently higher compared to the other methods,which did not use the multi-scale feature extraction approach.

5. Conclusion

The present study was designed to evaluate the integration ofwavelet analysis and discrimination analysis methods for feature

extraction. Thus, a novel multi-scale feature extraction method wasproposed, known as the multi-scale KFDA, which combined a dis-crete wavelet transform and the KFDA. Furthermore, this studyhad proposed new FDD frameworks that implemented multi-scale

N. Md Nor et al. / Applied Soft Computing 61 (2017) 959–972 971

for (a

KapSEbIpnamddbfp

A

f

Fig. 9. Confusion matrices for classification accuracy of Fault 16

FDA feature extraction method and data-driven classifiers, suchs the SVM for fault diagnosis, and the GMM classifier for com-arison. By comparing the performances and effectiveness of theVM and GMM classifications when dealing with the Tennesseeastman process database, this study has shown that the SVM-ased classifier was better than the multi-scale KFDA-GMM and

CA-SVM frameworks for fault detection and diagnosis. The pro-osed multi-scale KFDA-SVM framework can successfully detectearly all types of faults, with a high degree of accuracy (averageccuracy of greater than 90%). These results also showed that theulti-scale KFDA methodology has contributed in enhancing data

iscrimination, with its consideration of the time and frequencyomains of distinctive information. This characteristic had enabledetter data compression compared to the PCA and FDA, especiallyor a nonlinear process system, such as the Tennessee Eastmanrocess.

cknowledgement

The authors are grateful to the Universiti Sains Malaysia (USM)or the SLAB-KPT scholarship, and University of Malaya and the

) multi-scale KFDA-SVM, (b) FDA-SVM, and (c) SVM framework.

Ministry of Higher Education in Malaysia for supporting this workthrough the FRGS grant FP064-2015A.

References

[1] V. Venkatasubramanian, R. Rengaswamy, K. Yin, A review of process faultdetection and diagnosis Part I: Quantitative model-based methods, Comput.Chem. Eng. 27 (2003) 293–311.

[2] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, A review of processfault detection and diagnosis part II: Qualitative models and search strategies,Comput. Chem. Eng. 27 (2003) 313–326, http://dx.doi.org/10.1016/S0098-1354(02)00161-8.

[3] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review ofprocess fault detection and diagnosisPart III: Process history based methods,Comput. Chem. Eng. 27 (2003) 327–346.

[4] O.A.Z. Sotomayor, D. Odloak, Observer-based fault diagnosis in chemicalplants, Chem. Eng. J. 112 (2005) 93–108, http://dx.doi.org/10.1016/j.cej.2005.07.001.

[5] Zhiqiang Ge, Zhihuan Song, Furong Gao, Review of recent research on databased process monitoring, Am. Chem. Soc. (2013) 3543–3562, http://dx.doi.org/10.1021/ie302069q.

[6] E.E. Tarifa, N.J. Scenna, A methodology for fault diagnosis in large chemical

processes and an application to a multistage flash desalination process: partII, Reliab. Eng. Syst. Saf. 60 (1998) 41–51, http://dx.doi.org/10.1016/S0951-8320(97)00126-9.

[7] J.P. Patel, S.H. Upadhyay, Comparison between artificial neural network andsupport vector method for a fault diagnostics in rolling element bearings,

9 ft Com

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[48] P. Lyman, C. Georgakis, Plant-wide control of the tennessee eastman problem,

72 N. Md Nor et al. / Applied So

Procedia Eng. 144 (2016) 390–397, http://dx.doi.org/10.1016/j.proeng.2016.05.148.

[8] J. Mohd Ali, N. Ha Hoang, M.A. Hussain, D. Dochain, Review and classificationof recent observers applied in chemical process systems, Comput. Chem. Eng.76 (2015) 27–41 https://doi.org/10.1016/j.compchemeng.2015.01.019.

[9] M.A. Greaves, I.M. Mujtaba, M. Barolo, A. Trotta, M.A. Hussain,Neural-Network approach to dynamic optimization of batch distillationapplication to a middle-vessel column, Trans 81 (2003).

10] S.A. Hajimolana, S.M. Tonekabonimoghadam, M.A. Hussain, M.H. Chakrabarti,N.S. Jayakumar, M.A. Hashim, Thermal stress management of a solid oxidefuel cell using neural network predictive control, Energy 62 (2013) 320–329https://doi.org/10.1016/j.energy.2013.08.031.

11] B. He, T. Chen, X. Yang, Root cause analysis in multivariate statistical processmonitoring: integrating reconstruction-based multivariate contributionanalysis with fuzzy-signed directed graphs, Comp. Chem. Eng. 64 (2014)167–177, http://dx.doi.org/10.1016/j.compchemeng.2014.02.014.

12] M.M. Rashid, N.A. Rahim, M.A. Hussain, M.A. Rahman, Analysis andexperimental study of magnetorheological-based damper for semiactivesuspension system using fuzzy hybrids, IEEE Trans. Ind. Appl. 47 (2011)1051–1059, http://dx.doi.org/10.1109/TIA.2010.2103292.

13] S. Kar, S. Das, P.K. Ghosh, Applications of neuro fuzzy systems: a brief reviewand future outline, Appl. Soft Comput. 15 (2014) 243–259, http://dx.doi.org/10.1016/j.asoc.2013.10.014.

14] C.K. Lau, Y.S. Heng, M.A. Hussain, M.I. Mohamad Nor, Fault diagnosis of thepolypropylene production process (UNIPOL PP) using ANFIS, ISA Trans. 49(2010) 559–566, http://dx.doi.org/10.1016/j.isatra.2010.06.007.

15] C. Jing, J. Hou, SVM and PCA based fault classification approaches forcomplicated industrial process, Neurocomputing 167 (2015) 636–642, http://dx.doi.org/10.1016/j.neucom.2015.03.082.

16] C.C. Hsu, M.C. Chen, L.S. Chen, Intelligent ICA-SVM fault detector fornon-Gaussian multivariate process monitoring, Expert Syst Appl. 37 (2010)3264–3273, http://dx.doi.org/10.1016/j.eswa.2009.09.053.

17] J. Zhao, Y. Shu, J. Zhu, Y. Dai, An online fault diagnosis strategy for fulloperating cycles of chemical processes, Ind. Eng. Chem. Res. 53 (2014)5015–5027, http://dx.doi.org/10.1021/ie400660e.

18] A. Widodo, B.-S. Yang, T. Han, Combination of independent componentanalysis and support vector machines for intelligent faults diagnosis ofinduction motors, Expert Syst Appl. 32 (2007) 299–312, http://dx.doi.org/10.1016/j.eswa.2005.11.031.

19] D. Wang, J.A. Romagnoli, Robust multi-scale principal components analysiswith applications to process monitoring, J. Process Control. 15 (2005)869–882, http://dx.doi.org/10.1016/j.jprocont.2005.04.001.

20] X. Deng, X. Tian, S. Chen, C.J. Harris, Fault discriminant enhanced kernelprincipal component analysis incorporating prior fault information formonitoring nonlinear processes, Chemom. Intell. Lab. Syst. 162 (2017) 21–34,http://dx.doi.org/10.1016/j.chemolab.2017.01.001.

21] G. Wang, J. Liu, Y. Li, C. Zhang, Fault diagnosis of chemical processes based onpartitioning PCA and variable reasoning strategy, Chinese J. Chem Eng. 24(2016) 869–880, http://dx.doi.org/10.1016/j.cjche.2016.04.015.

22] Y. Xu, X. Deng, Fault detection of multimode non-Gaussian dynamic processusing dynamic Bayesian independent component analysis, Neurocomputing200 (2016) 70–79, http://dx.doi.org/10.1016/j.neucom.2016.03.015.

23] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-basedGLRT method for fault detection of chemical processes, J. Loss Prev. ProcessInd. 43 (2016) 212–224, http://dx.doi.org/10.1016/j.jlp.2016.05.023.

24] N.M. Nor, M.A. Hussain, C.R. Che Hassan, Process monitoring and faultdetection in non-Linear chemical process based on multi-Scale kernel fisherdiscriminant analysis, Comput. Aided Chem. Eng. 37 (2015) 1823–1828,http://dx.doi.org/10.1016/B978-0-444-63577-8.50149-2.

25] Z. Li, G. Tan, Y. Li, Fault diagnosis based on improved kernel Fisherdiscriminant analysis, J. Softw. 7 (2012) 2657–2662, http://dx.doi.org/10.4304/jsw.7.12.

26] H.-W. Cho, Identification of contributing variables using kernel-based

discriminant modeling and reconstruction, Expert Syst. Appl. 33 (2007)274–285, http://dx.doi.org/10.1016/j.eswa.2006.05.010.

27] Z.B. Zhu, Z.H. Song, Fault diagnosis based on imbalance modified kernel Fisherdiscriminant analysis, Chem. Eng. Res. Des. 88 (2010) 936–951, http://dx.doi.org/10.1016/j.cherd.2010.01.005.

[

puting 61 (2017) 959–972

28] Z.B. Zhu, Z.H. Song, A novel fault diagnosis system using pattern classificationon kernel FDA subspace, Expert Syst Appl. 38 (2011) 6895–6905, http://dx.doi.org/10.1016/j.eswa.2010.12.034.

29] X. Bin He, Y.P. Yang, Y.H. Yang, Fault diagnosis based on variable-weightedkernel Fisher discriminant analysis, Chemom. Intell. Lab. Syst. 93 (2008)27–33, http://dx.doi.org/10.1016/j.chemolab.2008.03.006.

30] C.K. Lau, K. Ghosh, M.A. Hussain, C.R. Che Hassan, Fault diagnosis of tennesseeeastman process with multi-scale PCA and ANFIS, Chemom. Intell. Lab. Syst.120 (2013) 1–14, http://dx.doi.org/10.1016/j.chemolab.2012.10.005.

31] A.H. Zamanian, A. Ohadi, Gear fault diagnosis based on Gaussian correlation ofvibrations signals and wavelet coefficients, Appl. Soft Comput. J. 11 (2011)4807–4819, http://dx.doi.org/10.1016/j.asoc.2011.06.020.

32] A.C. Adewole, R. Tzoneva, Distribution network fault detection and diagnosisusing wavelet energy spectrum entropy and neural networks, Int. Rev. Electr.Eng. 9 (2014) 165–173, http://dx.doi.org/10.1016/j.asoc.2016.05.013.

33] H. Sun, Z. He, Y. Zi, J. Yuan, X. Wang, J. Chen, S. He, Multiwavelet transform andits applications in mechanical fault diagnosis − A review, Mech. Syst. SignalProcess. 43 (2014) 1–24, http://dx.doi.org/10.1016/j.ymssp.2013.09.015.

34] V. Vapnik, Statistical Learning Theory, Springer Science & Business Media,1998.

35] Z. Yin, J. Hou, Recent advances on SVM based fault diagnosis and processmonitoring in complicated industrial processes, Neurocomputing 174 (2016)643–650, http://dx.doi.org/10.1016/j.neucom.2015.09.081.

36] S. Ekici, Support Vector Machines for classification and locating faults ontransmission lines, Appl. Soft Comput. J. 12 (2012) 1650–1658, http://dx.doi.org/10.1016/j.asoc.2012.02.011.

37] K. Sun, G. Li, H. Chen, J. Liu, J. Li, W. Hu, A novel efficient SVM-based faultdiagnosis method for multi-split air conditioning system’s refrigerant chargefault amount, Appl. Therm. Eng. 108 (2016) 989–998, http://dx.doi.org/10.1016/j.applthermaleng.2016.07.109.

38] Ł. Jedlí Nski, J. Jonak, Early fault detection in gearboxes based on supportvector machines and multilayer perceptron with a continuous wavelettransform, Appl. Soft Comput, J. 30 (2015) 636–641, http://dx.doi.org/10.1016/j.asoc.2015.02.015.

39] H. Keskes, A. Braham, Z. Lachiri, Broken rotor bar diagnosis in inductionmachines through stationary wavelet packet transform and multiclasswavelet SVM, Electr. Power Syst. Res. 97 (2013) 151–157, http://dx.doi.org/10.1016/j.epsr.2012.12.013.

40] P. Konar, P. Chattopadhyay, Bearing fault detection of induction motor usingwavelet and Support Vector Machines (SVMs), Appl. Soft Comput. J. 11 (2011)4203–4211, http://dx.doi.org/10.1016/j.asoc.2011.03.014.

41] P.K. Kankar, S.C. Sharma, S.P. Harsha, Fault diagnosis of ball bearings usingcontinuous wavelet transform, Appl. Soft Comput. J. 11 (2011) 2300–2312,http://dx.doi.org/10.1016/j.asoc.2010.08.011.

42] K. Feng, Z. Jiang, W. He, B. Ma, A recognition and novelty detection approachbased on Curvelet transform, nonlinear PCA and SVM with application toindicator diagram diagnosis, Expert Syst, Appl 38 (2011) 12721–12729,http://dx.doi.org/10.1016/j.eswa.2011.04.060.

43] J.-D. Wu, C.-C. Hsu, G.-Z. Wu, Fault gear identification and classification usingdiscrete wavelet transform and adaptive neuro-fuzzy inference, Expert Syst,Appl 36 (2009) 6244–6255, http://dx.doi.org/10.1016/j.eswa.2008.07.023.

44] C.-C. Hsu, M.-C. Chen, L.-S. Chen, Integrating independent component analysisand support vector machine for multivariate process monitoring, Comput.Ind. Eng. 59 (2010) 145–156, http://dx.doi.org/10.1016/j.cie.2010.03.011.

45] E. Dogantekin, A. Dogantekin, D. Avci, An expert system based on GeneralizedDiscriminant Analysis and Wavelet Support Vector Machine for diagnosis ofthyroid diseases, Expert Syst Appl. 38 (2011) 146–150, http://dx.doi.org/10.1016/j.eswa.2010.06.029.

46] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem,Comput. Chem. Eng. 17 (1993) 245–255.

47] L.H. Chiang, E.L. Russell, R.D. Braatz, Fault Detection and Diagnosis inIndustrial Systems, Springer-Verlag, London, 2001.

Comput. Chem. Eng. 19 (1995) 321–331.49] X. Bin He, W. Wang, Y.P. Yang, Y.H. Yang, Variable-weighted Fisher

discriminant analysis for process fault diagnosis, J. Process Control. 19 (2009)923–931, http://dx.doi.org/10.1016/j.jprocont.2008.12.001.