!!! icbm intelligent condition based maintenance thesis

8/2/2019 !!! ICBM Intelligent Condition Based Maintenance Thesis

1/109

UNIVERSITY OF CINCINNATI

Date:___________________

I, _________________________________________________________,

hereby submit this work as part of the requirements for the degree of:

in:

It is entitled:

This work and its defense approved by:

Chair: _______________________________

_______________________________

_______________________________

_______________________________

_______________________________

10/28/05

Ranganath Kothamasu

Doctor of Philosophy

Industrial Engineering

Intelligent Condition Based Maintenance - A Soft Computing

Approach to System Diagnosis and Prognosis

Samuel. H. Huang

Ali Minai

Ernest HallSam Anand


2/109

Intelligent Condition Based Maintenance: A Soft Computing Approach

to System Diagnosis and Prognosis

A Dissertation submitted to the

Dept of Mechanical, Industrial and Nuclear Engineering

University of Cincinnati

In Partial fulfillment of the

Requirements for the degree of

Doctor of Philosophy

2005

By

Ranganath Kothamasu

Committee Members:

Dr. Samuel Huang

Dr. Sam Anand

Dr. Ernest Hall

Dr. Ali Minai


3/109

Abstract

Maintenance is the set of activities performed on a system to sustain it in operable

condition while Condition Based Maintenance (CBM) refers to the practice of triggering

these activities as necessitated by the condition of the target system. CBM thus entails the

process of diagnosis (of the target system) and timely identification of incipient or

existing failures popularly known as Failure Detection and Identification (FDI). FDI has

been given due research focus; however there is a dearth of autonomous yet interactive

decision making tools that would perform diagnosis andprognosis under the precepts of

CBM in a guided environment.

The development of such an architecture along with the tools necessary for

decision making in the realm of condition based maintenance constitute the focus of this

research. The architecture and the tools developed in this research encompass the model

based approach to FDI. These tools are built on Neuro-Fuzzy (NF) paradigms as they

offer many advantages in the form of accuracy, adaptability and lucidity compared to

other parametric and non-parametric approaches. Along with the development of a NF

algorithm, suitable evaluation criteria are also explored and developed to gauge the

applicability and efficiency of the developed models. Intelligent Condition Based

Maintenance (ICBM) thus refers to the creation of adaptive and robust FDI models based

on a model based architecture and their subsequent validation using suitable evaluation

criteria. The efficiency and robustness of these ICBM tools are demonstrated by applying

them in several scenarios Simulated as well as real world.

i


4/109


5/109

Acknowledgements

I am highly indebted to Dr. Samuel Huang for his guidance throughout my

dissertation. He has encouraged and helped me in numerous ways to accomplish this

research effort. I am also grateful to Dr. Sam Anand, Dr. Ernest Hall and Dr. Ali Minai

for being a part of my dissertation committee. Dr. Sam Anand has always been a source

of inspiration and I am grateful to him for giving me this opportunity to do doctoral

research. It was through interactions with Dr. Bruce Shultes that I was able refine my

knowledge in statistical techniques which has been of great help to my research. Dr.

Ernest Hall is an excellent mentor and it was through his critique and recommendations

that I was able to define the scope and content of my dissertation. It was through

discussions with Dr. Ali Minai that I have gained insights into the workings of intelligent

systems which are the primary focus of my dissertation. I would like to specifically

acknowledge my family (parents, my brother and sister in law) who were my primary

drivers and source of inspiration and fortitude from the inception to conclusion of this

research effort. Ms. Vinodha Sadasivam stood by me through thick and thin, and has in

many ways guided this effort to its conclusion. I would also like to thank my colleagues

Mr. Kanthi Muthiah, Mr. Nuo Xu and Mr. Saurabh Dwivedi for their constant support. A

special thanks to Dr. Jun Shi, whose efforts into making my dissertation foolproof have

been a tremendous help.

iii


6/109

Table of Contents

ABSTRACT.................................................................................................................................................. I

ACKNOWLEDGEMENTS ........................................................................................................................III

TABLE OF CONTENTS............................................................................................................................IV

LIST OF FIGURES.....................................................................................................................................VI

LIST OF TABLES.................................................................................................................................... VII

INTRODUCTION........................................................................................................................................ 1

LITERATURE REVIEW............................................................................................................................ 5

2.1.SYSTEM MAINTENANCE PARADIGMS................................................................................................... 52.2.SYSTEM MAINTENANCE -TOOLS AND TECHNIQUES .......................................................................... 10

2.2.1. Reliability Based Maintenance ................................................................................................. 112.2.2. Model Based Approach to FDI ................................................................................................. 122.2.3. Signal Based FDI...................................................................................................................... 142.2.4. Statistical FDI / Maintenance................................................................................................... 16

INTELLIGENT CONDITION BASED MAINTENANCE (ICBM) -CONCEPTUAL

DEVELOPMENT....................................................................................................................................... 17

3.1.ICBMARCHITECTURE DEVELOPMENT ........................................................................................... 173.2.ICBMMODELING PARADIGM......................................................................................................... 20

INTELLIGENT CONDITION BASED MAINTENANCE - MODEL DEVELOPMENT AND

VALIDATION............................................................................................................................................ 23

4.1.ADAPTIVE MAMDANI FUZZY MODEL (AMFM)................................................................................. 244.1.1. Architecture Initialization......................................................................................................... 254.1.2. Rule Tuning............................................................................................................................... 28

4.2.EVALUATION CRITERIA ..................................................................................................................... 314.2.1. Validation of Model Precision .................................................................................................. 314.2.1.1. Function Approximation .................................................................................................................... 334.2.1.2. Classification...................................................................................................................................... 36

4.2.2. Validation of Model Legibility .................................................................................................. 40

INTELLIGENT CONDITION BASED MAINTENANCE - CASE STUDIES .................................... 46

5.1.SPINDLE BEARING FAILURE DIAGNOSIS ............................................................................................ 465.1.1. Model Development & Benchmarking ...................................................................................... 47

iv


7/109

5.1.2. Summary ................................................................................................................................... 535.2.HARD TURNING ................................................................................................................................. 54

5.2.1. Flank Wear and Force Prediction........................................................................................ 555.2.2. Simulated Failures................................................................................................................ 57

5.2.2.1. Bearing Wear ..................................................................................................................................... 585.2.2.2. Fixture Misalignment ......................................................................................................................... 59

5.2.3. Model Development .................................................................................................................. 615.2.4. Model Evaluation...................................................................................................................... 62

5.2.4.1. Tool replacement model..................................................................................................................... 635.2.4.2. Failure Detection and Diagnosis ........................................................................................................ 65

5.2.5. Summary ................................................................................................................................... 675.3.ENGINE DIAGNOSIS............................................................................................................................ 67

5.3.1. Feature Extraction.................................................................................................................... 685.3.2. Data Assimilation...................................................................................................................... 725.3.3. Model Development .................................................................................................................. 72

CONCLUSIONS AND FUTURE RESEARCH....................................................................................... 77

6.1.CONCLUSIONS.................................................................................................................................... 776.2.FUTURE RESEARCH............................................................................................................................ 78

REFERENCES ........................................................................................................................................... 80

APPENDIX I............................................................................................................................................... 88

A1.1.FEATURE EXTRACTION ................................................................................................................... 88

APPENDIX II ............................................................................................................................................. 99

A2.1.MODEL EXTENSION ........................................................................................................................ 99

v


8/109

List of Figures

FIGURE1.1.BATHTUB CURVE DEPICTING FAILURE RATE OF EQUIPMENT (STAMATIS,1995)............................ 3

FIGURE2.1.TAXONOMY OF MAINTENANCE PHILOSOPHIES (KOTHAMASU ET AL,2004) .................................. 6

FIGURE2.2.FLOW OF MODEL BASED APPROACHES (SIMANI ET AL,2003)...................................................... 13

FIGURE3.1.ICBMARCHITECTURE ............................................................................................................... 18

FIGURE4.1.POLYGONAL APPROXIMATION TO COG...................................................................................... 26

FIGURE4.2.ADAPTIVE MAMDANI FUZZY MODEL (AMFM).......................................................................... 28

FIGURE4.3.BOX AND WHISKER PLOT OF MSE VALUES ................................................................................. 36

FIGURE4.4.BOX AND WHISKER PLOT OF MSE VALUES ................................................................................. 38

FIGURE4.7.FINAL MEMBERSHIP FUNCTIONS (A)INPUT1(B)INPUT2 ............................................................. 44

FIGURE5.1.RESPONSE SURFACES (A)REGRESSION (B)NEURALNETWORK(C)ICBM.................................. 49

FIGURE5.2.DISTRIBUTION OF RESPONSES (A)REGRESSION (B)NEURALNETWORK(C)ICBM..................... 50

FIGURE5.3.RESPONSE SURFACES USING CLASSIFICATION SCHEME (A)TRADITIONAL (B)ICBM.................. 51

FIGURE5.4.DECISION BOUNDARIES (A)TRADITIONAL MODEL (B)ICBM MODEL ......................................... 52

FIGURE5.5.BOX PLOT OF PATTERN DISTANCES FROM LINEAR AND ICBM MODELS ...................................... 53

FIGURE5.6.EFFECT OF BEARING WEAR ON CUTTING FORCE (FZ) ................................................................... 58

FIGURE5.7.EFFECT OF MISALIGNMENT IN Y-DIRECTION ............................................................................... 60

FIGURE5.8.FORCE VS TIME (A)FORCE AT DIFFERENT SPEEDS (B)FORCE AT DIFFERENT FEEDS ................... 62

FIGUREA1.1.SPIKES IN THE DATA BELONGING TO STATE X.......................................................................... 95

FIGUREA1.2.SIGNALS WITH VARYING ACTIVE FREQUENCY COMPONENTS .................................................... 97

vi


9/109

List of Tables

TABLE2.1.MAINTENANCE TOOLS AND TECHNIQUES ..................................................................................... 10

TABLE4.1.DEFINITION OF THE TRADITIONAL CRITERIA USED IN MODEL EVALUATION .................................. 32

TABLE4.2.EVALUATION CRITERIA FOR THE FUNCTION APPROXIMATION PROBLEM ...................................... 34

TABLE4.3.CONFIDENCE INTERVALS FROM PAIR-WISE HOCHBERG AND TAMHANE TEST .............................. 34

TABLE4.4.MSE VALUES OF THE MODELS AT DIFFERENT NOISE LEVELS........................................................ 35

TABLE4.5.CLASSES IN THE ECOLI DATASET ................................................................................................. 37

TABLE4.6.AIC VALUES OF NETWORKS DEVELOPED FOR CLASSIFYING THE ECOLI DATASET........................ 37

TABLE4.7.CONFIDENCE INTERVALS FOR DIFFERENCE IN ERROR................................................................... 38

TABLE4.8.ERROR PROPORTIONS WHEN SIMULATED IN NOISY ENVIRONMENT ............................................... 39TABLE4.9.KL DISTANCE MATRIX FORX1 ..................................................................................................... 43

TABLE4.10.KL DISTANCE MATRIX FORX2................................................................................................... 44

TABLE5.1.NIST DATA ON SPINDLE BEARINGS .............................................................................................. 47

TABLE5.2.RANGE OF FEATURES IN THE TRAINING DATASET ......................................................................... 48

TABLE5.3.RANGE OF RESPONSE OF VARIOUS MODELS.................................................................................. 49

TABLE5.4.PATTERN DISTANCES FROM EACH MODEL ................................................................................... 52

TABLE5.5.PRECISION OF MODELS FROM VARIOUS TECHNIQUES ................................................................... 57

TABLE5.6.SAMPLE FALSE ALARM FROM REGRESSION MODEL ...................................................................... 63

TABLE5.7.ACTUAL AMFM AND REGRESSION OUTPUTS ............................................................................... 64

TABLE5.8.AMFM VS. REGRESSION IN PRESENCE OF NOISE (TOOL REPLACEMENT) ...................................... 65

TABLE5.9.AMFM VS. REGRESSION, EXTRAPOLATING IN PRESENCE OF NOISE (TOOL REPLACEMENT) .......... 65

TABLE 5.10.AMFM VS. REGRESSION IN PRESENCE OF NOISE (FAILURE MODE DETERMINATION) ................. 66

TABLE 5.11.AMFM VS. REGRESSION, EXTRAPOLATING IN PRESENCE OF NOISE ........................................... 67

TABLE5.12.FEATURES IN TRAINING DATA SET.............................................................................................. 70

TABLE5.13.FEATURES IN TESTING DATA SET................................................................................................ 71

TABLE5.14.RESULTS FROM MODEL USING 4 RULES ...................................................................................... 74

TABLE5.15.RESULTS FROM THE ENHANCED MODEL ..................................................................................... 76

vii


10/109

Chapter I

Introduction

The oldest and most common maintenance and repair strategy is fix it when it

breaks. The appeal of this approach is that no analysis or planning is required. The

problems however are the reduction in availability and high unscheduled downtime

because of unanticipated breakdowns that affect the overall performance of system.

Availability in this perspective has a serious impact on organizational agility especially

those that implement efficiency improvement strategies such as Just In Time (JIT) and

Material Resource Planning(MRP) (Stamatis, 1995).

Quality is increasingly seen as a motivation for improved maintenance

management as the link between quality, process/equipment control and productivity

improvement becomes increasingly apparent (Ben-Daya & Duffua, 1995). Another

compelling but less addressed justification of maintenance is safety and environmental

preservation which assumes a highly significant role with increase in stringency of safety

and environmental laws. Since operational hazards and accidents lead to enormous legal

expenses, inattention to these issues is no longer affordable (Rao, 1996).

Although the above motivational factors have direct economic impacts, efficient

maintenance on its own has economic objectives (Saranga & Knezevic, 2000). Though

the return on investment is highly dependent on the specific industry and the equipment

involved, a survey (Rao, 1996) states that an investment in monitoring of between

$10,000 and $20,000 dollars results in savings of 500,000 dollars a year. Across many

industries, 15-40% of manufacturing costs are typically attributable to maintenance

1


11/109

activities. In the current competitive marketplace, maintenance management plays an

increasingly important role in combating competition by reducing equipment downtime

and associated costs and unscheduled disruptions (Abdulnour et al, 1995).

These insights instigated the development of various paradigms like Total

Productive maintenance (Nakajima, 1998) which aims at maximizing equipment

efficiency and Terotechnology (Husband, 1978) which offers a much broader perspective

including the supply (to the system), engineering and market modules of a system. These

paradigms prescribe predictive maintenance over reactive or a simple time-based

maintenance.

Condition Based Maintenance (CBM) has evolved from these above practices and

it aims at continuous monitoring/assessment of the target system and development of a

maintenance strategy based on the assessed condition. CBM offers many advantages over

a traditional time based strategy that typically is modeled around the popular bath-tub

curve depicted in figure 1.1. Time based maintenance tends to be too conservative

resulting in very high maintenance costs. The bath-tub curve fails to acknowledge the

complex interactions between the different components of a system and is especially not

suited to discrete manufacturing systems with frequent changes in work content and

schedule. CBM on the other hand is highly generic and can be used to generate efficient

maintenance strategies.

CBM being a proactive process requires the development of a predictive model

that can trigger the alarm for maintenance. In many instances this model could be loosely

based on analytical criteria developed on the signals collected from the system. In a much

more sophisticated form it would necessitate the development of prognostic and

2


12/109

diagnostic models that can predict the future state of a system besides diagnosing the

current state. Such models can be developed using the process data, its history and

several other factors like future schedule via the modeling techniques belonging to the

parametric or non-parametric literature. These models have to be precise and robust

besides possessing some form of autonomous modeling capabilities.

Figure1.1. Bathtub curve depicting failure rate of equipment (Stamatis, 1995)

In this research the focus is on developing soft computing techniques with the

above qualities. The objective also included the development of a humane modeling

system that can assume the role of a decision making aid in the CBM arena. The

motivation was that such systems can be subjected to continuous improvement (or plain

modification) by interacting with the users of such systems. Neuro-Fuzzy models were

found to posses these qualities and hence the research aim is to develop robust yet lucid

proactive models under the neuro-fuzzy constructs. The following constitute the specific

objectives and deliverables of this research.

3


13/109

1. Development of a generic architecture or algorithmic approach to Condition

Based Maintenance systems in a manufacturing setup.

2. Development of an accurate, robust and adaptive algorithm that can create

transparent models.

3. Development of suitable evaluation and validation criteria for these models.

4. Development of an easy to use software system that incorporates the above

algorithms and can function as an aid to the decision making process in

maintenance scenarios.

5. Demonstration of the utility of this software system (and algorithms) by applying

it to maintenance scenarios.

4


14/109

Chapter II

Literature Review

System maintenance in a multitude of forms has received due focus ever since

mass manufacturing has been adopted. Several principles and several tools ranging from

expensive hardware to smart software systems have been created in order to establish,

automate and execute the different tasks involved in this domain. This section reviews the

various maintenance paradigms and then delves into the different tools and techniques

used to generate maintenance solutions.

2.1. System Maintenance Paradigms

System maintenance over the years has significantly evolved in terms of

governing philosophy, implementation, technology, analytical techniques and objectives.

This evolution has an interesting chronological perspective as elaborated by Kinclaid

(Kinclaid, 1987). A brief taxonomy of the various philosophies is given in figure 2.1.

Maintenance philosophies can be broadly classified as reactive and proactive.

Reactive orUnplannedmaintenance is a legacy practice where maintenance is done only

after the manifestation of the defect, breakdown or stoppage. It is appropriate in facilities

where the installed machinery is minimal and the plant is not totally dependent on the

reliability of any individual machine (Jones, 1995). It might also be appropriate when the

failure rate is minimal and failure does not result in serious cost setbacks or safety

consequence. Breakdown or Corrective maintenance and Emergency maintenance

belongs to this category.

5


15/109

a) Corrective maintenance is defined as the activity carried out after a failure has

occurred and is intended to restore an item to a state in which it can perform its

required function (Knezevic, 1987; Saranga & Knezevic, 2000; Goplan & Kumar,

1995).

b) Emergency maintenance is defined as the maintenance activity that is necessary to

accomplish immediately to avoid serious consequences. Constraints are applied

on the frequency of maintenance with the object of cost-wise optimization. These

constraints are defined in terms of the immediacy of the required action and the

possible repercussions of non-maintenance.

Figure2.1. Taxonomy of maintenance philosophies (Kothamasu et al, 2004)

Proactive or Planned maintenance on the other hand executes the necessary tasks

prior to any breakdown and it can be further classified as preventive and predictive

maintenance based on the form of maintenance schedule. In many situations, better

utilization of resources is seen compared to reactive strategies (Mobley, 1990).

6


16/109

Preventive maintenance is the strategy organized to perform maintenance at

predetermined intervals to reduce the probability of failure or performance degradation. It

can be classified into constant interval, age based or imperfect maintenance.

a) Constant interval maintenance: As the name suggests it is done at fixed intervals

(in addition to any maintenance prompted by failure). Intervals are selected to

balance high risk of failure with long intervals and high preventive maintenance

costs with short intervals (Jardine, 1987).

b) Age based maintenance: In this strategy, preventive maintenance at fixed intervals

is carried out only after the system has reached a specific age, say t. If the

system fails prior to t, maintenance action is taken and the next maintenance is

scheduled to tunits later. By deferring initiation, this strategy reduces the number

of maintenance intervals compared to constant interval maintenance.

c) Imperfect maintenance: In the above two schemes, the system is assumed to be

restored to its original condition after a preventive maintenance. However it may

be the case that the condition of the system is in between good (original) and bad

(failure). This is the premise of imperfect maintenance strategies which take into

consideration the uncertainty of the current state of the equipment while

scheduling future activities.

The predetermined interval is estimated from the failure rate distribution that is

constructed from historical data extracted from the system or provided by the supplier of

individual components in the system. The estimation of distribution and the interval

determination are extensively researched in (Rao, 1992).

7


17/109

Predictive and preventive maintenance differ in the scheduling of maintenance. In

the latter it is performed on a fixed schedule whereas in the former it is adaptively

determined.Predictive maintenance can be classified into Condition Based Maintenance

andReliability Centered Maintenance.

a) Condition Based Maintenance (CBM): This is a decision making strategy where

the decision to perform maintenance is reached by observing the condition of

the system and/or its components. The condition of a system is quantified by

parameters that are continuously monitored and are system or application specific.

For instance, in the case of rotary systems a vibration characteristic or index is an

appropriate choice. The advantage of this approach is immediately apparent as the

decision is made on depictive and corroborative data that actually reflects the state

of the system. It is highly presumptive to assume that the state of a system would

always follow the same operational curve, which is the underlying assumption in

preventive maintenance. In an industrial or production environment, the system is

exposed to random disturbances, which cause deviations in the operational

characteristics. Hence it is highly justified to monitor the condition of system and

base the maintenance decision on the state of the system.

Some of the advantages of CBM are prior warning of impending failure and

increased precision in failure prediction. It is also aids in diagnostic procedures as

it is relatively easy to associate the failure to specific components through the

monitored parameters. It also can be linked to adaptive control thus facilitating

process optimization. The disadvantage, of course, is the necessity to install and

8


18/109

use monitoring equipment and to develop some level of modeling or decision-

making strategy.

b) Reliability Centered Maintenance (RCM): This approach utilizes reliability

estimates of the system to formulate a cost-effective schedule for maintenance

RCM was originally developed in the aircraft industry. For aircraft and other

safety-related applications, cost-effectiveness is balanced with safety and

availability with the goal of minimizing costs and downtime but eliminating the

chance of a failure (Moss, 1985). RCM is a union of two tasks, one of which is to

analyze and categorize failure modes based on the effects of the failure on the

system and the other is to assess the impact of maintenance schedules on

reliability. The failure analysis starts with the identification of all the failure

modes and proceeds with categorization of these failure modes based on the

consequences of each failure. The results of this study comprise a Failure Modes

and Effects Analysis (FMEA).

Usually the consequences of failure are Operational, Environmental/Safety or

Economic (Rao, 1996). Once the effects have been identified, the decision logic

algorithms prioritize the effects. These algorithms tend to be industry specific as

the constraints and requirements of each industry vary considerably.

Though RCM-based maintenance intervals were determined similarly to planned

or scheduled maintenance, condition monitoring techniques are increasingly being used

to determine the optimum interval (Kumar & Granholm, 1990; Sandtorv 1991). Hence

though originally a preventive maintenance technique, RCM is graduating into predictive

9


19/109

maintenance. A good introduction to RCM is given in (Moubray, 1997; Wireman, 1998;

Monderres, 1993; Jones, 1995).

2.2. System Maintenance - Tools and Techniques

Maintaining the health of a system is a complex task that requires in-depth

analysis of the target system, principles involved, their applicability and the

implementation strategies. Table 2.1 below lists methods, analysis and modeling tools,

and techniques (for data/condition extraction).

Table2.1. Maintenance tools and techniques

Methods ToolsMeasurementTechniques

Reliability basedmaintenance

Parameter estimation techniques Numerical analysis techniques Markov chains

Model Based FDI

State space parameter estimation Artificial neural networks Knowledge based systems Fuzzy inference systems Neuro-Fuzzy systems

Signal Based FDI

Fourier analysis Wavelet analysis Wigner-Ville analysis Diagnostic parameter analysis

Statistical FDI /Maintenance

Bayesian estimation / reasoningtechniques

Markov chains Hidden Markov models Proportional Hazards models

Vibration analysis Thermography Acoustic emission

Wear/debrismonitoring

Lubricant analysis Process

measurements

However, it has to be noted that most applications are a combination of the listed

methods and techniques (tools) and the list is far from exhaustive. For instance because

of their generalized applicability, parameter estimation techniques such as regression,

maximum likelihoodand expectation maximization can be used in all the listed categories.

10


20/109

There is also a close association between reliability based maintenance and statistical

maintenance techniques. A high level explanation of these methods is given in this

section.

2.2.1. Reliability Based Maintenance

A popular approach to the maintenance of complex systems is through estimating

the reliability of the system. Traditionally, reliability is estimated from the time-to-failure

distributions of the system. The most striking drawback of such an approach is that

multiple failure mechanisms often interact with each other in perhaps unknown ways and

this affects the degradation rate of the system, causing it to deviate considerably from the

predicted failure distribution. An alternative approach much similar to condition based

maintenance has been proposed by Knezevic (1987) known as the Relevant Condition

Parameter (RCP) based approach. This approach is based on identifying RCPs (defined

for a process) that quantify or reflect a particular failure mechanism. Using these RCPs

the reliability of a system is defined as the probability that RCP lies within prescribed

limits as given in (2.1).

)1.2()RCP)t(RCPRCP(P)t(R limkin

inRCP is the initial state of the system and is the limiting value where the

system inevitably fails. When the failure mechanisms are dependent, it is possible to

model the system using Markov chains as shown in (Saranga & Knezevic, 2000). Once

the Markov chain is formulated representing the different states of the system, the

probability of the system being in the upstate A(t) can be calculated as a sequence of

integrals of the form given in (2.2) (Gopalan & Kumar 1995).

limRCP

11


21/109

)2.2()u(g)u(A)xt(w)t(A

t

0

These integrals are further solved by using quadrature techniques such as the trapezoidal

approximation.

2.2.2. Model Based Approach to FDI

The model based approach to FDIis based on analytical redundancy or functional

redundancy, meaning dissimilar signals are compared and evaluated to identify the

existing faults in the system or its components. This comparison is between the measured

signal and the estimated values generated by the mathematical model of the system.

Figure 2.2 gives a general structure of model-based approaches.

Residual generation is the most important element of a model-based approach and

the techniques involved in model based diagnosis differ in the generation and definition

of a residual. For instance in some cases it is the discrepancy of output (from the system)

estimation and in some cases it is the deviation of the systems parameters from their

expected values. It is imperative that the generated residual be dependent only on the

faults in the system and not on its operating state. Several techniques that have been

proposed in the literature for this residual generation are a modification or improvement

of the following three principles.

Observer-Based approaches (Beard, 1971; Ding & Frank, 1990; Patton & Chen,

1997; Wilsky 1976).

Parameter estimation technique (Kiramura, 1980; Isermann, 1993).

Parity space approach (Chow & Wilsky, 1984; Deckert et al 1977).

12


22/109

Figure2.2. Flow of model based approaches (Simani et al, 2003)

Observer-based approaches rely on estimating the outputs from either

Luenberger observers or Kalman filters (Simani et al, 2003). The approach is centered on

the idea that the state estimation error is zero in a fault free environment and it is not so

otherwise. Dedicated Observer, Fault Detection Filters and Output Observers are the

three important subcategories that fall under this approach.

Parameter estimation techniques analyze the failure mechanisms based on their

influence on the system parameters (of the model). Hence this approach is centered on

generating online estimates of the parameters and analyzing the changes in the estimates.

In the Equation Error methods which analyze the parameters directly, least square

estimation is quite often used; in the Output Errormethods which compute the error in

the output numerical optimization techniques are often used.

Parity Space Relations check for parity of the measurements from the process,

generating a residual by comparing the model and the process behavior. This approach

13


23/109

has been shown to be in close correlation with the observer-based techniques (Patton &

Chen, 1994).

As stated before the model-based FDI approaches are based on identifying

(constructing) models that mimic the system. These models have to be extremely robust

to real world nuances such as noise, etc. and to be effective the model based FDI should

learn to differentiate between these uncertainties and the changes due to failures. Another

challenge is to identify not just the existing faults but the incipient faults which may not

(yet) significantly affect the system.

2.2.3. Signal Based FDI

Signal-based FDI approaches focus on detecting the changes or variations in a

signal and subsequently diagnosing (identifying) the change. Change detection in a

system has been extensively explored in the literature and there are quite a few effective

techniques that have integrated various ideas from parametric modeling principles (in

statistics) with signal-based principles such as spectral analysis. A good summary is

given in (Basseville, 1988). Some of the techniques are formulated around model-based

approaches, i.e., generation of residuals (deviation from nominal signals) and diagnosis

of the residuals. Some of the detection algorithms are modeled in the form of hypothesis

testing involving a change (or jump) in the mean (known or unknown) such as the

Generalized Likelihood ratio test and the Page-Hinkley stopping rule. Some real time

algorithms are based on computing distance measures between local and global models

(differentiated based on their time windows) and some popular measures are the

Euclidean distance between AR (Auto Regressive) coefficients, Cepstral distance,

Chernoff distance etc.

14


24/109

In recent years non-stationary signals are modeled using wavelets instead of

Fourier transforms because wavelets are scale and time variant. Two of the important

uses of wavelets to FDI are data compression and feature extraction (Staszewski, 1998).

Data Compression as the name suggests refers to encoding the data (like a vibration

signal) in a compressed form and feature selection on the other hand is identifying

features within these encoded signals that would help identify the faults in the monitored

systems. Once wavelet transform is applied to the signal output, the coefficients are

analyzed for any variation from the normal signal. The identification of coefficients that

would substantiate a failure is a painstaking procedure though recently some techniques

such as genetic algorithms are being employed. These wavelets are predominantly used

for FDI in gears, as vibration analysis is quite effective for these domains (McFadden,

1994; Staszewski & Tomlinson, 1994).

Time-frequency analysis using Wigner-Ville Distribution (WVD) has proven to

be another effective tool for vibration analysis. It has proven to be quite effective in

situations where neither the time domain nor frequency domains can produce significant

patterns (Staszewski et al, 1997). The contour plots generated by WVD are visually

inspected for the failure features that indicate its progression and existence. Often these

plots are analyzed with the help of classification algorithms ranging from parametric

(statistical) to soft computing (neural networks, fuzzy inference systems).

Detection signaltechniques are also used for FDI, where a detection signal is used

as an input to the system for a specific period of time and the diagnosis is based on the

behavior of the system during this period. Some interesting theories in the design and

implementation of detection signals are given in (Nikoukhah et al, 2000; Zhang 1989;

15


25/109

Kerestecioglu 1993; Kerestecioglu & Zarrop, 1994; Uosaki et al, 1984; Nikoukhah,

1998).

2.2.4. Statistical FDI / Maintenance

A vast number of applications also use Bayesian statistics and Bayesian parameter

estimation for FDI (Berec, 1998; Won & Modarres, 1998; Wu et al, 2001; Leung &

Ramanougli, 2000; Ray et al, 2001). Another important aspect is to identify the detection

(inspection) intervals, optimization of cost and replacement decision-making. Markov

chains seem to be increasingly used for optimizing maintenance strategies and some

algorithms are given in (Wang & Shueng, 2003; Hassan et al, 2002; Hassan et al, 2000;

Zhang & Zhao, 1999). Another interesting application is given by Bunks et al (2000)

using hidden Markov models.

Proportional Hazards Modeling (PHM) has also been used for reliability

estimation and estimation of effects on failure rate ever since they were used by Feigl and

Zelen (1965). Some interesting theories and applications related using PHM are reported

in (Jardine 1987; Kobaccy et al, 1997; Pena & Hollander, 1995).

16


26/109

Chapter III

Intelligent Condition Based Maintenance (ICBM) -

Conceptual Development

It is evident from the current state of the art in system maintenance applications that

Condition Base Maintenance (CBM) is highly generalized and efficient paradigm to

generate maintenance solutions. It is possible to create an architecture that can be used to

generate maintenance solutions using this paradigm. It also possible to achieve a

seamless integration of model based FDI algorithms with this architecture in order to

create decision tools that aid users in their maintenance applications. This section deals

with the creation of this architecture and also integration with model based approach.

Intelligent Condition Based Maintenance (ICBM) as such is the application of model

based FDI using this generic architecture.

3.1. ICBM Architecture Development

The ICBM architecture specifies a methodical approach to model building for

maintenance applications. It specifies the various elements of model building from data

acquisition to model deployment. This architecture is depicted in figure 3.1.

The following are the various modules incorporated into the architecture.

1. Data Acquisition

2. Data Conditioning and Feature Extraction

3. Model Generation

4. Model Deployment

17


27/109

Figure3.1. ICBM Architecture

Data Acquisition as the name suggests is the process of acquiring data from the

target system and its environment. This data includes information extracted from the

process, sensors monitoring the process and the environment. These are the typical forms

of data that go into the model building process.

Data Conditioningis an essential element as raw data is seldom useful in its own

form especially if it comes from a sensor used in vibration or acoustic emission

application. Data conditioning comprises of all the pre-processing typical to model

generation process in any domain.Feature extraction on the other hand can be thought of

as the extraction of useful information from the conditioned data which reflects the

condition of the target system. It is not essential for the features to directly correlate to

the condition and they could even be a quantification of documented or known effects of

18


28/109

the failures of interest. Several features can be extracted from the domains of signal

processing, time series analysis and diagnostic analysis. Some of these features are

documented in Appendix I.

Model Generation refers to the actual process of building a predictive model from

extracted data. This model would take the extracted features as inputs and give out

relevant outputs based on whether it is a prognostic or diagnostic application. A

diagnostic model would give information on the existence of any failure along with its

type. A prognostic model would give information on the expected state of system as

reflected by the conditionparameters (features) or some other specified output typical of

the application. For instance it is typical to use indicators such as RMS (Root Mean

Square) and Kurtosis in bearing wear applications. So in such an application, the

diagnostic model would detect bearing failures while prognostic model could estimate the

future RMS or Kurtosis values. The inputs to the model generation procedure can be

categorized as below.

Health Condition Indicator (as depicted by features)

Process Condition Indicator (as depicted by the process data)

Sampling/Prediction Indicator (as specified in the feature extraction process)

Model Robustness Indicator, although not an input in the conventional sense, it can

be used to trigger the necessity to retrain / improve / extend an existing model which

is very essential in real time application

As is depicted in the architecture, the domain expertplays a significant role in the

process of feature extraction and some parts of the model building process. The experts

knowledge can efficiently used to identify the right set of features as well as generate the

19


29/109

necessary knowledge to create the models. This approach is consistent with the objective

to create a decision aid with certain amount of autonomy.

Model Deploymentrefers to the process of integrating the created model into the

monitoring system as well as establishing the proper channels of communication with the

various business functions like maintenance, production planning & control, quality

control etc. The two outputs of model generation process namely the modelitself and the

process knowledge are to be systematically integrated with these business functions.

Although all the modules pose interesting challenges this research effort is

concentrated on the model generation process and the various algorithms directly related

to the model generation process like its validation and evaluation.

3.2. ICBM Modeling Paradigm

As mentioned in the previous section, this research concentrates on the model

generation process and in this section we identify the modeling paradigm. As can be seen

from the architecture, the ICBM models belong to two classes of learning problems

function approximation and classification. Diagnosis is essentially a classification

problem while prognosis is akin to function approximation.

These two classes of learning problems have been extensively studied for long

time in both the parametric and non-parametric arena and there are a multitude of

paradigms that address the issues involved. We start by listing the desirable

characteristics of an ICBM modeling algorithm.

Adaptive: The algorithm should be adaptable as it functions in a highly dynamic and

non-linear environment.

20


30/109

Flexible: The algorithm should be as generic as the architecture and should be a

universal approximator. It should also be flexible enough to incorporate various

forms of knowledge data, heuristic and analytical as provided by the domain

expert.

Lucid: The algorithm should be able to create models that are highly transparent.

This is essential as the model has to act as a decision aid and should also generate

useful and intelligible knowledge about the process. This is also essential as the

domain expert is tightly integrated with the model building process.

Robust: The algorithm should be able to create models that are robust to handle the

demands of a real time algorithm such as noise handling capabilities and

dimensionality.

As mentioned previously several algorithms exist in the literature of the two

classes of learning problems. However it is necessary to identify the algorithm (or its

class) that possesses the desirable characteristics outlined above.

Parametric estimation methods are in general highly robust and theoretically can

estimate any system to the required accuracy. However they require assumptions on the

distribution of some modeling elements for instance the data or the error. Several

methods such as transformation techniques do exist but they are not amenable to

automation. As stated in the statistical learning theory, it is also not advise-able to

approach to a solution via solving a harder problem such as density estimation

(Cherkassky & Mulier, 1998).

Non-Parametric estimation methods such as neural networks do not require any

such assumptions and are capable of approximating any domain of problems. They are

21


31/109

also equipped with autonomous learning algorithms that can automatically retrain or

regenerate maintenance solution if necessitated. However, they tend to be quite opaque

(black box) and it is not often possible to generate any qualitative knowledge about the

approximated system. This is major hindrance to establishment of any form of knowledge

transfer between the domain expert and the approximating system.

Soft computing algorithms also include Fuzzy Inference Systems (FIS) that can

be efficiently used as a bridge between the domain expert and the ICBM architecture.

These systems work on knowledge bases that are in easily comprehensible IFTHEN

format. However, this particular class of algorithms do not possess any form of

automated learning, hence require considerable amount of manual tuning in the

generation of the solution.

Neuro-Fuzzy algorithms are an assimilation of these two forms of approximating

algorithms and are able to annul the disadvantages of the respective parts. These

algorithms are particularly adaptive, lucid and highly flexible. As they are essentially

fuzzy inference systems embedded into a neural network they are also robust. It is also

easy for a domain expert to interact with these algorithms. Since the knowledge is both in

a functional form (network) and generalized form (rule base), it is possible to integrate

with the other business functions mentioned above.

Based on the above discussion it is evident that neuro-fuzzy systems possess the

necessary qualities listed in the beginning of this section. In the following sections we

develop a neuro-fuzzy algorithm that meets the requirements.

22


32/109

Chapter IV

Intelligent Condition Based Maintenance - Model

Development and Validation

As observed in the previous section, neuro-fuzzy systems are ideal candidates to

fulfill ICBM objectives. Neuro-fuzzy systems have also been used for control system and

FDI application successfully (Simani et al, 2003). However it has to be noted that these

applications fulfill or aim only at function approximation and this actually beats the

purpose of using neuro-fuzzy tools.

Adaptive Neuro Fuzzy Inference System (ANFIS) (Jang, 1993) and Hybrid Fuzzy

Inference System (HyFIS) (Kim & Kasabov, 1999) are the two most popular neuro-fuzzy

connectionist systems that simulate a Sugeno and aMamdani type FIS respectively. Both

the algorithms have been validated on various datasets and were shown to possess good

accuracy. However, they are not without their drawbacks in the ICBM context as

elucidated below.

Consider a domain described by a function ( )21,xxfy =

A mamdani type FIS in this domain would consists of rules of the form given below:

IF x1 is low AND x2 is medium THEN y is high

Where low, medium and high are linguistic terms with functional forms like gaussian,

sigmoid etc also known as membership functions.

A sugeno type FIS in this domain would consist of rules of the form

IF x1 is low AND x2 is med THEN ( )211 ,xxfy =

23


33/109

where low and medium are linguistic terms with functional context. The difference

between the two FIS is the form of consequents. In mamdani type FIS the output

membership function can be defined independent of the premise parameters while in

sugeno type FIS each output membership function is a function of the inputs.

ANFIS mimics a sugeno type FIS and from the above it can be seen that it is

efficient for function approximation problems and is not particular useful in classification

applications. Hence it is not appropriate for diagnosis applications and the knowledge

(rules) it extracts would be abstract for a domain expert as they are not entirely in a

linguistic format.

HyFIS on the other hand simulates a mamdani type FIS which is universally

applicable and hence can be used for prognosis as well as diagnosis applications.

However, it uses a defuzzification (process of generating crisp outputs from fuzzy

outputs) strategy that restricts the output membership functions to assume a gaussian

functional form (with centre and variance parameters).Although this does not hamper its

ability to generate maintenance solutions, it is not possible for a domain expert to interact

with the model in all situations (for instance, when output membership functions are non

gaussian)

The aforementioned reasons have provided the motivation to formulate an easily

comprehensible neuro-fuzzy system. The next section elaborates such a neuro-fuzzy

system including its architecture and learning process.

4.1. Adaptive Mamdani Fuzzy Model (AMFM)

Adaptive Mamdani Fuzzy Model (AMFM) is a neuro-fuzzy algorithm that

simulates the mamdani type fuzzy inference system. The development of a diagnosis or

24


34/109

prognosis model using AMFM consists of two main tasks Architecture Initialization

andRule Tuning.

4.1.1. Architecture Initialization

Typically a neuro-fuzzy system consists of five layers of neurons with each layer

(subsequent to the input layer) performing the following four operations of a fuzzy

inference system.

Fuzzification

Implication

Aggregation

Defuzzification

Most of the neuro-fuzzy systems differ in the final layer which corresponds to the

defuzzification operation. Centre of Gravity (COG) is the most popular defuzzification

operation and since its computation is analytically intractable, the connectionist systems

approximate the COG with an easily computable function. In AMFM, the COG is

approximated with the COG of the maximum polygonal area contained within the fuzzy

output area as shown in figure 4.1.

25


35/109

Figure4.1. Polygonal approximation to COG

A typical AMFM structure would be as depicted in figure 4.2. The first layer

represents the input parameters. LetNbe the number of input parameters, then the first

layer will haveNnodes. LetMn denote the number of linguistic terms of input parameter

n, n = 1, 2 N. Then the total number of nodes in the second layer,I, will be . A

node n in the first layer is connected to onlyM

=

N

n

nM1

n nodes in the second layer that represents

its corresponding linguistic terms. It merely passes the input value xn to the connected

second layer nodes. A node i in the second layer has a Gaussian activation

function

2

)2(

)2()2(

2

1

)2(

= iii cx

i ey

. The center c and spread can be initialized using the

mean and standard deviation of the input parameter values within the cluster or interval

that represents the particular linguistic term.

)2(

i

)2(

i

The number of third layer nodesJ, equals to the number of rules. Each node, with

its connections from the preceding nodes, represents a rule. Note that different nodes in

26


36/109

this layer might represent the same concept (linguistic term of an output parameter). For

example, in figure 4.2, say the input parameterx1 has three linguistic terms, small,

medium, and large, which corresponds to the first three nodes in the second layer. The

input parameterxn has two linguistic terms, small and large, which corresponds to the last

two nodes in the second layer. The first two nodes in the third parameter both represent

the concept outputy1 is small. Then the first and second nodes in the third layer and

their connections from the second layer nodes represent the rules IFx1 is small andxn is

small THENy1 is small and IF x1 is medium andxn is large THEN y1 is small. The

activation function of third layer nodes is the minimum operation. For instance in figure

4.2, .},min{ )2( 1)2(

1

)3(

1 = Iyyy

The fourth layer nodes represent the output linguistic terms. Unlike the third

layer, each node represents a distinct concept. Therefore, the number of nodes,K, equals

to the total number of output parameter linguistic terms. Each node is connected to the

preceding layer nodes that represent the same concept. Its activation function is the

maximum operation. For example, in figure 4.2, the first node in the fourth layer

represents the concept outputy1 is small. It is connected to the first two nodes in the

third layer, which represent the same concept. We have . Each

fourth layer node also maintains a Gaussian membership function with center c and

spread (k = 1, 2 K), which are initialized in the same way as those of input

linguistic terms. Note that this Gaussian function is not an activation function. It is

transmitted to a fifth layer node for defuzzification.

},max{ )3(2)3(

1)4(

1 yyy =

)4(

k

)4(

k

The fifth layer represents the output parameters. Let L be the number of output

parameters, then the fifth layer will haveL nodes. Let Ol denote the number of linguistic

27


37/109

terms of output parameter l, l = 1, 2 L. A node l in the fifth layer will have Ol

incoming connections from forth layer nodes that correspond to its linguistic terms. For

example, in figure 4.2, the output parametery1 has two linguistic terms, namely, small

and large, which are represented by the first and the second nodes in the forth layer,

respectively. Therefore, these two forth layer nodes are connected to the first node is the

fifth layer, which represents the output parametery1. A node in the fifth layer performs

defuzzification using a modified centroid of area method. Specifically, instead of

considering the entire area under the Gaussian curves when calculating the gravity center,

we consider only the rectangular part of the area.

Figure4.2. Adaptive Mamdani Fuzzy Model (AMFM)

4.1.2. Rule Tuning

The tunable parameters in AMFM are the centers and spreads of the Gaussian

membership functions. These include those for the input parameters, namely, and

(i = 1, 2, , I), in the second layer; and those for the output parameters, namely,

and (k= 1, 2, , K), in the forth layer. The tuning process is based on error

)2(

ic

)2(

i

)4(

kc)4(

k

28


38/109

backpropagation and gradient descent search. For a particular input vector [x1x2

]NxT, let the desired output vector be [d1d2 ]Ld

T, and the AMFM output vector be [y1

y2 ]LyT. Then, the error can be calculated as in (4.1)

'k

y

y

)4(

'k

)4(

'k

E)4(

k

=

ln2

(

k

c

(

k

( ) )1.4(yd2

1E

L

1l

2

ll=

The error signal at the fifth layer can be calculated as in (4.2).

( ) )2.4(ydy

Ell

l

)5(

l

Using the chain rule, we can calculate the error signal for and , when the kth

node in the fourth layer is connected to the lth node in the fifth layer, as in (4.3) and (4.4).

)4(

kc)4(

k

)3.4(yln2y

yln2y-

c

y

y

E

c

E

)l(

)4(

'k

)4(

'k

)4(

'k

)4(

k

)4(

k

)4(

k)5(

l)4(

k

l

l

)4(

k

=

=

)4.4(yln2

yln2cyln2y

-y

y

E

)l('k

)4(

'k

)4(

'k

)l('k

)4(

'k

)4(

'k

)4(

'k

)4(

'k

)l('k

)4(

'k

)4(

'k

)5(

l)4(

k

l

l

In which )4()4()4( ln2 kkk ycy = ,)4()4(

kk yy = and )(l denotes the set of

fourth layer nodes that are connected to lth node in the fifth layer.

Now, c and can be updated as in (4.5) and (4.6), where, is a positive constant

(the learning rate).

)4 )4(

k

)5.4(yc

Ec )4(k)4(

k

)4(

k

)4(

k

)6.4(yE )4(

k)4(

k

)4)4(

k

29


39/109

To adjust c and , the error signals need to be propagated backward to the second

layer. The error signal at the forth layer is calculated as in (4.7).

)2(

i

)2(

i

)7.4(

y

)1

(cy)1

(cy

y

y

y

y

y

E

y

E

2

)l(k

'k

)4(

'k

)4(

'k

k

k

)4(

k

)l(k

'k

)4(

'k

)4(

'k

)4(

'k

k

k

)4(

k

)4(

k

)l(k

'k

)4(

'k

)4(

'k

)4(k

)5(

l)5(

l)4(k

)5(

l

)5(l

)4(k

)4(

k

'

''

=

=

=

in which )4(ln2 kk y= .

The error signal at the third layer is calculated as in (4.8)

)8.4()k(jotherwise,0

yyif, )4(k)3(

j

)4(

k)3(

j =

in which denotes the set of third layer nodes that are connected to kth node in the

forth layer.

)(k

The error signal at the second layer is calculated as in (4.9).

)9.4()j(iotherwise,0

yyif, )3(j)2(

i

)3(

j)2(

i =

in which denotes the set of second layer nodes that are connected tojth node in the

third layer.

)( j

Now we can calculate the error signal forc and as in (4.10) and (4.11).)2(i)2(

i

( )( ) )10.4(

cxy

c

y

y

E

c

E2)2(

i

)2(

i

)2(

i)2(

i

)2(

i)2(

i

)2(

i

)2(

i

)2(

i

=

( )( ) )11.4(

cxy

y

y

EE3)2(

i

2)2(

i

)2(

i)2(

i

)2(

i)2(

i

)2(

i

)2(

i

)2(

i

=

30


40/109

Hence, and can be updated as in (4.12) and (4.13))2(ic)2(

i

)12.4(xc

Ecc )2(i)2(

i

)2(

i

)2(

i

)13.4(xE )2(

i)2(

i

)2(

i

)2(

i

4.2. Evaluation Criteria

The previous sections detailed the AMFM algorithm that can be used to associate

the inputs with the outputs. However, just like any other approximation algorithm, it is

quite essential to validate the developed model. The validation is done at two levels

precision of the model and the legibility of the model. In the following sections these

aspects of model validation are explored.

4.2.1. Validation of Model Precision

As mentioned previously, model evaluation is done with the goal of selecting the

best available model for the given dataset. Traditionally criteria like SSE, MSE, MAP,

R2, R2(adj), PRESS are used to validate a model. The usual approach is to split the

available data into learningand validation sets [58]. The algorithms are supplied with the

learning sets to create the model and are later validated with the above-mentioned criteria

on the validation dataset. The definition of the above parameters is given in table 4.1

(Kothamasu et al, 2004).

The quality of the model is in inverse proportion to the magnitude of the first

three criteria and also to the deviation of the last two criteria from 1. However it has to

be mentioned that these criteria are not appropriate for model selection in all situations.

31


41/109

For instance, R2

should not be used for comparison of modeling algorithms that do no

satisfy the criteria that and0ei = 0ey ii = where ei are the corresponding residuals.

This is especially true in the case of neuro-fuzzy modeling where situations with R2

greater than 1 are often encountered.

Table4.1. Definition of the traditional criteria used in model evaluation

Criteria Definition

SSE (Sum of Squared Error) =

n

1i

2ii )yy(

MSE (Mean Squared Error) =

n

1i

2ii )yy(1n

1

MAP (Mean Absolute Percenterror)

=

n

1i

iii |y/)yy(|n

100

R2

=

=

n

i

i

n

i

i

yy

yy

1

2

1

2

)(

)(

R2

(adj)

=

1

12

)(

nSST

knSSE

Radj

parametersof#k

Patternsof#n

OutputsPredictedy

OutputsActualy

i

i

Apart from the above mentioned restrictions these criteria do not explicitly take

into account the underlying dimensionality of the model (except R2(adj)) and the

complexity of data into account. Multiple Comparison Procedures (MCP) is another

category of model evaluation techniques that are often used to compare a set of possible

models to the given data. Such tests include McNemars test, a test for difference of error

proportions, resampled paired t test, k-fold cross validated paired t test and 5x2cv paired t

32


42/109

test (Diettrich, 1997). The basic concept of these tests is to check for significant

difference in error (or its proportions) from the various models developed. Since the usual

practice is to check for this difference among the error vectors from the same dataset,

care must be taken to compensate for correlation. Secondly, the multiplicity effect that

arises out of simultaneous pair wise comparisons between the models should also be

taken into consideration (because of increased chances of Type I error).

4.2.1.1. Function Approximation

The Hochberg and Tamhane based on the studentized maximum modulus

distribution is appropriate for the function approximation problems. Dunn (1961)

proposed a test based on studentized t distribution that can reveal any significant

differences between error proportions (well suited for classification problems). An

excellent summary of both these tests is given in (Feelers & Verkooijen, 1996).

A third evaluation strategy is to construct a form ofpenalization criteria that

enhances the empirical risk with a term that disfavors complex models (Domingos,

1999). There are several penalization forms and AIC (Akaike Information Criterion) as

defined in (4.14) is one of them (Ishikawa, 1996).

)14.4(k2)log(nlAIC 2 Where, k is the number of independent estimated parameter, l is the number of output

units and is the maximum likelihood estimate of the mean square error.2

In order to check the performance of these criteria an approach similar to the one

proposed by Lawrence et al (1997) is used. A randomly initialized teacher network is

used to extract the training and testing data and networks of varying complexities called

student networks are then trained on the learning dataset and validated with the testing

33


43/109

data. In this case study a neural network consisting of 3 hidden neurons is used as the

teacher network with 2 inputs and one output. The data was split into learning and

validation sets comprising of 140 and 60 patterns respectively.

Networks of varying size are trained with the learning data for 500 epochs.

Various evaluation criteria along with the Hochberg and Tamhane confidence intervals

(at 95% level) are given in Tables 4.2 and 4.3 respectively.

Table4.2. Evaluation criteria for the function approximation problem

Hidden Neurons MSE R2

R2(adj) AIC

2 0.006471215 0.417 0.3256 -285.4319

3 0.000054294 0.7061 0.631 -564.27485 0.000040264 0.8028 0.7017 -566.2111

10 0.000220444 0.8472 0.5254 -424.2004

15 0.000046811 0.8769 8.2639 -477.1716

20 0.004963404 0.8229 1.4976 -157.3482

From table 4.2 we cannot conclusively select a model because of varying

indications from the different criteria, although AIC points to the model with 5 hidden

neurons which is the closest to the original model (3 hidden neurons). From table 4.3 it is

evident that the Hochberg & Tamhane test concludes that all models are equally good.

Table4.3. Confidence intervals from pair-wise Hochberg and Tamhane test

Model 1 2 3 4 5 6

1 - [-2.77 2.79] [-2.77 2.79] [-2.77 2.78] [-2.77 2.79] [-2.78 2.78]

2 - - [-2.78 2.78] [-2.78 2.77] [-2.78 2.78] [-2.78 2.77]

3 - - - [-2.7802.77] [-2.78 2.78] [-2.78 2.77]

4 - - - - [-2.78 2.78] [-2.78 2.778]

5 - - - - - [-2.78 2.778]

Though AIC was close to the original model (3 neurons), it cannot be concluded

that it did in fact select a model that best fits the data and it is also not valid to assume

that a 3 or close to 3 hidden neuron network is a good fit for the finite data. (This

34


44/109

validates the theory that for finite data that the best fit is not necessarily a model identical

to the true parametric form (Cherkassky V & Mulier F, 1998) and in this case it is in fact

of a higher complexity than the true parametric form.)

To confirm that the 5 hidden neuron model is in fact superior, a generalization test

was performed where noisy inputs were presented to the networks. The (additive) noisy

inputs were generated as )dBW,i(wgn)i(I)i(I += where I is the original input value and

wgn is white gaussian noise with power specified by dBW. It can be seen from table

4.4 and figure 4.3 that the network with 5 neurons outperforms the rest.

Table4.4. MSE values of the models at different noise levels

Hidden Neurons

Noise (dBW) 2 3 5 10 15 20

1 0.0112 0.07 0.0456 0.049 0.0611 0.0367

5 0.0365 0.1191 0.0673 0.095 0.1177 0.0429

10 0.0871 0.2113 0.099 0.2325 0.2257 0.1023

15 0.1864 0.2735 0.1332 0.3805 0.3768 0.1684

20 0.2669 0.4267 0.2189 0.4318 0.8173 0.197

25 0.2106 0.3823 0.1502 0.4577 0.7028 0.1944

30 0.3304 0.4181 0.1871 0.4428 0.8066 0.2133

35 0.3803 0.4053 0.1985 0.4856 0.8117 0.2375

40 0.4897 0.4747 0.2371 0.6171 0.8084 0.3408

45 0.613 0.3692 0.184 0.5839 0.8462 0.2364

50 0.549 0.4485 0.2114 0.5048 0.9228 0.2672

AvgMSE 0.2873 0.3272 0.1575 0.3892 0.5906 0.1852

35


45/109

Figure4.3. Box and whisker plot of MSE values

4.2.1.2. Classification

A real world problem in the form ofEColi dataset1

was chosen for analysis of

AIC and the other criteria in the classification arena. Table 4.5 details the composition of

this dataset (8 inputs and 8 classes). The data was normalized to facilitate computation of

AIC and this was done using the technique specified by Mirkin (1996) and shown in 18.

)15.4(

P1

Pvv

v

2

v

v

normalized =

The AIC values and the confidence intervals on the error proportions are given in

Tables 4.5 and 4.6 respectively. From table 4.6, it is evident that the AIC values indicate

an inferior classification capability (positive AIC values) and that the network with 2

hidden neurons is the best of the lot. From Table 4.7, it can be seen that none of the

classifiers have identical classification capabilities (no closed interval containing 0) and

that their performance is in the order 3, 6, 5, 4, 2, 1.

1 Available by anonymous ftp from ftp://ftp.ics.uci.edu/pub/machine-learning-databases/

36
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/ftp://ftp.ics.uci.edu/pub/machine-learning-databases/


46/109

Table4.5. Classes in the Ecoli dataset

Class ID Class Name Number of patterns

CP Cytoplasm 143

IM Inner membrane without signal sequence 77

PP Periplasm 52

IMU Inner membrane, un-cleavable signal sequence 35

OM Outer membrane 20

OML Outer membrane lipoprotein 5

IML Inner membrane lipoprotein 2

IMS Inner membrane, cleavable signal sequence 2

Table4.6. AIC values of networks developed for classifying the EColi dataset

Hidden Neurons AIC

2 146.5111

3 170.166

5 199.6422

10 299.781

15 406.0171

20 472.2449

37


47/109

Table4.7. Confidence Intervals for difference in error

Model 2 3 4 5 6

1 [0.003 0.016] [0.19 0.20] [0.12 0.13] [0.16 0.17] [0.17 0.18]

2 [0.18 0.19] [0.11 0.12] [0.15 0.16] [0.16 0.17]

3 [-0.07 -0.06] [-0.035 -0.02] [-0.02 -0.01]

4 [0.03 0.04] [0.04 0.05]

5 [0.003 0.01]

The test of generalization accomplished by inducing additive white gaussian noise

yielded the error proportions shown in Table 4.8. It is evident from the average error

values and box plot in Figure 4.4 that network 3 is in fact the best and the performance of

the networks is in the order 3, 4, 6, 5, 2, 1 which is close to what is concluded from the

above test. This is also in total contradiction to that indicated by AIC.

Figure4.4. Box and whisker plot of MSE values

38


48/109

Table4.8. Error proportions when simulated in noisy environment

Hidden Neurons

Noise (dBW) 2 3 5 10 15 20

1 0.6634 0.6436 0.5545 0.6238 0.5941 0.5446

5 0.6634 0.703 0.5248 0.6634 0.703 0.7129

10 0.8218 0.7921 0.6634 0.6634 0.7921 0.7723

15 0.8515 0.8218 0.6238 0.6832 0.6931 0.802

20 0.8218 0.802 0.703 0.703 0.7921 0.7822

25 0.8614 0.8416 0.6931 0.7624 0.7822 0.8119

30 0.8515 0.8515 0.6931 0.6535 0.802 0.7228

35 0.8416 0.7525 0.7129 0.6634 0.802 0.7426

40 0.8515 0.7723 0.6634 0.6832 0.7525 0.7624

Multiple comparison procedures such as the Hochberg & Tamhane test for

function approximation and studentized t test for classification can point to significant

differences in the approximation capabilities of the models. However these tests do not

take into account the complexity of the models. The Akaike Information Criterion,

designed to take into account the complexity as well as the precision of the model, was

seen to perform extremely well in the function approximation arena while it falters in the

classification domain. Studentized t test yields a better evaluation strategy when

compared to AIC for the classification problems.

39


49/109

4.2.2. Validation of Model Legibility

A neuro-fuzzy architecture is a highly transparent model (or representation)

because the rules used for modeling provide qualitative insights of the domain. However

a neuro-fuzzy system often does not result in a good model (from the view point of

legibility) because of the unconstrained gradient search algorithm. The rules that result

from this training often are neither identical nor similar in their ability to mirror the

domain as the rules prior to the training. Although the training phase results in a gain in

the precision, it often is at the expense of the legibility of the rules. Two types of model

deteriorations are explained below (Kothamasu et al, 2004).

Linguistic deterioration: The initial rules are created based on membership

functions that can be described using linguistic variables like small, medium

or large or other appropriate characterization. However it is not possible to

achieve such a characterization after these rules have been tuned because of the

deterioration of the linguistic structure (within each dimension) created prior to

the training or during the discretization phase of rule extraction.

Structural deterioration: The situation is compounded with the fact that the post

training rules often do not effectively describe the system. It is not uncommon

that the rules are often undistinguishable and make sense only from the

approximation point of view and will not be able to explain the created model

thus causing a deterioration of the transparency of the system.

This has grave repercussions in some situations where the model needs to change

over time because of the dynamic nature of the domain. Since the models are not

transparent enough it is not possible to direct this necessary change. However, it is

40


50/109

possible to continue to update the model using the backpropagation, but this makes the

NF models equivalent to a neural network and it defeats the original intention and

objective to utilize them as decision aids.

As demonstrated in the previous section, AIC based on Kullback-Leibler (KL)

mean information which measures the distance between two distributions can be

effectively used for model validation. A similar approach can be used to validate or

evaluate the structure of rules based on the KL distance between the membership

functions in each dimension. KL distance is computed as given in (4.16).

)16.4()x()x(log)x(

xd

j

d

id

i

d

j,i Where dji, represents the KL distance between membership functions and in

dimension d. A distance matrix hence can be formulated for each dimension dwhich

represents the qualitative distance of each membership function from the rest as given in

(4.17).

di

dj

d

)17.4(

..........

.........

d

N,N

d

1,N

d

N,i

d

i,i

d

1,i

d

N,1

d

1,1

d

ddd

d

d

=

Where, is the number of MFs in dimension danddN ( ) ( )2,2

,,d

ijdji

dji +=

. This

matrix is scaled to facilitate merging of the significantly similar membership functions

based on a threshold . The matrix is scaled as given in (4.18), where is

the largest element in .

dthreshold

d

dmax

.

41


51/109

)18.4(d

max

d

j,id

j,i

d

j,i

d

=

The primary advantage of using the KL distance is that it is not restricted by the

parametric form of the membership functions. The similarity between any two

membership functions is inversely related to the corresponding value in the distance

matrix. The advantage of KL distance matrix can be seen in the following case study

which involves the approximation of a function popularly known as Rosenbrocks banana

function as defined in (4.19).

)19.4()x1()xx(*100y2

2

22

21 The initial and final membership functions as identified by ANFIS are depicted in

figures 4.5 and 4.6. As can be seen from figures 4.5 and 4.6 there is as clear deterioration

of the rule and linguistic structure within each input dimension. However from an

approximation point of view the network is very precise as indicated by the MSE value

which is 0.00065141 after 1000 iterations.

The normalized KL distance matrices for the input dimension (X1) are computed

using the above formulae and are given in tables 4.9 and 4.10 where MF stands for

membership function. As can be seen from tables 4.9 & 4.10 the distance measures of

MF1 (from rest) are quite similar indicating a very wide span and hence higher overlap

with all the membership functions. This is indeed the case as can be seen from figure4.6.

It can also be seen that there is a gradual degradation of the structure because of

membership functions with very wide spans and closely spaced centers. This is indicated

in tables 4.9 & 4.10 where some of the distance measures are low in magnitude.

42


52/109

(a) (b)

Figure4.5. Initial membership functions (a) input 1 (b) input 2

(a) (b)

Figure4.6. Final membership functions (a) input 1 (b) input 2

Table4.9. KL distance matrix for X1

MF1 MF2 MF3 MF4 MF5 MF6

MF1 0 0.18253 0.23386 0.23386 0.32494 0.32495

MF2 0.18253 0 0.23104 0.23103 0.40117 0.40118

MF3 0.23386 0.23104 0 0.70052 0.86876 0.10774

MF4 0.23386 0.23103 0.70052 0 0.10774 0.86876

MF5 0.32494 0.40117 0.86876 0.10774 0 1

MF6 0.32495 0.40118 0.10774 0.86876 1 0

43


53/109

Table4.10. KL distance matrix for X2

MF1 MF2 MF3 MF4 MF5 MF6

MF1 0 0.48375 0.44889 0.44885 0.46912 0.46913

MF2 0.48375 0 1 0.99996 0.0026265 0.0026214

MF3 0.44889 1 0 1e-005 0.9768 0.97681

MF4 0.44885 0.99996 1e-005 0 0.97675 0.97676

MF5 0.46912 0.0026265 0.9768 0.97675 0 5e-006

MF6 0.46913 0.0026214 0.97681 0.97676 5e-006 0

The inference system is refined by eliminating (merging or deleting) the MFs that

result in structural deterioration as indicated by the distance measures. A threshold value

of 0.2 was chosen and MFs with lower distance measures are merged accordingly. The

resultant network was trained for 1000 epochs and the MSE value was found to be

0.00031699 which is 48% lower. The resultant MFs are given in figure 4.7.

Figure4.7. Final membership functions (a) Input1 (b) Input2

The resultant network as can be seen has higher legibility compared to the

original network and hence the KL measure can be used for validating the networks

legibility. However, the threshold for merging has to be subjectively decided so that

44

8/2/2019 !!! ICBM Intelligent Condition Based Maintenance Thesi

!!! icbm intelligent condition based maintenance thesis

Documents