memòria justificativa de recerca de les beques
TRANSCRIPT
Memòria justificativa de recerca de les beques predoctorals per a la formació de personal investigador (FI) La memòria justificativa consta de les dues parts que venen a continuació: 1.- Dades bàsiques i resums 2.- Memòria del treball (informe científic) Tots els camps són obligatoris 1.- Dades bàsiques i resums Títol del projecte ha de sintetitzar la temàtica científica del vostre document. Intelligent PCA Contribution Analysis for Quality Estimation in Batch Processes. Application in a Sequencing Batch Reactor for Wastewater Treatment Dades de l'investigador (benficiari de l’ajut) Nom Alberto
Cognoms Wong Ramírez
Correu electrònic [email protected] Dades del director del projecte Nom Joan
Cognoms Colomer Llinàs
Correu electrònic [email protected] Dades de la universitat / centre al que s’està vinculat Universitat de Girona Departament d'Enginyeria Elèctrica, Electrònica i Automàtica Enginyeria de Control i Sistemes Intel•ligents - Grup de Recerca Número d’expedient 2010FI_B200198 Paraules clau: cal que esmenteu cinc conceptes que defineixin el contingut de la vostra memòria. Batch Processes, Contribution Plots, Data Mining, Fault Diagnosis, Principal Component Analysis
Data de presentació de la justificació 28/07/2011
Resum en la llengua del projecte (màxim 300 paraules) En aquest treball, es proposa un nou mètode per estimar en temps real la qualitat del producte final en processos per lot. Aquest mètode permet reduir el temps necessari per obtenir els resultats de qualitat de les anàlisi de laboratori. S'utiliza un model de anàlisi de componentes principals (PCA) construït amb dades històriques en condicions normals de funcionament per discernir si un lot finalizat és normal o no. Es calcula una signatura de falla pels lots anormals i es passa a través d'un model de classificació per la seva estimació. L'estudi proposa un mètode per utilitzar la informació de les gràfiques de contribució basat en les signatures de falla, on els indicadors representen el comportament de les variables al llarg del procés en les diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots anormals històrics es construeix per cercar els patrons i entrenar els models de classifcació per estimar els resultas dels lots futurs. La metodologia proposada s'ha aplicat a un reactor seqüencial per lots (SBR). Diversos algoritmes de classificació es proven per demostrar les possibilitats de la metodologia proposada.
Resum en anglès(màxim 300 paraules) In this work, a new method to estimate in real-time the quality of final product in batch processes is proposed. This method allows reducing the required time to obtain the quality results by laboratory analysis. A Principal Component Analysis (PCA) model built with historical data in normal operation condition is used to discern if a released batch is normal or not. For abnormal batches, a fault signature is calculated and passes through a classification model for the estimation. The study proposes a method to use the information of the contribution plots as a fault signature, where indicators will represent the behavior of the process variables in the different stages. A fault signature dataset composed of historical abnormal batches is built to search for patterns and train classification models to estimate the results of future batches. The proposed methodology has been applied in a Sequencing Batch Reactor (SBR). Several classification algorithms are tested to prove the possibilities of the proposed methodology.
2.- Memòria del treball (informe científic sense limitació de paraules). Pot incloure altres fitxers de qualsevol mena, no més grans de 10 MB cadascun d’ells. The structure of this work consist of eight chapters, the glossary and the references. Chapter 1 presents the background of the study, methods and techniques that are going to be executed; the situation in which the study is applied and the objective to achieved for the study. Chapter 2 the different types of wastewater treatment plants are presented, the stages to treat the wastewater, the differences between two wastewater treatment plants and the advantages and disadvantages of one plant with respect the other. Chapter 3 the history and theory of the multivariate statistical process control, its beginnings with statistical process control and how they are applied, then the statistical chart for MSPC to detect faulty product. Next the principal component analysis, a popular MSPC technique for industry processes, its statistical chart for fault detection and the contributions plots to diagnose the faulty products. Finally the unfold-PCA, a technique consistently with PCA but applied to batch processes. Chapter 4 the pilot plant for wastewater treatment description, the historical data with the laboratory analysis of the quality variables of the treated water. Following, the creation of the PCA model for batch processes with the historical data of the plant. The detection of faulty processes with the statistical chart followed by the contribution plots for diagnosis. Chapter 5 the new methodology proposed in this study, contribution limit chart to performed a better diagnosis task than the contribution plots. Other methods that were proposed to achieve a better diagnosis of the stages of a batch process with the contribution limit chart, methods that were discarded because of the poor results. After, the second part of the method proposed were a fault signature is develop to represent a faulty batch to be used to diagnose new released batches. Chapter 6 the new methodology proposed is applied to the historical data of the wastewater treatment plant. The PCA model and the statistic chart are build to detect the faulty batches. The estimation diagnosis of the global quality removal of the treated wastewater are presented using the contribution limit charts, the fault signature with the binary indicator and rules set obtained by a rule induction algorithm. Chapter 7 the new methodology proposed is used to estimate the diagnosis of each quality variable. In this chapter the two indicators for the fault signature is applied to the historical data. Rule induction and classification algorithm are used to obtained the rules set and the knowledge model to performed the estimation diagnosis of the quality variable. Chapter 8 the conclusions of the study. The results with the unfold-PCA technique and the analysis of new methodology proposed to estimate the different quality variables of the process and the advantages of the system. Finally the future works that can be developed with the new methodology.
Intelligent PCA Contribution
Analysis for Quality Estimation
in Batch Processes. Application
in a Sequencing Batch Reactor
for Wastewater Treatment
Alberto Wong Ramırez
Department of Electrical, Electronic and Automatic Engineering
Control Engineering and Intelligent Systems Group
eXiT
2011 July
Abstract
En aquest treball, es proposa un nou metode per estimar en temps real la qualitat del
producte final en processos per lot. Aquest metode permet reduir el temps necessari
per obtenir els resultats de qualitat de les analisi de laboratori. S’utiliza un model
de analisi de componentes principals (PCA) construıt amb dades historiques en
condicions normals de funcionament per discernir si un lot finalizat es normal o no.
Es calcula una signatura de falla pels lots anormals i es passa a traves d’un model
de classificacio per la seva estimacio. L’estudi proposa un metode per utilitzar la
informacio de les grafiques de contribucio basat en les signatures de falla, on els
indicadors representen el comportament de les variables al llarg del proces en les
diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots
anormals historics es construeix per cercar els patrons i entrenar els models de
classifcacio per estimar els resultas dels lots futurs. La metodologia proposada s’ha
aplicat a un reactor sequencial per lots (SBR). Diversos algoritmes de classificacio
es proven per demostrar les possibilitats de la metodologia proposada.
Abstract
In this work, a new method to estimate in real-time the quality of final product
in batch processes is proposed. This method allows reducing the required time to
obtain the quality results by laboratory analysis. A Principal Component Analysis
(PCA) model built with historical data in normal operation condition is used to
discern if a released batch is normal or not. For abnormal batches, a fault signature
is calculated and passes through a classification model for the estimation. The
study proposes a method to use the information of the contribution plots as a fault
signature, where indicators will represent the behavior of the process variables in the
different stages. A fault signature dataset composed of historical abnormal batches
is built to search for patterns and train classification models to estimate the results of
future batches. The proposed methodology has been applied in a Sequencing Batch
Reactor (SBR). Several classification algorithms are tested to prove the possibilities
of the proposed methodology.
Acknowledgements
The author wishes to thank the Spanish Goverment (CTQ2008-06865-C02-02), with
the support of the CUR, the DIUE, the Generalitat of Catalonia and the European
Social Fund. The Control Engineering and Intelligent Systems Group (eXiT) and
their personnel for all the support and the Laboratory of Chemical and Environ-
mental Engineering (LEQUIA) and their personnel.
Contents
List of Figures xiii
List of Tables xv
1 Introduction 1
1.1 Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Publications and Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Wastewater Treatment Plants 5
2.1 Continuous Treatment Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Sequencing Batch Reactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Multivariate Statistical Process Control 9
3.1 Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Multivariate Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Statistical Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Principal Components to Retain . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.1.1 Percent Variance Explained . . . . . . . . . . . . . . . . . . . . . 15
3.4.1.2 Kaiser-Guttman Criterion . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1.3 Cattell’s Scree Test . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.2 Statistical Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.2.1 Squared Prediction Error or Q Statistic Chart . . . . . . . . . . 17
ix
CONTENTS
3.4.2.2 Hotelling’s T 2 Statistic Chart . . . . . . . . . . . . . . . . . . . . 17
3.4.3 Schematic Interpretation of PCA . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.4 Contribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Unfold-PCA for Batch Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Pilot Plant Description and Statistical Modelling 25
4.1 Pilot Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Analysis of Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Statistical Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Contribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 New Methodology for Intelligent Contribution Analysis 33
5.1 Contribution Limit Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Improving the Contribution Limit Chart . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Modify Cumulative Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Sum of Standard Deviation and Stage Mean . . . . . . . . . . . . . . . . . 35
5.2.3 Sum of Standard Deviation with Statistic Range . . . . . . . . . . . . . . 36
5.3 Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.1 Binary Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . 36
5.3.2 Numeric Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . 37
5.4 Diagnosis with the Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Intelligent Contribution Analysis for Fault Diagnosis 39
6.1 Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Contribution Limit Chart and Binary Fault Signature . . . . . . . . . . . . . . . 42
6.4 Diagnosis with the Binary Fault Signature . . . . . . . . . . . . . . . . . . . . . . 43
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
x
CONTENTS
7 Intelligent Contribution Analysis for Estimation of Quality Variables 47
7.1 Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3 Binary Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3.1 Contribution Limit Chart and Binary Fault Signature . . . . . . . . . . . 48
7.3.2 Diagnosis with the Binary Fault Signature . . . . . . . . . . . . . . . . . . 48
7.4 Numeric Indicator for Fault Signature . . . . . . . . . . . . . . . . . . . . . . . . 51
7.4.1 Contribution Limit Chart and Numeric Fault Signature . . . . . . . . . . 51
7.4.2 Diagnosis with the Numeric Fault Signature . . . . . . . . . . . . . . . . . 52
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8 Conclusions and Future Studies 57
Glossary 59
References 61
xi
CONTENTS
xii
List of Figures
2.1 Continuous Wastewater Treatment Plant . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Sequencing Batch Reactor Treatment Plant . . . . . . . . . . . . . . . . . . . . . 7
3.1 Schematic Control Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Bivariate vs Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Percent Variance Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Eigenvalue vs. Principal Component . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 A Simplified Representation of PCA . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 PCA Model Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.8 3D matrix data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.9 Batch wise unfold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Pilot Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Pre-process 93 High GQR Batches . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Pre-process 84 High GQR Batches . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 PCA Model 84 Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Q Statistic Chart for Medium GQR . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Q Statistic Chart for Low GQR . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 Q Statistic Chart for Medium and Low GQR Batches . . . . . . . . . . . . . . . 30
4.8 Q Contribution Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 Pre-process 70 NOC Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 PCA Model 70 Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 Q Statistic Chart for AOC Batches . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.4 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
xiii
LIST OF FIGURES
6.5 Application Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.1 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Fault Signature for AOC Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3 Application Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
xiv
List of Tables
4.1 Chemical analysis of BNR and GQR. . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Standard Levels for Nutrients Removal. . . . . . . . . . . . . . . . . . . . . . . . 27
6.1 Chemical analysis of BNR and GQR. . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Diagnosis results obtained with the rules set of the CN2 algorithm . . . . . . . . 44
6.3 Diagnosis results obtained with the rules set of the PART algorithm . . . . . . . 44
7.1 CN2 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.4 IB1 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.5 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.6 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.7 IB1 diagnosis table for ammonium. . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.8 CN2 diagnosis table for nitrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.9 CN2 diagnosis table for phosphate. . . . . . . . . . . . . . . . . . . . . . . . . . . 54
xv
LIST OF TABLES
xvi
Chapter 1
Introduction
In industrial manufacturing batch processing is an alternative to continuous processing. In
batch processing the input materials are inserted in a reaction tank in a certain sequence and,
after the mixing reaction, a product is released. In some batch processes product quality
is achieved by measuring qualitative variables, which can be done by performing a chemical
laboratory test on the released product. The time period to obtain the chemical test result of
the released product can sometimes be long, requiring that the mixing reaction remains intact
during the time period of the test and risking the loss of valuable materials if the obtained
result is a low-quality product. The development of systems capable to diagnose the quality of
the product release of a batch process to achieve the highest efficiency is a great concern for
production management (1).
Sequencing batch reactor (SBR) processes have demonstrated their efficiency and flexibility
in the treatment of wastewater with high concentrations of nutrients (nitrogen and phosphorous)
and toxic compounds from domestic and industrial sources (2, 3, 4, 5). The SBR process is
highly nonlinear and time varying. Changes in the concentration of the influent, could affect
the process and change the effluent quality. A faster estimation of effluent quality respect the
conventional off-line analysis would be useful to reconfigure and correct the process.
Principal component analysis (PCA) is a tool of multivariate statistical process control
(MSPC) to identify patterns in data of high dimension, expressing the data in a way that
highlights their similarities and differences (6, 7). The primary objective of PCA are data
summarization, classification of variables, outliers detection, early warning of potential mal-
functions and fingerprinting for fault identification (8). PCA is one of the techniques that have
been used in a wide range of continuous processes, proving their ability to detect faults in the
1
1. INTRODUCTION
processes. Nomikos and McGregor developed the unfold-PCA (U-PCA) for batch processes,
processes with three dimensional data (9). If a process is detected as faulty a PCA contribution
plot is build, a graphical representation on how each variable contributed in the process.
1.1 Current Situation
A SBR reactor with the capacity to remove biological nutrient as organic matter, nitrogen and
phosphorus is applied for wastewater treatment. The measurements of the biological nutrient
removal (BNR) are conducted by laboratory analysis, because the sensors to measure the quality
variables are very expensive. The laboratory results of a finished process can be obtained several
hours later. The process remains intact during the period of the analysis for the quality variables
(organic matter, nitrogen and phosphorus) of the wastewater treated, risking the environmental
requirements to discharged.
1.2 Objective
The main objective for the project is to develop a system that can predict in real-time the quality
variables of the released product from the quantitative variables measured from a batch process.
The advantages of this system consist in a reduction in the investment of expensive sensors to
measure the qualitative variables of a batch process, time reduction for the product quality
analysis with respect to a laboratory analysis that can take several hours and the diagnosis
estimation of the quality variables in real-time, immediately after the product is released.
1.3 Outline
The structure of this work consist of eight chapters, the glossary and the references.
Chapter 1 presents the background of the study, methods and techniques that are going to
be executed; the situation in which the study is applied and the objective to achieved for the
study.
Chapter 2 the different types of wastewater treatment plants are presented, the stages
to treat the wastewater, the differences between two wastewater treatment plants and the
advantages and disadvantages of one plant with respect the other.
Chapter 3 the history and theory of the multivariate statistical process control, its beginnings
with statistical process control and how they are applied, then the statistical chart for MSPC
2
1.4 Publications and Related
to detect faulty product. Next the principal component analysis, a popular MSPC technique
for industry processes, its statistical chart for fault detection and the contributions plots to
diagnose the faulty products. Finally the unfold-PCA, a technique consistently with PCA but
applied to batch processes.
Chapter 4 the pilot plant for wastewater treatment description, the historical data with the
laboratory analysis of the quality variables of the treated water. Following, the creation of the
PCA model for batch processes with the historical data of the plant. The detection of faulty
processes with the statistical chart followed by the contribution plots for diagnosis.
Chapter 5 the new methodology proposed in this study, contribution limit chart to performed
a better diagnosis task than the contribution plots. Other methods that were proposed to
achieve a better diagnosis of the stages of a batch process with the contribution limit chart,
methods that were discarded because of the poor results. After, the second part of the method
proposed were a fault signature is develop to represent a faulty batch to be used to diagnose
new released batches.
Chapter 6 the new methodology proposed is applied to the historical data of the wastewater
treatment plant. The PCA model and the statistic chart are build to detect the faulty batches.
The estimation diagnosis of the global quality removal of the treated wastewater are presented
using the contribution limit charts, the fault signature with the binary indicator and rules set
obtained by a rule induction algorithm.
Chapter 7 the new methodology proposed is used to estimate the diagnosis of each quality
variable. In this chapter the two indicators for the fault signature is applied to the historical
data. Rule induction and classification algorithm are used to obtained the rules set and the
knowledge model to performed the estimation diagnosis of the quality variable.
Chapter 8 the conclusions of the study. The results with the unfold-PCA technique and the
analysis of new methodology proposed to estimate the different quality variables of the process
and the advantages of the system. Finally the future works that can be developed with the new
methodology.
1.4 Publications and Related
Alberto Wong Ramırez. Multivariate Statistical Process Control (MSPC) Applied to
a Sequencing Batch Reactor for Wastewater Treatment. Master of Science Thesis,
Universitat de Girona (UdG), 2007.
3
1. INTRODUCTION
A. Wong, J. Colomer, M. Coma and J. Colprim. PCA Intelligent Contribution Anal-
ysis for Fault Diagnosis in a Sequencing Batch Reactor. In Proceedings of the iEMSs
Fifth Biennial Conference, Vol. 3, pages 2230-2237, 2010.
A. Wong, J. Colomer. Soft-Sensor Utilizando Contribuciones ACP para un Reac-
tor Secuencial por Lotes para la Depuracin de Aguas Residuales. In Proceedings
Memorias de la Conferencia Iberoamericana de Complejidad, Informtica y Ciberntica (CICIC
2011), pages 34-39, 2011.
A. Wong Ramırez, J. Colomer Llinas. Fault Diagnosis of Batch Processes Release Using
PCA Contribution Plots as Fault Signatures. In Proceedings of the 13th International
Conference on Enterprise Information Systems, pages 223-228, 2011.
A. Wong Ramırez, J. Colomer Llinas, M. Coma, S. Puig, J. Colprim. Intelligent PCA
Contribution Analysis for Quality Estimation. Submitted to Industrial & Engineering
Chemistry Research, 2011.
4
Chapter 2
Wastewater Treatment Plants
Wastewater treatment is the process of removing pollutants from wastewater, both runoff and
domestic. The process combine physical, chemical and biological techniques to remove physical,
chemical and biological contaminants (10). The principal objective is to produce a treated
effluent suitable for discharge or reuse back into the environment with standards provided by the
state, Commission Directive 98/15/EC Amending Council Directive 91/271/EEC Concerning
Urban Waste Water Treatment (11).
Wastewater is created by residences, institutions, commercial and industrial buildings. The
wastewater can be treated with small treatment plant or collected and transported through a
pipe network to treatment plant facilities.
2.1 Continuous Treatment Plant
Wastewater treatment plants are commonly composed of a series of stages. There are different
techniques that can be applied to the process to achieve the best quality treated water for
disposal. The major stages to treat the wastewater are (12):
• Preliminary treatment: removes materials that could damage plant equipment of would
occupy treatment capacity without being treated.
• Primary treatment: removes settleable and floatable solids.
• Secondary treatment: removes biochemical oxygen demand (BOD) and dissolved and
colloidal suspended organic matter by biological action. Organics are converted to stable
solids, carbon dioxide and more organisms.
5
2. WASTEWATER TREATMENT PLANTS
• Advanced waste treatment: uses physical, chemical and biological processes to remove
additional BOD, solids and nutrients.
• Disinfection: removes microorganisms to eliminate or reduce the possibility of disease
when the flow is discharged.
• Sludge treatment: stabilizes the solids removed from wastewater during treatment, inac-
tivates pathogenic organism and reduces the volume of the sludge by removing water.
Figure 2.1: Continuous Wastewater Treatment Plant - Schematic of a continuous wastew-
ater treatment process. Source (10).
2.2 Sequencing Batch Reactor
Sequencing batch reactor differs from the conventional continuous process mainly because the
treatment process in the SBR is performed in one reaction tank following a structure sequence
of stages, while the treatment process in the continuous plant is performed through few reac-
tion vessels. SBR process are commonly used to produce high-quality end products like food,
biochemicals, pharmaceuticals, beverages and many more products from chemical processes.
Batch processes treat material in a prescribed manner for a finite duration. Successful opera-
tion is the reproducibility from batch to batch of a certain product (13). SBR technology have
proved its success in treating urban and industrial wastewater (14, 15).
6
2.2 Sequencing Batch Reactor
Commonly the SBR process is divide in five discrete periods: fill, reaction, settle, draw and
idle (16). In the fill reaction the influent is introduced to the tank, followed by the reaction
period where the process start to treat the influent. After the reaction cycle is finished, then, a
settle period is performed to separated the solids followed by the draw period where the effluent,
treated wastewater, is obtained. The idle cycle is used for wasting sludge.
Figure 2.2: Sequencing Batch Reactor Treatment Plant - Schematic of a sequencing
batch reactor wastewater treatment process. Source (16).
Few advantages of the SBR are:
• Equalization, primary clarification, biological treatment and secondary clarification can
be achieved in a single reactor tank.
• Operating flexibility and control.
• Minimal space to located the system.
• Cost savings by eliminating other equipments.
Few disadvantages of the SBR are:
• A higher level of sophistication is required, especially for larger systems.
• Higher level of maintenance associated with more sophisticated controls, automated switches
and automated valves.
7
2. WASTEWATER TREATMENT PLANTS
• Potential of discharging floating or settled sludge during the draw period.
• Potential requirement for equalization after the SBR process is finished.
2.3 Conclusions
In this chapter the two major system processes to treat wastewater were explained. The con-
tinuous process plant requires a large dimensional space to locate the different vessels in which
the wastewater pass through to be treated. The sequencing batch reactor process plant only
needs a reaction tank to execute, practically, all the stages in which the wastewater is treated.
With the SBR process plant the requirements for space dimensionality is very small compared
with the continuous process plant, but the require level of control is more sophisticated.
8
Chapter 3
Multivariate Statistical Process
Control
Multivariate statistical process control is a technique to study the vast amount of variables in a
complex process. MSPC applied in industrial plant can help to monitor the production, detect
faulty processes and reduce cost by decreasing the defect product rate.
3.1 Statistical Process Control
Statistical process control (SPC) is a statistical technique used to find variation in different sets
of measurements that bring a process into a state of control, and treating this, the improvement
of quality in the output variable.
SPC is attributed to Dr. Walter Shewhart, a concept that he developed during the 1920s
when he was working in the Bell Telephone Laboratories researching on techniques to improve
the quality of the product and reduce cost. With SPC he provided a tool to discern if a process
was in control or not (17).
The shewhart control chart shows if a sample projected in the chart is within the control
limits. The shewhart control chart is based in the measurements performed to quality products
of the process through time and compare those quality characteristic with new samples of the
process. This control chart is composed of the center line where the samples of the process
needs to be around, the upper control limit and the lower control limit where if a sample is
outside the limits, then the process is out of control (18, 19).
The control limits are affected by three parameters: the estimate of average process level, the
process spread expressed as range or standard deviation and a constant based on the probability
9
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
of Type I error α. The most popular control limit used is the 3σ control limit. Therefore, the
control limits for the shewhart control chart are:
UCL = QualityCharacteristicMeasurement+ 3σ (3.1)
LCL = QualityCharacteristicMeasurement− 3σ (3.2)
To increase the sensitivity and detect early shifts from the quality characteristic measure-
ment of the process, warning limits are incorporated:
UWL = QualityCharacteristicMeasurement+ 2σ (3.3)
LWL = QualityCharacteristicMeasurement− 2σ (3.4)
Figure 3.1: Schematic Control Chart - Source (19)
The shewhart control chart were used for years in process industry with successful achieve-
ment, where the control is performed in small number of variables, mostly in the quality product
variable. But with the increase of new technology the complexity of the process became more
10
3.2 Outliers
challenging and the shewhart control chart lacks in detecting changes in the process more
quickly and more over, to control a process with hundreds of variables. Since the shewhart
control chart analyze only one variable, a process with hundred variables will required hundred
shewhart control charts.
3.2 Outliers
An outlier is a sample that is very different from the rest of the data set were it belongs to. The
measurement of the sample differ substantially respect to a certain variable or a combination
of variables (20).
Detecting outliers in a data set is important when a statistical model wants to be create.
The outliers in the data set can influence the parameters calculated to build the model, creating
thresholds that can lead to inaccurate predictions of new samples projected in the model and
then to wrong actions to take.
Figure 3.2: Outlier - Sample 35 is an outlier in the data set.
In figure 3.2 a plot of samples versus standard deviation is presented. The sample 35 have
a high standard deviation with respect to whole data set, meaning that this sample does not
belong to the data set or that the sample have disturbances in its measurements.
11
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
3.3 Multivariate Statistical Process Control
SPC is a tool that provide early warnings of fault conditions in the process. The quality
variables of the product are measured in terms of the mean and variation and if a new sample
projected in the statistical chart are outside the thresholds or suggest a shifting of the mean
then actions to correct the behavior needs to be applied (21).
Today a typical industrial process can contain hundreds or thousands of sensors. With
SPC methods each sensor needs its own monitoring chart, which is impractical in situations
where the process is large. Another problem for the statistical process is the way in which
this technique deals with the variables process, assuming that each one is independent of the
others. Because of this limitation a new technique has been developed, finding a way to treat
the variables of a complex process that, in almost all the cases are related to the others.
MSPC is a technique to study the vast amount of variables that we found in a complex
process, and by doing this, trying to find a way or a logical model that represents the mea-
surements in the process for detection of fault processes and to calibrate the process for the
best quality result. Multivariate it is said by the great amounts of variables that we needed to
analyze in a process.
3.3.1 Statistical Charts
Control charting is the most common SPC technique used in the industry (22). The diagnosis
with control chart helps to reduce low quality products in the process (23).
In 1947 Hotelling established the multivariate process control when he applied the technique
to a bombsights problem (24). Hotelling’s T 2 control statistic has the characteristic to find
correlations between the variables of the process. In 1931 Hotelling proposed a concept of
generalized distance between a new observation to its sample mean.
The Hotelling’s T 2 statistic examine a new sample and see if its out-of-control when is
compare with the sample mean. For multiple sets of variables, the Hotellings T 2 statistic will
be plotted in a chart against time or observation and compared with a limit. With a historical
set of data, a normal operation chart can be develop and project the new samples to see if it is
out-of-control.
12
3.4 Principal Component Analysis
Figure 3.3: Bivariate vs Univariate - Source (25)
3.4 Principal Component Analysis
Principal Component Analysis defined a series of new variables by linear combination of the
original variables that explained the maximal variability and at the same time reduce the
dimension of the problem (6).
PCA search for patterns in the data and deliver information on how the different variables
relates to each other. There are cases where few principal component are needed which explained
the variability of the original data with a minimal loss of information (6, 7). The PCA objective
are data compression, classification of variables, detection of outliers, early warning of process
malfunctions and fault detection (8).
PCA is based in the eigenvector decomposition of the correlation matrix of the process
variables. For a given data matrix X with m rows (samples) and n columns (variables) the
covariance matrix of X is defined as (26):
cov(X) =XTX
m− 1(3.5)
this assumes that the columns of X have been “mean centered”. If the columns of X have
been “autoscaled”, the cov(X) equation gives the correlation matrix of X. PCA decomposes
13
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
the data matrix X as the sum of the outer product of vectors ti and pi plus a residual matrix
E:
X = t1pT1 + t2p
T2 + · · ·+ tkp
Tk + E (3.6)
here k must be less than or equal to the smaller dimension of X, i.e. k ≤ minm-1,n. The ti
vectors are known as scores and contain information on how the samples relate to each other.
The pi vectors are eigenvectors of the covariance matrix, i.e. for each pi:
cov(X)pi = λipi (3.7)
where λi is the eigenvalue associated with the eigenvector pi. In PCA the pi are known as
loadings and contain information on how variables relate to each other. The ti form and
orthogonal set (tTi tj = 0 for i 6= j), while the pi are orthonormal (pTi pj = 0 for i 6= j, pTi pj =
1 for i = j). Note that for X and any ti, pi pair
Xpi = ti (3.8)
This is because the score vector ti is the linear combination of the original X data defined
by pi. The ti, pi pairs are arranged in descending ordered according to the associated λi. The
λi are a measure of the amount of variance described by the ti, pi pair. In this context, we
can think of variance as information. Because the ti, pi pairs are in descending order of λi, the
first pair capture the largest amount of information of any pair in the decomposition. In fact,
it can be shown that the t1, p1 pair capture the greatest amount of variation in the data that
it is possible to capture with a linear factor. Subsequent pairs capture the greatest possible
variance remaining at that step.
3.4.1 Principal Components to Retain
Deciding how many principal component to retain is one of the issues for a PCA model. If there
are fewer principal components retained, the threshold would be narrower and then the model
would have higher false alarms. Instead, if there are too many principal component retained,
the threshold of the model would be wider and early detection of process misbehavior would
be slow (27). Several methods have been proposed to retain the exact number of principal
components.
14
3.4 Principal Component Analysis
3.4.1.1 Percent Variance Explained
In this method the principal components to retain represent a percentage of the total variance in
the process (figure 3.4). This percentage is obtained through the calculation of the eigenvalues
from the covariance matrix, each eigenvalue is a measure of the process variance (27). This
method is arbitrary to build a model with the correct number of principal components, therefore,
others methods needs to confirm if the percentage proposed could be correct.
Figure 3.4: Percent Variance Explained - 20 principal component with their respective
cumulative variance percentage.
3.4.1.2 Kaiser-Guttman Criterion
This method is maybe the most employed to retained principal components. If the eigenvalues
of the covariance matrix are greater than one, then those principal components are retained.
According to this method, eigenvalues lower than one explain less variance than the original
standardize variables (28, 29). Regarding with this method, many authors say that the rule of
greater or less than one is an arbitrary decision in the values around one. It is known that this
method retains many principal components. In figure 3.5 the number of principal component
suggested to retain with this method are two, because both principal component are above one.
3.4.1.3 Cattell’s Scree Test
The method proposed by Catell observe the eigenvalue versus the principal component chart
and looks for a scree shape at the bottom of the graph. In the graph there are two section where
15
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
first the eigenvalue start to fall quickly and the second section where the eigenvalues fall looks
more like a straight line. The break of the two section suggest that the first section is the linear
relationship between the variables of the process and the second the noise and uncertainties
of the process (27). This method does not give a clear definition of the break point between
the principals components that have the information and the trivial ones. The scree test has
a tendency to overestimate (29). In figure 3.5 the number of principal component to retain
suggested with this method are between three and four, depends on where is considered the
break point.
Figure 3.5: Eigenvalue vs. Principal Component - Graph to determine the number of
principal component to retain.
3.4.2 Statistical Charts
One of the key issues for the development of a MSPC statistical chart is to have samples
of the process that were functioning within the specifications of the product quality. The
success of this monitoring chart has the basis in that many variables of the process are highly
correlated, therefore, linear combination of the correlated variables can be performed and the
dimensionality of the problem is reduce since the new linear variables explain the process (8, 30).
The PCA statistical charts can detect if a process is out of its control zone, that is, if it is a
faulty process. The T 2 statistic measures the variation of a new process inside the PCA model
and the Q statistic measures if the process is inside the projection of the PCA model.
16
3.4 Principal Component Analysis
3.4.2.1 Squared Prediction Error or Q Statistic Chart
The squared prediction error (SPE) or the Q statistic chart measure the distance between the
projection space of the PCA model and the new sample that is projected in the model. If the
sample is different from the cases used to build the PCA model, then the sample will move
away from the plane (31).
The Q is simply the sum of squares of each row (sample) of E, for example, for the ith
sample in X, xi:
Qi = eieTi = xi(I−PkP
Tk )xTi (3.9)
where ei is the ith row of E, Pk is the matrix of the first k loadings vectors retained in the
PCA model (where each vector is a column of Pk) and I is the identity matrix of appropriate
size (n by n).
Confidence limits can be calculated for Q, provided that all of the eigenvalues of the covari-
ance matrix of X, the λi, have been obtained:
Qα = Θ1
[cα
√2Θ2h20Θ1
+ 1 +Θ2h0 (h0 − 1)
Θ21
] 1h0
(3.10)
where
Θi =
n∑j=k+1
λij for i = 1, 2, 3 (3.11)
and
h0 = 1− 2Θ1Θ3
3Θ22
(3.12)
In equation 3.10, cα is the standard normal deviate corresponding to the upper (1-α) per-
centile. In equation 3.11, k is the number of principal components retained in the model and
n is the total number of principal components. Thus, n is less than or equal to the smaller of
the number of variables or samples in X.
3.4.2.2 Hotelling’s T 2 Statistic Chart
The sum of normalized squared scores, Hotelling’s T 2 statistic, measure the distance between
the mean position of the process with the new sample. Is a measure of the variation in each
sample within the PCA model.
17
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
T 2i = tiλ
−1tTi = xiPλ−1PTxTi (3.13)
where ti in this instance refers to the ith row of Tk, the matrix of k scores vectors from the
PCA model. The matrix λ−1 is a diagonal matrix containing the inverse eigenvalues associated
with the k eigenvectors (principal components) retained in the model.
Statistical confidence limits for the values for T 2 can be calculated by means of the F-
distribution as follows
T 2k,m,α =
k(m− 1)
m− kFk,m−k,α (3.14)
here m is the number of samples used to develop the PCA model and k is the number of
principal component vectors retained in the model.
3.4.3 Schematic Interpretation of PCA
A PCA interpretation for process variables are presented in figure 3.6. The deviations from the
nominal trajectories of five variables are shown. The variables x1, x3 and x4 shows approxi-
mately the same patterns; the x4 variable have some peaks, probably outliers. The variables x2
and x5 also shows approximately the same pattern in both of them. Therefore, the variables x1,
x3 and x4 are highly correlated and a new variable, principal component t1 can be created. The
variables x2 and x5 are highly correlated too, a principal component t2 can be created. The
first principal component correspond to the largest number of correlated variables, the second
principal component to the next largest number of correlated variables.
Figure 3.6: A Simplified Representation of PCA - Source (30)
A PCA model is presented in figure 3.7, were the first and second principal components
are the blue lines, while an unusual sample with T 2 and Q are the red circles and the samples
within control are the green circles.
18
3.4 Principal Component Analysis
Figure 3.7: PCA Model Schematic - Source (26)
3.4.4 Contribution Plots
The statistical charts do not give information of which process variables caused the process to
be faulty. Contribution plots gives information on how the variables interact in the process. In
a faulty process the contribution plot is used to observed which variables of the process caused
the low-quality of the released product, variables with the highest contribution magnitude
(31, 32, 33). The most common indexes used for fault diagnosis with contribution plots are T 2
and Q.
If the contribution of a particular score variable towards the T 2 statistic is abnormally large,
the individual contribution of the jth process variable to the ith score variable, c(ti)j , can be
determined as follows:
c(ti)j =
pijxjpTi x
λi= pijxj
tiλi
(3.15)
where ti and λi represent the value and the variance of the ith score variable, respectively, pij
is the element in the ith row and the jth column of the matrix P, pi is the ith column vector
of P, z is the current data vector and zj is the value of the jth process variable.
The contribution of the jth variable to the Q statistic can be obtained as follows:
c(Q)j = ΦTj x (3.16)
19
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
where ΦTj is the jth row of the matrix IN+M −PPT and IN+M represents and N+M identity
matrix.
3.5 Unfold-PCA for Batch Processes
Principal component analysis is a technique of MSPC that identifies process data patterns
through the correlation of variables. With PCA the vast number of variables in a process
is reduce by creating new variables that represent the linear combination of the correlated
variables (8). PCA is applied to continuous processes where the measured data is arranged
in a 2D matrix, the rows represents the time and the columns the different variables. Batch
processes are finished in a finite time and the data measured from the process is arranged in a
3D matrix (figure 3.8).
Figure 3.8: 3D matrix data. - Measured batch process data arranged as a 3D matrix.
Unfold principal component analysis is a technique that converts a 3D matrix of a batch
process into a 2D matrix to be treated with PCA, a technique developed by Nomikos and
MacGregor as multiway principal component analysis (MPCA) (9), lately known as U-PCA.
Batch-wise unfolding turns the 3D matrix (IxJxK) into a 2D matrix (IxJK), where the i =
1, 2, ..., I are the processed batches, j = 1, 2, ..., J are the variables of the process and k = 1,
2, ..., K is the duration of the process. The columns of the resulting matrix are mean centered
and scaled to unit variance (figure 3.9).
In U-PCA the array X is decomposed as the summation of the product of score vectors (t)
and loading matrices (P) plus a residual array E that is minimized in a least squares sense:
X =
R∑r=1
tr ⊗Pr + E (3.17)
20
3.5 Unfold-PCA for Batch Processes
Figure 3.9: Batch wise unfold. - Unfolding a 3D matrix into a 2D matrix.
U-PCA is statistically and algorithmically consistent with PCA, therefore, the principal
components to retain from section 3.4.1), the statistical charts from section 3.4.2 and the
contributions plots from section 3.4.4, uses the same theory and methodology to perform the
same task with the U-PCA.
3.5.1 Applications
In the chemical industry batch and semi-batch process are of great demand because of the high
quality products. There are used in reactors, crystallization, distillation, injection molding
processes, the manufacture of polymers and more chemical related industry. One of the charac-
teristic for batch processes is that the processing of the materials are in a prescribed sequence
for a finite duration. The achievement is to reproduce the prescribed recipe from batch to batch
(13).
The MSPC based on U-PCA have been successful to analyze the industrial data with their
statistics charts in the monitoring of real-time processes, completed batches, and for on-line
monitoring (34).
There are several companies that use the methodology as a real-time release monitoring
system for released products. When the batch is finished the data obtained from the process
are passed through the PCA model to observe if the sample is within the control limits. If the
product is beyond the control limits is sent to the laboratory to obtained a diagnosis of the
21
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
problem. With this kind of procedure companies saves money and time; money because if the
process continues and the laboratory analysis of the product last few hours, then the next run
batches could have the same problem if there are any problem with the batch; time because the
process does not need to wait for the laboratory analysis to know if its a high quality product
(30).
To perform a batch process monitoring first is needed a set of historical data where the
batches were in normal operation condition (NOC) . With the NOC batches a preliminary
PCA model is built. During the analysis of the NOC PCA model if there are batches that
present disturbances or does not belong to the NOC batches, then those batches considered as
outliers are removed from the set. After removing the outliers, a new PCA model is built and
new batches are projected to test the consistency of the model.
A historical abnormal operation condition (AOC) and a NOC batch are projected in the
NOC PCA model for testing. After the projection the Q and the T 2 of both batches are
calculated and compared with corresponded statistic limit of the model. If the NOC PCA
model is correct, the NOC test batch will be below the statistic limits and the AOC test batch
above the statistic limits. To diagnose the AOC test batch contribution plots can be calculated
to find the cause of the abnormal process (35, 36).
SBR processes are widely used to treat wastewater with high concentration of nutrients
(nitrogen, phosphorous) and toxic compounds from domestic and industrial sources. Variations
in the concentration of the wastewater influent can lead to low quality effluent because those
changes affect directly the biological reaction of the process. Therefore, early fault detection
are needed to correct the biological process since such processes may take few days to recover
from an abnormal state (37).
Studies related with MSPC, PCA and SBR wastewater treatment plant could be found in
(38, 39, 40, 41).
3.6 Conclusions
In this chapter the evolution of the techniques to monitoring industry processes is presented.
The Shewhart control chart was the first SPC technique used to control the behavior of a process
and reduce the quantity of low quality products, this technique was implemented widely in the
industry. Almost three decades after, Hotelling proposed the multivariate process control,
where the control charts tries to find correlation between the variables of the process. The
22
3.6 Conclusions
principal component analysis, a method developed in 1900, is one of the techniques used for
monitoring, diagnosis and control of todays industries, where large amount of variable are
needed to be controlled. The methods to retain principal components for building a PCA
model, the recognition of outliers, the detection of faulty products with the statistical charts
and the diagnosis with the contribution plot have demonstrated great results in the industry.
The unfold-PCA, a technique mathematically and algorithmically consistent with PCA, applied
to batch process have demonstrated its great results applied in the chemical industry in the
detection of faulty batches and its diagnosis, and the reduction in lost of raw materials and
low quality product. The diagnosis with the contribution plots is a field of PCA that is not
studied widely. If the contribution of a faulty batch, from a small process, is observed at naked
eye, probably the expert in the process can make the diagnosis watching the contribution of
the variables through all the process. The batch processes are highly non-linear, therefore,
is difficult to have an expert that can read the contribution of the process and relate the
contribution of the measured variables with the quality variables. New methods have to be
developed to improve the diagnosis.
23
3. MULTIVARIATE STATISTICAL PROCESS CONTROL
24
Chapter 4
Pilot Plant Description and
Statistical Modelling
4.1 Pilot Plant
The pilot plant is an SBR for wastewater treatment with the capability to eliminate organic
matter (C) , ammonium (NH+4 ) , nitrates (NO−
2 orNO−3 ) and phosphate (PO3−
4 ) (figure 4.1).
In continuous systems the reaction and settling occur in different reactors, but in the SBR all the
processes are conducted in a single reactor following a sequence of stages: fill, reaction, settling
and draw. The stages of the batch configuration depends on the wastewater characteristics and
the legal requirements (11).
Figure 4.1: Pilot Plant - Sequencing batch reactor for wastewater treatment.
The pilot plant is located in the LEQUIA laboratory at the University of Girona (Cataloni-
Spain). The maximum capacity of the SBR is 30 liters. The influent wastewater is synthetic,
25
4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING
is a blend of carbon source, ammonium solution, phosphate buffer, alkalinity control and mi-
croelements solution. The influent wastewater is stored in a store tank with a capacity of 150
liters. The temperature inside the store tank is 4◦C to minimize the microbial activity. This
reactor is located in a thermo-regulated room at 20◦C.
To monitor essential variables, the SBR process is equipped with pH (EPH-M10), dissolved
oxygen (DO) (WTW OXI 340), oxidation reduction potential (ORP) (ORP M10) and Temper-
ature (Temp) (PT 100) Endress-Hauser probes.
The SBR cycle is composed of four section: biological reaction, wastage, settling and draw-
ing. The study will focus in the biological reaction that is composed of six stages: first fill (F1),
anaerobic condition (ANA), first aerobic condition (AE1), second fill (F2), anoxic condition
(ANO) and second aerobic condition (AE2).
4.2 Analysis of Historical Data
The historical data from the SBR process are composed of 266 batch cases associated with
their respective BNR and global quality removal (GQR) for the wastewater processed provided
by the chemical laboratory, table 4.1. The quality specifications are according to the Euro-
pean Community Council Directive (11), table 4.2. Extra information is provided from off-line
analysis in (42).
BatchesBiological Nutrient Removal Global Quality
C NH+4 NO−
2 orNO−3 PO3−
4 Removal
93 X X X X High
58 X × X X Medium
24 X X X × Medium
91 X X • × Low
Table 4.1: Chemical analysis of BNR and GQR.
X = high quality removal. •= medium quality removal. × = low quality removal.
The duration of the different stages of the biological reaction are composed as follow: 10
minutes for F1, 150 minutes for ANA, 100 minutes for AE1, 11 minutes for F2, 75 minutes for
ANO and 78 minutes for AE2. The data collected from the process has a sample every minute.
Since there are four sensors to measure the process, the 3D matrix will have 266 batches in the
I axis, 4 variables in the J axis and 424 instances of time in the K axis (see 3D matrix data in
figure 3.8).
26
4.3 PCA Model
Biological Nutrient C (NH+4 ) (NO−
2 orNO−3 ) (PO3−
4 )
Removal mgCOD/L mgN/L mgN/L mgP/L
High < 84 < 6, 7 < 3, 3 < 0, 9
Medium 84 - 125 6,7 - 10 3,3 - 5 0,9 - 2
Low > 125 > 10 > 5 > 2
Table 4.2: Standard Levels for Nutrients Removal.
4.3 PCA Model
The 93 cases of high GQR were analyzed to build a PCA model that can detect the medium
and low GQR cases. The cumulative variance expected to retained is 70% or higher and the raw
data are going to be unfolded in batch-wise, figure 3.9. The pre-processing method used for the
unfolded data was the block/group scaling method (32), there were 9 cases considered as outliers
(20), figure 4.2. A PCA model composed of 84 GQR batches (figure 4.3) with three principal
component retaining and explaining 72,81% of cumulative variance was obtained, figure 4.4.
Figure 4.2: Pre-process 93 High GQR Batches - Block/group scaling of the 93 high GQR.
The black circles shows some outliers.
4.4 Statistical Chart
The statistical charts are used to detect faulty batches of the process. To identify if a batch is
faulty, the first statistical chart to project the released batch is the Q statistic. If the released
27
4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING
Figure 4.3: Pre-process 84 High GQR Batches - Block/group scaling of the 84 high GQR
batches without some outliers.
Figure 4.4: PCA Model 84 Batches - Model with 84 GQR batches and three principal
component.
28
4.4 Statistical Chart
batch is detected as a faulty batch, then theres no need to project the batch in the T 2 statistic.
If the released batch is between the confidence limits of the Q statistic then the released batch
is projected in the T 2 statistic. If the batch is between the confidence limits of the T 2 statistic
then the batch is within the requirements, otherwise the batch is faulty.
The first task is to observe if the Q statistic chart of the 84 high GQR PCA model can
detect the medium GQR and the low GQR batches. If the PCA model meets the requirements
for the Q statistic to detect both group, then the following task is to observe if there are any
suggestion or clue that could provide information in which group belongs the faulty batch.
In the following figures, figure 4.5 and figure 4.6, the gray circles are the 84 high GQR
batches that were used to build the PCA model and the inverted triangles are the 82 medium
GQR batches (figure 4.5) and the 91 low GQR batches (figure 4.6). In both figures the medium
GQR and the low GQR are above the confidence limit of the Q statistic, meaning that all the
batches are faulty.
Figure 4.5: Q Statistic Chart for Medium GQR - 82 medium GQR batches projected in
the Q statistic chart of the 84 high GQR PCA model.
The medium GQR and the low GQR batches were projected together in the Q statistic chart
to observe if there are any pattern or hint that could lead to identify to which GQR group a
batch belongs (figure 4.7).
In figure 4.7 all the batches are above the confidence limit of the Q statistic chart, as were
before in figure 4.5 and figure 4.6, but the batches from both group are virtually in every
29
4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING
Figure 4.6: Q Statistic Chart for Low GQR - 91 low GQR batches projected in the Q
statistic chart of the 84 high GQR PCA model.
Figure 4.7: Q Statistic Chart for Medium and Low GQR Batches - 82 medium GQR
and 91 low GQR batches projected in the Q statistic chart of the 84 high GQR PCA model.
30
4.5 Contribution Plots
position of the Q residual axis. It is clear that further analysis should be performed to identify
whether a batch is medium GQR or low GQR.
4.5 Contribution Plots
To diagnose a batch process a contribution plot of the batch is calculated. The contribution
plot is used to observe which variable or variables of the process caused the faulty process. If a
variable has a higher magnitude value than the others, probably that variable or variables are
causing the failure in the process.
Figure 4.8: Q Contribution Plot - Q contribution plot of a low GQR batch
In figure 4.8 the variables with the highest magnitude with respect to the others are the
pH in the ANO stage and the DO in the AE1 and AE2 stage, therefore this variables are
investigated, probably they are the ones that contribute the most to the faulty process. If
the conclusion for the diagnosis of the faulty batch relays in the variables with the highest
magnitude, it can be incur in taking the wrong actions to change the behavior of the process
and making the process goes from a wrong behavior state to a even more wrong behavior.
The issue lies in how the variables are supposed to contribute in a process. In the q con-
tribution plot in figure 4.8 maybe the variables with the highest magnitude are supposed to
contribute in that way in the process to be within the requirements.
31
4. PILOT PLANT DESCRIPTION AND STATISTICAL MODELLING
4.6 Conclusions
This chapter describe the pilot plant, a sequencing batch reactor for wastewater treatment.
The batch process is applied to remove critical nutrients of the wastewater, organic matter,
ammonium, nitrates and phosphate. There are 93 historical batches with normal behavior that
are used to build the PCA model. In the training phase 9 batches were considered outliers.
The 84 normal batches PCA model with the Q statistic chart could detected the 173 abnormal
historical batches. The contribution plot of a faulty batch suggest that the ANO stage of the pH
variable and the AE1 and AE2 stage of the DO variable are probably the series of instances of
the process that contributes to make the process faulty. At naked eye probably the assumption
is right, but developing a method or a tool to be certain of the assumption is necessary.
32
Chapter 5
New Methodology for Intelligent
Contribution Analysis
When a process is flawed it is important to know its behavior, and which factors were responsible
for the low-quality product. Occasionally, when there are too many factors involved in a faulty
process, the task of classifying the type of failure is difficult. The fault diagnosis of batch
processes is widely studied to prevent failure in the released product, where process misbehavior
is introduced for simulation and prediction results (43).
In recent years the development of techniques for fault detection and diagnosis in batch
processes have been widely used as real-time tools to prevent further releases of low quality
products. Analysis techniques have been proposed in previous studies to monitor the process
of an SBR for wastewater treatment (39, 40, 44), these works are mainly focused on fault
detection. Furthermore, systems capable of estimating quality variables of the process have
been developed using artificial neural networks (43, 45) and in some cases combined with PCA
(46, 47).
The experiments performed by the studies related in estimating the qualitative variables of
the released product has different mixing effluent highly controlled and in few cases have sensors
for the quality variables. In this study the purchased or expensive sensors to measure the
quality variables was discarded due the amount of available budget. Therefore, the techniques
or methods used in other studies could not be executed in this study as in those experiments
presented, because the amount of well controlled data was not enough. One of the key point in
not having controlled data was to resemble the behavior of real wastewater, where the influent is
always different depending on different situations as the weather condition, industry wastewater,
33
5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS
urban wastewater and issues that affect the wastewater influent.
Since small budgets in projects restricts certain types of experiments that can be executed,
the development of new ideas to obtain the knowledge required of the process is needed. The
method proposed in this study is to create a fault signature (FS) to predict the diagnosis of
the quality variables for the faulty batches. The FS will represent the behavior of the stages
through each variable with the information gathered on how the variables contribute to the
process. To obtain the behavior of the variables in the stages a contribution limit chart is
developed.
If prediction of the faulty released batches is the objective in this study, a dataset of historical
faulty batches will be needed to associate the behavior of those faulty batches with the behavior
of the batches that needs to be diagnose. Therefore, a fault signature dataset (FSD) with all the
FS of the faulty batches associated with their respective chemical analysis of quality variable
have to be built. To obtained the knowledge of the FSD different classification and rule induction
algorithms are going to be applied to the dataset.
5.1 Contribution Limit Chart
The contribution limit chart are developed to compare the contribution plots against a threshold
for the contribution of the variables. The objective of the limit chart is to better detect the
variables that cause the process to be faulty.
For every time instant of the PCA model batches the mean and the standard deviation
of the contributions are calculated. Then the upper contribution limit (UCL) and the lower
contribution limit (LCL) for the new contribution limit chart will be the mean plus/minus three
times the standard deviation (equation 5.1 and 5.2).
UCL(y) = my + 3std (5.1)
LCL(y) = my − 3std (5.2)
where y is the T 2 or Q for whom the limit is built, my is the mean and std is the standard
deviation.
34
5.2 Improving the Contribution Limit Chart
5.2 Improving the Contribution Limit Chart
The following methods were proposed to performed a wiser diagnosis. In the analysis of the
faulty batches with the contribution limit chart, there were situations when one contribution
in any stage had a very high magnitude, probably cause from wrongly sensor measurements,
electrical disturbances or any other incidence that could affect the measurements of the process.
Despite all the efforts implemented with the following methods, the results obtained did not
provided great expectation to go further with implementation of these method for this study.
5.2.1 Modify Cumulative Sum
The cumulative sum (CUSUM) are charts used in SPC to detect small shifts in the mean value
of a continuous process (48). The standard deviation for each sample of the process is projected
in the CUSUM chart to display how the samples are shifting from the mean value and a sum
of the standard deviation for the previous samples with the new sample is calculated. A set of
rules describe the warnings when the sum of the deviation are above or below the threshold of
the mean value, meaning that the process is likely to change or release faulty products.
The proposal was to incorporate the CUSUM with the contribution limit of the contribution
limit chart. In this modification of the CUSUM, the contribution limit will act as the mean
value of the process and the quantity, on how many times the standard deviation is above the
contribution limit, the value to do the summation. The summation will be made for each stage
in the different variables. The value will be the one to use for the diagnosis of the batches.
After all the study conducted, the information gathered did not help to performed the diagnosis.
The method could work if the correction of the process is performed on-line. In this study the
correction of the process is performed after the diagnosis of a released batch, real-time diagnosis.
5.2.2 Sum of Standard Deviation and Stage Mean
The proposal was to sum how many standard deviation the contributions in a stage are outside
the contribution limits and compare the value with the mean period length of the corresponding
stage. If the value of the sum of the contributions outside the threshold is greater than the
stage length mean, then the stage is considered as faulty. After all the different tests to verify
the quality of the proposal, the results were not encouraging.
35
5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS
5.2.3 Sum of Standard Deviation with Statistic Range
In this proposed method the value of the summation of the contributions is the same as section
5.2.2, the difference is that a statistic range is calculated for the stage, where the highest and
lower values of the contribution limit in the stage are used to obtain the statistic range. If the
value of the sum of standard deviations are outside the range, then the stage is considered as
faulty.
5.3 Fault Signature
The objective of the FS is to create a vector that will represent the behavior of the stages
through each variable of a faulty batch process thanks to the analysis of the contribution limit
chart, which would provide information on how the variables contributed to the process and,
at the same time, reduce the dimensionality of the information that should be analyzed.
In batch processes l = 1, 2, ..., L stages need to be completed to achieve the final product.
So, the summation of all the individual stage durations (βl) must be equal to the K duration
time of the process, as in equation (5.3):
L∑l=1
βl = K (5.3)
The proposal to reduce the dimensionality is to obtain an indicator in each stage for each
variable. A vector containing all the indicators obtained in this way will be the FS. If there are
L stages in the process that need to be completed and J variables that are analyzed, JL will be
the length of the FS vector. In this way, the FS will represent the faulty process with a vector
of JL fields, where JL << JK.
5.3.1 Binary Indicator for Fault Signature
The FS indicators representing the behavior of the stages obtained through the analysis of the
contribution limit chart will be binary values.
In the analysis of the contribution plot for a faulty batch projected in the contribution limit
chart, if the contribution in any instance exceeds the UCL and LCL thresholds, it is counted as
an event. If the total number of events in a stage is equal to or less to a given percentage of the
length of that stage, the indicator of the variable for the stage would be normal (0); otherwise,
it would be abnormal (1).
36
5.4 Diagnosis with the Fault Signature
One of the issues with this proposal is that implies that each stage has an equal contribution
importance to the process and can incur in a loss of information and consequently a misdiagnosis
of the batch. Also, the choice of a percentage is very relative since there are no method to choose
the correct percentage for the limit.
5.3.2 Numeric Indicator for Fault Signature
The FS indicators representing the behavior of the stages will be the instances outside the
thresholds of the contribution limits.
During the analysis of the contribution plot for a faulty batch projected in the contribution
limit chart, if the contribution in any instance exceeds the UCL and LCL thresholds, it is
counted as an event. Then, at the end of the stage the quantity of the instances outside the
thresholds will be the indicator representing the behavior of the stage.
The advantage with this proposal is that the indicator have different quantity of maximum
range (length period) allowed in the different stages and does not imply that all the stages
contributes equally to the process, therefore, a better FS can be obtained to diagnose a faulty
batch.
5.4 Diagnosis with the Fault Signature
The FS provides the information on how the variables of the faulty batch contribute to the
different stages of the process. The objective is to build a FSD with historical faulty batch
processes associated with their respective quality variable analysis that will be used to diagnose
future batch releases. The integration of statistical methods with expert system has been
proposed to deal with the difficulties of diagnosing faulty process (49, 50).
Since the FSD can have high number of cases and the FS have high number of fields,
knowledge of the FSD can be obtained with rule induction and classification algorithms that
are machine learning tools used to find patterns in databases and classify new events. Given
an input data set, the FSD, the algorithm searches of the best description instances to map the
classes of an output dataset, quality variable. The algorithm will be applied to the FSD and
will deliver a set of rules or a knowledge model to help to predict the diagnosis of the quality
variables in future batch releases.
The rule induction algorithms (RIA) used in this study are the CN2 algorithm (51), an
induction algorithm that combines the ID3 and AQ algorithms to generate IF-THEN rules,
37
5. NEW METHODOLOGY FOR INTELLIGENT CONTRIBUTION ANALYSIS
and the PART algorithm (52), that create rules from decision trees and use the separate-and-
conquer rule-learning technique. After applying the RIA to the FSD an ordered rules set are
provided and are used to diagnose new faulty batches.
The classification algorithms used are the IB1 algorithm (53), an instance based learner that
use the nearest neighbor as a distant measure, and the KStar algorithm (54), an instant-based
learner that use entropy as a distance measure. After applying the algorithms to the FSD a
classification model is provided and then used to diagnose new faulty batches.
5.5 Conclusions
A new methodology is proposed to do a better diagnosis of a faulty batch. A contribution
limit chart was developed to observe the contribution behavior of a batch. The contribution
limit are create from the contribution of the batches that composed the PCA model, then the
contribution of a doubtful batch is projected in the chart to observe if the contributions are
within the limits or not. The contribution limit chart provide information in which stages
of the process the contribution were outside the thresholds, but the diagnosis task at naked
eye still is difficult. Classification techniques could be used to make the task easier, the issue
lies in with the quantity of instances that need to be analyzed by the classification algorithm.
Therefore, a fault signature that contains the information of the behavior of the contributions
was proposed. The fault signature reduces the instances that need to be analyzed by the
classification algorithms and the diagnosis estimation of future batches can be achieve.
38
Chapter 6
Intelligent Contribution Analysis
for Fault Diagnosis
In chapter 4 the pilot plant for wastewater treatment and historical data from the process
were introduce. A PCA model to discern between the normal and abnormal batches was built.
The Q statistic chart was able to detect the faulty batches of the medium and low GQR, and
contribution plots of the of a faulty batch was calculated. Neither the statistical chart or the
contribution plot gave hints to discern to which GQR group the faulty batch belong. Therefore,
in chapter 5 a new methodology to predict the quality variables of a batch process is proposed.
The diagnosis of the GQR of the batches presented in table 4.1 are presented in this chapter
using the methodology proposed in chapter 5 and implementing the binary indicator for the
fault signature explained in section 5.3.1. The diagnosis of the fault signatures are going to be
executed with the rules set of the RIA explained in section 5.4.
6.1 Historical Data
The historical data from the SBR process are composed of 266 batch cases associated with their
respective BNR and GQR for the wastewater processed provided by the chemical laboratory.
The GQR diagnosis of the historical data in table 4.1 was redefined as NOC for the 93 high
GQR batches and AOC for the 82 medium and 91 low GQR batches. The AOC batches are
composed as follow: 58 medium GQR batches as AOC1, 24 medium GQR batches as AOC2
and 91 low GQR batches as AOC3.
39
6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS
BatchesBiological Nutrient Removal Global Quality
C NH+4 NO−
2 orNO−3 PO3−
4 Removal
93 X X X X NOC
58 X × X X AOC1
24 X X X × AOC2
91 X X • × AOC3
Table 6.1: Chemical analysis of BNR and GQR.
X = high quality removal. •= medium quality removal. × = low quality removal.
6.2 PCA Model
In the procedure for this method, 23 batches were considered as outliers to have a better model
to detect AOC batches and build better contribution limits (figure 6.1). The PCA model is
composed of 70 NOC batches with three principal component retained and explaining 75,60%
of cumulative variance (figure 6.2) and the Q residuals for the statistic threshold is 24,40%
(figure 6.3). In the first model built in section 4.3, the cumulative variance was 72,81%; and in
section 4.4 the Q residuals for the statistic threshold was 27,19%.
Figure 6.1: Pre-process 70 NOC Batches - Block/group scaling of the 70 NOC batches
without 23 outliers batches.
In figure 6.3 the circle are the 70 NOC batches, while the inverted triangular are the 173
batches. As seen, all the AOC batches were detected as faulty. The difference with the Q
statistic of the 70 NOC PCA model batches with the Q statistic of the 84 high PCA model
40
6.2 PCA Model
Figure 6.2: PCA Model 70 Batches - Model with 70 NOC batches and three principal
component.
Figure 6.3: Q Statistic Chart for AOC Batches - 173 AOC batches projected in the Q
statistic chart of the 70 NOC PCA model.
41
6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS
batches can be seen in figures 6.3 and 4.4, respectively, where the circle that represent the
batches of the PCA model are below the Q statistic threshold in figure 6.3, while in figure
4.4 there are few batches above the Q statistical threshold, those batches were considered as
outliers for the PCA model with 70 NOC batches.
6.3 Contribution Limit Chart and Binary Fault Signature
The contribution plot of the faulty batch is projected in the contribution limit chart and each
time step is compared against the threshold. In this procedure the binary indicator of section
5.3.1 is used. As explained in section 5.3.1, a counter would save the number of instances when
a contribution is outside the thresholds of the contribution limit in each stage. If the value of
the counter is more than 5% the length of the stage, then the indicator of the stage in that
variable is abnormal (1), otherwise normal (0). In the figure 6.4 the FS for the faulty batch can
be observed.
Figure 6.4: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart
of the 70 NOC PCA model.
In figure 6.4 the FS is composed of 24 fields where every 6 fields (stages of the process) a
new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the
42
6.4 Diagnosis with the Binary Fault Signature
ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances
outside the threshold producing the faulty batch.
6.4 Diagnosis with the Binary Fault Signature
In this process the FS can have 576 different sequences, therefore, a FSD containing the AOC
batches with their respective GQR is built. Rule induction algorithms are used to find patterns
in the FSD and deliver a set of rules to diagnose a batch released. There are two RIA that are
going to be used to build the rules to diagnose the released batches: CN2 and PART algorithms.
To test the proposed method with the RIA a training set and a validation set will be created
from the FSD of all AOC batches. The AOC sets are divided randomly and are composed as
follow:
• the training set is composed of 29 AOC1 batches, 12 AOC2 batches and 45 AOC3 batches;
• the validation set is composed of 29 AOC1 batches, 12 AOC2 batches and 46 ACO3
batches.
The RIA CN2 was applied to the training set and the algorithm provided a set of 15 rules
to diagnose the batches of the validation set. The rules obtained are IF - THEN rules. When
the system is ready to evaluate a batch process, it would check the diagnosis of the 24 indica-
tors from the FS. IF the indicators present a combination that is equal to a rule from the set,
THEN the result of that batch is the diagnosis that the algorithm induced from the training
set. Below are 3 examples rules for the 3 different GQR discovered by the CN2 algorithm:
Rule Ex.1:
IF pH AE1 = 1 AND Temp AE1 = 0 AND Temp ANO = 0
THEN Diagnosis = AOC1
Rule Ex.2:
IF pH F1 = 0 AND O2 F1 = 1 AND Temp ANA = 1
THEN Diagnosis = AOC2
Rule Ex.3:
IF O2 F2 = 0 AND Temp ANA = 1 AND Temp AE1 = 1
THEN Diagnosis = AOC3
43
6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS
CN2 CasesCorrectly Classified Unclassified
Cases % Cases %
AOC1 29 29 100 0 0
AOC2 12 9 75,00 3 25,00
AOC3 46 36 78,26 10 21,74
Total 87 74 85,06 13 14,94
Table 6.2: Diagnosis results obtained with the rules set of the CN2 algorithm
In table 6.2 the diagnosis results with the CN2 algorithm for the batches of the validation
set are shown. The first and second column presents the GQR and the cases, respectively. The
third and fourth column shows the cases correctly classified and the percentage of classification.
The last two column are the cases that were not classified and their percentage. The total
classification percentage for the validation set is 85,06% of correct classification.
Examples rules for the 3 different GQR discovered by the PART algorithm are following
presented:
Rule Ex.1:
IF pH AE1 = 1 AND Temp F2 = 0
THEN Diagnosis = AOC1
Rule Ex.2:
IF O2 F2 = 1 AND Temp ANO = 1 AND Temp AE2 = 0
THEN Diagnosis = AOC2
Rule Ex.3:
IF pH ANA = 0 AND O2 ANO = 0 AND Temp AE1 = 1
THEN Diagnosis = AOC3
CN2 CasesCorrectly Classified Unclassified
Cases % Cases %
AOC1 29 28 96,55 1 3,45
AOC2 12 10 83,33 2 16,67
AOC3 46 36 78,26 10 21,74
Total 87 74 85,06 13 14,94
Table 6.3: Diagnosis results obtained with the rules set of the PART algorithm
44
6.4 Diagnosis with the Binary Fault Signature
In table 6.2 the diagnosis results with the PART algorithm for the batches of the validation
set are shown. The first and second column presents the GQR and the cases, respectively. The
third and fourth column shows the cases correctly classified and the percentage of classification.
The last two column are the cases that were not classified and their percentage. The total
classification percentage for the validation set is 85,06% of correct classification.
The rule set obtained by the two algorithms, 15 with the CN2 and 16 with the PART
algorithm, provided a classification rate of 85,06%, meaning that 74 batches had a correct
classification of a total of 87 batches. But, the classification rate differs in two GQR. For
instance, in table 6.2 the AOC1 has a classification rate of 100%, while in table 6.3 the AOC1
has a classification rate of 96,55%. Meanwhile, the AOC2 has a classification rate of 75% in
table 6.2 and a classification rate of 83,33% in table 6.3. Depending which group is more critical,
the better classification rate obtained with a RIA is the one that is going to be applied.
Figure 6.5: Application Window - An AOC1 batch with the proposed methods for diagnosis.
The figure 6.5 is the window with the result of the application to diagnose the faulty batches.
At the top of the window an AOC1 batch is projected in the contribution limit chart of the 70
NOC PCA model. At the bottom of the window, in the left corner the FS, the red circle for
45
6. INTELLIGENT CONTRIBUTION ANALYSIS FOR FAULT DIAGNOSIS
an abnormal stage and the blank circle for a normal stage. In the middle the BNR where the
green circle is for a high removal, the yellow circle for a medium removal and the red circle for
a low removal. And at the right corner, the GQR diagnosis of the batch.
6.5 Conclusions
A PCA model composed of 70 NOC batches was created, 23 of the total NOC batches were
considered as outliers. The Q statistic detect all the AOC batches. As observed in the chapter,
an AOC batch was projected in the contribution limit chart and a fault signature using the
binary indicators was obtained. To follow the procedure of the methodology to estimate the
diagnosis of the global quality removal, each AOC set was divided in half randomly, one half
was used as a training set and the other one as validation set. The rule induction algorithms
were applied to the training set and a set of rules were obtained. After the validation set was
pass through the PCA model for detection, fault signatures of the new batches of the validation
set were created and then these fault signature passed through the rules sets of the algorithms.
The estimation diagnosis for the global quality removal of the validation set give good results,
where the total classification rate with the rules set for the two algorithms were above 85% of
correct diagnosis.
46
Chapter 7
Intelligent Contribution Analysis
for Estimation of Quality
Variables
The good results obtained in the diagnosis of the faulty batches from chapter 6 have made delve
in the diagnosis task and develop new methods to obtained better diagnostic results. In this
chapter the challenge is to estimate each one of the quality variables of the process. The first
diagnosis will be performed with the binary indicator for the FS and the diagnosis the rules
set of the CN2 algorithm. The second and third diagnosis will be performed with the numeric
indicator for the FS, explained in section 5.3.2, the method proposed after the good results with
the binary indicator. The diagnosis of the faulty batches are performed with the classification
algorithms IB1 and KStar, explained in section 5.4.
7.1 Historical Data
The historical data from the SBR process are composed of 266 batch cases associated with their
respective BNR and GQR for the wastewater processed provided by the chemical laboratory
divided in 93 NOC batches and 173 AOC batches, can be found in table 6.1 of section 6.1.
The FSD of the AOC batches is going to be linked with the BNR of the four quality variables
of the process (table 6.1). There are 173 AOC batches and their BNR according to the effluent
quality are:
• organic matter (C): all the batches have high quality removal,
47
7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES
• ammonium (NH+4 ): 115 batches with high quality removal and 58 batches with low
quality removal,
• nitrates (NO−2 orNO
−3 ): 82 batches with high quality removal and 91 batches with medium
quality removal,
• phosphate (PO3−4 ): 58 batches with high quality removal and 115 batches with low quality
removal.
7.2 PCA Model
The PCA model for this procedure is the model presented in section 6.2, where the model
retains three principal component explaining 75,60% of cumulative variance (figure 6.2; and
the Q residuals for the statistic threshold is 24,40% (figure 6.3), all the AOC batches were
detected as faulty.
7.3 Binary Indicator for Fault Signature
7.3.1 Contribution Limit Chart and Binary Fault Signature
The contribution plot of the faulty batch is projected in the contribution limit chart and each
time step is compared against the threshold. In this procedure the binary indicator of section
5.3.1 is used. As explained in section 5.3.1, a counter would save the number of instances when
a contribution is outside the thresholds of the contribution limit in each stage. If the value of
the counter is more than 5% the length of the stage, then the indicator of the stage in that
variable is abnormal (1), otherwise normal (0). In the figure 6.4 the FS for the faulty batch can
be observed.
In figure 7.1 the FS is composed of 24 fields where every 6 fields (stages of the process) a
new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the
ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances
outside the threshold producing the faulty batch.
7.3.2 Diagnosis with the Binary Fault Signature
Since the organic matter has high quality removal in all batches, is not taken into account.
A training set composed of 87 random batches from the 173 AOC batches is created, the 86
remaining batches will be the validation set. The CN2 algorithm is applied to the training set
48
7.3 Binary Indicator for Fault Signature
Figure 7.1: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart
of the 70 NOC PCA model.
to obtained the rules set that will help to diagnose the validation set. To diagnose the three
quality variables that need to be measured for a faulty batch, rules set needs to be built for
each quality variable.
The diagnosis of the validation set after been evaluated with the rules set obtained from the
CN2 algorithm are shown in the tables 7.1, 7.2 and 7.3. There are five subdivision for each table:
in the first subdivision, the first column indicate the BNR quality; the second subdivision, the
second column the cases; the third subdivision (correct classification) show the cases correctly
classified from the total cases of the BNR quality from the first section and the second column
the percentage rate of correct classification; the fourth subdivision (wrong classification) shows
the cases that were wrong classified by the rules and what type of BNR quality was assigned to
the cases, the percentage rate of wrong classification can be seen in the fourth column; and the
fifth subdivision (unclassified) shows the cases that were not classified and the second column
indicate the percentage rate. If the sequence of indicators from the FS does not match a rule,
then the case is unclassified.
With the CN2 rule induction algorithm the correct classification rate for the ammonium
was 95,35%, 87,21% for the nitrates and 95,35% for the phosphate (tables 7.1, 7.2 and 7.3).
49
7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES
BNR CasesCorrect Classification Wrong Classification Unclassified
Classified % High Medium Low % Cases %
High 51 50 98,04 - - 1 1,96 - -
Medium - - - - - - - - -
Low 35 32 91,43 3 - - 8,57 - -
Total 86 82 95,35 3 - 1 4,65 - -
Table 7.1: CN2 diagnosis table for ammonium.
BNR CasesCorrect Classification Wrong Classification Unclassified
Classified % High Medium Low % Cases %
High 45 41 91,11 - 3 - 6,67 1 2,22
Medium 41 34 82,93 2 - - 4,88 5 12,20
Low - - - - - - - - -
Total 86 75 87,21 2 3 - 5,81 6 6,98
Table 7.2: CN2 diagnosis table for nitrates.
BNR CasesCorrect Classification Wrong Classification Unclassified
Classified % High Medium Low % Cases %
High 35 32 91,43 - - 3 8,57 - -
Medium - - - - - - - - -
Low 51 50 98,04 1 - - 1,96 - -
Total 86 82 95,35 1 - 3 4,65 - -
Table 7.3: CN2 diagnosis table for phosphate.
The total classification rate to estimate the diagnosis of the quality variables are above 87%,
while the total classification rate to estimate the diagnosis of the global quality in section 6.4
was above 85,06%. The difference in the classification rates is slightly better if its take into
account that each quality variable is estimated. Moreover, beside the classification rate of the
BNR medium cases in the nitrate quality variable, that has a classification rate of 82,93%, all
the other BNR in the different quality variables have classification above 91%.
50
7.4 Numeric Indicator for Fault Signature
7.4 Numeric Indicator for Fault Signature
7.4.1 Contribution Limit Chart and Numeric Fault Signature
The contribution plot of the faulty batch is projected in the contribution limit chart and each
time step is compared against the threshold. In this procedure the numeric indicator of section
5.3.2 is used. As explained in section 5.3.2, a counter would save the number of instances when
a contribution is outside the thresholds of the contribution limit chart in each stage. At the
end of the stage the indicator of the stage in the variable is the value of the counter. In the
figure 7.2 the FS for the faulty batch can be observed.
Figure 7.2: Fault Signature for AOC Batch - AOC Batch projected in the Q statistic chart
of the 70 NOC PCA model.
In figure 7.2 the FS is composed of 24 fields where every 6 fields (stages of the process) a
new variable start. It can be observed that the AE1 and AE2 stage of the pH variable and the
ANA, ANO and AE2 stage of the ORP variable were the ones with more contribution instances
outside the threshold producing the faulty batch and at the same time how many instances
the contribution were outside the thresholds, 9 instances for the AE1 and 18 instances for the
AE2 stage of the pH variable, and 84 instances for the ANA, 69 instances for the ANO and 67
instances for the AE2 stage of the ORP variable.
51
7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES
7.4.2 Diagnosis with the Numeric Fault Signature
Since the organic matter has high quality removal in all batches, is not taken into account.
A training set composed of 87 random batches from the 173 AOC batches is created, the 86
remaining batches will be the validation set. Two classification algorithms are applied to the
training set to obtained the knowledge model that will help to diagnose the validation set,
the IB1 and KStar algorithm, explained in section 5.4. To estimate the diagnosis of the three
quality variables that need to be measured for a faulty batch, knowledge model needs to be
built for each quality variable.
The diagnosis of the validation set after been evaluated with the knowledge model obtained
from the IB1 algorithm are shown in the tables 7.4, 7.5 and 7.6.
There are four subdivision for each table: in the first subdivision, the first column indicate
the BNR quality; in the second subdivision, the second column the cases; the third subdivision
(correct classification) show the cases correctly classified from the total cases of the BNR quality
from the first section and the second column the percentage rate of correct classification; and
the fourth subdivision (wrong classification) shows the cases that were wrong classified by the
knowledge model and what type of BNR quality was assigned to the cases, the percentage rate
of wrong classification can be seen in the fourth column.
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 51 50 98,04 - - 1 1,96
Medium - - - - - - -
Low 35 35 100 - - - -
Total 86 85 98,84 - - 1 1,16
Table 7.4: IB1 diagnosis table for ammonium.
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 45 45 100 - - - -
Medium 41 39 95,12 2 - - 4,88
Low - - - - - - -
Total 86 84 97,67 2 - - 2,33
Table 7.5: CN2 diagnosis table for nitrates.
52
7.4 Numeric Indicator for Fault Signature
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 35 35 100 - - - -
Medium - - - - - - -
Low 51 50 98,04 1 - - 1,96
Total 86 85 98,84 1 - - 1,16
Table 7.6: CN2 diagnosis table for phosphate.
In table 7.4 the total classification rate for the diagnosis of the ammonium nutrient is 98,84%,
the nitrates nutrient have a total classification rate of 97,67% and the phosphate nutrient a total
classification rate of 98,84%. In comparison, the estimated diagnosis for the quality variables
with the numeric indicator for the FS had a total classification rate above 97,67%, while the
estimated diagnosis for the quality variables with the binary indicator for the FS of section 7.3
had a top total classification rate of 95,35%, therefore, the method to obtain the indicator for
the FS proposed in section 5.3.2 provided a better estimation of the quality variables.
The diagnosis of the validation set after been evaluated with the knowledge model obtained
from the KStar algorithm are presented in tables 7.7, 7.8 and 7.9.
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 51 50 98,04 - - 1 1,96
Medium - - - - - - -
Low 35 35 100 - - - -
Total 86 85 98,84 - - 1 1,16
Table 7.7: IB1 diagnosis table for ammonium.
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 45 45 100 - - - -
Medium 41 39 95,12 2 - - 4,88
Low - - - - - - -
Total 86 84 97,67 2 - - 2,33
Table 7.8: CN2 diagnosis table for nitrates.
53
7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES
BNR CasesCorrect Classification Wrong Classification
Classified % High Medium Low %
High 35 35 100 - - - -
Medium - - - - - - -
Low 51 50 98,04 1 - - 1,96
Total 86 85 98,84 1 - - 1,16
Table 7.9: CN2 diagnosis table for phosphate.
The tables 7.7, 7.8 and 7.9 where the estimated diagnosis classification of the three nutrients
are found, shows that the total classification rate and the classification of the different BNR
cases with the KStar knowledge model are exactly the same. Probably, one of the reason for
the results is that both algorithms are based in finding the smaller distance between the nearest
neighbors.
Figure 7.3: Application Window - An AOC batch with the proposed methods for diagnosis.
The figure 7.3 is the window with the result of the application to diagnose the faulty batches.
At the top of the window an AOC batch is projected in the contribution limit chart of the 70
NOC PCA model. At the bottom of the window, in the left corner the FS with the numeric
indicator for each stage in the different variables. In the middle the BNR where the green
54
7.5 Conclusions
circle is for a high removal, the yellow circle for a medium removal and the red circle for a low
removal. And at the right corner, the standard levels for nutrient removals (11).
7.5 Conclusions
The new methodology was used to estimate the diagnosis of the quality variables of the process,
ammonium, nitrates and phosphate. The PCA model was the same one as in chapter 6. In
this chapter the 173 AOC batches were divided randomly in two sets, the training set and the
validation set. The binary indicator was used for the fault signature. The CN2 rule induction
algorithm was applied to the training set to obtained the rules set to estimate the diagnosis of
the quality variables. Since there are three quality variables the algorithm was applied to the
training set three times, the first time the input data (fault signature) with the quality removal
of the ammonium, the second time the input data (fault signature) with the quality removal
of the nitrates and the third time the input data (fault signature) with the quality removal of
the phosphate. After obtaining the rules set for the ammonium, nitrates and phosphate, the
validation set was passed through the system. The result of the estimation diagnosis of the
quality variables for the validation set were above 87,21% of correct diganosis.
To achieve better results with the methodology, the numeric indicator for the fault signature
was proposed. The same training set and validation set were used to test the new proposal.
In this occasion two classification algorithms were used to create a knowledge model, the IB1
and KStar algorithm. Each algorithm was applied to the training set three times, the first time
the input data (fault signature) with the quality removal of the ammonium, the second time
the input data (fault signature) with the quality removal of the nitrates and the third time the
input data (fault signature) with the quality removal of the phosphate. The knowledge models
were applied to the validation set and the estimate diagnosis for the quality variables for the
validation set were above 97,67% of correct diagnosis for both algorithms.
55
7. INTELLIGENT CONTRIBUTION ANALYSIS FOR ESTIMATION OFQUALITY VARIABLES
56
Chapter 8
Conclusions and Future Studies
In this study a new methodology to estimate the quality of a released batch using the measure-
ments of the variables of the process was proposed. The Contribution limits charts obtained
information on how the variables contributes in the different stages of a faulty process. A fault
signature was created as a tool to be used with classification algorithms. The fault signature
contain the information of the contribution behavior of a faulty process and at the same time
reduce the dimensionality of the instances that need to be analyzed. Two approaches to repre-
sent the behavior of the contributions in the fault signature were developed. The classification
algorithms search for patterns in a fault signature dataset composed of abnormal batches and
provided a trained knowledge model to estimate the faulty behavior of future batches.
To test the methodology a PCA model based in 70 NOC historical batches was built. The Q
statistic chart of the PCA model detected the 173 AOC historical batches. With the Q statistic
the classification or the diagnosis of the batches could not be achieved, therefore, contributions
plots of the faulty batches were calculated to pursuit the task of diagnosis. At naked eye
the diagnosis of the contribution plots is very difficult if there are no expert in process, and
moreover, an expert to diagnose a process that is highly non-linear is very difficult to find.
Contribution limits charts were proposed to tackle the deficiency on the diagnosis with the
contribution plots. If the contribution of a faulty batch is projected in the contribution limit
chart, it can be observe in which stages the variables contribute in an abnormal manner. With
the information of the contribution behavior gathered, classification task could be performed.
The issue, if the batch process duration is long and has many variables to measure, then the
classification task could be a problem.
57
8. CONCLUSIONS AND FUTURE STUDIES
To overcome the dimensionality of the contributions instances, a fault signature to contain
the information of the contribution behavior, and reduce the dimensionality for the classification
task was proposed. The approaches to reduce the dimensionality of the contributions is to use
a binary indicator or a numeric indicator for the fault signature. With the reduce dimension
of the fault signature, classifications tasks are much easier. An AOC fault signature dataset is
built in order that a classification algorithm searches for patterns that describe the mapping of
the faulty behavior and the laboratory analysis on the product quality.
After the classification models are obtained, the system can estimate the quality diagnosis
of a released batch. The measurements of the released batch are projected in the PCA model
and the Q statistic is observed to discern if the batch is faulty or not. If the batch is faulty the
contribution plot is calculated and projected in the contribution limit chart. The fault signature
is obtained with the information of the contribution behavior of the faulty process. Then, the
fault signature passes through the classification models and depending on its information the
model give a diagnosis estimation of the product quality.
The result obtained with the methodology proposed were great. In the diagnosis of the
global quality variable with the binary indicator for the fault signature the total classification
rate was 85,06% for both rule induction algorithm. The diagnosis of the quality variables with
the binary indicator had a minimun classification rate of 87,21% and a maximum of 95,35%.
And the diagnosis of the quality variables with the numeric indicator for the fault signature
had a minimum classification rate of 97,67% and maximum of 98,84%. The results suggest that
the best approche to fill the fault signature with the information of the contribution behavior
is the numeric indicator, but, if the process is more controlled and not so non-linear, maybe
the other approach is much better.
For future studies, applying the methodology in different batch processes would help to
improve the proposed method. Improvement of the approaches for the fault signature would
help to achieve better classifications rates. The proposed methodology could be applied to a
wide range of batch process, where the cycle of the batches have different stages and the disposal
of sensors to measure the quality variable are not available, and with this studies improve the
robustness of the system.
58
Glossary
NO−2 Nitrate
NO−3 Nitrate
PO3−4 Phosphate
NH+4 Ammonium
AE1 First aerobic condition for the SBR biological reaction
AE2 Second aerobic condition for the SBR biological reaction
ANA Anaerobic condition for the SBR biological reaction
ANO Anoxic condition for the SBR biological reaction
AOC Abnormal operation condition
BNR Biological nutrient removal
BOD Biochemical oxygen demand
C Organic matter
CUSUM Cumulative Sum
DO Dissolved oxygen
F1 First fill for the SBR biological reaction
F2 Second fill for the SBR biological reaction
FS Fault signature
FSD Fault signature dataset
59
GLOSSARY
GQR Global quality removal for the wastewater treated
LCL Lower contribution limit for the contribution limit chart
MPCA Multiway principal component analysis
MSPC Multivariate statistical process control
NOC Normal operation condition
ORP Oxidation reduction potential
PCA Principal component analysis
pH Measure of the acidity or alkalinity of an aqueous solution
RIA Rule induction algorithm
SBR Sequencing batch reactor
SPC Statistical process control
SPE Squared prediction error
Temp Temperature
U-PCA Unfold - Principal component analysis
UCL Upper contribution limit for the contribution limit chart
60
References
[1] P. Nomikos and J. F. MacGregor. Multivariate SPC Charts for Monitoring
Batch Processes. Technometrics, 37(1):41–59, February 1995. 1
[2] S. Puig, M.T. Vives, L. Corominas, M.D. Balaguer, and J. Colprim. Wastewa-
ter Nitrogen Removal in SBRs, Applying a Step-Feed Strategy: from Lab-Scale
to Pilot-Plant Operation. Water Sci. Technol., 50(10):89–96, 2004. 1
[3] S. Puig, M. Coma, M. C.M. VanLoosdrecht, J. Colprim, and M. D. Balaguer.
Biological Nutrient Removal in a Sequencing Batch Reactor Using Ethanol as
Carbon Source. J. Chem. Technol. Biotechnol., 82(10):898–904, 2007. 1
[4] R. Ganigu, H. Lpez, M.D. Balaguer, and J. Colprim. Partial Ammonium Oxi-
dation to Nitrite of High Ammonium Content Urban Landfill Leachates. Water
Res., 41(15):3317–3326, 2007. 1
[5] H. Lopez, S. Puig, R. Ganigud, M. Ruscalleda, M. D. Balaguer, and J. Col-
prim. Start-Up and Enrichment of a Granular Anammox SBR to Treat High
Nitrogen Load Wastewaters. J. Chem. Technol. Biotechnol., 83(3):233–241, 2008. 1
[6] S. Wold, K. Esbensen, and P. Geladi. Principal Component Analysis. Chemom.
Intell. Lab. Syst., 2(1):37–52, 1987. 1, 13
[7] J. E. Jackson. A User’s Guide to Principal Components. John Wiley & Sons Canada,
Limited, March 1991. 1, 13
[8] E. B. Martin, A. J. Morris, and J. Zhang. Process Performance Monitor-
ing Using Multivariate Statistical Process Control. IEE P-Contr. Theor. Ap.,
143(2):132–144, March 1996. 1, 13, 16, 20
61
REFERENCES
[9] P. Nomikos and J. F. MacGregor. Monitoring Batch Processes Using Multiway
Principal Component Analysis. AIChE J., 40(8):1361–1375, August 1994. 2, 20
[10] Joanne Drinan. Waster and Wastewater Treatment: A Guide for the Nonengineering
Professional. Technomic Publishing Co. Inc., illustrated edition, 2001. 5, 6
[11] European Community. Commission Directive 98/15/EC Amending Council
Directive 91/271/EEC Concerning Urban Waste Water Treatment. Official J.
European Communities., L 67:29–30, March 7 1998. 5, 25, 26, 55
[12] Frank R. Spellman. Handbook of Water and Wastewater Treatment Plant Operations.
Lewis Publishers, 2003. 5
[13] K. A. Kosanovich, M. J. Piovoso, K. S. Dahl, J. F. MacGregor, and
P. Nomikos. Multi-Way PCA Applied to an Industrial Batch Process. In Proc.
Am. Control Conf., 2, pages 1294–1298, 1994. 6, 21
[14] S. Mace and J. Mata-Alvarez. Utilization of SBR Technology for Wastewater
Treatment: An Overview. Ind. Eng. Chem. Res., 41(23):5539–5553, 2002. 6
[15] J. Keller, K. Subramaniam, J. Gsswein, and P.F. Greenfield. Nutrient Re-
moval from Industrial Wastewater Using Single Tank Sequencing Batch Reac-
tors. Water Sci. Technol., 35(6):137–144, 1997. 6
[16] Wisamm S. Al-Rekabi, He Qiang, and Wei Wu Qiang. Review on Sequencing
Batch Reactors. Pakistan Journal of Nutrition, 6(1):11–19, 2007. 7
[17] Douglas C. Montgomery. Introduction to Statistical Quality Control. Wiley, 3 edition,
1996. 9
[18] Ali Cinar and Cenk Undey. Statistical Process and Controller Performance
Monitoring. A Tutorial on Current Methods and Future Directions. Proceedings
of the American Control Conference, 4:2625–2639, June 1999. 9
[19] John S. Oakland. Statistical Process Control. Butterworth-Heinemann, fifth edition,
2003. 9, 10
[20] C. C. Aggarwal and P. S. Yu. Outlier Detection for High Dimensional Data.
In SIGMOD Conference, 2001. 11, 27
62
REFERENCES
[21] J. F. MacGregor. Using On-Line Process Data to Improve Quality. Is there a
Role for Statisticians? Are They Up for the Challenge? Int. Stat. Rev., 16(2):6–13,
1996. 12
[22] S. Bersimis, J. Panaretos, and S. Psarakis. Multivariate Statistical Process
Control Charts and the Problem of Interpretation: A Short Overview and
Some Applications in Industry. In Proceedings of the 7th Hellenic European Conference
on Computer Mathematics and its Applications, 2005. 12
[23] Julia Doroshenko and Vale. Multivariate Control Charts for the Analysis of
Process. In Modern Problems of Radio Engineering, Telecomunications and Computer
Science, 2002, pages 136–137, 2002. 12
[24] H. Hotelling. Techniques of Statistical Analysis, chapter Multivariate Quality Control,
pages 111–184. McGraw-Hill, 1947. 12
[25] Kuang-Han Chen, Duane S. Boning, and Roy E. Welsch. Multivariate Sta-
tistical Process Control and Signature Analysis Using Eigenfactor Detection
Methods. In Proceedings of the 33rd Symposium on the Interface: Computing Science
and Statistics, number 33, pages 271–291, 2002. 13
[26] Barry M. Wise, Neal B. Gallagher, Stephanie Watts Bulter, Danifl
D. White Jr., and Gabriel G. Barna. Development and Benchmarking of
Multivariate Statistical Process control Tool for a Semiconductor ETCH Pro-
cess: Impact of Measurement Selection and Data Treatment on Sensitivity. In
IFAC SafeProcess’97, pages 35–42, 1997. 13, 19
[27] David M. Himes, Robert H. Storer, and Chistos Georgakis. Determination of
hte Number of Principal Components for Disturbance Detection and Isolation.
In Proceedings of the Amrecian Control Conference, 1994. 14, 15, 16
[28] Gilles Raıche, Martin Riopel, and Jean-Guy Blais. Non Graphical Solutions
for the Cattell’s Scree Test. In International Meeting of the Psychometric Society,
2006. 15
[29] Ruben Daniel Ledesma and Pedro Valero-Mora. Determining the Number of
Factors to Retain in EFA: an easy-to-use computer program for carrying out
Parallel Analysis. Practical Assessment, Research & Evaluation, 12(12), 2007. 15, 16
63
REFERENCES
[30] T. Kourti. Application of Latent Variable Methods to Process Control and
Multivariate Statistical Process Control in Industry. Int. J. Adapt. Control,
19:213–246, 2005. 16, 18, 22
[31] T. Kourti and J. F. MacGregor. Multivariate SPC Methods for Process and
Product Monitoring. J. Qual. Technol., 28(4):409–428, October 1996. 17, 19
[32] J. A. Westerhuis, S. P. G., and A. K. Smilde. Generalize Contribution Plots in
Multivariate Statistical Process Monitoring. Chemom. Intell. Lab. Syst., 51:95–114,
2000. 19, 27
[33] S. J. Qin. Statistical Process Monitoring: Basics and Beyond. J. Chemom.,
17:480–502, 2003. 19
[34] J. Flores-Cerrillo and J. F. MacGregor. Multivariate Monitoring of Batch
Processes Using Batch-to-Batch Information. AIChE J., 50(6):1219–1228, June
2004. 21
[35] Svante Wold, Nouna Kettaneh, Hakan Friden, and Andrea Holmberg. Mod-
eliling and Diagnostics of Batch Processes and Analogous Kinetic Experiments.
Chemometrics and Intelligent Laboratory Systems, 44(1-2):331–340, 1998. 22
[36] H. J. Ramaker, E. N. M. van Sprang, S. P. Gurden, J. A. Westerhuis, and A. K.
Smilde. Improved Monitoring of Batch Processes by Incorporating External
Information. Journal of Process Control, 12(4):569–576, 2002. 22
[37] C. K. Yoo, K. Villez, I. Lee, C. Rosn, and P. A. Vanrolleghem. Multi-Model
Statistical Process Monitoring and Diagnosis of a Sequencing Batch Reactor.
Biotechnol. Bioeng., 96(4):687–701, March 2007. 22
[38] C. Rosen and G. Olsson. Disturbance Detection in Wastewater Treatment
Plants. Water Science and Technology, 37(12):197–205, 1998. 22
[39] M. Ruiz, J. Colomer, J. Colprim, and J. Melndez. Multivariate Statistical
Process Control for Situation Assessment of a Sequencing Batch Reactor. In
Control 2004, University of Bath, UK, page 11, September 2004. 22, 33
64
REFERENCES
[40] M. Ruiz, J. Colomer, and J. Melndez. Combination of Statistical Process Con-
trol (SPC) Methods and Classification Strategies for Situation Assessment of
Batch Process. Inteligencia Artificial, Revista Iberoamericana de IA, 10(29):99–107,
2006. 22, 33
[41] Kunwar P. Singh, Amrita Malik, Dinesh Mohan, Sarita Sinha, and Vinod K.
Singh. Chemometric Data Analysis of Pollutans in Wastewater - A Case Study.
Analytica Chimica Acta, 532(1):15–25, 2005. 22
[42] S. Puig, M. Coma, H. Moncls, M.C.M. VanLoosdrecht, J.Colprim, and M.D.
Balaguer. Selection Between Alcohols and Volatile Fatty Acids as External
Carbon Sources for EBPR. Water Research, 42(3):557–566, 2008. 26
[43] Young-Hak Lee, Don-Yong Lee, and Chonghun Han. RMBatch: Intelligent
real-time monitoring and diagnosis system for batch processes. Computers &
Chemical Engineering, 23(Supplement 1):S699 – S702, 1999. European Symposium on
Computer Aided Process Engineering, Proceedings of the European Symposium. 33
[44] K. Villez, M. Ruiz, G. Sin, J. Colomer, C. Rosn, and P. A. Vanrolleghem.
Combining Multiway Principal Component Analysis (MPCA) and Clustering
for Efficient Data Mining of Historical Data Sets of SBR Processes. Water Sci.
Technol., 57(10):1659–1666, 2008. 33
[45] Yejin Kim, Hyeon Bae, Kyungmin Poo, Jongrack Kim, Taesup Moon, Sungshin
Kim, and Changwon Kim. Soft Sensor Using PNN Model and Rule Base for
Wastewater Treatment Plant. In Jun Wang, Zhang Yi, Jacek Zurada, Bao-
Liang Lu, and Hujun Yin, editors, Advances in Neural Networks - ISNN 2006, 3973 of
Lecture Notes in Computer Science, pages 1261–1269. Springer Berlin / Heidelberg, 2006.
33
[46] Sung Hun Hong, Min Woo Lee, Dae Sung Lee, and Jong Moon Park. Moni-
toring of sequencing batch reactor for nitrogen and phosphorus removal using
neural networks. Biochemical Engineering Journal, 35(3):365 – 370, 2007. 33
[47] Liping Fan and Yang Xu. A PCA-Combined Neural Network Software Sensor
for SBR Processes. In Derong Liu, Shumin Fei, Zengguang Hou, Huaguang
Zhang, and Changyin Sun, editors, Advances in Neural Networks ISNN 2007, 4492
65
REFERENCES
of Lecture Notes in Computer Science, pages 1042–1047. Springer Berlin / Heidelberg,
2007. 33
[48] E. S. Page. Continuous Inspection Schemes. Biometrika, 41(1/2):100–115, June
1954. 35
[49] David Leung and Jose Romagnoli. An integration mechanism for multivariate
knowledge-based fault diagnosis. Journal of Process Control, 12(1):15 – 26, 2002. 37
[50] Fu Xiao, Shengwei Wang, Xinhua Xu, and Gaoming Ge. An isolation enhanced
PCA method with expert-based multivariate decoupling for sensor FDD in
air-conditioning systems. Applied Thermal Engineering, 29(4):712 – 722, 2009. 37
[51] P. Clark and T. Niblett. Induction in Noisy Domains. In I. Bratko and
N. Lavrac, editors, Progress in Machine Learning (Proceedings of the 2nd European
Working Session on Learning), pages 11–30, Wilmslow, UK, 1987. Sigma Press. 37
[52] Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets Without Global
Optimization. In Proceedings of the 15th International Conference on Machine Learning
(1998), 144-151, 1998. 38
[53] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-Based Learning
Algorithms. Machine Learning, 6(1):37–66, January 1991. 38
[54] John G. Cleary and Leonard E. Trigg. K*: An Instance-based Learner Using
an Entropic Distance Measure. In 12th International Conference on Machine Learning
(1995), pages 108–114, 1995. 38
66