by dong chan lee - university of toronto t-space › bitstream › 1807 › ... · dong chan lee...
TRANSCRIPT
Automatic Power Quality Monitoring with Recurrent Neural Network
by
Dong Chan Lee
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science and EngineeringGraduate Department of Electrical & Computer Engineering
University of Toronto
c© Copyright 2016 by Dong Chan Lee
Abstract
Automatic Power Quality Monitoring with Recurrent Neural Network
Dong Chan Lee
Master of Applied Science and Engineering
Graduate Department of Electrical & Computer Engineering
University of Toronto
2016
The electric power grid constantly experiences disturbances that hinder efficiency and reli-
ability of the grid. This thesis is concerned with the development of automatic power quality
monitoring system that classifies power quality disturbances based on the voltage waveform.
The classification process involves generating training data, extracting features, and classifying
the data at every time step with a neural network. The feature extraction is implemented
based on short-time Fourier transform, wavelet transform, and S transform, and we present
comparisons of their performance for this application. The extracted features are used as the
inputs for the neural network, and the outputs are classes that the waveform belongs to. We
introduce recurrent neural network as the classifier for the first time in this application. The
recurrent neural network has the ability to memorize information in a time sequence data by
passing information through its hidden units. We show that recurrent neural network achieves
better performance than conventional feedforward neural network.
ii
Acknowledgements
I would like to thank my adviser, Professor Deepa Kundur for her encouragement and feedback
throughout my graduate studies. I also would like to thank my family and friends, and especially
my parents for their support and dedication.
iii
Contents
1 Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Power Quality Disturbance Data Generation 5
1 Overview of Power Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Standard Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Classification of power quality disturbances . . . . . . . . . . . . . . . . . 7
2 Characterization of Power Quality Disturbances . . . . . . . . . . . . . . . . . . . 8
2.1 RMS and Peak Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Monte Carlo simulation for data generation . . . . . . . . . . . . . . . . . . . . . 19
4 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Real-time Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Transformation and Feature Extraction 24
1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1 Discrete Short-Time Fourier Transform and its implementation . . . . . . 26
3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 Discrete Wavelet Transform and its implementation . . . . . . . . . . . . 29
4 S Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iv
4.1 Discrete S transform and its implementation . . . . . . . . . . . . . . . . 32
5 Comparison of Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Common limitations of the feature . . . . . . . . . . . . . . . . . . . . . . 38
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Classification with Long Short Term Memory 40
1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1 Decision making with Softmax Function . . . . . . . . . . . . . . . . . . . 43
2.2 Training Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Windowed Feedfoward Neural Network . . . . . . . . . . . . . . . . . . . . 45
3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Advantages Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . 48
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Results and Case Studies 50
1 Data Generation and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1 Comparisons of the transformations . . . . . . . . . . . . . . . . . . . . . 51
2.2 Effect of the size of window . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Distribution of Misclassification . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4 Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4 Limitations of the Proposed Power Quality Monitor . . . . . . . . . . . . . . . . 65
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Conclusion 66
Appendices 68
A Parameters for Monte Carlo Simulation 69
v
B Results 72
Bibliography 74
vi
List of Tables
2.1 Classification of power quality disturbances and their characterization [1, 2] . . . 8
3.1 Comparison of computational complexity of feature extraction algorithms . . . . 38
5.1 Comparison of accuracy of FNN and LSTM in percentage . . . . . . . . . . . . . 53
5.2 Accuracy of LSTM with feature from DWT . . . . . . . . . . . . . . . . . . . . . 54
5.3 Comparison of LSTM and FNN in data with noise . . . . . . . . . . . . . . . . . 59
A.1 Parameters for Monte Carlo simulation in power quality data generation . . . . . 70
A.2 Ratio of disturbance classes in the generated data . . . . . . . . . . . . . . . . . . 71
B.1 Confusion matrix for FNN from DWT features . . . . . . . . . . . . . . . . . . . 72
B.2 Confusion matrix for FNN from STFT features . . . . . . . . . . . . . . . . . . . 73
B.3 Confusion matrix for LSTM from STFT features . . . . . . . . . . . . . . . . . . 73
B.4 Confusion matrix for FNN from ST features . . . . . . . . . . . . . . . . . . . . . 73
B.5 Confusion matrix for LSTM from ST features . . . . . . . . . . . . . . . . . . . . 73
B.6 Confusion matrix for LSTM from DWT features with noise . . . . . . . . . . . . 74
B.7 Confusion matrix for FNN from DWT features with noise . . . . . . . . . . . . . 74
vii
List of Figures
2.1 An example waveform of impulsive transient . . . . . . . . . . . . . . . . . . . . . 10
2.2 An example waveform of oscillatory transient . . . . . . . . . . . . . . . . . . . . 11
2.3 An example waveform of interruption . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 An example waveform of voltage sag . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 An example waveform of voltage swell . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 An example waveform of DC offset . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 An example waveform of harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 An example waveform of notching . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 An example waveform of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.10 An example waveform of voltage fluctuation . . . . . . . . . . . . . . . . . . . . . 16
2.11 An example waveform of frequency variation . . . . . . . . . . . . . . . . . . . . 16
2.12 RMS and peak voltage measurement of each class of power quality disturbances . 18
2.13 An example of generated data for the training . . . . . . . . . . . . . . . . . . . . 20
2.14 (a) Classification process of existing techniques [3] (b) Classification process of
the proposed technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Overall process of building an automatic power quality monitor . . . . . . . . . . 25
3.2 Contour diagram of STFT coefficients . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Two-band analysis bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Reconstructed signal using discrete wavelet transform at different levels . . . . . 30
3.5 Contour diagram of ST coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Relationships between wavelt transform, S transform and Fourier transform [4] . 36
3.7 An example of waveform and extracted features based on different transforms . . 37
viii
4.1 Feed forward neural nentwork architecture . . . . . . . . . . . . . . . . . . . . . . 41
4.2 (a) A simplified single node recurrent neural network (b) Unrolled version of the
network through time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Conventional approach for power quality disturbance classifier [3] . . . . . . . . . 48
4.4 (a) Feedforward neural network architecture (b) Windowed feedforward neural
network architecture (c) Long short-term memory architecture . . . . . . . . . . 49
5.1 Cross entropy of the training and testing data as the training progresses . . . . . 51
5.2 Output of the automatic power quality monitoring system . . . . . . . . . . . . . 52
5.3 Performance of LSTM with various window sizes of DWT . . . . . . . . . . . . . 54
5.4 Performance of LSTM with various window sizes of STFT . . . . . . . . . . . . . 55
5.5 Performance of wFNN with various window sizes . . . . . . . . . . . . . . . . . . 56
5.6 Performance of LSTM with various sampling frequencies . . . . . . . . . . . . . . 56
5.7 Performance of LSTM with various output frequencies . . . . . . . . . . . . . . . 57
5.8 Overall distribution of misclassification . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 Distribution of misclassification for individual power quality disturbances . . . . 58
5.10 Case study of interruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.11 Case study of oscillatory transient . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.12 Case study of voltage sag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.13 Case study of harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
ix
Chapter 1
Introduction
1 Motivation
The electric power grid provides a convenient and affordable way to deliver energy to sustain
our society. Technologies ranging from personal electronics to manufacturing plants use electri-
cal energy delivered by the electric grid. While maintaining the reliability of the grid, engineers
began to realize that connecting multiple generators and consumers mitigate the uncertain-
ties in supply and demand. Fortunately, the invention of transformers enabled a high voltage
transmission system that significantly reduced the power loss over the long transmission lines.
Interconnections between regions started to expand with the high voltage transmission system,
and the electric power grid today is the largest human-made machine. The conventional power
system structure provides most of the electricity from large power plants such as nuclear or
gas-fired generators that are often distant from the customers for safety reasons. This system
is centrally monitored and controlled by the transmission system operator.
Today there is a strong demand for innovating the conventional design of the electric grid.
The greenhouse gas emission from fossil fuel power plants is one of the major causes of climate
change. The fossil fuel power plants are being replaced by alternative renewable energy sources.
The wind and solar energy are two promising energy sources that can be widely and safely
deployed. With the rapid advancement of technology in wind turbines and solar cells, their costs
are expected to be competitive with conventional energy sources in near future. Wind turbine
and solar cells are considered distributed generators because they have a smaller capacity than
1
Chapter 1. Introduction 2
conventional synchronous generators. The integration of distributed generators is leading major
shifts in the structure and operation of the electric power grid. Since these distributed generators
have a small capacity, they are scattered near the consumers in a low voltage system to avoid
long distance delivery. Both generators and consumers need to be managed in a smaller and
local grid and the low to medium voltage grid such as the microgrid is one of the most researched
topics today [5].
The distributed generators are often connected to the grid through an electronic interface
with a nonlinear control such as MPPT algorithm in Photovoltaics. The electronic components
introduce disturbances to the grid, which result in issues with power quality of the grid. In
addition, the intermittency of renewables increases variation and disturbance in the supply and
operating condition. The increased uncertainty and disturbances manifest as impurities in the
sinusoidal waveform of the voltage and current.
The impurities in the waveform are referred to as power quality disturbances. The concerns
with power quality are not new, and these issues always exist in power systems since the system
is always subject to disturbances. Increasing penetration of renewable is closely related to the
issues in power quality and its importance is continuously growing [6, 7].
The operation of the grid has to accommodate the increasing number of the distributed
generators. The measurement from distributed generator introduce very large amount of data,
and the assessment based on human operator become very expensive and unreliable. While
the changes in power system structure and operation create new challenges, recent advances in
smart grid technology give promising solutions to mitigate the rising issues with its metering
and communication technology.
2 Contributions
This thesis proposes a power quality monitor equipped with machine learning technique
to assist operator’s observability. The automated power quality monitor classifies the type of
phenomena recorded, and the system operator can easily detect and analyze the issues that
the grid is faced with. Traditionally, the operators only provided a diagnostic monitoring of
power quality. Technicians are dispatched only when customers complain continuously or after
Chapter 1. Introduction 3
the damage. The automated monitoring technology enables the preventive monitoring of power
quality since the data can be analyzed before customers report problems. To address the issues
regarding power quality, reliability and efficiency can be greatly enhanced with the proactive
system rather than the reactive system.
Specifically, the goal of this thesis is to increase the accuracy of the power quality monitor,
and we make several modifications from existing techniques. The specific contributions of this
thesis are as follows.
1. The technique presented in this thesis eliminates the need of a pre-segmentation algorithm,
which is required in existing techniques. Current techniques assume the monitor is given
a nicely segmented and fixed size window of disturbance in the waveform. We eliminate
this assumption and apply the classification at every time step giving a more accurate
localization of the disturbance.
2. Feature extraction algorithms based on different transforms are studied under a standard
classification algorithm. This thesis describes the algorithms and shows an experimental
evaluation and comparison of the transforms.
3. The effectiveness of recurrent neural network is studied and compared with conventional
feedforward neural network. We demonstrate that we can achieve a lower error rate and
localization of the event by passing information through the hidden network.
3 Overview
Chapter 2 provides background in current practice for power quality monitoring as well as
the state of the art techniques. We describe the data generation process to create training data
for the neural network.
In Chapter 3, we go over the existing transforms and feature extraction based on short-time
Fourier transform, wavelet transform, and S transform. In addition, the comparison of the
transforms will be discussed with their relationship to each other.
Chapter 4 presents the classification of power quality disturbances with the recurrent neural
network, which is implemented with Long Short-Term Memory. The description of how this
Chapter 1. Introduction 4
classifier is implemented and how it can be trained is provided in this chapter. This is a standard
method in machine learning, but some of the core techniques such as softmax output layer and
training methods were not introduced for the power quality disturbance classification.
In Chapter 5, we present our results and case studies. The accuracy with Long Short-Term
Memory is compared with the feedforward neural network. In addition, we examine the different
window size of the transform to find the optimal parameter settings. We present the limitation
of the technique due to its over-fitting towards the generated data.
We conclude our thesis in chapter 6 with the summary of the contents as well as the future
direction in this research.
Chapter 2
Power Quality Disturbance Data
Generation
In this chapter, we provide an overview of power quality disturbances with their mathemat-
ical descriptions. The definition of each power quality disturbance is given with its cause and
the example waveform. This chapter defines the target classes and describes how each class of
disturbance can be generated based on the known magnitude and spectral content. Later we
will see how the generated data can be used to build a classification system that uses machine
learning to automatically recognize patterns in the data.
1 Overview of Power Quality
The term power quality is defined in [1] as ”any power problem manifested in voltage, cur-
rent, or frequency deviations that result in failure or misoperation of customer equipment.”
Power quality encompasses a broad range of concerns and is difficult to develop a cohesive solu-
tion in general. The IEEE Recommended practice for monitoring electric power quality (IEEE
Std 1159-2009 [2]) defines the terminologies and definitions of phenomena that are adopted in
this thesis. While the importance of power quality has increased throughout the past decades,
power quality analysts struggle with processing massive volume of measurements [11]. The
current practice in the industry is commonly equipped with Root Mean Square (RMS) and
total harmonic distortion (THD) measurements of the voltage and current waveform.
5
Chapter 2. Power Quality Disturbance Data Generation 6
1.1 Standard Measurements
Root mean square voltage/current
Root mean square of a voltage or current waveform, v[t] is defined as
Vrms(t) =
√1
T
∫ t
τ=t−T[v(τ)]2 (2.1)
where T = 1/ff is the period for the waveform’s fundamental frequency, ff , which is 60 Hz for
North American power systems.
Peak voltage/current
Peak voltage and current identify the maximum and minimum of the waveform over the
period and is defined as
Vmax(t) = maxτ∈[t−T,t]
v(τ)
Vmin(t) = minτ∈[t−T,t]
v(τ)
(2.2)
RMS and peak voltage of the waveform are related. For a pure sinusoidal wave, Vmax(t) =
−Vmin(t) =√
2Vrms(t) for every t. Having both maximum and minimum of the waveform can
be useful for detecting dc component in the waveform.
Total harmonic distortion
Total harmonic distortion (THD) estimates the overall distortion of the wave from the
fundamental and is defined as
THD =
∑Nn=2 VnV1
(2.3)
where Vn is the magnitude of nth harmonics, and V1 is the magnitude of fundamental frequency.
The maximum harmonic, N , is 7 to 13 based on the applications.
IEEE Recommended Practice and Requirements for Harmonic Control in Electric Power
Systems [12] specifies the current practice on how this measurement is utilized. Although the
total harmonic distortion has been the most widely used measurement for detecting waveform
distortions, there is a clear limitation in characterizing different phenomena. Vn is the short-
Chapter 2. Power Quality Disturbance Data Generation 7
time Fourier transform coefficient at n th frequency, and the total harmonic distortion sums
up all the coefficients in the non-fundamental frequencies. Total harmonic distortion simply
reduces the dimension of the coefficient so that it is easy to detect waveform disturbances, but
it does not have the capacity to sufficiently represent and characterize the signal content. We
can generalize the function in Equation 2.3 and use the raw information such as V1, ..., VN to
extract much more information using machine learning techniques.
Moreover, there is a limitation for time localization of the phenomena because Fourier
transform gives only the frequency representation of the signal. This point will be elaborated
in the next chapter.
1.2 Classification of power quality disturbances
Classification of power quality disturbances is based on IEEE Std 1159-2009 [2]. Table 2.1
shows the categories of power quality disturbances, which can be found in both [1, 2]. Having
consistent definitions of classes is important in preserving and extending our knowledge of
the phenomena. Classification of the power quality disturbances directs the engineers to the
solution for its fundamental issue. Each class of the disturbance often has the common causes
as well as the solution.
The power quality disturbances can be broadly classified into steady state and transient
disturbances. The steady state disturbances include waveform distortions and voltage imbal-
ances. The transient disturbances include impulsive and oscillatory transients as well as short
duration voltage magnitude variation. While our power quality monitoring has the capability
to distinguish any types of disturbances, the classes of disturbance can be selected as a subset of
the disturbances defined in the standard [2]. For example, the engineers may be interested only
in steady state disturbances so that some control scheme can be implemented. By including
various phenomena such as transient disturbances, the proposed monitor can reduce the false-
positive identification of desired classes for control. If standard measurements such as Total
Harmonic Distortion were used, the controller may react to temporary transient disturbances
reducing the efficiency controller.
Each phenomenon has common characterization in terms of its spectral content and the
magnitude, and we currently have the knowledge to recreate them. This gives us the ability
Chapter 2. Power Quality Disturbance Data Generation 8
to create example data of power quality disturbances. With a large amount of data, we can
employee the state of the art machine learning techniques to build an automatic classifier.
Table 2.1: Classification of power quality disturbances and their characterization [1, 2]
Categories Spectral Content Typical DurationTypical
Magnitude
Transients Impulsive 5 ns - 0.1 ms rise 1 ns - 1 ms plus
Oscillatory 0.5 MHz - 5 kHz 5 µs - 50 ms 0 - 8 pu
Short Duration Interruptions 0.5 cycle - 1 min < 0.1 pu
Variations Sags 0.5 cycle - 1 min 0.1 - 0.9 pu
Swells 0.5 cycle - 1 min 1.1 - 1.8 pu
Long Duration Interruptions > 1 min < 0.1 pu
Variations Under-Voltages > 1 min 0.8 - 0.9 pu
Over-Voltages > 1 min 1.1 - 1.8 pu
Voltage Imbalances steady state 0.5-2 %
Waveform Distortions dc offset steady state 0-0.1 %
Harmonics 0 - 9 kHz steady state 0-20 %
Interharmonics 0 - 9 kHz steady state 0-2 %
Notching steady state
Noise steady state 0-1 %
Voltage
Fluctuations< 25 Hz Intermittent 0.1 - 7 %
Power Frequency
Variations< 10 s ± 0.10 Hz
2 Characterization of Power Quality Disturbances
In this section, we present the complete list of power quality disturbances that are subject
to classification in this thesis. We explain the phenomena and present its numerical model and
an example of the disturbance waveform. Although we give brief descriptions of the causes
of the phenomena, the readers should consult references such as [1, 13] to gain the deeper
understanding of the subject. References such as [14, 15] also contain the information on how
synthetic disturbance data can be generated.
After presenting the examples of power quality disturbances, we show the limitation of the
Chapter 2. Power Quality Disturbance Data Generation 9
standard measurements in the classification of the disturbances. Throughout this thesis, we will
focus on the voltage waveform since the voltage is the variable that is more strictly monitored
and regulated. The grid-connected equipment is generally designed for a range of current and
a fixed value of the voltage. However, this assumption can be generalized to current by simply
removing some of the classes that are only applicable for voltage such as voltage sag and swell.
Before we present the disturbances, we first define the normal sinusoidal voltage as
vnormal(t) = sin(2πft) + µ(t) (2.4)
where f is the fundamental operating frequency and µ(t) ∼ N(0, σ2) and σ ∈ [0, 0.01]. The
fundamental frequency presented throughout the thesis is the 60 Hz which is standard in North
America, and the voltage is in per unit. The noise term, µ(t) was added as the regularization
term to avoid over-fitting of towards perfect sinusoidal waveform. Since the power grid may
not have the perfect sinusoidal waveform, adding the noise can act as a generalization of the
realistic voltage waveform. The example waveforms are generated with the sampling frequency
of 10 kHz.
Chapter 2. Power Quality Disturbance Data Generation 10
Impulsive Transients
Impulsive transients are momentary and instantaneous change in the state without chang-
ing the fundamental frequency. It is unidirectional (either positive or negative) and can be
characterized by the rise and decay time and the peak value. The most common cause of im-
pulsive transients is a lightning, and it can result in oscillatory transient if it excites the natural
frequency of the circuit [2]. Impulsive transient can be synthesized by the following equation,
v(t) = vnormal(t) + βexp
(− c t− tstart
tend − tstart
)t ∈ [tstart, tend] (2.5)
where β ∈ ±[0.1, 0.8] and c = − log( εb) are the peak and fall time constant respectively. ε is the
threshold where the disturbance can be neglected, and is set to 0.001 in our thesis. Since the
rise time is between 5 ns to 0.1 ms, the rise delay is essentially negligible for sampling frequency
up to 10 kHz. An example of impulsive transient is show in 2.1 with 0.2 peak current and 1 ms
duration. The impulsive transient was repeated just for the illustration.
Figure 2.1: An example waveform of impulsive transient
Oscillatory Transients
Oscillatory transient rapidly changes polarity and can be described by the spectral content,
duration, and magnitude. Back-to-back capacitor energization results in oscillatory current
transient. Power electronic devices can produce voltage transients due to commutation and
RLC snubber circuits. Cable switching can also result in oscillatory voltage transients [2]. This
phenomenon can be synthesized with the following equation,
v(t) = vnormal(t) + βexp
(− c t− tstart
tend − tstart
)sin(2πfht) t ∈ [tstart, tend] (2.6)
Chapter 2. Power Quality Disturbance Data Generation 11
where β ∈ ±[0.1, 0.8], c = − log( εb), and fh ∈ [500, 5000] are the peak magnitude, fall time
constant, and transient frequency of the transient component of the waveform respectively.
Figure 2.2: An example waveform of oscillatory transient
Interruptions
When an interruption occurs, the supply voltage or load current decreases to less than 0.1
p.u. for a period of time. Common causes of interruptions are power system faults, equipment
failures, and control malfunctions. The interruption due to faults can be restored by the
instantaneous reclosure [2]. If the reclosure fails, the interruption could be permanent. Figure
2.3 shows a momentary interruption, and the waveform can be generated by
v(t) = αvnormal(t) t ∈ [tstart, tend] (2.7)
where α ∈ [0, 0.1] is the magnitude of the waveform. Intuitively, RMS or peak measurement in
Equation 2.1 or 2.2 would be ideal features for classification.
Figure 2.3: An example waveform of interruption
Chapter 2. Power Quality Disturbance Data Generation 12
Voltage Sag
Voltage sag is a decrease in RMS voltage or current to between 0.1 to 0.9 p.u. with durations
from 0.5 cycle to 1 minute. It is also referred as voltage dips. Most common cause of voltage
sags is system faults but the energization of heavy loads or starting of large motors could also
cause voltage sags [2]. Similar to the interruption, the RMS voltage is an obvious indicator to
classify voltage sag. Voltage sag waveform can be generated by simply changing the voltage
magnitude,
v(t) = αvnormal(t) t ∈ [tstart, tend] (2.8)
where α ∈ [0.1, 0.9] changes the voltage magnitude.
Figure 2.4: An example waveform of voltage sag
If the duration of the voltage sag lasts more than a minute, it is classified as under-voltage.
A common cause of under-voltage is a load switching on or a capacitor bank switching off [2].
In this thesis, the voltage sag and under-voltage will be in one class because they exhibit same
characteristics. They can be classified further if needed with an additional post-processing step
that measures the duration of the event.
Voltage Swell
Voltage swell is when the voltage magnitude increases to between 1.1 and 1.8 p.u. Similar
to voltage sag, swells are caused by system faults condition. It can be also caused by switching
off a large load or energizing a large capacitor bank [2]. The voltage swell waveform can be
generated by
v(t) = αvnormal(t) t ∈ [tstart, tend] (2.9)
Chapter 2. Power Quality Disturbance Data Generation 13
where α ∈ [1.1, 1.8] changes the magnitude of the waveform. If the duration of voltage swell
lasts longer than a minute, it is classified as an over-voltage. Common causes are load switchings
such as switching off a large load and incorrect settings of transformers.
Figure 2.5: An example waveform of voltage swell
DC offset
DC offset occurs if there is a dc voltage or current in the system. The cause of DC offset
is a geomagnetic disturbance or the effect of half-wave rectification. Direct current may be
caused by the electrolytic erosion of grounding electrodes and other connectors [2]. DC offset
waveform can be generated by simply adding a bias to the normal wave form,
v(t) = vnormal(t) + γ(t) t ∈ [tstart, tend] (2.10)
where γ(t) ∈ ±[0.001, 0.01]. Intuitively, tracking both the minimum and maximum voltage in
equation 2.2 can be a good feature to identify this disturbance.
Figure 2.6: An example waveform of DC offset
Harmonics/Interharmonics
Harmonics are sinusoidal voltage or currents having frequencies that are the integer multiple
of the fundamental frequency (60 Hz). Interharmonics are voltage or current having frequency
Chapter 2. Power Quality Disturbance Data Generation 14
that are non-integer multiples of the operating frequency. Harmonic distortion usually originates
from the nonlinear characteristics of devices and loads [2]. It creates waveform distortion from
the fundamental frequency. The harmonics and inter-harmonics can be synthesized by
v(t) = vnormal(t) + β sin(2πfht) t ∈ [tstart, tend] (2.11)
Figure 2.7: An example waveform of harmonics
where β ∈ [0.1, 0.2] and fh ∈ [180, 900] and b are the magnitude, and frequency of the harmonic
respectively. Since the phenomenon is periodic with the frequency greater than the operating
frequency, it is hard to detect the harmonic with the RMS voltage. The harmonics and inter-
harmonics will be one class in the automatic classification because distinguishing them is difficult
for the classifier. Determining whether the harmonic frequency is integer multiple or not is a
highly non-convex discrete set, and thus the feature required by the classifier will have to
span a large range of frequency. Therefore we combine these classes to one, and the further
classification can be made by an additional layer of classification.
Notching
Notching is a periodic voltage disturbance. The source of nothcing is the operation of
power electronic devices. When current is commutated from one phase to another, there is
a momentary short circuit between two phases resulting in notching [2]. The waveform of
notching can be generated by
v(t) = vnormal(t) +∑i
βexp
(− c t− tstart,i
tend,i − tstart,i
)t ∈ [tstart, tend] (2.12)
Chapter 2. Power Quality Disturbance Data Generation 15
where beta ∈ [0.25, 0.5], c = − log( εb) and tstart,i+1 − tstart,i is constant for all i making the
notching periodic.
Figure 2.8: An example waveform of notching
Noise
Noise is an electrical signal with spectral content less than 200 kHz. Power electronic devices,
control circuits, arcing equipment, and switching power supplies are common causes of noise
[2]. The noise can be added to the waveform by
v(t) = vnormal(t) + µ(t) t ∈ [tstart, tend] (2.13)
where µ(t) ∼ N(0, σ2) and σ ∈ [0.05, 0.1].
Figure 2.9: An example waveform of noise
Voltage Fluctuations
Voltage fluctuation is a variation in the voltage magnitude and is also referred as the voltage
flickers. An arc furnace is one of the most common causes of the flickers [2]. Solar panels can
also produce flickers when the irradiation condition changes and the voltage is modified with
Maximum Power Point Tracking (MPPT) algorithms. Flicker can be reproduced by introducing
low frequency waveform,
Chapter 2. Power Quality Disturbance Data Generation 16
v(t) = vnormal(t) + β sin(2πff t) t ∈ [tstart, tend] (2.14)
where β ∈ [0.05, 0.1] and ff ∈ [10, 25] are magnitude and frequency of flicker. Flicker appears
as a fluctuation in RMS voltage, but the instantaneous measurement may appear to be a
continuous switching between voltage sag or swell.
Figure 2.10: An example waveform of voltage fluctuation
Power Frequency Variations
Frequency variation is when the fundamental frequency of the power system deviates from
its nominal fundamental frequency significantly. Frequency variation normally occurs due to
the faults on bulk power transmission system or due to a large block of load or generator goes
off [2]. Synthesizing the power frequency variation can be done by
v(t) = sin(2π(60 + ∆ff )t) t ∈ [tstart, tend] (2.15)
where ∆ff ∈ ±[0.05, .01].
Figure 2.11: An example waveform of frequency variation
Chapter 2. Power Quality Disturbance Data Generation 17
Sag/Swell and Harmonic
Power grid can also experience combinations of the disturbances listed above. Two phe-
nomena that will be considered in this thesis are voltage sag with harmonics and voltage swell
with harmonics. The synthesis of those disturbances can be done by
v(t) = αvnormal(t) + β sin(2πfht) t ∈ [tstart, tend] (2.16)
where α ∈ [0.1, 0.2], fh ∈ [180, 900], β ∈ [0.1, 0.9] are the magnitude, frequency, and phase of the
harmonic respectively. The combination of disturbances are rarer than individual disturbance,
and thus only these two were considered. However, if a system regularly experiences certain
combinations, then those class can be added to this list. The machine learning approach
for classification allows this expansion of the list very easy because the modification of the
monitoring system is automated.
Voltage imbalance was not considered because the system is designed with input from single
phase input. Building a classifier with three phase input is more complex than having three
separate single phase classifier. This is because the feature required for the three phase input is
three times the single phase input, and the neural network will require a much larger capacity
to process the tripled input features. For the three phase system, dqo frame could be effective
in classifying severely unbalanced disturbances.
Although this section gave the complete list of power quality disturbance listed in the IEEE
standard [2], it may not cover the complete list of phenomena that can occur in the grid. One of
the advantages of establishing automatic data generation process is that we can easily expand
our definition by adding a class with its mathematical description. This approach allows us to
preserve definitions and expands records and understanding of the power quality disturbances.
2.1 RMS and Peak Measurement
In this section, we give RMS and peak voltage measurements that were presented in section
1.1. Figure 2.12 shows RMS and peak measurement of each example waveform. While RMS
measurement is generally good for classifying events related to the voltage magnitude such as
voltage sag and swell, it is unable to retrieve the spectral information. Since this is limited
Chapter 2. Power Quality Disturbance Data Generation 18
information retrieved from the waveform, we need an additional layer of feature extraction to
extract more information about the waveform. This layer will be presented in the next chapter.
Figure 2.12: RMS and peak voltage measurement of each class of power quality disturbances
Chapter 2. Power Quality Disturbance Data Generation 19
3 Monte Carlo simulation for data generation
The characterization developed in the previous sections will be used to generate data for
training the neural network. The data will be generated by Monte Carlo simulation from a
uniform distribution with the range given in Table A.1. A general form of the equation can be
described by the following equation,
v(t) = α sin(2π(60 + ∆ff )t) +∑i
βiexp
(− c t− tstart
tend − tstart
)cos(2πfh(t− tstart)) + µ(t) + γ(t)
(2.17)
for t ∈ [tstart, tend], µ(t) ∼ N(0, σ2), and c = − log( εb) where ε = 0.01. Since we have a
full characterization of power quality disturbances, it removes the need of manual labeling
process. The pseudocode for the data generation is given in Algorithm 1 with some notations
adopted from MATLAB. We let N be the length of the data we want to generate. The function
randi(a, b) draws random integer between a and b. We denote the type of disturbance by i and
insert normal waveform (i = 1) between each disturbance since the probability of a disturbance
right after another disturbance is very low. An example of data generated by the proposed
method is presented in Figure 2.13.
Algorithm 1 Data generation with Monte Carlo simulation1: tstart ← 02: i← 13: while tstart < N do4: if i==1 then5: i← randi(2, 14)6: else7: i← 18: end if9: α,∆f, β, fh, σ, γ, duration← sampled with range given in table A.1 with event i
10: tend ← tstart + duration11: v(tstart : tend)← equation 2.1712: label(tstart : tend)← i13: tstart ← tend14: end while15: Output v
Since the duration of each disturbance is different, there is an issue of fairness of generated
data between the disturbances. For example, if the majority of the data is voltage swell, then
Chapter 2. Power Quality Disturbance Data Generation 20
Figure 2.13: An example of generated data for the training
the bias of the neural network towards voltage swell would be very high. Therefore, all of
the disturbances were sampled with approximately equal probability in their total duration.
The total duration was chosen to be balanced between classes because the training of neural
network will be minimizing the objective function equally distributed over time. The number
of occurrence for the impulsive and oscillatory transients were higher than other classes by
about 15 and 5 times respectively. This result in almost equal ratio of total duration for each
disturbance except the normal waveform. The normal waveform is about 10 times the rest
of the data, and this is to weakly represent the realistic grid, which is usually in the normal
condition. Table A.2 shows both the ratio of classes in terms of the total duration and the
number of occurrences. In the next section, we present the literature survey where we collect
the history of the work that attempted to achieve the automatic monitoring of power quality.
4 Literature Survey
The goal of automatic classification of power quality disturbances has a long history and
some of the works date back to mid to late 1990s. The automatic classification involves two
major steps, which are feature extraction and classification. Feature extraction algorithms are
from techniques in signal processing such as short-time Fourier transform, wavelet transform
and S transform. The classification algorithms are from techniques in machine learning such
as neural network, support vector machine, and etc. In the literature, we will see that many
papers are different combinations of feature extraction and classification algorithm.
One of the earliest papers that tried to address this problem with neural network was done by
Chapter 2. Power Quality Disturbance Data Generation 21
Gosh et. al. [16]. Wavelet transform was noticed to be effective in detecting disturbances [17],
and the first classification system combining wavelet transform and neural network was achieved
[18]. After then, Gaouda et. al. [19] found that by Parseval’s theorem, the energy in discrete
wavelet transform coefficient is a much better feature for classification. Wavelet transform is
still one of the dominating feature extraction algorithm in this application [20, 21, 22, 23].
The variation was in the classification algorithm, and multiple neural networks with decision
making with voting scheme algorithm [20], neural structure [21], probabilistic neural network
[22], and self-organizing learning array [23]. While these classification algorithms often show
great results, all of the existing techniques avoid the issue of windowing of the sampled waveform
with a segmentation algorithm. The segmentation algorithm divides the data sequence into
normal and disturbance parts by adding a layer of an algorithm such as the triggering method
[3]. Figure 2.14 shows the comparison between existing and proposed algorithms where we
remove the segmentation layer and have output in the function of time. The decision-making
layer is also eliminated by the softmax output layer, which is part of the neural network.
Figure 2.14: (a) Classification process of existing techniques [3] (b) Classification process of theproposed technique
S transform is also another transform for the feature extraction [24, 25, 26, 27, 28, 14].
The classification algorithm that were considered were feedforward neural network [24, 27],
probabilistic neural network [24, 25], modular neural network [26], and decision tree [14]. Ray
et. al. [28] studied the system with distributed generation and renewable and compared the
discrete wavelet transform and S transform.
Chapter 2. Power Quality Disturbance Data Generation 22
Other approaches such as Hilbert transform [29], neural tree [29], and combining multiple
algorithms [30] were studied as well. Existing literature is summarized with through literature
survey in [31, 3]. There are recent papers as well [15, 32, 33] proposing similar ideas for power
quality classification. With recent advancements in neural networks [34, 35], the automatic
power quality monitoring system can be greatly improved and simplified by removing many
layers of the process. We propose a new recurrent neural network from the deep neural network
architecture to address the implementation of power quality disturbance classification at every
time step.
5 Real-time Classification
Removing the segmentation layer in the existing techniques enables real-time implemen-
tation of the power quality disturbance classification. The real-time monitoring system can
be implemented without extensive computing hardware upgrade. The computing capability
requirement for the proposed monitoring system is not significantly greater than the existing
capabilities. The implementation of total harmonic distortion requires computing the magni-
tude of integer multiples of fundamental frequency with Fourier transform. In the real-time
implementation, those magnitudes are the coefficient of short-time Fourier transform. The only
computation left is the forward propagation through time in Long-Short Term Memory, which
takes much less time than the short-time Fourier transform. The most expensive step in the
online classification is the feature extraction, which uses signal transform. Since the existing
infrastructure can handle the signal transforms, the additional classification layer can be im-
plemented without much upgrade. In the future, more primitive signal transforms such as dqo
transformation could be considered to significantly reduce the computational requirement of
the monitoring system.
Currently, the standard measurements such as RMS voltage and THD are constantly mon-
itored to detect any abnormality in the grid. Those values are often used to initiate control
action when those measurements are over the thresholds defined by standards [2, 12]. The
proposed technique in power quality monitoring can extend the current capability to report
multiple classes of disturbances. The identified class can be used to initiate different control
Chapter 2. Power Quality Disturbance Data Generation 23
actions. These control actions include Static Synchronous Compensator (STATCOM), con-
figuring the distribution network and determining the status of the capacitor bank deployed
throughout the grid. There are a number of potential applications in increasing fault tolerance
of the grid, setting control parameters for STATCOM, and operating the microgrid [8, 10, 28].
In the next section, we will first explore the options for feature extraction and give overview
and comparison of short-time Fourier transform, wavelet transform, and S transform.
Chapter 3
Transformation and Feature
Extraction
1 Background
In order to assess the power quality issues, engineers have to carefully and often tediously
observe the system states. Most common variables that are monitored are voltage magnitude,
frequency and total harmonic distortion as mentioned in the previous chapter. These can be
considered as features that are extracted from the voltage and current waveforms. While the
standard measurements are usually quite effective in detecting whether the system is in a normal
or abnormal condition, these features are not sufficient to distinguish different disturbances. In
order to extract more information from the waveforms, this chapter investigates short-time
Fourier transform, wavelet transform, and S transform. The overall process of classification
is presented in Figure 3.1. The first step is to generate labeled data based on Monte Carlo
simulation, and we apply the transform and feature extraction. The extracted feature data will
be used to train the neural network that will classify the input features, and the next chapter
will discuss the classifier. In Figure 3.1, the upper part of the graph is the process that is done
offline before the monitor is deployed. The training of the classifier is the most time-consuming
step, and it has to be done only once to set up the parameters for the classifier. The bottom
part of the graph shows the implemented system. It will only require the feature extraction of
24
Chapter 3. Transformation and Feature Extraction 25
the data and classification.
Figure 3.1: Overall process of building an automatic power quality monitor
In this chapter, we will discuss the feature extraction process using the signal transforms. We
first go over the Fourier transform, which extracts the spectral information about the waveform.
1.1 Fourier Transform
Fourier transform breaks down a signal into sinusoids at different frequencies. It transforms
our view of the signal from a time domain to frequency domain. Fourier transform uses com-
plex exponential as the basis function. The formal definition of Fourier Transformation of a
continuous signal v(t) is
V (f) =
∫ +∞
−∞v(t)e−j2πftdt. (3.1)
Discrete Fourier transform of a discrete signal v[n] is defined as
V [k] =N−1∑n=0
v[n]e−j2πNkn (3.2)
where N is the size of the signal. For the implementation of Fourier transform, fast Fourier
transform (FFT) efficiently computes based on the divide and conquer method. The algorithmic
complexity of the naive implementation of Fourier transform is O(N2), and FFT is O(N logN).
The limitation of Fourier transform in power quality monitor is its inability to localize events.
The sense of time is completely lost in Fourier transform, and therefore it needs to be modified
in order to be applicable for localizing power quality events.
Chapter 3. Transformation and Feature Extraction 26
2 Short-Time Fourier Transform
In order to overcome the limitation of the Fourier transform, a window is applied to the
signal for localization. Short-time Fourier transform (STFT) applies Fourier transform to only
a fixed section of the signal at a time. By taking Fourier transform on the window in the
specified period, the time is part of the representation of the location of the window. STFT
can be defined formally as
STFTx(τ, f) =
∫ +∞
−∞v(t)w(t− τ)e−j2πftdt (3.3)
where w(t) is the windowing function such as rectangular window, Hann window, Hamming
window, etc.
2.1 Discrete Short-Time Fourier Transform and its implementation
Suppose the sampling frequency of the waveform is fi and the required sampling frequency
of the output is fo. We define the input and output sampling ratio as g = fifo
, and we will
only consider an integer ratio, g ∈ Z. Since the window function of size Q has the property of
w[n] = 0 for every n ∈ (−∞, 0) ∪ [Q,∞), the discrete STFT can be written as:
STFTx[m, k] =
m+Q−1∑n=m
v[n]w[n−m]e−j2πNkn, (3.4)
with the Hanning window
w[n] = 0.5(
1− cos( 2πn
Q− 1
)). (3.5)
In Figure 3.2, we present the short-time Fourier transform of the example signals given
in the previous chapter. The window size of one cycle was used at every time step. The
figure demonstrates the existence of other harmonics as shown in oscillatory transient and
harmonic disturbances. However, the fixed window size in STFT is limited for distinguishing
high-frequency short term events such as impulsive transients and notch. Having a shorter
time window would suffer from higher uncertainty in determining the coefficient. This shows
the limitation of short-time Fourier transform due to having the fixed size window.
Chapter 3. Transformation and Feature Extraction 27
Figure 3.2: Contour diagram of STFT coefficients
The magnitude of the STFT coefficients are selected as features. We take the spectrogram
definition of short-time Fourier transform, which the square root of short-time Fourier transform
coefficients. In addition, we concatenate the peak voltage presented in the equation 2.2 to
complete the feature vector x,
x =[|STFTx|2, Vmin, Vmax
]. (3.6)
The complete algorithm for implementing feature extraction with short-time Fourier trans-
form is given in Algorithm 2. In addition to the concatenation, we normalize the feature. The
normalization of the feature sets the mean and variance of the feature data to 0 and 1 respec-
tively. This step is to help the convergence of the training neural network, and the normalization
constant is obtained during the offline training step. When the classifier is implemented after
Chapter 3. Transformation and Feature Extraction 28
training the neural network, the same constant that was used for training is loaded to ensure
the computation of feature data is consistent during both training and testing.
Algorithm 2 Feature extraction from short-time Fourier transform
1: for m from 1 to L/g do2: vw ← 03: for n from m to m+Q− 1 do4: w[n]← 0.5(1− cos((2πn)/(Q− 1)))5: vw[n]← v[n] ∗ w[n−m]6: end for7: STFT [m, :]← FFT (wv)8: Vmax[m], Vmin[m]← equation 2.29: x[m, :]←
[|STFTx[m, :]|2, Vmax[m], Vmin[m]
]10: if online then11: load(xmean, xvar)12: x[m, :]← (x[m, :]− xmean)/
√xvar
13: end if14: end for15: if offline then16: for k from 1 to end do17: xmean[k]← mean(x[:, k])18: xvar[k]← var(x[:, k])19: end for20: x[m, :]← (x[m, :]− xmean)/
√xvar ∀m
21: save(xmean, xvar)22: end if23: Output x
While short-time Fourier transform gives the spectral analysis of the waveform, it has lim-
itations due to its fixed-size window. The size of the window is fixed, and there is a trade-off
in varying the window size for classifying high frequency and low-frequency disturbances. For
example, having a small window would be advantageous for detecting short duration events
such as impulsive transient, but it will be at a disadvantage for detecting flickers.
3 Wavelet Transform
In order to overcome the limitation of short-term Fourier transform, wavelet transformation
has been developed. Wavelet transform introduces the scale s, which changes the size of the
Chapter 3. Transformation and Feature Extraction 29
window. Continuous wavelet transform is defined as follows:
CWTψv (τ, s) =
∫ +∞
−∞v(t)ψ∗τ,s(t)dt =
1√|s|
∫ +∞
−∞v(t)ψ∗(
t− τs
)dt, (3.7)
where τ and s are the translation and scale parameters. As it can be seen, ψτ,s(t) = 1√|s|ψ( t−τs )
where ψ is the mother wavelet.
3.1 Discrete Wavelet Transform and its implementation
Detail derivation of discrete wavelet transform will not be discussed in this thesis since they
can be found in many standard books such as [36]. Discrete wavelet transform samples the
scale and position in a dyadic grid. The wavelet system satisfies the multiresolution conditions
where wavelets in higher resolution can span wavlets in lower resolution. Moreover, the lower
resolution coefficients can be efficiently computed from higher resolution coefficients by a filter
bank. For implementation of discrete wavelet transform, the following coefficients are results
of single-level wavelet analysis,
cj(k) =∑m
h(m− 2k)cj+1(m), (3.8)
dj(k) =∑m
g(m− 2k)cj+1(m), (3.9)
where cj is called approximation coefficients and dj is called detail coefficients. Implementation
of equation 3.8 and 3.9 can be done using FIR filtering and then down-sampling by two as
shown in figure 3.3. The filter banks, h(−n) is a lowpass filter outputting the approximation
coefficient, cj , and g(−n) is a highpass filter outputting the detail coefficieint, dj .
Figure 3.3: Two-band analysis bank
Decomposed detail coefficients were used to reconstruct the signal, and it can be computed
Chapter 3. Transformation and Feature Extraction 30
by
fj(t) =∑k
dj2j/2ψ(2jt− k) (3.10)
where ψ is the wavelet and j is the reconstruction level. In this thesis, the Daubechies wavelets
(db5) are used to decompose the signal into 5 levels. Daubechies wavelet system is one of the
most widely used wavelet systems. Figure 3.4 shows reconstructed details of power quality
disturbances at 5 levels using Equation 3.10. Unlike short-time Fourier transform where fre-
quencies are sampled, the signal is decomposed to different levels. In order to make the features
classifiable, the energy of a windowed signal will be used as the feature. We present the algo-
rithm for getting features based on wavelet decomposition in Algorithm 3. In the algorithm,
H is an initial buffer to obtain both peak voltage as well as the energy of the decomposed
waveform. Similar to Algorithm 2, we apply the normalization of data to make the gradient
descent work better for training the classifier.
Figure 3.4: Reconstructed signal using discrete wavelet transform at different levels
Chapter 3. Transformation and Feature Extraction 31
Algorithm 3 Discrete Wavelet Transform1: c5 ← v2: for j from 4 to 1 do3: cj , dj ← equation 3.8, 3.9 with cj+1
4: f [:, j]← equation 3.10 with dj5: end for6: H ← max (Q, fs/60)7: for m from H + 1 to L do8: g[m, :]←
√∑mn=m−Q f [n, :]2
9: Vmax[m], Vmin[m]← equation 2.210: x[m, :]←
[g[m, :], Vmax[m], Vmin[m]
]11: if online then12: load(xmean, xvar)13: x[m, :]← (x[m, :]− xmean)/
√xvar
14: end if15: end for16: if offline then17: for k from 1 to end do18: xmean[k]← mean(x[:, k])19: xvar[k]← var(x[:, k])20: end for21: x[m, :]← (x[m, :]− xmean)/
√xvar ∀m
22: save(xmean, xvar)23: end if24: Output x
4 S Transform
S transform was first proposed by Stockwell in [37]. S transform has a fixed modulating
sinusoids with respect to the time axis, and a Gaussian window is dilated and translated like the
wavelet transform. It maintains relationship with both the wavelet transform and short-time
Fourier transform. The fact that S transform retains direct relationship with Fourier transform
gives a good characterization. Continuous S transform is defined as follows,
STx(τ, f) =
∫ +∞
−∞x(t)
|f |√2πe−
(τ−t)2f22 e−j2πftdt. (3.11)
The continouous S transform can be also written as
STx(τ, f) =
∫ +∞
−∞X(α+ f)e
− 2π2α2
f2 ej2πατdα, f 6= 0 (3.12)
where X(f) is the Fourier transform of x(t). We can show equation 3.12 is equivalent to 3.11.
Chapter 3. Transformation and Feature Extraction 32
STx(τ, f) =
∫ +∞
−∞X(α+ f)e
− 2π2α2
f2 ej2πατdα
=
∫ +∞
−∞
[ ∫ +∞
−∞x(t)e−j2π(α+f)tdt
]e− 2π2α2
f2 ej2πατdα
=
∫ +∞
−∞
∫ +∞
−∞x(t)e−j2π(α+f)te
− 2π2α2
f2 ej2πατdtdα
=
∫ +∞
−∞
∫ +∞
−∞x(t)e−j2πfte
− 2π2α2
f2 ej2πα(τ−t)dαdt
=
∫ +∞
−∞
[ ∫ +∞
−∞e− 2π2α2
f2+j2πα(τ−t)
dα
]x(t)e−j2πftdt.
(3.13)
The integral of a Guassian function in a general form can be written as
∫ +∞
−∞e−ax
2+bx+c =
√π
aeb2
4a+c, (3.14)
which can be used for the integration inside Equation 3.13. Hence
STx(τ, f) =
∫ +∞
−∞
[√πf2
2π2e
−4π2(τ−t)2f2
8π2
]x(t)e−j2πftdt
=
∫ +∞
−∞
[√f2
2πe
−(τ−t)2f22
]x(t)e−j2πftdt
=
∫ +∞
−∞x(t)
|f |√2πe
−(τ−t)2f22 e−j2πftdt
(3.15)
which is equivalent to Equation 3.11.
4.1 Discrete S transform and its implementation
Using equation 3.12, we can utilize Fast Fourier transform to increase the computational
efficiency. S transform can be written in discrete time by
STx[m, k] =N−1∑p=0
X[p+ k]e−2π2p2
k2 ej2πpm. (3.16)
Some part of the implementation can utilize fast Fourier transform and inverse fast Fourier
transform. However, multiplication of the Gaussian window e−2π2p2
k2 has to be done for every p
and m, and thus the algorithmic complexity is O(N2). Contour plots of S transform are shown
in Figure 3.5. When this is compared to the contour plot of short-time Fourier transform in
Chapter 3. Transformation and Feature Extraction 33
Figure 3.5: Contour diagram of ST coefficients
Chapter 3. Transformation and Feature Extraction 34
Figure 3.2, we can see clearer and more localized representation of the signal. Especially the
high-frequency disturbances such as impulsive transient, notch were accurately captured and
shown.
Algorithm 4 shows the implementation of S transform and feature extraction. Since the
discrete sample size is large, we will sample it down by d. In the implementation, the window
size, Q, was set to 12 cycles of the fundamental frequency, and the frequency sampling was
done at every 240 Hz.
Algorithm 4 Implementation of Feature extraction with Discrete Time S transform
1: for i from 1 to 2N/Q do2: v ← v[i ∗Q/2 + 1 : i ∗Q/2 +Q]3: V ← FFT (v)4: for k from 1 to l + 1 do5: for p from 1 to Q do
6: B[p]← V [p+ k]e−2π2p2
k2
7: end for8: D[:, k]← IFFT (B)9: end for
10: STx[i ∗Q/2 +Q/4 + 1 : i ∗Q/2 + 3Q/4]← D[Q/4 + 1 : 3Q/4]11: end for12: x← |STx[:, 1 : d : K]|13: if online then14: load(xmean, xvar)15: x[m, :]← (x[m, :]− xmean)/
√xvar ∀m
16: end if17: if offline then18: for k from 1 to end do19: xmean[k]← mean(x[:, k])20: xvar[k]← var(x[:, k])21: end for22: x[m, :]← (x[m, :]− xmean)/
√xvar ∀m
23: save(xmean, xvar)24: end if
5 Comparison of Transforms
In this section, we give both theoretical and experimental comparisons of the transforms.
At this stage, the extracted features are too ambiguous to compare their performances. Their
performance will be evaluated after applying a standard classifier in Chapter 5. The theoretical
Chapter 3. Transformation and Feature Extraction 35
comparison was given by Ventosa et. al. [4], and this section will briefly go over it. The com-
parison is done on a continuous version of transforms to establish straightforward mathematical
relationships.
Firstly, we compare S transform and Fourier transform. S transform can recover Fourier
transform by following equations,
∫ +∞
−∞STx(τ, f)dτ = X(f). (3.17)
This can be seen by directly substituting and using equation 3.14,
∫ +∞
−∞STx(τ, f)dτ =
∫ +∞
−∞
∫ +∞
−∞x(t)
|f |√2πe−
(τ−t)2f22 e−j2πftdtdτ
=
∫ +∞
−∞
[ ∫ +∞
−∞e−
(τ−t)2f22 dτ
]x(t)
|f |√2πe−j2πftdt
=
∫ +∞
−∞
[√2π
f2
]x(t)
|f |√2πe−j2πftdt
=
∫ +∞
−∞x(t)e−j2πftdt = X(f).
(3.18)
Therefore, we can see that summing S transform over time gives Fourier representation of the
signal. Short-term Fourier transform is a windowed version of Fourier transform, and within
the window, the above equation holds. Next, the relationship between the continuous wavelet
transform and S transform is shown.
STx(τ, f) =
∫ +∞
−∞x(t)
|f |√2πe−
(τ−t)2f22 e−j2πftdt
= e−i2πfτ∫ +∞
−∞x(t)
|f |√2πe−
(τ−t)2f22 ej2πf(τ−t)dt
(3.19)
We give the definition of Morlet Wavelet,
ψ(t) =1√2πe
12t2ej2πft (3.20)
and substitute s = 1f to replace the frequency with scale. Then,
Chapter 3. Transformation and Feature Extraction 36
STx(τ, f) = e−i2πτ/s∫ +∞
−∞x(t)
1
|s|√
2πe−
12
( t−τs
)2e−j2π( t−τs
)dt
=e−i2πτ/s√|s|
1√|s|
∫ +∞
−∞x(t)ψ∗(
t− τs
)dt
=e−i2πτ/s√|s|
CWTψx (τ, s)
(3.21)
Now we can relate S transform with the continuous wavelet transform by the phase factor
e−i2πτ/s√|s|
. Figure 3.6 summarizes the theoretical comparison. Although continuous transform
shows the direct relationship between transforms, the discrete implementation gives perk for
the wavelet transform over short-time Fourier transform and S transform due to its efficiency
in implementation.
Figure 3.6: Relationships between wavelt transform, S transform and Fourier transform [4]
Although S transform may appear to be a good feature, S transform and short-time Fourier
transform shares a disadvantage compared to discrete wavelet transform. Both S transform
and short-time Fourier transform introduce redundant representation if the frequency sampling
points are too many [38]. The redundant representation means the feature input size is larger
for the classifier, and there are more computations involved. If the sampling point is too low, it
may not convey sufficient information for classification. While S transform shows more favorable
characteristics as shown in Figure 3.5, we will see that the classification accuracy is not as good
as it was expected in Chapter 5. This issue may come from insufficient frequency sampling
points. Currently, there is no consensus on how to effectively select the limited sampling point
in the frequency, and it needs further investigation.
In figure 3.7, we show an example features that were obtained based on Algorithms 2, 3 and
Chapter 3. Transformation and Feature Extraction 37
4. These are the features that will be sent to the classifier for both offline training and online
classification.
Figure 3.7: An example of waveform and extracted features based on different transforms
Algorithmic complexities of proposed transforms are also important for implementation,
and they are given in the table 3.1. While the short-time Fourier transform computes the
features in O(Q logQ) with fast Fourier transform, the S transform requires O(Q2) for comput-
ing feature per window with size Q. The discrete wavelet transform is most efficient with the
implementation as described in the previous section.
While the characterization has different advantages and disadvantages in terms of efficiency
in implementation and characterization, there is a fundamental limitation in extracting features.
The features from the transforms are based on the frequencies of the signal, and they are subject
to the Heisenberg uncertainty principle.
Chapter 3. Transformation and Feature Extraction 38
Table 3.1: Comparison of computational complexity of feature extraction algorithms
Algorithmic complexity
short-time Fourier transform O(NQ logQ)
discrete wavelet transform O(N)
S transform O(NQ2)
5.1 Common limitations of the feature
The Heisenberg uncertainty principle states the following. Given the following variables,
mt =
∫ ∞−∞
t|x(t)|2dt
σt =
[ ∫ ∞−∞
(t−mt)2|x(t)|2dt
] 12
mf =
∫ ∞−∞
f |X(f)|2df
σf =
[ ∫ ∞−∞
(f −mf )2|X(f)|2df] 1
2
(3.22)
where mt and σt are the average and uncertainty in time, and mf and σf are the average and
uncertainty in frequency, Heisenberg uncertainty principle states that
σtσw ≥1
2. (3.23)
This means there is a fundamental trade-off between the localization of the feature and
uncertainty of feature. As we try to extract more accurate frequency feature, the localization
is lost. And as we obtain more accurate localization, there is more uncertainty in the frequency
feature. The wavelet transform and S transform tries to address the issue by having windows
in various sizes depending on the frequency that we are trying to capture. While we expect
the short-time Fourier transform to perform significantly worse than the other transforms, we
will see that if the classifier is powerful enough, it may not have as significant impact as it
was expected. The implication of Heisenberg uncertainty principle is more important in the
transition state, which is the moment that the disturbance happens. The classification during
the transition outputs features with high uncertainty. As a result, we expect the majority of
misclassification to be in the transition state, and we will see this in Chapter 5.
Chapter 3. Transformation and Feature Extraction 39
6 Summary
In this chapter, we went over the short-time Fourier transform, wavelet transform and
S transform to extract features that will be used for the classifier in the next chapter. We
presented the relationships and comparisons between the transforms and discussed pros and
cons of each transform based on its characterization and efficiency in implementation.
Chapter 4
Classification with Long Short Term
Memory
1 Background
The previous chapter discussed about how different transform can refine the waveform
to features appropriate for classification. In this chapter, we finally describe the classifier
that will output the types of power quality disturbance. In general, classifiers are associated
with parameters that implicitly or explicitly define thresholds. The explicit parameters are
determined by the engineers, and it can be used when the disturbance is directly related to
the parameter. For example, we could define voltage sag as the time period where voltage
magnitude is greater than 1.1 p.u. The voltage magnitude would be an explicit parameter
classifying voltage sag and swell with a threshold. However, the classification with the explicit
parameters usually involves a tree structure classification. The tree structure will become very
complex and difficult to tune as the number of classes increases. In addition, the modification
of the classification becomes very challenging and may require much time for engineers to
reconfigure the tree structure.
The alternative is to use implicit parameters, and example of this approach is the neural
network. Neural network is one of the most successful techniques that trains a classification
system to find patterns in training data. We will use the supervised learning to figure out the
40
Chapter 4. Classification with Long Short Term Memory 41
implicit parameters of the classifier, which are the weights and bias of the neural network. The
implicit parameters are determined systematically, and there is no need for manual work in
order to modify the classifier and to reconfigure the parameters.
Neural network was invented in 1950s, inspired by how the brain works. Neural network
distributes the pattern matching across many nodes, which corresponds to the neurons in the
brain. Recently, it made great achievement in numerous fields including speech recognition, ob-
ject recognition, natural language processing, etc [34, 39, 40]. While there are other approaches
such as Support Vector Machine and neural tree, the deep neural network showed most recent
success across many fields. In the next chapter, different architectures of the neural network
will be reviewed.
2 Feedforward Neural Network
Feedforward neural network (FNN) is a basic neural network architecture that has multiple
hidden layers with the direction of the edge from a layer to the layer above. This structure uses
input in the bottom layer and computes layer by layer from the bottom to top where the top
layer is the output. An example of 3 layer feedforward neural network is shown in figure 4.1.
Figure 4.1: Feed forward neural nentwork architecture
The neural network is a relation of the input x ∈ RM×T with output y ∈ RN×T . The true
label will be denoted by y ∈ RN×T . M is the number of features obtained from the feature
extraction algorithm, N is the number of classes, and T is the number of data points, which is
the number of time steps. Individual training data, x(t), is the t th column of x. The output
y(t)i ∈ {0, 1} indicates which class the data belongs to. It also satisfies the condition
∑i y
(t)i = 1.
Chapter 4. Classification with Long Short Term Memory 42
A FNN with l hidden layers has parameters that are l + 1 weight matrices W = (w0, ..., wl)
and biases b = (b1, ..., bl+1). We will denote the parameters by θ = [W, b]. Given kth layer has
mk units, Wk ∈ Rmk×mk−1 , and bk ∈ Rmk , and m0 = M . The output y = [y(1)... y(T )] can be
computed from the input x using forward propagation with the following algorithm,
Algorithm 5 Forward Propagation for FNN
1: set z0 ← x(t)
2: for i from 1 to l do3: gi ←Wi−1zi−1 + bi4: zi ← σ(gi)5: end for6: Output y(t) ← P (y(t)|zl)
The function σ is an activation function, and P (y(t)|z) is the softmax function, which will
be explained later. The activation could be the sigmoid function, hyper-tangent function,
and rectified linear function, and the sigmoid function will be used in this thesis. The last
hidden layer is connected to the output by the softmax function, which produces a probability
distribution.
The architecture of neural network also allows an efficient algorithm for the derivative of
an objective function with respect to its parameters. Using the chain rule, the computation is
done in the reverse way of the forward propagation, so it is called the backward propagation.
Backward propagation requires the output of each node zi, which can be computed by running
forward propagation before running the backward propagation. The following algorithm shows
the implementation of backward propagation.
Algorithm 6 Backward Propagation for FNN
1: dzl ← dL(y, y)/dzl2: for i from l to 1 do3: dgi ← σ′(xi) · dzi4: dzi−1 ←W T
i−1dgi5: dbi ← dxi6: dWi−1 ← dgiz
Ti−1
7: end for8: Output ∇θL← [dW0, ..., dWl, db1, ...., dbl+1]
We will define the objective function or the loss function L(y, y) in the later section. The
backward propagation will be utilized to set the parameters θ.
Chapter 4. Classification with Long Short Term Memory 43
2.1 Decision making with Softmax Function
Existing algorithms for power quality monitoring were often equipped with additional layer
of algorithms to interpret the output from the neural network [3]. In our thesis, we propose
softmax function, which is currently the most widely used technique for multi-class classifier. It
only changes the output layer of the neural network and is much easier and simpler to implement
than having additional layer of algorithm. The softmax function is defined as follows,
P (y = j|z) =exp(wTj z)∑k exp(w
Tk z)
. (4.1)
Since P (y = j|z) ∈ (0, 1) and∑
j P (y = j|z) = 1, it satisfies the condition for a probability
distribution. The compact notation, P (y|z) = [P (y = 1|z) ... P (y = 14|z)]T , is used where
N = 14 is the number of disturbance classes in our definition. To determine which class the
signal belongs, the one with maximum probability is chosen. We can reconstruct the output
prediction by
yi =
1 if i = arg maxj P (y = j|x)
0 otherwise
(4.2)
where we assign class based on the maximum probability. Based on the output prediction, the
error rate can be defined as
E =1
T
T∑t=1
y(t) · y(t) (4.3)
where y(t) · y(t) = 1 if the prediction matches the actual label, and 0 otherwise. This is summed
over all the training cases and divided by the total number of training data.
2.2 Training Neural Network
If we consider FNN as the input to output mapping relation, then
y(t) = f(x(t), θ); (4.4)
is our prediction based on the input x with fixed hyper-parameters such as number of hidden
layers and number of nodes at each layer. Our goal is to get our prediction y as close as possible
Chapter 4. Classification with Long Short Term Memory 44
to its label y by adjusting the parameter, θ = [W, b]. We can formulate this as an optimization
problem where we first define the negative log probability of the target class,
L(y, y) = − 1
T
∑t
y(t) · log(y(t)), (4.5)
where only the log probability of the right class is selected and summed. This is the cross
entropy between the actual output and the prediction. Then, θ is the argument that minimizes
the cross entropy function,
θ = argminθL(y, y) = argmin
θL(f(x, θ), y). (4.6)
In order to find this θ, we use the gradient descent method with the learning rate α,
θk ← θk−1 + αk∇θL (4.7)
where ∇θL is the gradient of the loss function L with respect to θ, and αk is the learning rate
at step k. The gradient, ∇θL, can be calculated efficiently with the backpropgation presented
in Algorithm 6. In addition, we decay the learning rate to yield better performance,
αk = α0 · η(k/K) (4.8)
where α0 is the initial learning rate, η is the decay rate, and K is the decay steps. In this
thesis, α0 = 0.01, η = 0.9 and K = 2000 were used. While the gradient descent method is
straightforward, there are much better methods for this optimization. Adagrad [41], RMSProp
[42] and ADAM [43] are some of the popular algorithms for training neural network. For
recurrent neural network, there are Hessian free Newton’s method as well [44]. In this thesis,
ADAM optimizer was used, and the training was done with the mini-batch of size 128.
Chapter 4. Classification with Long Short Term Memory 45
2.3 Windowed Feedfoward Neural Network
While feedforward neural network is a basic neural network architecture, there is a limitation
for application in time series data such as power quality disturbance classification. In voltage
and current waveforms, each data point is part of a sequence. FNN has access to only the
instantaneous data, and it is unable to retrieve information from neighbouring data in the
sequence. Heisenberg uncertainty principle states the fundamental limitation on the certainty
of the localization and characterization.
We introduce windowed feedforward neural network (wFNN) in this thesis as an attempt
to address the problem of FNN having access to only instantaneous features. The feedforward
neural network architecture in equation 4.4 can be reformulated to include the past data,
y(t) = f(x(t−w), ... , x(t), θ); (4.9)
where w+1 is the size of the window. The improvement with the windowed feedforward neural
network is that the neural network has access up to w previous data in the sequence. However,
there are still disadvantages with this approach.
1. Windowed neural network does not have any way to access the data prior to the window.
2. The length of input data increases proportional to the size of the window. It may require
more capacity and training time for the neural network.
3. If the length of the window is too large, it will require more memory and computational
power for the monitoring system.
The size of window needs to consider both benefits of accessing information as well as the
complexity and dilution of the feature. In Chapter 5, we will see that this approach is not very
effective in improving the accuracy of FNN. The increased window step increases the size of
input features, and the benefit of window is not realized with fixed capacity of FNN. In the
next section, we introduce the recurrent neural network that elegantly addresses the problems
stated above with feedforward neural network and windowed feedforward neural network.
Chapter 4. Classification with Long Short Term Memory 46
3 Recurrent Neural Network
Recurrent neural network describes the neural network that has recurrence in its structure.
The recurrence is when the neural network structure yields a directed cycle within the network.
Figure 4.2 (a) shows a simple single node neural network with a directed cycle within the node
h. This cycle allows information to flow within itself, enabling neural network to maintain
information within the network.
Figure 4.2: (a) A simplified single node recurrent neural network (b) Unrolled version of thenetwork through time
Recurrent neural network shares similarities with dynamical system studied in many engi-
neering applications including power systems. The unrolling of the recurrent neural network in
Figure 4.2 makes the similarity more apparent. Figure 4.2 (b) is exactly the same representation
as a linear dynamical system assuming the node activation functions are linear. Non-steady
state power system analysis is already very familiar with this type of model. Both discrete
time-invariant dynamical system and a single neuron RNN can be written as
h[t] = e(h[t− 1], x[t], θ)
y[t] = g(h[t])
(4.10)
where h, x, and θ are state, input and system parameters respectively. The function e is the
activation function of RNN, and the function g is the output function. The computation of
recurrent neural network above goes forward in time, and it can handle continuously sampled
data very elegantly. The forward propagation through time in Algorithm 7 describes how the
computation can be carried out in this architecture.
The pre-activation value u(t) is a linear combination of the input unit and the hidden state
Chapter 4. Classification with Long Short Term Memory 47
Algorithm 7 Forward Propagation Through Time for RNN
1: for t from 1 to T do2: u(t) ←Whxx
(t) +Whhh(t−1) + bh
3: h(t) ← e(u(t))4: o(t) ←Wohh
(t) + bo5: z(t) ← g(o(t))6: y(t) ← P (y(t)|z(t))7: end for8: Output y
from the previous time step t − 1 plus the bias bh. The pre-activation value for the output ot
is a linear function of the hidden state ht. Similar to the feedforward neural network, finding
the derivative of the network with respect to the parameters, θ = [W,h], can be implemented
efficiently using the chain rule. Backward propagation through time algorithm considers the
unrolled recurrent neural network as a big neural network and goes backward in time to get
the gradient. The implementation of the algorithm can be found in [45, 44]. The training
of the recurrent neural network can be done in the same way as training feedforward neural
network. The computation of the gradient will be replaced by backpropagation through time.
However, the training of general recurrent neural network is much more difficult than FNN due
to vanishing gradient problem [46]. To overcome this issue, we use the gating of the recurrent
unit with Long Short-Term Memory.
4 Long Short-Term Memory
Long Short-Term Memory (LSTM) is a special type of RNN architecture that avoids the
vanishing gradient problem by utilizing memory units. LSTM was first proposed by Hochreiter
and Schmidhuber in 1997 [47]. Gating units controls the flow of the information through
time, which gives LSTM’s ability to memorize important features. LSTM showed successful
applications in speech and handwritten text recognition tasks, and partially-observable Markov
Decision Processes [39, 40, 48], which are similar to power quality classification problem in
terms of recognizing characteristics of a continuous signal. Figure 4.3 shows the architecture of
LSTM unit.
Chapter 4. Classification with Long Short Term Memory 48
Figure 4.3: Conventional approach for power quality disturbance classifier [3]
At the classification time step t, LSTM computes the following:
at = tanh(Whhht−1 +Whxxt + ba) (4.11a)
it = sigmoid(Wihht−1 +Wixxt + bi) (4.11b)
ft = sigmoid(Wfhht−1 +Wfxxt + bf ) (4.11c)
ot = sigmoid(Wohht−1 +Woxxt + bo) (4.11d)
ct = ft � ct−1 + it � at (4.11e)
ht = ot � tanh(ct) (4.11f)
yt = ht (4.11g)
where � denotes the elementwise multiplication, igt , ft and ot denotes input, output, and forget
gates respectively, and ct is the memory unit.
4.1 Advantages Recurrent Neural Network
As mentioned in the limitation of feedforward neural network, recurrent neural network
address the issue with limited information sharing along the sequence. To illustrate point,
Figure 4.4 shows configurations of feedforward neural network, windowed feedforward neural
network, and LSTM. The figure shows that feedforward neural network has no communication
between the sequence and the windowed feedforward neural network has access to only fixed
neighbours . The long short-term memory is the most elegant form and the system has access
to information from previous times.
Chapter 4. Classification with Long Short Term Memory 49
Figure 4.4: (a) Feedforward neural network architecture (b) Windowed feedforward neuralnetwork architecture (c) Long short-term memory architecture
For power quality monitoring purpose, this gives an advantage in classification because
many of the disturbances last for a long time. For example, if there was harmonics in the
waveform in the previous step, the chance that it is continuing is high. LSTM incorporates
these information in making decision for the classification. This temporal information allows
the algorithm to build confidence over time and to hold on to various information such as
existing frequency over time. It also reduces the false positive rate by the forget unit, which
rapidly drains the confidence in classification if those disturbances are no longer observed.
5 Summary
In this chapter, we proposed recurrent neural network as the new classifier for the power
quality disturbance classification. We presented the limitation of feedforward neural network
and discussed how recurrent neural network passes information through time. Particularly, we
use the Long-Short Term Memory, which employs gating of input, output and memory cell to
prevent the gradient from vanishing or blowing up during the training. In the next chapter,
we will show and evaluate the performance of compared to feedforward neural network and
windowed version of it.
Chapter 5
Results and Case Studies
In this chapter, we synthesize feature extraction algorithms and classifiers to build the
automatic power quality monitoring system. We test the performance of the system with
feature extractions based on different transforms. We also compare the performance of LSTM
and traditional FNN.
1 Data Generation and Training
Power quality disturbance data was generated with the sampling frequency of 3840 Hz,
and the length of the data was 10 million time step, which corresponds to approximately 43
minutes. The effect of changing the sampling frequency will be presented in the later section.
Data generation and feature extraction processes were implemented with MATLAB, and the
classifier was implemented with Tensorflow developed by Google [49]. Tensorflow is an open
source software library that is designed for research in machine learning. The descriptions of
algorithms and parameters were done according to Chapter 4. Since the training data could be
generated as much as needed, the neural network is free from the issue of over-fitting within the
data. However, later in the case studies, we will observe that the monitoring system over-fits to
the generated data, and there could be cases where the classification does not work very well.
This is a fundamental issue with neural network approach based on generated data for power
quality monitoring.
Figure 5.1 shows an example of how the cross entropy drops as the training progresses. The
50
Chapter 5. Results and Case Studies 51
Figure 5.1: Cross entropy of the training and testing data as the training progresses
cross entropy of training and testing data were very close to each other indicating the data set
was large enough to avoid the over-fitting within the generated data.
2 Results
Figure 5.2 shows an example of output from the classifier. LSTM was used with features
from different transforms. The shaded area shows the output of the softmax layer, which can
be interpreted as the probability of the colored disturbance. The square boxes are the true
label where the output is either one or zero for each class. While the cross entropy was the
objective for the minimization, we only show the accuracy throughout this chapter. The cross
entropy includes the confidence of the result, but the accuracy is what matters at the end and
gives a more straightforward interpretation. The accuracy of the monitoring system is defined
as (1− E) where E is the error rate defined in Equation 4.3.
2.1 Comparisons of the transformations
We first show the comparisons of the accuracy between different combinations of feature
extraction and classifier in Table 5.1. Feedforward neural network was built with 3 hidden
layers and 8 hidden units in each layer. The result shows that LSTM increases the performance
by 2.95%, 3.40 % and 3.86% for STFT, DWT, and ST respectively. It shows that the discrete
Chapter 5. Results and Case Studies 52
Figure 5.2: Output of the automatic power quality monitoring system
Chapter 5. Results and Case Studies 53
wavelet transform works the best with LSTM in the overall result. However, it shows that the
short-time Fourier transform is quite comparable. The performance of S transform was not as
good as it was expected since the frequency variation did not work in the classification. This
is likely due to insufficient sampling in frequency, and it will require further investigation.
Table 5.1: Comparison of accuracy of FNN and LSTM in percentage
Feature STFT DWT ST
Classifier FNN LSTM FNN LSTM FNN LSTM
(i) normal 94.44 95.15 91.88 95.01 91.02 92.33
(ii) interruption 88.23 88.86 89.00 90.22 88.24 89.33
(iii) sag 88.31 89.54 91.52 89.78 87.04 87.60
(iv) swell 87.78 85.97 90.20 88.9 84.68 81.28
(v) impulsive 20.66 62.87 82.29 86.54 17.63 50.90
(vi) oscillatory 74.14 79.65 76.59 82.68 74.49 85.55
(vii) dc offset 90.78 90.94 93.54 91.7 83.16 84.38
(viii) harmonics 82.08 81.37 90.67 93.82 89.54 80.65
(ix) notch 75.79 83.43 82.07 83.37 59.49 85.12
(x) flicker 86.61 89.04 74.49 87.06 68.58 70.26
(xi) noise 89.98 93.53 94.04 96.26 88.48 98.09
(xii) frequency variation 82.58 91.33 67.71 84.1 0.00 0.00
(xiii) sag and harmonics 86.89 82.39 86.09 80.63 78.83 86.21
(xiv) swell and harmonics 81.8 82.91 86.10 86.5 85.95 83.70
overall 87.04 89.61 88.05 91.04 79.66 82.74
Since short-time Fourier transform and S transform have disadvantages with computational
complexity and sampling resolution in frequency, we will focus on the discrete wavelet transform
based features for testing the effect of sampling frequency and output frequency. Table 5.2 is
the confusion matrix for discrete wavelet transform with recurrent neural network. The values
are in percentage. It shows that the highest confusion was made between sag & harmonics and
just harmonics with 12.54 % followed by the swell & harmonics and harmonics with 6.96 %.
The impulsive transient was often confused with the notch with 4.47 %. These were, in fact,
the ones that can be difficult for human to distinguish especially when the sag or swell is not
significant.
Chapter 5. Results and Case Studies 54
Table 5.2: Accuracy of LSTM with feature from DWT(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 94.07 0.06 0.05 0.16 1.51 0.5 0.88 0.18 0.39 0.45 0.39 1.23 0.03 0.1
(ii) 5.41 89.9 2.15 0 2.03 0.03 0.01 0.01 0.25 0 0.03 0.07 0.12 0
(iii) 6.59 1.33 90.65 0 0.57 0.08 0.01 0.01 0.49 0 0.03 0.18 0.06 0
(iv) 5.95 0 0 90.54 0.69 0.05 0 0 0.59 2.07 0.01 0.02 0 0.09
(v) 6.13 0 0 0.01 89 0.79 0.01 0 3.76 0.09 0.15 0 0 0.06
(vi) 12.84 0.08 0.22 0.25 0.45 81.45 0.41 1.21 0.54 0.57 1.29 0.17 0.11 0.42
(vii) 7.34 0 0 0 0 0 92.66 0 0 0 0 0 0 0
(viii) 5.45 0 0 0 0 0.12 0 93.57 0.01 0 0 0 0.75 0.09
(ix) 16.03 0 0 0.07 1.36 0.13 0.03 0.01 81.93 0.03 0.3 0.11 0 0
(x) 8.21 0 0 4.26 2.2 0.12 0 0.01 0.29 84.63 0.01 0.02 0 0.25
(xi) 1.09 0 0 0 0.21 1.08 0 0 1.25 0 96.36 0 0 0
(xii) 17.63 0.02 0.29 0 2.11 0.42 0.01 0.25 0.11 0 0.04 79.05 0.08 0
(xiii) 5.59 0.32 0 0 0.86 0 0 12.51 0.28 0 0.01 0 80.42 0
(xiv) 5.11 0 0 0.03 0.77 0.09 0 6.28 0.51 0.14 0.01 0 0 87.07
2.2 Effect of the size of window
In this section, we adjust the window size of short-time Fourier transform and discrete
wavelet transform to find the optimal window size. The default settings for sampling frequency
was 3840 Hz, and window size for STFT and DWT were 1 cycles and 0.5 cycles of the funda-
mental frequency respectively.
Window in Discrete Wavelet Transform
Figure 5.3 shows the effect of varying size of the energy window in features from discrete
wavelet transform. Some notable classes are highlighted with the legend. It shows that if
the window size is too small or too big, the accuracy of impulsive transient suffers the most.
Frequency variation and notch also suffer from a small window because the uncertainty in
Figure 5.3: Performance of LSTM with various window sizes of DWT
Chapter 5. Results and Case Studies 55
determining periodicity increases. The ideal size of the window was determined to be about
half the cycle of the fundamental frequency.
Window in Short-Time Fourier Transform
The window size of short-time Fourier transform was varied in this section as shown in
Figure 5.4. It showed that impulsive transients are difficult to distinguish when the window
size is too small. Having a small window size, the fundamental and other frequencies can be
significantly impacted by impulsive transients. The performance slightly improves with the
window size of 3 cycles, but it significantly drops again when the window is too large. Having
a large window loses the localization of the impulsive transient, and the result reflects this
intuition. Voltage sag and swell also decreased with increasing the window size. This clearly
illustrates the limitation of the short-time Fourier transform, which has only the fixed window
size.
Figure 5.4: Performance of LSTM with various window sizes of STFT
Window in windowed Feedforward Neural Network
In this section, the window size of the wFNN network was varied. Although it was expected
that having a larger window in wFNN will increase the accuracy, the result did not show this
to be true. The neural network used for the wFNN was fixed with 3 layers with 8 hidden units
in each layer. Increasing the window size of wFNN increased the size of the input units, and it
Chapter 5. Results and Case Studies 56
rather degraded the performance of the classifier.
Figure 5.5: Performance of wFNN with various window sizes
Sampling frequency
In this section, the sampling frequency was varied in logarithmic scale as shown in Figure
5.6. Due to aliasing, we expect the accuracy to drop as the sampling frequency goes down. As
the figure shows, the high-frequency disturbances such as impulsive, oscillatory and notch were
significantly affected if the sampling frequency was too low.
Figure 5.6: Performance of LSTM with various sampling frequencies
Chapter 5. Results and Case Studies 57
Output frequency
The required frequency of the output may not be as frequent as the sampling frequency. The
output frequency was varied in this section to see if varying the output frequency will have a
similar effect as varying the sampling frequency. The data was obtained by sampling the input
data down so that the same data can be used in testing. The input frequency was fixed to 3840
Hz, and the input-output frequency ratio was varied. The result shows that the performance of
the recurrent neural network is quite constant for different input-output frequency ratio. The
consistent performance in this case study indicates that the training set was large enough to
avoid the over-fitting, and the learning rate was decreased slowly enough with enough duration.
Figure 5.7: Performance of LSTM with various output frequencies
2.3 Distribution of Misclassification
In this section, we investigate and determine when the misclassification occur. We plot
the distribution of time steps away from the transition time for every misclassification. The
transition time is the time that a disturbance starts to occur or end. From the Heisenberg
uncertainty principle, we expect most of the misclassification to occur near the transition time.
In Figure 5.8, the distribution of the entire misclassification is plotted and its distance from
the transition time. It shows that the mode of the distribution is at the transition time and
81.61% of the misclassification occur within a cycle away from the transition time.
Chapter 5. Results and Case Studies 58
Figure 5.8: Overall distribution of misclassification
Figure 5.9: Distribution of misclassification for individual power quality disturbances
Chapter 5. Results and Case Studies 59
In Figure 5.9, we show the distribution for individual disturbance. DC offset, notching,
flicker, and sag/swell and harmonics were disturbances that had misclassification away from
the transition time. This section shows that majority of our misclassification is associated with
the fundamental limitation for power quality disturbance classification.
2.4 Effect of Noise
In this section, we present the effect of noise in classification accuracy. We test both feedfor-
ward neural network and LSTM and who and show the results in Table 5.3. The signal-to-noise
ratio (SNR) of the waveform was 37.1 dB. The feature extraction was based on discrete wavelet
transform with half of cycle window size and 3840 Hz as the sampling frequency. The result
showed that the classification of dc offset, flicker and sag and harmonics drops significantly.
The SNR with respect to those disturbances are low, and as a result, it becomes difficult to
classify them in general. The result shows that the performance of LSTM is consistently higher
than feedforward neural network under noise.
Table 5.3: Comparison of LSTM and FNN in data with noise
Feature FNN LSTM
Classifier Without Noise With Noise Without Noise With Noise
(i) normal 91.88 91.41 95.01 93.31
(ii) interruption 89.00 90.15 90.22 89.28
(iii) sag 91.52 87.24 89.78 90.50
(iv) swell 90.20 91.08 88.90 91.26
(v) impulsive 82.29 82.86 86.54 88.25
(vi) oscillatory 76.59 74.24 82.68 79.75
(vii) dc offset 93.54 19.68 91.70 19.96
(viii) harmonics 90.67 93.32 93.82 92.82
(ix) notch 82.07 82.31 83.37 81.75
(x) flicker 74.49 28.46 87.06 39.49
(xi) noise 94.04 92.60 96.26 95.29
(xii) frequency variation 67.71 56.08 84.10 54.35
(xiii) sag and harmonics 86.09 0.00 80.63 3.43
(xiv) swell and harmonics 86.10 90.19 86.50 84.10
overall 88.05 78.77 91.04 80.08
Chapter 5. Results and Case Studies 60
3 Case Studies
In this section, we show case studies with data from electrical simulation in SimPowerSys-
tems/MATLAB. These data are a more realistic representation of the power quality distur-
bance, and we evaluate the performance and discuss the limitation of the proposed automatic
monitoring system.
Interruption
The interruption was generated by opening the breaker of the system at 0.04 second and
reclosing it at 0.12 second. This was one of the straightforward classification tasks. There
is a high misclassification rate in the first cycle of the transition time, and this is due to the
definition of the peak voltage in Equation 2.2. In order to obtain the peak voltage, it needs to
wait until the new peak is reached.
Chapter 5. Results and Case Studies 61
Figure 5.10: Case study of interruption
Chapter 5. Results and Case Studies 62
Oscillatory Transient
In the second case study, the oscillatory transient was generated by a short disconnection
of RL load. Figure 5.11 shows the waveform as well as the output from RNN and FNN. As
the oscillation phases away, the FNN reduces its confidence with the oscillatory transient. For
RNN, the classifier is constantly confident that it is the oscillatory transient until the end.
Figure 5.11: Case study of oscillatory transient
Chapter 5. Results and Case Studies 63
Voltage Sag
Voltage sag was generated by a three-phase fault on 230 kV line connected to a synchronous
machine. The fault happens at 0.3 seconds and is cleared at 0.5 seconds. The system goes
through the voltage sag and voltage swell after the clearance. The classification result identifies
voltage sag as well as voltage swell afterward.
Figure 5.12: Case study of voltage sag
Chapter 5. Results and Case Studies 64
Harmonics
Harmonics was generated by the three phase AC system connected to a rectifier with a
constant DC load. Although the proposed method has performed well in the previous studies,
this case study shows the limitation of the proposed power quality monitoring system. The
harmonic occurs between 0.4 to 0.6 second with multiple harmonics combined. The output of
the classifier sees the state as normal. This indicates that there is an over-fitting towards the
generated data, and there could be cases where the classifier fails to generalize towards any
Figure 5.13: Case study of harmonics
Chapter 5. Results and Case Studies 65
power quality events.
4 Limitations of the Proposed Power Quality Monitor
As we saw in the case study with harmonics, the proposed training approach with generating
data based on mathematical equation has limitations with representing the real power quality
disturbances. Although we propose adding Gaussian noise to the normal waveform as the
generalization approach, it will have certain cases where it fails to generalize to realistic data
correctly. The limitations we saw throughout our thesis are the followings.
1. The data generation process described in Chapter 2 is prone to over-fitting towards the
synthesized data.
2. The performance of the classifier significantly reduces with increase in noise for the feature
extracted based on discrete wavelet transform.
3. Short-time Fourier transform and S transform requires an efficient way for computation
and reducing the redundant representations
The first point can be resolved with more data from the real examples that could be used
for training the neural network. This may require much manual work in order to label the data.
While S transform shows more promising results against the noise [24], the efficient computation
method and identifying frequency samples remains as challenging. For this approach, we may
want to use unsupervised learning methods such as principle component analysis.
5 Summary
In this chapter, we synthesized the data generation, feature extraction, and neural network
classifier from previous chapters to build a working automatic classifier of power quality distur-
bances. We tested the system with different transformation and variables. In the case studies,
we found some limitations of the technique and provided the future research directions.
Chapter 6
Conclusion
In this thesis, we proposed the recurrent neural network for power quality disturbance
classification. We compared short-time Fourier transform, wavelet transform, and S transform
for this process. Our technique modifies existing techniques by freeing the process from the
segmentation algorithm. With this approach, we can have the class labeled at each time step
allowing further localization of the event. We propose the classification with Long Short-Term
Memory to improve the accuracy by employing the hidden units to pass information through
time within the classifier.
We examined the performance of LSTM in various settings by adjusting variables such
as sampling frequency, output frequency and parameter settings for the feature extraction
algorithms. At present, the discrete wavelet transform is the most favourable candidate due
to its efficiency in implementation. We tested the system under noise and showed some case
studies with data generated by full electrical simulation using SimPowerSystems. We showed
some of the limitations and discussed potential solutions for them as well.
In addition to improving the power quality monitor, the future direction in this research
is identifying appropriate placements for these monitors. There are some works such as [50]
that attempt to solve this problem, but we need further investigation to identify buses that can
maximize the benefit of the monitoring system to reduce the cost of deployment.
In addition, the idea of an automatic operation of the power grid and the benefit of having
the automatic monitoring system can be investigated further. Figure 6.1 shows the causal
graph of the disturbance, control, and phenomena. This thesis has focused on inferring the
66
Chapter 6. Conclusion 67
phenomena from the voltage and current waveform. With further investigation, the source of
the disturbance can be further characterized and added to the automatic classification scheme.
With the identified source of disturbance, the control action for mitigating the issue can be
automated.
Figure 6.1: Future work
The current shifts in the power system structure and operation put the operators in great
demand. With recent advancements and achievements in artificial intelligence, it may be a
potential solution for operating our future grid reliably and efficiently.
Appendices
68
Appendix A
Parameters for Monte Carlo
Simulation
In this section, we present the parameters used for the Monte Carlo simulation of power
quality disturbance generation. As it was presented earlier, the equation for the data generation
can be generalized to the equation 2.17,
v(t) = α sin(2π(60+∆ff )t)+∑i
βiexp
(−c t− tstart
tend − tstart
)cos(2πfh(t−tstart))+µ(t)+γ(t) (A.1)
for t ∈ [tstart, tend], µ(t) ∼ N(0, σ2) and c = − log( εβ ) where ε = 0.01. The parameters α, ∆f ,
β, fh, σ, γ, and the duration (tend− tstart) are sampled from a uniform distribution. The range
for the uniform distribution is given in table A.1. Note that the notch was periodcally repeated
3 to 6 times in a cycle. The data was generated continuously while the normal waveform was
inserted between every disturbances. It was implemented this way because the probability of
one of the disturbance happening right after another disturbance is very small. The length of
the waveform was N = 107, which gives about 43 minutes of waveform data with the sampling
frequency of 3840 Hz. Table A.2 shows the ratio of the classes in the generated data. The
duration ratio is the ratio of total duration of each disturbance in the generated data. The
occurrence ratio is the ratio of number of occurrence that the disturbance happened.
69
Appendix A. Parameters for Monte Carlo Simulation 70
Table A.1: Parameters for Monte Carlo simulation in power quality data generation
Par
amet
ers
α∆f
βf h
σγ
du
rati
on
min
max
min
max
min
max
min
max
min
max
min
max
min
max
(i)
nor
mal
11
00
00
00
00.0
10
00.0
50.0
5
(ii)
inte
rru
pti
on0
0.1
00
00
00
00
00
0.1
0.2
(iii
)sa
g0.
10.
90
00
00
00
00
00.1
0.2
(iv)
swel
l1.
11.
80
00
00
00
00
00.1
0.2
(v)
imp
uls
ive
11
00
±0.
1±
0.8
00
00
00
1/f s
0.0
1
(vi)
osci
llat
ory
11
00
±0.
1±
0.8
500
5000
00
00
0.0
003
0.0
5
(vii
)d
coff
set
11
00
00
00
00
±0.
001±
0.01
0.1
0.2
(vii
i)h
arm
onic
s1
10
00.1
0.2
180
900
00
00
0.1
0.2
(ix)
not
ch1
10
00.2
50.5
00
00
00
0.1
0.2
(x)
flic
ker
11
00
0.0
50.1
10
25
00
00
0.1
0.2
(xi)
noi
se1
10
00
00
00.0
50.1
00
0.1
0.2
(xii
)fr
eqva
r1
1±
1±
50
00
00
00
00.1
0.2
(xii
i)sa
g&
har
0.1
0.9
00
0.1
0.2
180
900
00
00
0.1
0.2
(xiv
)sw
ell
&h
ar1.
11.
80
00.1
0.2
180
900
00
00
0.1
0.2
Appendix A. Parameters for Monte Carlo Simulation 71
Table A.2: Ratio of disturbance classes in the generated data
Class duration ratio (%) occurrence ratio (%)
(i) normal 44.99 50.00
(ii) interruption 3.97 1.49
(iii) sag 4.31 1.61
(iv) swell 4.65 1.72
(v) impulsive 3.24 24.34
(vi) oscillatory 4.31 8.01
(vii) dc offset 4.49 1.67
(viii) harmonics 4.34 1.61
(ix) notch 4.12 1.56
(x) flicker 4.23 1.56
(xi) noise 4.24 1.57
(xii) frequency variation 4.39 1.63
(xiii) sag and harmonics 4.39 1.62
(xiv) swell and harmonics 4.32 1.61
Appendix B
Results
In this appendix, we present the confusion matrix. The Roman numerals indicates power
quality disturbances and the list is the following: (i) normal, (ii) interruption, (iii) sag, (iv)
swell, (v) impulsive, (vi) oscillatory, (vii) dc offset, (viii) harmonics, (ix) notch, (x) flicker, (xi)
noise, (xii) frequency variation, (xiii) sag and harmonics, (xiv) swell and harmonics. The units
are in percentage indicating how much percentage of disturbance in the row was classified to
the disturbance in the column.
Detail results for Table 5.1
Table B.1: Confusion matrix for FNN from DWT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 91.88 0.09 0.5 0.18 1.34 0.41 1.01 0.22 0.85 0.14 0.33 2.81 0.04 0.18
(ii) 5.58 89 2.63 0 1.83 0.05 0 0.01 0.58 0 0.03 0.03 0.26 0
(iii) 5.3 1.29 91.52 0 0.36 0.06 0 0.01 1.19 0 0.01 0.2 0.05 0
(iv) 6.06 0.08 0.01 90.2 0.64 0.02 0 0 1.12 1.57 0 0.03 0 0.26
(v) 7.21 0.01 0 0 82.29 0.27 0 0 10.17 0 0 0 0 0.04
(vi) 14.34 0.02 0.36 0.36 1.01 76.59 0.7 1.54 1.88 0.48 2.04 0.15 0.08 0.46
(vii) 6.4 0 0 0 0 0 93.54 0 0 0 0 0.06 0 0
(viii) 4.87 0 0.06 0 0 0.01 0.01 90.67 0 0.01 0 0 4.29 0.07
(ix) 15.62 0.03 0 0.04 1.99 0 0.02 0 82.07 0 0.17 0.06 0 0
(x) 14.19 0.02 0.01 8.04 1.94 0.11 0 0.03 0.45 74.49 0 0.03 0 0.67
(xi) 1.11 0.02 0.01 0 0 1.8 0 0 3.02 0 94.04 0 0 0
(xii) 29.4 0.14 0.12 0 1.6 0.2 0.01 0.35 0.37 0 0.04 67.71 0.07 0
(xiii) 4.96 0.18 0.2 0 0.58 0.02 0 7.25 0.67 0 0.01 0.03 86.09 0
(xiv) 4.65 0.05 0.03 0 0.83 0.12 0 7.47 0.7 0.04 0 0 0 86.1
72
Appendix B. Results 73
Table B.2: Confusion matrix for FNN from STFT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 94.44 0.1 0.07 0.04 0.24 1.22 0.2 0.05 0.94 0.36 1.38 0.81 0.13 0.03
(ii) 10.28 88.23 1.08 0 0 0.03 0 0 0.02 0 0.22 0.1 0.03 0
(iii) 9.07 1.66 88.31 0 0.1 0.08 0.03 0 0 0 0.13 0.55 0.06 0
(iv) 9.85 0 0 87.78 0.24 0.4 0 0.01 0.06 1.21 0.02 0.43 0 0
(v) 43.41 0 0 0 20.66 1.59 3.37 0 0.27 2.57 0 28.12 0 0
(vi) 10.01 0 0.3 0.16 5.73 74.14 4.03 0.1 0 0.8 0 4.56 0.13 0.04
(vii) 1.28 0 0.01 0 1.5 1.78 90.78 0 0 0 0 4.64 0 0
(viii) 9.1 0 0 0 0.06 0.86 0 82.08 0 0.07 0 0.42 7.41 0
(ix) 17.59 0 0 0 0.47 0.68 0.05 0 75.79 1.1 2.6 1.72 0 0
(x) 10.12 0 0 0.61 0.9 0.84 0 0 0.76 86.61 0.04 0.1 0.02 0
(xi) 4.65 0 0 0 0.06 0.45 0 0 4.27 0.02 89.98 0.56 0 0
(xii) 7.29 0 0 0 1.84 0.04 0.83 0 0.66 0 6.75 82.58 0.02 0
(xiii) 8.97 0 0 0 0.02 0.14 0 3.68 0.03 0 0.02 0.25 86.89 0
(xiv) 10.12 0 0 0.06 0.16 0.29 0 6.61 0.07 0.32 0.1 0.45 0.03 81.8
Table B.3: Confusion matrix for LSTM from STFT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 95.15 0.12 0.07 0.08 0.4 0.68 0.24 0.4 0.9 0.23 0.97 0.43 0.19 0.14
(ii) 10.91 88.86 0.01 0 0.01 0 0 0 0.01 0 0.01 0.15 0.03 0
(iii) 9.87 0.04 89.54 0 0.18 0.08 0 0 0.06 0 0 0.13 0.1 0
(iv) 10.37 0 0 85.97 0.29 0.36 0 0 0.11 2.78 0 0.08 0.03 0
(v) 23.14 0 0 0.02 62.87 1.99 1.09 0 0.74 1.3 0 8.85 0 0
(vi) 8.59 0 0.05 0.04 6.67 79.65 1.63 0.47 0.03 1.12 0.03 1.71 0.01 0
(vii) 3.91 0 0 0 2.37 1.38 90.94 0.03 0 0 0 1.37 0 0
(viii) 7.67 0 0.02 0.07 0.08 0.27 0 81.37 0.02 0 0.01 0.12 10.39 0
(ix) 13.93 0 0 0 1.77 0.36 0.03 0 83.43 0.33 0.06 0.08 0 0
(x) 9.93 0 0 0.21 0.27 0.08 0 0 0.17 89.04 0.02 0.19 0.04 0.04
(xi) 4.63 0 0 0 0.07 0.4 0.01 0 0.84 0 93.53 0.5 0 0.02
(xii) 3.28 0.01 0 0.01 0.22 0.33 0.09 0 2.84 0.01 1.88 91.33 0.01 0
(xiii) 8.09 0.02 0 0 0.04 0.11 0 9.2 0.04 0 0.02 0.09 82.39 0
(xiv) 9.43 0 0 0.12 0.14 0.12 0 6.8 0.17 0.06 0.03 0.19 0.04 82.91
Table B.4: Confusion matrix for FNN from ST features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 91.02 0.09 0.31 1.6 0.7 0.08 1.55 1.94 1.29 0.39 0.43 0 0.2 0.39
(ii) 8.16 88.24 1.09 0 0.02 0 0.09 0.47 0.3 0 0.07 0 1.56 0
(iii) 9.69 0.59 87.04 0 0 0 0.39 0.14 0.59 0 0.48 0 1.08 0
(iv) 11.15 0 0 84.68 1.17 0 0 0.04 0.92 0.02 0.4 0 0.03 1.59
(v) 70.84 0 0 0 17.63 0.03 0 8.11 0 0.43 0 0 0.34 2.62
(vi) 18.39 0.1 0.62 0.95 1.28 74.49 0.73 0.7 1.91 0 0.81 0 0.01 0.01
(vii) 16.84 0 0 0 0 0 83.16 0 0 0 0 0 0 0
(viii) 6.16 0 0 0 0 0 0 89.54 0.58 0 0.02 0 3.7 0
(ix) 29.67 0 0 0 0 0 0 0 59.49 0 10.84 0 0 0
(x) 11.29 0 0 17 1.26 0 0 0.36 0.42 68.58 0.04 0 0 1.05
(xi) 0.93 0.19 0 0 0 0.52 0 0 9.89 0 88.48 0 0 0
(xii) 96.94 0 0.3 0 0.16 0.01 0.61 0.87 0.93 0 0.08 0 0.11 0
(xiii) 5.93 0.29 0.17 0 0 0 0 13.92 0.77 0 0.09 0 78.83 0
(xiv) 4.36 0 0 0.01 1.4 0 0 7.69 0.48 0.04 0.04 0 0.01 85.95
Table B.5: Confusion matrix for LSTM from ST features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 92.33 0.07 0.36 1.55 2 0.19 1.42 0.27 0.69 0.74 0.15 0 0.09 0.14
(ii) 8.21 89.33 2.02 0 0.27 0.01 0.06 0 0.02 0 0 0 0.07 0
(iii) 10.85 0.48 87.6 0 0.08 0 0.36 0 0.15 0 0.03 0 0.45 0
(iv) 12.8 0 0 81.28 0.6 0 0 0 0.52 4.45 0.05 0 0.03 0.26
(v) 44.12 0 0 0 50.9 1.41 0 1.35 0.02 0.58 0 0 0.63 1
(vi) 11.35 0.01 0.35 0.75 0.49 85.55 0.44 0.75 0.04 0.05 0.11 0 0.08 0.03
(vii) 15.62 0 0 0 0 0 84.38 0 0 0 0 0 0 0
(viii) 7.25 0 0 0 0.34 1.09 0 80.65 0.02 0 0.02 0 10.64 0
(ix) 14.77 0 0 0.11 0 0 0 0 85.12 0 0 0 0 0
(x) 10.14 0 0 17.65 1.15 0.25 0 0.15 0.08 70.26 0 0 0 0.31
(xi) 0.73 0 0.02 0.05 0 0.3 0.03 0.04 0.69 0.03 98.09 0 0.02 0
(xii) 97.29 0 0.11 0 0.93 0.02 0.48 0.11 0.68 0 0.07 0 0.31 0
(xiii) 4.77 0.02 0.72 0 0.57 0.04 0 7.49 0.18 0 0 0 86.21 0
(xiv) 5.5 0 0 0.59 1.36 0.2 0 7.25 0.23 1.05 0 0 0.12 83.7
Appendix B. Results 74
Detail results for Table 5.3
37.1 dB
Table B.6: Confusion matrix for LSTM from DWT features with noise(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 93.31 0.11 0.14 0.2 1.32 0.42 0.91 0.41 0.38 0.48 0.3 1.74 0.02 0.2
(ii) 5.58 89.28 3.15 0 1.37 0.02 0 0.11 0.3 0.02 0.03 0.14 0 0
(iii) 5.81 1.76 90.5 0 0.19 0.06 0 0.01 0.72 0.33 0.04 0.59 0 0
(iv) 7.34 0.03 0.04 91.26 0.16 0.07 0 0 0.6 0.19 0 0.06 0.01 0.23
(v) 6.99 0 0.04 0 88.25 0.08 0 0.08 4.23 0 0.2 0.01 0.01 0.11
(vi) 14.05 0.08 0.27 0.42 0.17 79.75 0.22 1.41 0.6 0.64 1.52 0.33 0.09 0.44
(vii) 78.96 0 0 0 0 0.01 19.96 0 0 1.07 0 0 0 0
(viii) 6.53 0 0 0.01 0 0.23 0 92.82 0 0 0 0.02 0.01 0.38
(ix) 15 0 0.13 0.29 1.16 0.04 0.19 0 81.75 0.02 1.35 0.07 0 0
(x) 54.74 0 0.2 0.71 0 0.04 4.53 0 0 39.49 0 0.29 0 0
(xi) 1.4 0 0.01 0 0.09 0.97 0 0 2.18 0 95.29 0.06 0 0
(xii) 43.32 0.04 0.07 0 1.13 0.07 0.05 0.37 0.31 0.2 0.05 54.35 0 0.02
(xiii) 5.27 0 0.01 0.02 0.21 0.09 0 5.52 0.34 0 0 0.01 3.43 85.11
(xiv) 4.94 0 0.02 0.03 0.11 0.02 0 7.71 0.39 0 0 0.01 2.67 84.1
Table B.7: Confusion matrix for FNN from DWT features with noise(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)
(i) 91.41 0.12 0.11 0.44 1.53 0.42 0.71 0.7 0.62 0.46 0.43 2.72 0 0.33
(ii) 4.63 90.15 1.45 0 1.8 0.01 0 0.14 0.76 0.05 0.04 0.96 0 0
(iii) 5.49 4.58 87.24 0 0.52 0.05 0 0.08 0.7 0.16 0.01 1.18 0 0
(iv) 6.62 0 0 91.08 0.5 0.04 0 0 0.91 0.3 0 0.28 0 0.26
(v) 8.99 0.21 0 0 82.86 0.13 0 0.09 7.64 0 0.02 0.01 0 0.06
(vi) 17.23 0 0.16 0.44 0.53 74.24 0.28 2 0.64 1.23 2.28 0.46 0 0.52
(vii) 79.03 0 0 0 0 0 19.68 0 0 1.28 0 0.02 0 0
(viii) 5.81 0 0 0.02 0.03 0 0 93.32 0 0.04 0 0.25 0 0.53
(ix) 14.43 0 0 0.07 1.85 0.11 0 0 82.31 0.01 0.89 0.32 0 0.01
(x) 63.55 0 2.44 0.26 0 0 4.85 0 0 28.46 0 0.44 0 0
(xi) 1.13 0 0 0 0 2.09 0 0 4.18 0 92.6 0 0 0
(xii) 40.78 0.01 0.02 0 1.84 0.13 0.06 0.49 0.44 0.09 0.05 56.08 0 0
(xiii) 4.26 0 0 0.18 0.87 0.02 0 6.89 0.55 0 0.01 0.05 0 87.17
(xiv) 4.6 0 0 0.05 1.13 0.02 0 3.2 0.74 0 0 0.05 0 90.19
Bibliography
[1] R. C. Dugan, M. F. McGranaghan, and H. W. Beaty, “Electrical power systems quality,”
New York, NY: McGraw-Hill,— c1996, vol. 1, 1996.
[2] “IEEE recommended practice for monitoring electric power quality,” IEEE Std 1159-2009
(Revision of IEEE Std 1159-1995), pp. c1–81, June 2009.
[3] O. P. Mahela, A. G. Shaik, and N. Gupta, “A critical review of detection and classification
of power quality events,” Renewable and Sustainable Energy Reviews, vol. 41, pp. 495–505,
2015.
[4] S. Ventosa, C. Simon, M. Schimmel, J. J. Danobeitia, and A. Manuel, “The S -transform
from a wavelet point of view,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp.
2771–2780, July 2008.
[5] F. Katiraei, R. Iravani, N. Hatziargyriou, and A. Dimeas, “Microgrids management,” IEEE
Power and Energy Magazine, vol. 6, no. 3, pp. 54–65, May 2008.
[6] C. Gonzalez, J. Geuns, S. Weckx, T. Wijnhoven, P. Vingerhoets, T. De Rybel, and
J. Driesen, “LV distribution network feeders in belgium and power quality issues due to
increasing PV penetration levels,” in Innovative Smart Grid Technologies (ISGT Europe),
2012 3rd IEEE PES International Conference and Exhibition on. IEEE, 2012, pp. 1–8.
[7] B. Rahmani, W. Li, and G. Liu, “An advanced universal power quality conditioning system
and MPPT method for grid integration of photovoltaic systems,” International Journal of
Electrical Power & Energy Systems, vol. 69, pp. 76 – 84, 2015.
75
Bibliography 76
[8] C. G. Hou, S. C. Lin, S. T. Su, and W. H. Chung, “Fault tolerant quickest detection
for power quality events in smart grid ami networks,” in 2015 International Symposium
on Intelligent Signal Processing and Communication Systems (ISPACS), Nov 2015, pp.
159–163.
[9] IEEE, “Electric signatures of power equipment failures,” IEEE, Tech. Rep.
[10] M. Sumner, A. Abusorrah, D. Thomas, and P. Zanchetta, “Real time parameter estima-
tion for power quality control and intelligent protection of grid-connected power electronic
converters,” Smart Grid, IEEE Transactions on, vol. 5, no. 4, pp. 1602–1607, 2014.
[11] R. P. Bingham, “Recent advancements in monitoring the quality of the supply,” in Power
Engineering Society Summer Meeting, 2001, vol. 2, July 2001, pp. 1106–1109 vol.2.
[12] “Ieee recommended practice and requirements for harmonic control in electric power sys-
tems,” IEEE Std 519-2014 (Revision of IEEE Std 519-1992), pp. 1–29, June 2014.
[13] M. H. Bollen, Understanding power quality problems. IEEE press New York, 2000, vol. 3.
[14] R. Kumar, B. Singh, D. T. Shahani, A. Chandra, and K. Al-Haddad, “Recognition of
power-quality disturbances using S-transform-based ANN classifier and rule-based decision
tree,” IEEE Transactions on Industry Applications, vol. 51, no. 2, pp. 1249–1258, March
2015.
[15] S. Alshahrani, M. Abbod, and B. Alamri, “Detection and classification of power quality
events based on wavelet transform and artificial neural networks for smart grids,” in 2015
Saudi Arabia Smart Grid (SASG), 2015, pp. 1–6.
[16] A. K. Ghosh and D. L. Lubkeman, “The classification of power system disturbance wave-
forms using a neural network approach,” Power Delivery, IEEE Transactions on, vol. 10,
no. 1, pp. 109–115, 1995.
[17] S. Santoso, E. J. Powers, W. M. Grady, and P. Hofmann, “Power quality assessment via
wavelet transform analysis,” Power Delivery, IEEE Transactions on, vol. 11, no. 2, pp.
924–930, 1996.
Bibliography 77
[18] B. Perunicic, M. Mallini, Z. Wang, and Y. Liu, “Power quality disturbance detection and
classification using wavelets and artificial neural networks,” in Harmonics and Quality of
Power Proceedings, 1998. Proceedings. 8th International Conference On, vol. 1, Oct 1998,
pp. 77–82 vol.1.
[19] A. Gaouda, M. Salama, M. Sultan, A. Chikhani et al., “Power quality detection and
classification using wavelet-multiresolution signal decomposition,” IEEE Transactions on
Power Delivery, vol. 14, no. 4, pp. 1469–1476, 1999.
[20] S. Santoso, E. J. Powers, W. M. Grady, and A. C. Parsons, “Power quality disturbance
waveform recognition using wavelet-based neural classifier. I. Theoretical foundation,”
Power Delivery, IEEE Transactions on, vol. 15, no. 1, pp. 222–228, 2000.
[21] D. Borras, M. Castilla, N. Moreno, and J. C. Montano, “Wavelet and neural structure:
a new tool for diagnostic of power system disturbances,” IEEE Transactions on Industry
Applications, vol. 37, no. 1, pp. 184–190, Jan 2001.
[22] Z.-L. Gaing, “Wavelet-based neural network for power disturbance recognition and classi-
fication,” Power Delivery, IEEE Transactions on, vol. 19, no. 4, pp. 1560–1568, 2004.
[23] H. He and J. A. Starzyk, “A self-organizing learning array system for power quality clas-
sification based on wavelet transform,” Power Delivery, IEEE Transactions on, vol. 21,
no. 1, pp. 286–295, 2006.
[24] I. W. Lee and P. K. Dash, “S-transform-based intelligent system for classification of power
quality disturbance signals,” Industrial Electronics, IEEE Transactions on, vol. 50, no. 4,
pp. 800–805, 2003.
[25] S. Mishra, C. Bhende, and B. Panigrahi, “Detection and classification of power quality
disturbances using S-transform and probabilistic neural network,” Power Delivery, IEEE
Transactions on, vol. 23, no. 1, pp. 280–287, 2008.
[26] C. Bhende, S. Mishra, and B. Panigrahi, “Detection and classification of power quality
disturbances using S-transform and modular neural network,” Electric Power Systems
Research, vol. 78, no. 1, pp. 122 – 128, 2008.
Bibliography 78
[27] M. Uyar, S. Yildirim, and M. T. Gencoglu, “An expert system based on S-transform and
neural network for automatic classification of power quality disturbances,” Expert Systems
with Applications, vol. 36, no. 3, Part 2, pp. 5962 – 5975, 2009.
[28] P. K. Ray, N. Kishor, and S. R. Mohanty, “Islanding and power quality disturbance detec-
tion in grid-connected hybrid power system using wavelet and S-transform,” Smart Grid,
IEEE Transactions on, vol. 3, no. 3, pp. 1082–1094, 2012.
[29] B. Biswal, M. Biswal, S. Mishra, and R. Jalaja, “Automatic classification of power quality
events using balanced neural tree,” Industrial Electronics, IEEE Transactions on, vol. 61,
no. 1, pp. 521–530, 2014.
[30] M. Valtierra-Rodriguez, R. de Jesus Romero-Troncoso, R. A. Osornio-Rios, and A. Garcia-
Perez, “Detection and classification of single and combined power quality disturbances
using neural networks,” Industrial Electronics, IEEE Transactions on, vol. 61, no. 5, pp.
2473–2482, 2014.
[31] M. K. Saini and R. Kapoor, “Classification of power quality events–a review,” International
Journal of Electrical Power & Energy Systems, vol. 43, no. 1, pp. 11–19, 2012.
[32] P. Sebastian and P. A. Da, “A neural network based power quality signal classification
system using wavelet energy distribution,” in Advancements in Power and Energy (TAP
Energy), 2015 International Conference on, June 2015, pp. 199–204.
[33] S. Upadhyaya and S. Mohanty, “Localization and classification of power quality distur-
bances using maximal overlap discrete wavelet transform and data mining based classi-
fiers,” IFAC-PapersOnLine, vol. 49, no. 1, pp. 437 – 442, 2016, 4th IFAC Conference on
Advances in Control and Optimization of Dynamical Systems ACODS 2016Tiruchirappalli,
India, 1-5 February 2016.
[34] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp.
436–444, 2015.
[35] Y. B. Ian Goodfellow and A. Courville, “Deep learning,” 2016, book in preparation for
MIT Press. [Online]. Available: http://goodfeli.github.io/dlbook/
Bibliography 79
[36] C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to wavelets and wavelet transforms
: a primer. Upper Saddle River, N.J. : Prentice Hall, 1998.
[37] R. G. Stockwell, L. Mansinha, and R. P. Lowe, “Localization of the complex spectrum:
the S transform,” IEEE Transactions on Signal Processing, vol. 44, no. 4, pp. 998–1001,
Apr 1996.
[38] R. Stockwell, “A basis for efficient representation of the S-transform,” Digital Signal Pro-
cessing, vol. 17, no. 1, pp. 371 – 393, 2007.
[39] A. Graves, M. Liwicki, S. Fernndez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel
connectionist system for unconstrained handwriting recognition,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, May 2009.
[40] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural
networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International
Conference on. IEEE, 2013, pp. 6645–6649.
[41] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning
and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. Jul, pp.
2121–2159, 2011.
[42] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running
average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning,
vol. 4, no. 2, 2012.
[43] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint
arXiv:1412.6980, 2014.
[44] I. Sutskever, “Training recurrent neural networks,” Ph.D. dissertation, University of
Toronto, 2013.
[45] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings
of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
Bibliography 80
[46] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural
networks,” arXiv preprint arXiv:1211.5063, 2012.
[47] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,
no. 8, pp. 1735–1780, 1997.
[48] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani,
M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates,
Inc., 2014, pp. 3104–3112. [Online]. Available: http://papers.nips.cc/paper/5346-
sequence-to-sequence-learning-with-neural-networks.pdf
[49] M. Abadi and et. al., “TensorFlow: Large-scale machine learning on heteroge-
neous systems,” 2015, software available from tensorflow.org. [Online]. Available:
http://tensorflow.org/
[50] S. Ali, K. Weston, D. Marinakis, and K. Wu, “Intelligent meter placement for power quality
estimation in smart grid,” in Smart Grid Communications (SmartGridComm), 2013 IEEE
International Conference on. IEEE, 2013, pp. 546–551.