x-outlier detection and periodicity...
TRANSCRIPT
X-OUTLIER DETECTION AND PERIODICITY
DETECTION IN LOAD CURVE DATA IN
POWER SYSTEMS
by
Zhihui Guo
B.Sc. Sun Yat-sen University, China, 2009
a Thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science
in the School
of
Computing Science
c© Zhihui Guo 2011
SIMON FRASER UNIVERSITY
Summer 2011
All rights reserved. However, in accordance with the Copyright Act of
Canada, this work may be reproduced without authorization under the
conditions for Fair Dealing. Therefore, limited reproduction of this
work for the purposes of private study, research, criticism, review and
news reporting is likely to be in accordance with the law, particularly
if cited appropriately.
APPROVAL
Name: Zhihui Guo
Degree: Master of Science
Title of Thesis: X-Outlier Detection and Periodicity Detection in Load Curve
Data in Power Systems
Examining Committee: Dr. Jian Pei
Chair
Dr. Ke Wang
Senior Supervisor
Dr. Martin Ester
Supervisor
Dr. Fred Popowich
SFU Examiner
Date Approved:
ii
Partial Copyright Licence
Abstract
Load curve data is a type of time series data which records the electric energy consumptions
at time points and plays an important role in operation and planning of power systems.
Unfortunately, load curves always contain abnormal, noisy, unrepresentative and missing
data due to various random factors. It is crucial to power systems to identify and repair
corrupted and unrepresentative data before load curve data can be used for planning and
modeling. In this thesis we present a new class of X-outliers that have abnormal power
consumption levels related to periodicity (X-axis) and propose a novel solution to detect
these outliers. The underlying assumption is that the data set follows a periodicity and the
length (not the pattern) of the periodicity is known. This is the case for most real load
curve data collected at BC Hydro.
In the above the periodicity is assumed to be known for X-outlier detection. In some
other applications, however, the periodicity needs to be discovered. The latter is the case
when the periodicity evolves, when a new time series is collected, or when conditions that
affect time series have changed. Periodicity detection for time series has important applica-
tions in forecasting, planning, trend detection, and outlier detection. For time series with
unknown periodicity, X-outlier detection could still be performed after the periodicity is de-
tected. Thus X-outlier detection and periodicity detection are highly related and periodicity
detection could be considered as a pre-processing step of X-outlier detection for time series
with unknown periodicity. Therefore, in this thesis, we also propose a trend based period-
icity detection algorithm for time series data with unknown periodicity. This approach is
trend preserving and noise resilient. Real load curve data in the BC Hydro system is used
to demonstrate the effectiveness and accuracy of the proposed methods.
Keywords: Time Series, Load Management, Power Systems, Power Quality, Smoothing
Methods, Periodicity Detection.
iii
Acknowledgments
My special thanks goes to my senior supervisor Dr. Ke Wang. I benefit a lot from his
insights and every discussion with him. This work would not have been possible without his
invaluable guidance and great patience. I am grateful for the inspiring discussions with him
that led to this thesis. I would like to thank my supervisor, Dr. Martin Ester and examiner
Dr. Fred Popowich for their precious time and useful comments on my thesis. And I would
like to thank Dr. Jian Pei for taking the time to chair my thesis defense.
I am grateful to BC Hydro’s Principal Engineer Dr. Wenyuan Li, Manager Dr. Tito
Inga-Rojas and Senior Engineer Dr. Adriel Lau for every discussion about my research.
I thank them for giving me the opportunity to work in BC Hydro for over a year. I am
thankful for their precious time for training me to be a better presenter and executor. I
learn the ways of doing a practical project from their on-site supervision, feedback, testing
and evaluation on our collaborative project. I would also like to thank BC Hydro for the
access to their precious data sets.
I would like to thank SFU for providing the excellent facilities and comfortable environ-
ments, and thank NSERC and BC Hydro for their funding to support my study in SFU.
I would like to take this opportunity to thank my friends Wen Huang, Bo Hu, Hua
Huang, Jiyi Chen, Chao Han, Peng Wang, Judy Yeh and Zhensong Qian for their care and
help.
Finally, I would like to express my deepest gratitude to my family for their continuous
love, support and encouragement.
iv
Contents
Approval ii
Abstract iii
Acknowledgments iv
Contents v
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 X-outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Periodicity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 For Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 For Periodicity Detection . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Background 12
2.1 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Overview of Existing Techniques . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Outlier Detection in a Time Series Database . . . . . . . . . . . . . . 14
2.1.3 Outlier Detection in a Single Time Series . . . . . . . . . . . . . . . . 14
2.2 Periodicity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
v
3 Trend Modelling 16
3.1 Nonparametric Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Kernel Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Smoothing Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 X-Outlier Detection 19
4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Observations for X-Outlier Detection . . . . . . . . . . . . . . . . . . . 21
4.2.2 Approximating the Smoothing Curve by Peaks and Valleys . . . . . . 22
4.2.3 Identifying Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.4 Repairing Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Practical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.2 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.3 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Trend Based Periodicity Detection 39
5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.2 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.3 The WARP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 The Trend Based Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.1 Observations for Periodicity Detection . . . . . . . . . . . . . . . . . . 43
5.2.2 Identifying Periodicities Using The Shape Sequence . . . . . . . . . . 44
5.2.3 Computing the Length of Candidate Periods . . . . . . . . . . . . . . 46
5.2.4 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.2 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.4 Effect of Smoothness Levels . . . . . . . . . . . . . . . . . . . . . . . . 50
vi
5.3.5 Effect of Discretization on WARP . . . . . . . . . . . . . . . . . . . . 51
5.3.6 Multiple Periodicities . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Conclusion and Future Work 54
Bibliography 56
vii
List of Tables
4.1 Proposed method and traditional smoothing method for data sets with no
X-outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Running median method for data sets with no X-outliers . . . . . . . . . . . . 33
4.3 Proposed method and traditional smoothing method for data sets with X-
outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Running median method for data sets with X-outliers . . . . . . . . . . . . . 34
5.1 Accuracy comparison on “Noisy” data . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Accuracy comparison on “Normal” data . . . . . . . . . . . . . . . . . . . . . 49
5.3 Trend based algorithm for the “Noisy” data (confidence threshold set as 70%) 50
5.4 Trend based algorithm for the “Normal” data (confidence threshold set as 70%) 50
5.5 WARP on “Normal” data (confidence threshold set as 70%, equi-width binning) 51
5.6 WARP on “Normal” data (confidence threshold set as 70%, equi-depth binning) 52
viii
List of Figures
1.1 Local Y-outliers identified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Global Y-outliers identified . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outier not identified by smoothing techniques (a) Load curve data (b) Model
the trend by a proper smoothing curve (c) Model the trend by an overly flat
smoothing curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Load curve data with labeled X-outliers . . . . . . . . . . . . . . . . . . . . . 5
1.5 X-outiers not identified by smoothing techniques . . . . . . . . . . . . . . . . 6
1.6 Four days data with daily periodicity . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Five weeks’ data with weekly periodicity . . . . . . . . . . . . . . . . . . . . . 8
4.1 Example for an X-outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Example for an X-outier within a valley . . . . . . . . . . . . . . . . . . . . . 22
4.3 Smoothing curve [t1, t12]. The horizontal axis is time. The values on the
curve are the slopes at each point. [t1, t5] is a maximal-decreasing interval;
[t6, t10] is a maximal-increasing interval. [t4, t8] is a ∪ shape. . . . . . . . . . . 23
4.4 Two similar load curves with noise . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Two similar load curves with time shifting and stretching . . . . . . . . . . . 26
4.6 The system tool developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.7 Outlier detection for a six-year test data set. (a) Outlier detection result for
smoothness level 5. (b) Outlier detection result for smoothness level 1. (c)
Outlier detection result for smoothness level 10. . . . . . . . . . . . . . . . . . 36
4.8 Outlier repairing for the six-year test data set. . . . . . . . . . . . . . . . . . 37
4.9 Outlier repairing for a five-week test data set. (a) Test data before outlier
repairing. (b) Test data after outlier repairing. . . . . . . . . . . . . . . . . . 38
ix
5.1 An example for the DTW matrix . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Alignment for the DTW matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Alignment for T(3) and T (3) where T = “abcabcabd” . . . . . . . . . . . . . . . 42
5.4 DTW matrix for sequences T and T where T = e1e2 . . . en . . . . . . . . . . . 42
5.5 Example for a period consisting of a peak and a valley . . . . . . . . . . . . . 44
5.6 Example for a period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.7 One weeks data with weekly pattern . . . . . . . . . . . . . . . . . . . . . . . 48
5.8 Five weeks data with daily patterns, smoothness level 3 . . . . . . . . . . . . 53
5.9 Five weeks data with weekly patterns, smoothness level 5 . . . . . . . . . . . 53
x
Chapter 1
Introduction
1.1 Outlier Detection
Load curve data is a type of time series data which refers to the power consumptions recorded
by meters at time intervals. The quality of load curve data is essential to load forecast
[34, 40], system analysis, system operation and visualization, system reliability performance,
energy saving, and accuracy in system planning [37]. Two key features in smart grids [42] are
self-healing from power disturbance events and enabling active participation by consumers
in demand response. The collection of valid load curve data is crucial for supporting decision
making in smart metering and smart grid systems.
Collecting all load data accurately in fine granularity is a challenging objective. There is
often missing and corrupted data in the process of information collection and transfer, due
to various causes including meter malfunction, communication failures, equipment outages,
lost data, unexpected shutdown, unscheduled maintenance, and unknown factors. Since such
events cause a significant deviation in load and do not repeat regularly, their presence results
in load data records being unrepresentative of actual usage patterns. Therefore, before load
curve data can be used for load forecasting, modeling and system analysis, an important
task is to identify and repair abnormal data that are unrepresentative of underlying usage
patterns, called “outliers” below.
A load curve is a time series with the Y-axis representing power or energy consumption
and the X-axis representing the time. Outlier detection in time series has been a topic
in data mining [1, 2, 3] and statistics [4]. Most previous work focused on outliers that
have invalid Y-axis values compared to the behaviors in a local neighborhood [5]. We call
1
CHAPTER 1. INTRODUCTION 2
such outliers Y -outliers. Smoothing techniques were successfully used to detect Y-outliers
[5]. The idea is modeling the data by a smoothing curve whose points are derived from
the observations in a local neighborhood. The moving average [41] is another example of
smoothing techniques. Y-outliers are the observations that deviate substantially from the
smoothing curve. To define the “deviation” from the smoothing curve, a technique called
“confidence interval” [5] is applied. Data points outside the confidence interval are the
detected Y-outliers.
Figure 1.1: Local Y-outliers identified
Figure 1.2: Global Y-outliers identified
An example showing the Y-outliers detected by the smoothing techniques is displayed
in Figure 1.1 with one week’s data and Figure 1.2 with almost one year’s data. The red
curve is the smoothing curve and the two blue curves are the upper bound and lower bound
CHAPTER 1. INTRODUCTION 3
of the confidence interval. In Figure 1.1, the data points outside the confidence interval are
detected as Y-outliers, denoted in the circles. In Figure 1.2, some of the Y-outliers detected
using the smoothing techniques are denoted in the circles.
Y-outliers have been studied by a list of previous work [5, 41]. In this thesis we do not
consider Y-outliers but a new type of outliers called X-outliers discussed below.
1.1.1 X-outliers
As well known, load curve data exhibits some loose form of periodicities (e.g., daily, weekly,
monthly, seasonal, yearly). The term “loose” means that the actual data values can be
different but the trend of the data repeats itself regularly at some interval. In statistics,
such periodicity is known as seasonality. Periodicities of load curves are usually known
to power utilities due to regularities of usage patterns. Our investigation indicates that
abnormal data may occur as a deviation from such periodicities. In this thesis the term
X-outliers refers to such deviations. For example, the data in Figure 1.3(a) has a weekly
periodicity, i.e., high weekdays and low weekends, but the data in the rectangle box in the
first week is an X-outlier to this periodicity.
In general, X-outliers are caused by random events such as malfunction of data metering
or transfer systems, outages, unexpected full or partial shutdown of production lines, un-
scheduled strikes, temporary weather changes, etc. Such events are unlikely to occur again
in other periods, thus, are not representative of regular patterns of load curve. Therefore,
before the data can be used, such unrepresentative data must be identified and repaired.
Notice that we distinguish between an “important change” (say in temperature), which is
likely to persist in the future, and a “random rise or drop”, which most likely does not occur
again in the future. The latter case is caused by random events such as outages, temporary
weather change, union strike, unscheduled maintenance, etc. Identifying the affected data
in this case is exactly the motivation of this work.
Another purpose of identifying X-outliers is to focus further investigation on the potential
problematic areas of the data and find the cause of unusual data. In the example of Figure
1.3(a), because of the identification of the unusually low consumption during Wednesday-
Friday in the first week, the user could conduct further investigation and find that the low
consumption was caused by a union strike during Wednesday-Friday in that week. Without
identifying X-outliers, finding such problematic areas will require scanning huge load curves
manually, which is a painstaking task.
CHAPTER 1. INTRODUCTION 4
(a)
(b)
(c)
Figure 1.3: Outier not identified by smoothing techniques (a) Load curve data (b) Modelthe trend by a proper smoothing curve (c) Model the trend by an overly flat smoothingcurve
CHAPTER 1. INTRODUCTION 5
The neighborhood based techniques, such as moving average [41] and smoothing tech-
niques [5], are effective for detecting Y-outliers. Since X-outliers are deviations from a
periodicity, these outliers cannot be detected by such techniques for two reasons: (1) check-
ing deviations from a periodicity requires identifying the periodic pattern, thus, examining
the data in all periods; (2) an X-outlier could last for a considerable time and form its own
trend of a sizable neighborhood, thus, misleading all local information based methods.
To illustrate these points, consider the data in Figure 1.3(a). In Figure 1.3(b), the tra-
ditional smoothing curve models the normal weeks correctly by including most observations
in the confidence interval, but it also models the corrupted data in the first week 1. In Fig-
ure 1.3(c), with a flatter smoothing curve and a smaller confidence interval, the traditional
smoothing technique detects all weekend drops as outliers. Neither result is satisfactory.
Using a larger confidence interval does not help because the outlier in the first week will not
be detected. The problem with the traditional smoothing techniques is that they ignore the
periodicity information. If the knowledge on periodicity is used, the data in the first week
would not be considered as normal because it does not repeat in other weeks.
Figure 1.4: Load curve data with labeled X-outliers
Another example of the limitations of the neighborhood based techniques for detecting
X-outliers is shown in Figure 1.4 and Figure 1.5. These two Figures show the same six-year
load curve data with yearly periodicity. The X-outliers are labeled in the rectangle boxes in
1In the traditional smoothing techniques, only the data points falling outside the confidence interval areconsidered as outliers. That’s, the data points outside the upper and lower confidence interval curves inFigure 1.3(b) and Figure 1.3(c) in this example.
CHAPTER 1. INTRODUCTION 6
Figure 1.5: X-outiers not identified by smoothing techniques
Figure 1.4. It can be seen from Figure 1.5 that the traditional smoothing technique does not
detect any X-outlier because the labeled X-outliers are all within the specified confidence
interval. Since the neighborhood based techniques fail to detect X-outliers, new technique
is required and this is the motivation of this work.
1.2 Periodicity Detection
As discussed in Section 1.1.1, the X-outlier detection considered in this thesis depends on
a known periodicity. In some other applications, however, the periodicity is unknown and
needs to be discovered. For example, for a time series with unknown periodicity, if we would
like to see whether there are X-outliers in the time series, we have to detect the periodicity
first. Thus X-outlier detection and periodicity detection are highly related and periodicity
detection could be considered as a pre-processing step of X-outlier detection for time series
with unknown periodicity. Therefore, in this thesis a method for periodicity detection in
time series data will also be presented. Time series often have some form of periodicity
where a certain pattern repeats itself at regular time intervals, and due to the presence
of noise, such periodicities are subject to variances in both time and data value. Many
real life applications depend on knowing the periodicities of time series [34]. For example,
power utilities use periodic usage patterns for load forecasting, system analysis, scheduling
maintenance, energy saving, filtering missing and corrupted data [5, 37]. Other sources of
time series include weather data, sensor generated data, physical traffic data, stock index
CHAPTER 1. INTRODUCTION 7
data, network flow data, patient physiological data, etc. Periodic pattern mining is useful
in predicting the stock price movement, computer network fault analysis and detection
of security breach, earthquake prediction, and gene expression analysis [38, 39]. While
periodicities may be known in some applications, in other applications they need to be
discovered. The latter is the case when the periodicity evolves, when a new time series is
collected, or when conditions that affect time series have changed. In this thesis, we consider
periodicity detection from time series data.
Periodicity detection has been an active topic in data mining and statistics [27, 28, 29,
30, 31, 32, 33]. Three types of periodicity have been identified [27, 36]: segment periodicity,
symbol periodicity, and partial periodicity. In this thesis, we consider segment periodicity,
where a time series consists of the repetition of a segment in the series. Most existing
algorithms [27, 28, 36] first discretize a real valued time series into a sequence of discrete
symbols before performing periodicity detection. Common discretization methods include
equi-width binning, where each bin has the same size, or equi-depth binning, where each bin
contains the same number of data points. With this preprocessing step, most algorithms
assume a sequence of discrete symbols as the input [27, 28, 36].
Unfortunately, the above approach suffers from major drawbacks. First, it is difficult to
specify a proper number of bins. A large number makes similar data different and a small
number makes different data similar, both of which impair the detection of periodicity.
Second, a fixed binning scheme is not suitable for a time series where different parts may
have different characteristics. For example, daytime and nighttime may have different data
characteristics, so do weekdays and weekends. Third, the binning method considers each
time point independently and is not sensitive to the preservation of neighborhood based
trends.
To illustrate these drawbacks, let us consider the four days hourly time series in Figure
1.6. This data has a strong daily periodicity as highlighted by the peaks and valleys in the
rectangles. With equi-width binning, the y-values are discretized into four equal-sized bins
{a, b, c, d}. This discretization breaks each big peak into several bins (i.e., a, b, and c) and
collapses the small peaks and valleys into one bin d. After such discretization, the daily
trend that a big peak is followed by two small valleys and two small peaks is lost.
Figure 1.7 further illustrates the last drawback mentioned above. This is a time series
for five weeks’ hourly data with a clear periodic pattern and with noise shown in the rect-
angle boxes. Suppose that the data is discretized into eight bins {a, b, c, d, e, f, g, h} using
CHAPTER 1. INTRODUCTION 8
Figure 1.6: Four days data with daily periodicity
equi-width binning. Since each data point is discretized independently, the noisy data are
also discretized into bins, instead of being filtered, which increase the chance to mislead
periodicity detection. It does not work to filter noise by using a smaller number of bins
because doing so also diminishes the variance that is part of the trends. For example, if
we discretize the above data into four bins, the data points that form the trend will be
represented by the same symbol, making the data less useful for periodicity detection.
Figure 1.7: Five weeks’ data with weekly periodicity
A similar problem with equi-depth binning will be discussed in Section 5.3.5. To sum-
marize, discretization is not sensitive to the preservation of trends, as such, a significant
amount of information could be lost.
CHAPTER 1. INTRODUCTION 9
1.3 Contributions
This thesis focuses on both outlier detection and periodicity detection in time series data in
general, and in load curve data in particular. The contributions of this thesis are as follows.
1.3.1 For Outlier Detection
First, we present the novel notion of X-outliers under the assumption that data follows some
loose form of periodicity and the length (but not the pattern) of the periodicity is known.
Second, a novel method to detect and repair X-outliers is proposed. The proposed
method has four steps: (1) Approximate the load curve data by a smoothing curve. (2)
Represent the smoothing curve by a sequence of valleys and peaks, called ∪ shapes and ∩shapes. (3) Identify X-outliers as the valleys and peaks that do not repeat according to the
known periodicity length. Our observation is that an X-outlier typically occurs at a time
interval where the smoothing curve either has a valley or has a peak. (4) Repair the outliers.
The novelty of the proposed method is considering periodicity of the data and considering
the X-outliers to be the data in the valleys and peaks that do not repeat according to the
periodicity. Therefore this method is able to detect the approximate locations and lengths
of X-outliers without making any assumption about them.
Third, a fully implementable system, which includes a method for detecting Y-outliers
[5], the method presented here for detecting X-outliers, and a user-friendly interface, is de-
veloped. This system will help power utilities identify and correct corrupted data efficiently
in applications of smart metering in particular and in load forecasting, system analysis,
operation modeling and planning studies of power systems in general.
Finally, though motivated by load curve data in power systems, the proposed approach
is rather general and can be applied to other types of data such as road traffic, network flow
traffic, call volume, weather data, etc. In this sense, the proposed method can be applied
to a wide range of data sets.
1.3.2 For Periodicity Detection
Consider the data in Figure 1.7 again. Despite the presence of noise, this data has the
periodicity that it peaks at weekdays and valleys at weekends, illustrated by the smoothing
curve in red. The peak and valley of each week are not exactly the same because actual
data points are different in every week, but the trends represented by them are similar.
CHAPTER 1. INTRODUCTION 10
Therefore, if we can represent these peaks and valleys, it is possible to detect the periodicity
as re-occurrence of subsequences of peaks and valleys, by taking into account similarity of
such shapes. In this example, the period is approximately the length of a peak plus the
length of a valley.
With the above observation, a novel trend based algorithm is proposed to detect peri-
odicities in real valued time series. The term “trend based” means that the method focuses
on the trends in the data, rather than the actual value of every single data point. This al-
gorithm has four steps: (1) The trends of the data are approximated by a smoothing curve.
(2) The smoothing curve is represented by a sequence of ∪ shapes and ∩ shapes, which
correspond to valleys and peaks and capture the most interesting information in the data.
Each ∪ shape and ∩ shape is represented by a feature vector. (3) The WARP algorithm
[28] is extended to a sequence of ∪ shapes and ∩ shapes to discover periodicities in terms
of subsequences of ∪ shapes and ∩ shapes. (4) We express these periodicities in the length
of time.
The novelty of the trend based algorithm is modeling a time series as a sequence of
local trends (i.e., ∪ shapes and ∩ shapes), instead of a sequence of symbols obtained by
a fixed binning scheme. Thus this approach is sensitive to the distinction between trends
and noise, which helps preserve trends and filter noise, thus, helps detect the underlying
periodicity. Another feature of the trend based algorithm is the easiness of detecting multiple
periodicities (e.g., daily, weekly, yearly, etc.). This can be done by adopting a proper choice
for the smoothing parameter to model the trends at a desired resolution level. The user is
not required to have good knowledge on such choices; the software tool mentioned in 1.3.1
can help the user to converge to a proper choice with little effort. We have evaluated the
proposed algorithm using real life load curve data obtained from our industrial collaborator.
The evaluation shows that the proposed method is able to detect more accurately than the
discretization based methods and increases the F-measure by more than 30%.
1.4 Thesis Organization
The thesis is organized as follows: in Chapter 2, the related work about outlier detection
and periodicity detection in time series data is reviewed. In Chapter 3, a description of the
regression smoothing method to model the trends of time series data is given. In Chapter 4,
a new method for X-outlier detection is proposed and evaluated using the real data from BC
CHAPTER 1. INTRODUCTION 11
Hydro. In Chapter 5, the novel algorithm for periodicity detection is presented and tested
using the real data from BC Hydro. Finally, we summarize our conclusions in Chapter 6.
Chapter 2
Background
In this chapter, a review of the related techniques for outlier detection and periodicity
detection in time series data in the literature will be presented.
2.1 Outlier Detection
Outlier detection in time series data refers to the problem of finding behaviors in the data
which are not expected according to some regular patterns existing in the data. Outlier
detections are used widely in various applications such as credit card fraud detection and
extremely low or high power consumption regarding to the periodicity.
2.1.1 Overview of Existing Techniques
Outlier detection in time series has been studied in the field of statistics as a general mathe-
matical concept [3]. A simple statistical outlier detection proposed in [25] is to use informal
box plots to pinpoint the outliers. Many statistical approaches assume an underlying model
that generates data sets (e.g. normal distribution) [6]. Other methods [7, 8, 9, 10, 11] are
based on the ARMA (auto-regressive moving average) model, which impractically assumes
that the time series is stationary, as implied by the various parameters used by the model.
For load curve data, this kind of assumption does not hold. In addition, these methods
cannot handle a relatively large portion of missing data.
Most works on time series outlier detection employ smoothing techniques to identify
unusual data values within a local neighborhood. See [5] for a list of works. The idea is
12
CHAPTER 2. BACKGROUND 13
to model the trends of data using a smoothing curve whose points are some aggregation of
those observed values within a small neighborhood. The moving average [41] is one example
of aggregation. Unfortunately, such techniques do not work for X-outliers considered in this
thesis because they do not consider periodicities. The mean and median method suggested
in [41] considers a given periodicity and replaces missing and corrupted data using the
average or medians of the corresponding observations at different periods. This method
does not fully factor data distribution. For example, though the median for {1, 1, 50, 100,
100} and the median for {49, 49, 50, 51, 51} are the same (i.e., 50), the second median is
more representative than the first median. Also, this method does not allow time shifting
and stretching that are commonly observed in load curve data.
The SAS/ETS (Econometric and Time Series) package provides routines for outlier
detection in periodic time series data using the intervention analysis methods [12, 13], which
are based on the ARIMA (auto-regressive integrated moving average) model. The ARIMA
model treats each outlier as a single observation and detects multiple point outliers as a
sequence of observations. If multiple outliers exist in a close proximity, these outliers may
mask each other so that no points are identified as outliers. Besides, the ARIMA method
requires considerable computer time and memory for a long time series [14], which is the
case of load curve.
The Multivariate Linear Gaussian state space model [45] provides a more general mod-
eling technique for time series and it also allows for non-stationary models. The state space
model has primarily been used for forecasting, for example, see [46, 47], where observed
data are assumed to be valid and the parameters of the state space model are estimated by
fitting the observed data. Our work has a different focus and objective from forecasting: we
assume that a large portion of observed data (in general) may be corrupted according to a
known periodicity and our goal is to identify and repair corrupted data, instead of fitting
the observed data, so that the repaired data is more representative of the underlying data
pattern. Therefore, our work can be applied in a preprocessing step to fix corrupted data
prior to other applications such as forecasting.
The work [15] defines an object to be a distance-based outlier if at least some percent of
the data set lies greater than some distance away from the object. Other similar definitions
use the distance of a point to its k-th nearest neighbor [16] or the sum of the distances to its
k-nearest neighbors [17]. Unfortunately, these definitions do not deal with the time series
data that characterizes load curves with loose periodicity.
CHAPTER 2. BACKGROUND 14
In general, two forms of time series are considered for outlier detection: a time series
database and a single time series.
2.1.2 Outlier Detection in a Time Series Database
When detecting outliers in a time series database, most of the previous work tries to find a
time series which is abnormal with respect to a normal time series. Both of [26, 49] construct
a normal model from training time series that are known to be normal. If an input time
series does not conform to the model, it is detected as an outlier. These methods are not
suitable for solving our problem because we do not have any pre-determined normal time
series for training.
2.1.3 Outlier Detection in a Single Time Series
Our problem fits into the second form where outliers are detected in a single time series,
where an anomalous subsequence exists for an abnormally long time. It is assumed that
most part of the time series is normal.
In this scenario, much of the previous work slides a window across the time series data to
search anomalous subsequences [1, 18, 19, 20]. This approach has to predefine the window
size for anomalous subsequences. For example, Keogh et al. developed a suite of techniques
[1, 2] for finding discords within a large time series, where a discord is a subsequence that is
maximally different from all the rest of the time series data. To locate discords, they used a
sliding window to scan the whole time series data. However, in the case of load curve data,
the length of abnormal data can vary considerably, which makes it difficult to determine
a proper window size in advance. They find the most unusual subsequences, but in our
context the most unusual subsequences are not necessarily outliers. The problem is that the
notion of discords does not factor in the periodicity of data, which is crucial to load curve
data.
2.2 Periodicity Detection
Periodicity detection in time series data has been studied in the data mining field. Basically
there are three types of periodicity considered in the time series literature [27, 36]. The first
type is called segment or full-cycle periodicity, meaning that the time series consists of the
CHAPTER 2. BACKGROUND 15
repetitions of a segment in the series. And this is the type of periodicity we are going to
detect in this thesis. The second type is called symbol periodicity, where it is determined
whether the individual symbols repeat periodically or not. The last type is called partial
periodicity, with a pattern (length ≥ 1) repeating periodically.
There have been various approaches for detecting different kinds of periods. [30] devel-
oped a linear distance-based algorithm for detecting the symbol periodicity. [33] presented
a similar method with some pruning techniques. [32] proposed a multi-pass algorithm for
symbol periodicity, one symbol at a time. All the proposed algorithms in [30, 32, 33] dis-
cover the periodicities of some symbols of the time series rather than the periodicity of the
entire time series.
Previous work on segment periodicity detection can be divided into those for real valued
time series and those for sequences of discrete symbols. Representative works from the first
group include the sketching algorithm [29] and the wavelet transform based AWSOM [31].
The latter detects only periods that are of powers of two. Representatives from the second
group include the convolution based technique [27], the dynamic time warping distance
based WARP algorithm [28] and the suffix tree based method [36]. The authors of [28]
showed that the WARP algorithm outperforms the algorithms in [27, 29, 31]. Like [28], our
algorithm is based on the dynamic time warping technique. Unlike [28], our algorithm deals
with a real valued time series without a prior discretization step. Dealing with real valued
but not discretized data has the benefit of preserving the trends of the data while avoiding
the limitations of discretization discussed in Section 1.2.
Chapter 3
Trend Modelling
In this chapter, we introduce the smoothing techniques for modelling the trends of the load
curve (time series) data. Trend modelling is the first and essential step of the proposed
methods for X-outlier detection and periodicity detection in this thesis. Modelling the
trends of the data provides a sequence of peaks and valleys which describes the trends on
how the data goes up and down. For X-outlier detection, these peaks and valleys indicate
the possible locations where an X-outlier tends to occur. For periodicity detection, the re-
occurrence of these peaks and valleys represents the periodicity. We begin with some basic
definitions:
Definition 1: A load curve T = {(ti, yi)}ni=1 is an ordered sequence of n real-valued
observations where yi is the observed value at time ti. A sub load curve C = {(ti, yi)}ki=j of
T is a continuous part of T where 1 ≤ j and k ≤ n.
Given a load curve T = (ti, yi)ni=1, we can model the regression relationship of the data
by a continuous function [21]
yi = m(ti) + εi(i = 1, , n) (3.1)
with the regression function m and the observation error εi. The error εi is assumed to be
normally and independently distributed with the mean of zero and a constant variance.
16
CHAPTER 3. TREND MODELLING 17
3.1 Nonparametric Regression
To smooth the observed data (ti, yi)ni=1, a key is to estimate the function m(t) in Equation
3.1. The approximation can be done in two ways: parametric regression and nonparametric
regression. In parametric regression, m(t) is some known function and the researcher must
determine the appropriate parameters of m(t). In nonparametric regression, m(t) is an un-
known function. We choose nonparametric regression because we have no prior knowledge
about the structures of the load curves except that they have loose periodicity.
In nonparametric regression, the basic idea is local averaging: to estimate the value
at time t, the Y-observations in a neighborhood around t are taken into account, and the
further the Y-observations are away from t, the less they will contribute to the estimation
of the Y-observation at time t. Formally, the estimated value at time t can be modeled as
m̂(t) =1
n
n∑i=1
Wi(t)yi (3.2)
where Wi(t)ni=1 denotes a sequence of weights which depend on the whole vector {ti}ni=1.
Equation 3.2 is also called the smoothing curve.
3.2 Kernel Smoothing
To instantiate the weight function Wi(t) in Equation 3.2, we consider kernel smoothing, one
of the most popular nonparametric smoothing techniques 2. In kernel smoothing, Wi(t) is
given by
Wi(t) =Kernh(t− ti)
f̂t(t)(3.3)
where
Kernh(t) =1
hKern(
t
h) (3.4)
is theKernel with the scale factor h. Using theRosenblatt-Parzen kernel density estimator
[21] of the density of t
2In general, any smoothing technique could be considered.
CHAPTER 3. TREND MODELLING 18
f̂h(t) = n−1n∑
i=1
Kernh(t− ti) (3.5)
we obtain the Nadaraya-Watson estimator [21] for Equation 3.2:
m̂h(t) =n−1 ∑n
i=1Kernh(t− ti)yin−1
∑ni=1Kernh(t− ti)
(3.6)
The shape of the kernel weights is determined by the function Kern whereas the size of
weights is parameterized by h, called bandwidth. Importantly, the bandwidth h controls
the smoothness of the smoothing curve and how wide the probability mass is spread around
a point. In this thesis, we choose Kern as the normal probability density function [21]:
Kern(t) =1√2π
e−12t2 (3.7)
3.3 Smoothing Parameter
Below, we refer to the bandwidth h as the smoothing parameter. This parameter con-
trols the smoothness of the smoothing curve by regulating the size of the neighborhood
around time t in the observed data. A large h corresponds to a large neighborhood, thus, a
smoother curve. There has been some work on choosing the optimal smoothing parameter
in the literature. Different methods have been proposed such as cross-validation (CV) [21],
minimizing mean squared error (MSE) [21] and mean integrated squared error (MISE) [48].
In general, there is no golden rule since the choice often depends on the user’s needs. For
example, in Y-outlier detection, the smoothing parameter should be set to make the Y-
outliers outstanding so that they could be detected by the confidence interval. In X-outlier
detection, the smoothing parameter should be properly set so that a “valley” or a “peak”
on the smoothing curve is approximately the position where an X-outlier occurs. In period-
icity detection, a best choice of the smoothing parameter ensures that the smoothing curve
models the periodic pattern of the data properly.
Chapter 4
X-Outlier Detection
In this chapter, we will present the X-outlier detection algorithm. We start with some
notions used in this chapter in Section 4.1. After that the detailed description of X-outlier
detection is provided in Section 4.2. In Section 4.3 we discuss some practical issues. In
Section 4.4 the proposed method is evaluated experimentally.
4.1 Problem Definition
In this section, we present the essential notions used in the chapter and the problem we
study.
As mentioned in Chapter 1, load curve data have a loose form of periodicity. Informally,
a loose periodicity of length l means that the load at time t is similar to the load at the
corresponding times in other periods, subject to variability in time and load caused by
background noise, where the corresponding time of t in the i-th period is t + i × l. In
other words, a similar “trend” is observed in all periods, even if the actual load at the
corresponding times may be different.
In this chapter, we assume that a load curve follows a loose periodicity and the length
(but not the pattern) of the periodicity is known. For example, we know that a data set
follows a yearly (or weekly, etc.) periodicity, thus, the length of the periodicity is one year
(or one week, etc.), but we do not know the actual trend or pattern of the periodicity. This
assumption holds for all the load curves we encountered because the usage of electricity does
follow daily, weekly, monthly and seasonal periodicities or working cycles (such as industrial
customers) in real life. We believe that this assumption also holds in many other real
19
CHAPTER 4. X-OUTLIER DETECTION 20
applications beyond power systems, such as road traffic, network flow traffic, call volume,
weather data, etc.
Definition 2: Given a load curve data following a loose periodicity of a known length,
an X-outlier is a maximal sub load curve which deviates from the periodicity.
The exact definition of “deviation” in Definition 2 is unspecified so that the notion
of X-outliers can be adapted to a new instantiation of “deviation”. In Section 4.2.3, we
will consider one instantiation of “deviation” based on the longest common subsequence
similarity [22].
Consider the load curve data with the weekly periodicity in Figure 4.1. This data set has
a weekly periodicity, i.e., high consumption during weekdays and low consumption during
weekend, except for a deviation in Monday-Wednesday of the third week denoted in the
rectangle box. Thus the data for Monday-Wednesday of the first week is considered an
X-outlier. It is worth noting that there is no requirement that all X-outliers have a similar
length or a length similar to the length of the periodicity. The length of an X-outlier is
determined by the length of the random event that causes the outlier, whereas the length of
periodicity is determined by the regularity of underlying patterns in the data. For example,
for a data set of a yearly periodicity, the length of the periodicity is 12 months; if a factory
shuts down for 3 months, the X-outlier has 3 months in length; if a union strike lasts for
1.5 weeks, this X-outlier is 1.5 weeks in length.
Figure 4.1: Example for an X-outlier
Definition 3: Given a load curve following a loose periodicity of a known length, the
outlier detection problem is to identify all X-outliers.
CHAPTER 4. X-OUTLIER DETECTION 21
4.2 Proposed Method
At first glance, one may think that it is straightforward to detect X-outliers by comparing the
observations in the data against the periodic pattern assumed in the data. Unfortunately,
this approach does not work because we only know the length of the periodicity, but not
the pattern or trend of the periodicity, as is the case in most real applications. To detect
X-outliers, we need to first model the general trend in the data while ignoring background
noisy. Since the data follow a periodicity, the trend tends to be periodical, i.e., repeating
itself at intervals equal to the known length of periodicity, with possible deviations. We
then detect outliers by identifying all deviations to this periodic trend.
Our X-outlier detection approach can be described in four steps:
1. Approximate the load curve data by a smoothing curve, which captures the general
trend of the data;
2. Represent the smoothing curve by a sequence of valleys and peaks, called ∪ shapes
and ∩ shapes;
3. Identify X-outliers as the valleys and peaks that do not repeat according to the known
periodicity length. Our observation is that an X-outlier typically occurs at a time
interval where the smoothing curve either has a valley or has a peak;
4. Repair the X-outliers.
Generating a smoothing curve takes O(n2) time. This is the most time consuming part
for the detection approach, resulting in the time complexity of the approach to be O(n2).
Step 1 is already explained in Chapter 3. Below, we first make some observations useful
for X-outlier detection in Section 4.2.1. After that we explain step 2, step 3 and step 4 in
Section 4.2.2, Section 4.2.3 and Section 4.2.4, respectively.
4.2.1 Observations for X-Outlier Detection
After the smoothing curve is obtained from the kernel smoothing technique discussed in
Chapter 3 (step 1), in the following we will make some observations on a few properties of
the smoothing curve which are useful for X-outlier detection.
As discussed in Chapter 1, an X-outlier has an unusual trend in the Y-axis over an
extended time period. Therefore, if the trend of a load curve is represented by a smoothing
CHAPTER 4. X-OUTLIER DETECTION 22
curve, we have the observation that an X-outlier tends to occur at a “valley” or a “peak”
of the smoothing curve where the smoothing curve has local minimal or maximal values.
An example of this intuition is shown in Figure 4.2 which has a six-year load curve and the
length of the periodicity is one year. The red curve is the smoothing curve modelling the
trends of the load curve. It can be seen that the smoothing curve consists of a sequence of
“valleys” and “peaks”. And an X-outlier lies within a valley inside the rectangle box.
Figure 4.2: Example for an X-outier within a valley
In the following we present step 2, step 3 and step 4 of the proposed method. Unless
otherwise specified, the term “outlier” refers to an X-outlier.
4.2.2 Approximating the Smoothing Curve by Peaks and Valleys
This section is step 2 of the proposed method. To formally define the location of such valleys
and peaks discussed in Section 4.2.1, we first introduce some terminology. The slope at a
time ti for a smoothing curve {(ti, m̂i)}ni=1 is defined by ∆mi∆ti
, where ∆mi = m̂i − m̂i−1,
∆ti = ti − ti−1, for 2 ≤ i ≤ n. A time t in an interval is called a steep time point if the
absolute value of the slope at time t is maximum in the interval. In other words, at a steep
time point the smoothing curve ascends or descends at the maximum rate in the concerned
interval. Note that there could be more than one steep time point in an interval.
In the smoothing curve, an interval [a, b] is maximal-decreasing if the slope at every
point in [a, b] is ≤ 0 and any interval containing [a, b] has at least one point with positive
slope. Let c and c′ be in a maximal-decreasing interval [a, b]. [c, b] is convex-decreasing if
c is the last steep time point in [a, b], and [a, c′] is concave-decreasing if c′ is the first steep
CHAPTER 4. X-OUTLIER DETECTION 23
time point in [a, b]. Since c is the last steep time point and c′ is the first steep time point,
the concave-decreasing interval [a, c′] and convex-decreasing interval [c, b] overlap at most
one point.
In Figure 4.3, the values on the curve are the slopes of the smoothing curve. The
horizontal axis is time. [t1, t5] is a maximal-decreasing interval. t3 and t4 are steep time
points in [t1, t5]. Since t4 is the last steep time point in [t1, t5], [t4, t5] is a convex-decreasing
interval but [t3, t5] is not. [t1, t3] is a concave-decreasing interval.
Figure 4.3: Smoothing curve [t1, t12]. The horizontal axis is time. The values on the curveare the slopes at each point. [t1, t5] is a maximal-decreasing interval; [t6, t10] is a maximal-increasing interval. [t4, t8] is a ∪ shape.
Similarly, we can define concave and convex intervals for increasing intervals. In the
smoothing curve, an interval [a, b] is maximal-increasing if the slope at every point in [a, b]
is ≥ 0 and any interval containing [a, b] has at least one point with negative slope. Let c
and c′ be in a maximal-increasing interval [a, b]. [c, b] is concave-increasing if c is the last
steep time point in [a, b], and [a, c′] is convex-increasing if c′ is the first steep time point
in [a, b]. Notice that [a, c′] and [c, b] overlap at most one point.
In Figure 4.3, [t6, t10] is a maximal-increasing interval, t8 and t9 are steep time points,
[t6, t8] is a convex-increasing interval because t8 is the first steep time point in [t6, t10].
[t9, t10] is a concave-increasing interval.
The following definitions 4 and 5 formalize the notions of valleys and peaks.
Definition 4: (∪ shape) For a smoothing curve, a ∪ shape is a sub curve T∪ =
{(tp, m̂p)}jp=i such that for some k with i≤k≤j, [i, k] is a convex-decreasing interval and
[k + 1, j] is a convex-increasing interval.
Definition 5: (∩ shape) For a smoothing curve, a ∩ shape is a sub curve T∪ =
{(tp, m̂p)}jp=i such that for some k with i≤k≤j, [i, k] is concave-increasing interval and
CHAPTER 4. X-OUTLIER DETECTION 24
[k + 1, j] is a concave-decreasing interval.
In Figure 4.3, the curve [t4, t8] is a ∪ shape formed by a convex-decreasing interval
[t4, t5] and a convex-increasing interval [t6, t8]. The curve [t9, t12] is a ∩ shape formed by a
concave-increasing interval [t9, t10] and a concave-decreasing interval [t11, t12]. Notice that
adjacent ∪ shape and ∩ shape overlap at most one point. To see this, consider the adjacent
∪ shape [t4, t8] and ∩ shape [t9, t12] in Figure 4.3. The ∪ shape [t4, t8] must end at the first
steep time point t8 and the ∩ shape [t9, t12] must start at the last steep time point t9.
There might be a gap between two adjacent ∪ shape and ∩ shape. To cover all data
points on the smoothing curve, we extend each shape to cover a half of the gap on each of
its two ends.
4.2.3 Identifying Outliers
This section is step 3 of the proposed method. Intuitively, ∪ shapes and ∩ shapes capture
the regions on the smoothing curve where the raw load curve has large drops and large rises.
Such regions are the potential places where outliers may occur. Therefore, ∪ shapes and ∩shapes are candidate regions for outliers.
Definition 6: (Candidate Regions) A candidate region for an outlier is the region for
a ∪ shape or ∩ shape of the smoothing curve.
All candidate regions can be found by computing ∪ shapes and ∩ shapes following
Definition 4 and Definition 5. We would like to remind that the length of a ∪ shape or ∩shape, thus the length of a candidate region, is determined by the width of a drop or rise
in the smoothing curve, which is independent of the length of the periodicity. There is no
correlation between a candidate region and a period of the periodicity.
The ∪ shape and ∩ shape provide candidates for outliers based on local neighborhood
information. A candidate may or may not be a real outlier depending on whether or not
it is a part of regular pattern. In other words, a candidate is not a real outlier if it occurs
regularly according to periodicity. As well known, a load curve has daily, weekly, seasonal,
or yearly periodicity although there may be noise or time shifting in the periodicity.
Candidate regions are extracted based on local neighborhood information (i.e., valleys
and peaks). Whether a candidate region contains a real outlier depends on whether the data
in the region deviates from the given periodicity. A candidate is not a real outlier if similar
load data occurs regularly in other periods according to the periodicity. The similarity
should take into account background noise and time shifting of periodicity.
CHAPTER 4. X-OUTLIER DETECTION 25
To identify all outliers, we consider every candidate region r found in the previous step.
Let C∗ denote the portion of the raw load curve data contained in r. Notice that C∗ contains
raw data in the load curve, not the data in the smoothing curve. If the data in C∗ occurs
approximately in the corresponding regions in different periods, C∗ is not a real outlier but
a part of the periodicity. For this purpose, we extract all the sub load curves C1, C2, . . . , Ck,
where each Ci is the portion of the raw load curve in the corresponding region of C∗ in the
i-th period. If C∗ is “similar” to the majority of C1, C2, . . . , Ck, C∗ is considered normal;
otherwise, C∗ is considered a real outlier.
The remaining question is how to measure the similarity between C∗ and Ci of the
same length. There are two considerations in choosing the similarity measure. First, the
similarity measure should be less sensitive to background noise. For example, the two load
curves TA and TB in Figure 4.4 should be considered similar despite some variability at the
first peak due to background noise. Second, the similarity measure should be less sensitive
to time shifting and stretching commonly observed in load curve data. For example, load
curves TA and TB in Figure 4.5 should be considered similar despite a small time shifting
and stretching.
Figure 4.4: Two similar load curves with noise
The Euclidean distance commonly used is not suitable for our purpose because it is
sensitive to time shifting and stretching. If the Euclidean distance was used, the two curves
TA and TB in Figure 4.5 would be recognized as dissimilar. What we need is a similarity
measure that will examine a small neighborhood in search of matching points and skip
noisy points. For this purpose, the Longest Common Sub-Sequence (LCSS) [22] concept
can be adopted. LCSS is a coarse-grained similarity measure in the sense that it measures
similarity in terms of “trends” instead of exact points. Below, we describe how LCSS is
extended to measure the similarity of load curves.
CHAPTER 4. X-OUTLIER DETECTION 26
Figure 4.5: Two similar load curves with time shifting and stretching
Given two load curves A =< a1, a2, . . . , am > and B =< b1, b2, . . . , bn > which corre-
spond to C∗ and Ci, we want to find the longest subsequence common to both A and B.
The idea is as follows. To allow time shifting and stretching, ai and bj that are within some
time proximity are examined for matching. If these load points are similar, they are con-
sidered as a match and are kept. Dissimilar values in one or both load curves are dropped.
Mathematically, given an integer δ and a real value ε, the cumulative similarity Si,j(A,B)
or Si,j is defined as follows:
Si,j =
0, if i = 0 or j = 0
1 + Si,j , if |ai − bj | ≤ ε and |i− j| ≤ δmax(Si,j−1, Si−1,j) otherwise
(4.1)
In Equation 4.1, the first if statement does the initialization for the shortest prefix. The
second if statement builds the similarity recursively: If |ai−bj | ≤ ε and if ai and bj are close
enough in time, i.e., |i − j| ≤ δ, ai and bj are matched and the similarity is incremented.
Note that ε represents a tolerance of noise in load and δ represents a tolerance of time
shifting and stretching.
Let |A| and |B| be the length of A and B, respectively The LCSS similarity of A and
B is given by
γ(δ, ε, A,B) =S|A|,|B|
min(|A|, |B|)(4.2)
where S|A|,|B| is the length of the common subsequence to both A and B computed by
Equation 4.1. For a user-specified threshold, we say that A and B are similar if
γ(δ, ε, A,B) ≥ θ (4.3)
CHAPTER 4. X-OUTLIER DETECTION 27
Before LCSS is applied to A and B, piecewise aggregate approximation (PAA) [24] is
utilized to reduce the dimensionality of A and B. In PAA, a load curve T = {(ti, yi)}ni=1
of length n can be represented in a w-dimensional space by a vector T =< t1, t2, . . . , tw >.
The i-th element of T is calculated by the following equation:
ti =w
n
nwi∑
j= nw
(i−1)+1
yj (4.4)
The load curve T is divided into w segments with equal size. Each segment is represented
by its mean value.
4.2.4 Repairing Outliers
This section is step 4 of the proposed method. After detection, outliers should be re-
placed with valid data in the load curve. The replacing data can be derived from the data
in the corresponding time interval in other different periods with an adjustment for the
increasing/decreasing long-term trends over time. This is expressed using the following
multiplicative model [23]:
Y (ti) = T (ti) ∗ S(ti) (4.5)
where Y (ti) represents the value that will replace the abnormal value at a time ti belonging
to an outlier. T (ti) represents the long-term trends and S(ti) represents the periodic index
(i.e., how much the load curve deviates from the long-term trends at time ti).
T (ti) can be estimated by the smoothing curve defined in Equation 3.2 with an appro-
priate smoothness level. However, in the presence of outliers, the smoothing curve may have
some errors around the outliers to be replaced. To address this problem, the outliers in the
load curve are replaced firstly by the average of the data at the corresponding time in the
previous and next periods. If load points at these times are also outliers themselves, the
corresponding data from earlier and later periods is examined until normal data is obtained.
After that, the smoothing curve is re-generated using Equation 3.2 to obtain T (ti).
The periodic index S(ti) at a time ti belonging to an outlier is estimated by the average
of the periodic indexes at the corresponding time of its previous and next period, that is,
S(ti) =1
2(S(ti − l) + S(ti + l)) (4.6)
CHAPTER 4. X-OUTLIER DETECTION 28
where l is the length of the periodicity. In the case where the data at previous and next
periods are outliers, earlier and later periods are examined until normal data is obtained.
Note that for a time ti not belonging to an outlier, the periodic index at ti is computed by
its definition:
S(ti) =yi
T (ti)(4.7)
where yi is the load data value at time ti. After T (ti) and S(ti) for a time point belonging
to an outlier are obtained, Equation 4.5 is utilized to produce the replacing value Y (ti).
4.3 Practical Issues
The proposed approach uses several parameters: the smoothing parameter h (see Section
3.3), the load stretching threshold ε and the time stretching threshold δ (see Equation
4.1), and the LCSS similarity threshold θ (see Equation 4.3). A question is how to set
the values of these parameters. One approach is to use some statistically “optimal” setting
such as the optimal smoothing parameter [43, 44]. In the case of applications where the
user has background knowledge, the user often has the desire to have a control over a small
number of settings. This is the case for BC Hydro and we believe that the situation is
similar in other utilities. In particular, often the result produced by the “optimal” setting
was not satisfactory and the satisfactory result was not produced by the “optimal” setting.
A close look reveals that certain background knowledge or certain business rules prefer
certain solutions to others. To address this issue, we take a practical approach to provide
a mechanism that helps the user to identify a proper setting of parameters. Below, we
describe such an approach using the smoothing parameter h as an example, but the same
approach can be applied to other parameters.
Recall that a larger h produces a smoother smoothing curve, thus, models less detail of
the data. Thus a proper choice of the smoothing parameter h is crucial for detecting the
outliers at a proper resolution. In practice, the user does not have to make such a choice
in advance. A software tool with a user-friendly interface has been developed, which allows
the user to slide a bar for the smoothing parameter and displays the smoothing curve and
the identified outliers to the user interactively. Based on visual inspection of the raw data,
the smoothing curve, and the identified outliers, the user can either accept the results or
slide the bar again for a different choice of h and get a display of the new smoothing curve
CHAPTER 4. X-OUTLIER DETECTION 29
and outliers based on the new choice of h. In our experiences, after several trials the user
quickly converges to a proper choice of the smoothing parameter. In our case, five users
have had experience on this. At the beginning the users need five to six trials on average to
get a proper smoothing curve for outlier detection with good results. After they get more
familiar with it, the number of trials are reduced to three to four.
A screen shot of the software tool is shown in Figure 4.6. The left side displays data
selections and algorithm options. When the user selects a data set and an algorithm, the raw
data will be displayed in the upper window and the data after detecting and/or repairing
outliers will be shown in the lower window. The user can slide the bar for the smoothing
parameter and display the outliers detected based on a different smoothing parameter.
Figure 4.6: The system tool developed
CHAPTER 4. X-OUTLIER DETECTION 30
4.4 Experiments
This section evaluates the accuracy of the trend based algorithm proposed for detecting
X-outliers. The rest of this section is structured into four subsections: data selection,
evaluation criteria, parameter settings, and accuracy. All smoothing curves were generated
using the Nadaraya-Watson estimator.
4.4.1 Data Selection
Ten data sets from the industrial load curves in the BC Hydro system were used for our
experiments. These data sets are hourly electricity consumptions in different areas for the
six years from October 2004 to October 2010, with 24 × 365 × 6 = 52560 observations in
each data set. All data sets have the yearly periodicity and are categorized into two types:
five data sets contain no outliers and the other five data sets contain 21 outliers in total,
with 3 to 5 outliers in each data set. Note that these outliers are not usual deviations from a
neighborhood; they are deviations from the yearly periodicity. These outliers were identified
manually by experienced engineers in the industry in advance and were pre-labeled. We use
such pre-labeled outliers as the “ground truth” to evaluate the accuracy of the proposed
method. More details will be explained shortly. In addition, the time is normalized into the
interval [0, 1].
4.4.2 Evaluation Criteria
Recall that the proposed method uses each ∪ shape and ∩ shape (on the smoothing curve)
to detect an X-outlier or a non-X-outlier. The outliers pre-labeled by the user are not
necessarily delimited in the same way as such ∪ shapes and ∩ shapes are limited. For this
reason, we cannot simply count the pre-labeled outliers that are detected as X-outliers. To
address this issue, we consider accuracy at the observation level as follows. Let D denote
the set of observations on the load curve that are detected in the sense of belonging to the
outliers detected by an algorithm, let L denote the set of observations on the load curve
that are pre-labeled in the sense of belonging to the pre-labeled outliers. Let |S| denote
the cardinality of a set S. Precision (P ) is the percentage of detected observations that
were pre-labeled. Recall (R) is the percentage of pre-labeled observations that are detected.
F-measure (F ) is the harmonic mean of precision and recall. Mathematically,
CHAPTER 4. X-OUTLIER DETECTION 31
P =|L ∩D||D|
(4.8)
R =|L ∩D||L|
(4.9)
F =2× P ×RP +R
(4.10)
A higher F entails both a high precision and a higher recall, thus, more agreement
between the ground truth and the detection made by an algorithm. P , R and F will be
used as our accuracy criteria.
We compare the proposed method with two baseline algorithms. The first is the tra-
ditional smoothing method in [5], which uses a smoothing curve to model the trend of
data and uses a confidence interval around the smoothing curve to detect outliers. We use
the usual 95% confidence level for generating the confidence interval. The second baseline
algorithm is the running median method in [41]. At each observation (ti, yi) of the load
curve, a running median mi of a sub curve Ti centered at ti is computed and a filter band
Bi = mi ± 3 × SD(Ti − mi) is constructed, where Ti − mi is the sub curve obtained by
shifting Ti down by mi and SD() is the standard deviation. Then all observations outside
the filter bands are identified as outliers. For the running median method, we consider five
levels for the length of Ti. For i = 1, . . . , 5, the level i has the length 24× 7× i.
4.4.3 Parameter Settings
As explained in Section 4.2.3, ε controls the closeness of two matching points ai and bj , and
δ controls how far i could be away from j in the matching. The choices of these parameters
are dependent on the noise level of the periodicity in the dataset. In the given studies, δ is
empirically set to be 14 days, whereas for each candidate of outlier C∗, ε is set to be half of
the standard deviation of the sub load curves < C1, C2, . . . , Ck > that correspond to C∗ in
different years. The LCSS similarity threshold θ is set to be 40%. This is the best setting
from several settings we tried: 30%, 35%, 40%, 45% and 50%. w = n24 (for PAA, where n
is the number of data points in Ci) which means every 24 data points (data of one day) in
Ci and C∗ are represented by their mean value.
The most important parameter is the smoothing parameter for generating the smoothing
curve, which determines the level of details to be modeled. The literature has suggested
CHAPTER 4. X-OUTLIER DETECTION 32
some optimal setting that minimizes some notion of modeling error when the smoothing
curve is used to model the data. Among others, the mean integrated squared error (MISE)
[44] is a commonly used error measure. If modeling the data is the ultimate goal, such as
in forecasting, such an optimal setting would be sufficient. However, our goal is to identify
a special type of corrupted data or outliers, where the notion of error is the deviation from
a given periodicity. This notion of error requires checking the re-occurrence of a pattern in
*all* periods, therefore, it is not sufficient to minimize the standard estimation error such as
MISE where local data points play a more important role than distant points. As a matter
of facts, the optimal smoothing parameter setting suggested in [44] does not depend on the
periodicity used by our outlier detection problem, thus, is unlikely to produce a good result
for our problem. This point will be evaluated experimentally in Section 4.4.4.
The theoretical range of the smoothing parameter h is (0,∞). A smaller h produces a
rougher smoothing curve whereas a larger h produces a smoother curve. In our experiments,
the following 10 smoothness levels are considered for generating the smoothing curve in the
trend based algorithm and the traditional smoothing method. The h at level i is given by
h =1
480− 45× i, i = 1, 2, . . . , 10 (4.11)
Level 1 corresponds to the roughest smoothing curve and level 10 corresponds to the
smoothest smoothing curve. These levels are not equally spaced and are chosen so that
they cover a wide range of smoothness. The use of such smoothing levels assumes that the
time has been normalized into the interval [0, 1], where 0 corresponds to the starting time
and 1 corresponds to the ending time.
4.4.4 Accuracy
Results on Data Sets without X-Outliers In the first experiment, we study how dif-
ferent algorithms perform on the five data sets with no X-outlier. Since no outlier was
pre-labeled, all detected points are false positives and the percentage of detected points is
an indication of performance. Outlier detection was performed on each data set individually.
Let |D| be the sum of the numbers of detected points in the five data sets and let |T | be the
total number of points in the five data sets. |D|/|T | is the percentage of points that were
incorrectly detected. Table 4.1 and Table 4.2 summarize |D|/|T | for the three methods.
Each row corresponds to a smoothness level defined by Equation 4.11, and the last row
CHAPTER 4. X-OUTLIER DETECTION 33
corresponds to the optimal setting of the smoothing parameter suggested in the literature
[44], hopt = 1.06× σ× n−15 , where σ is the sample standard deviation of the samples and n
is the number of samples.
Table 4.1: Proposed method and traditional smoothing method for data sets with no X-outliers
Smoothness LevelProposed Method Traditional Smoothing Method
|D|/|T | |D|/|T |1 1.7% 4.5%2 0.9% 4.5%3 0.7% 4.6%4 0.4% 4.6%5 0.2% 4.7%6 0% 4.8%7 0% 4.9%8 0% 4.9%9 0% 4.9%10 0% 5.0%hopt 0% 4.7%
Table 4.2: Running median method for data sets with no X-outliers
Running Median
Length Level level |D|/|T |1 2.2%2 2.1%3 2.2%4 2.2%5 2.2%
When the smoothness level is low, the proposed method has a small |D|/|T | because
the smoothing curve is rather rough. The traditional smoothing method has a much higher
|D|/|T | across all smoothing levels, and the running median method has a lower |D|/|T | than
the traditional smoothing method, but a higher |D|/|T | than the proposed method. The
reason for the higher false positives of the traditional smoothing method and the running
median method is that these algorithms do not consider the periodicity of data, therefore,
even if a peak or valley is part of the periodicity, it may still be considered as an outlier.
CHAPTER 4. X-OUTLIER DETECTION 34
Such peaks and valleys will not be considered as outliers by the trend based method.
Results on Data Sets with X-Outliers In the second experiment, we consider the
five data sets with pre-labeled outliers. We examined the precision (P ) and recall (R) and
F-measure (F ) on the five data sets with pre-labeled X-outliers. First, outlier detection was
performed on each data set individually. Then, D and L were aggregated over the five data
sets, and P/R/F was computed using the aggregated D and L. The results are summarized
in Table 4.3 and Table 4.4. Recall that precision is the percentage of detected points that
were pre-labeled (thus correctly detected), recall is the percentage of pre-labeled points that
were detected, and F-measure is the harmonic mean of precision and recall.
Table 4.3: Proposed method and traditional smoothing method for data sets with X-outliers
Smoothness LevelProposed Method Traditional Smoothing Method
P R F P R F
1 83% 98% 90% 0.5% 0.2% 0.3%2 85% 98% 91% 0.6% 0.3% 0.4%3 86% 98% 92% 0.9% 0.4% 0.6%4 87% 97% 92% 1.3% 0.6% 0.8%5 90% 95% 92% 3.2% 1.5% 2.1%6 91% 95% 93% 5.7% 2.7% 3.7%7 92% 92% 92% 6.4% 2.3% 4.0%8 91% 84% 87% 7.6% 3.3% 4.6%9 94% 66% 77% 13.1% 5.6% 7.9%10 93% 48% 63% 16.8% 7.0% 9.9%hopt 92% 48% 63% 17.2% 6.0% 8.9%
Table 4.4: Running median method for data sets with X-outliers
Running Median
Length Level level P R F
1 2.4% 0.4% 0.7%2 2.8% 0.5% 0.8%3 2.7% 0.5% 0.8%4 2.6% 0.5% 0.8%5 1.5% 0.2% 0.4%
A clear trend shown in Table 4.3 and Table 4.4 is that the F-measure of the proposed
CHAPTER 4. X-OUTLIER DETECTION 35
method is significantly higher than those of the traditional smoothing method and the run-
ning median method. Moreover, this gain was observed over all choices of parameter settings
of the three methods, thus, was not due to a careful choice of parameter settings. Specif-
ically, the traditional smoothing method and the running median method failed miserably
as P and R were extremely low, suggesting that many pre-labeled outliers were missed and
many detected outliers are part of normal data. These methods consider any deviation from
a local neighborhood as an outlier, even though such deviation is part of the periodicity in
the data. In contrast, both P and R of the proposed method are high, with the best results
being 91% and 95%, respectively, at the smoothness level of 6. This study clearly shows
that the traditional smoothing method and the running median method are not suitable for
detecting X-outliers, and the proposed method has achieved the expectation of detecting
X-outliers.
This study clearly shows that traditional methods are not suitable for detecting X-
outliers, and the proposed method has met the expectation of detecting X-outliers. The
study also shows that the optimal setting hopt of the smoothing parameter suggested in the
literature for the standard forecasting problem failed to generate the best result for outlier
detection, and that the proposed user-sliding bar is highly effective for choosing the best
smoothness level.
It is worth noting an interesting trend on the proposed method: as the smoothness level
increases, P increases and R decreases; the best result in terms of the highest F-measure
was attached at a suitable smoothness level. This trend can be explained as follows. At
a low smoothness level, the smoothing curve models more details of the load curve and
many small ∪ shapes and ∩ shapes were generated while some of them are pure noise. As
a result, the number of false positives is large and precision is low. As the smoothness level
increases, fewer details were modeled and larger ∪ shapes or ∩ shapes were identified, which
more likely correspond to real outliers. Therefore, the number of false positives decreases.
When the smoothness level further increases, the smoothing curve becomes more flat, thus,
under-fits the data. In this case, a ∪ shape or ∩ shape becomes so large that it contains a
portion of normal data, which leads to the failure of detecting some real outliers, thus, a
low recall.
To further illustrate the above points, an example for one data set is depicted in Figures
4.7(a), 4.7(b) and 4.7(c) for various smoothness levels. The curve in red is the smoothing
curve and each rectangle marks an X-outlier detected by the proposed method. Figure
CHAPTER 4. X-OUTLIER DETECTION 36
(a)
(b)
(c)
Figure 4.7: Outlier detection for a six-year test data set. (a) Outlier detection result forsmoothness level 5. (b) Outlier detection result for smoothness level 1. (c) Outlier detectionresult for smoothness level 10.
CHAPTER 4. X-OUTLIER DETECTION 37
4.7(a) shows that the smoothing curve at level 5 models the load curve properly and there
is no false positive or false negative. Figure 4.7(b) shows the result at smoothness level 1
where too much local information is modeled by the rather rough smoothing curve, which
leads to small ∪ shapes and ∩ shapes. A few of such small ∪ shapes and ∩ shapes, marked
as false positives FP1, FP2, FP3, FP4 and FP5, are pure noise and mislead the algorithm
to consider them as outliers. The exact opposite was observed in Figure 4.7(c) where the
smoothing curve at smoothness level 10 is rather flat and ∪ shapes and ∩ shapes are so
wide that a large portion of normal data are contained in them; consequently, no outlier
was detected.
Figure 4.8 shows the load curve data after repairing the outliers identified in Figure 4.7(a)
based on the repairing method in Section 4.2.4. The outliers are replaced by representative
y values.
Figure 4.8: Outlier repairing for the six-year test data set.
The proposed method is applicable to load curve data of any time granularity and length.
An example for outlier repairing for a test data of five weeks is depicted in Figure 4.9. Figure
4.9(a) illustrates the data before outlier repairing, with the detected outlier being marked
by the star rectangle. Figure 4.9(b) shows the data after outlier repairing based on the
method in Section 4.2.4.
CHAPTER 4. X-OUTLIER DETECTION 38
(a)
(b)
Figure 4.9: Outlier repairing for a five-week test data set. (a) Test data before outlierrepairing. (b) Test data after outlier repairing.
Chapter 5
Trend Based Periodicity Detection
In this chapter the trend based periodicity detection algorithm will be presented. We start
with an overview of a highly related periodicity detection technique call ”WARP” (The
WArping foR Periodicity Algorithm). Then the trend based periodicity detection will be
introduced in Section 5.2. In Section 5.3, the performance of the trend based algorithm is
evaluated.
5.1 Preliminaries
In this section we review the WARP algorithm in [28], a periodicity detection algorithm for
a sequence of discrete symbols. In the next section we will extend the WARP algorithm to
deal with a real valued time series.
5.1.1 Periodicity
For convenience, in this chapter a time series T = (ti, yi)ni=1 is denoted as T = e1e2 . . . en,
an ordered list of n feature values ei at times i, 1 ≤ i ≤ n. A sequence is the special case
of time series where each feature value ei is a discrete symbol taken from a dictionary of
alphabets. We adopt the notion of segment periodicity from [27]: A time series T is periodic
with a period p if it can be divided into equal length segments, each of length p, that are
“almost similar”. For example, the sequence T = “abcabcabb” has a period 3 with the noise
“b” at the end.
In this chapter, we consider time series where the feature values ei are real values for
39
CHAPTER 5. TREND BASED PERIODICITY DETECTION 40
periodicity detection. To deal with such real valued time series, most existing periodic-
ity detection algorithms assume that a time series is first transformed into a sequence of
discrete symbols using a binning method (mostly uniform binning scheme, i.e., equi-width
binning where each bin has the same size or equi-depth binning where each bin contains
approximately the same number of data points.). The data points in the same bin are rep-
resented using the same symbol. Thus, most of the existing algorithms deal with a sequence
of discrete symbols. The dynamic time warping based WARP algorithm [28] is such an
algorithm. Below, we review this algorithm.
5.1.2 Dynamic Time Warping
Dynamic time warping (DTW) [35] is a measure of the distance between two sequences A
= a1a2 . . . am and B = b1b2 . . . bn. The DTW distance of A and B, denoted as DTW (A,B)
or DTW (m,n), is computed by a dynamic programming formulated as
DTW (i, j) = d(ai, bj) +
DTW (i− 1, j − 1)
DTW (i− 1, j)
DTW (i, j − 1)
(5.1)
where the function d(ai, bj) returns the distance between two symbols ai and bj , defined as
d(ai, bj) =
0 ai = bj
1 ai 6= bj(5.2)
To compute the DTW distance, an m × n matrix is constructed where the cell (i, j)
contains the value d(ai, bj). A warping path is a contiguous path from cell (1, 1) to cell
(m, n) 3, corresponding to a particular alignment between the two sequences. The DTW
distance is defined as the minimum cost of any warping path from (1, 1) to (m, n). A locality
constraint is added to control how far away i could be from j when computing DTW(i, j).
A window size w can be used to specify this constraint, that is, if ai is aligned with bj , then
|i− j| ≤ w.
Figure 5.1 shows an example of the DTW matrix for two sequences “abcbde” and
“abcefg”. The minimum cost warping path is circled. Figure 5.2 shows the actual alignment
represented by this minimum cost warping path. The warping cost of this path is 0 (a↔ a)
3From cell (i, j), the path could go to cell (i + 1, j), cell (i, j + 1) or cell (i + 1, j + 1).
CHAPTER 5. TREND BASED PERIODICITY DETECTION 41
Figure 5.1: An example for the DTW matrix
Figure 5.2: Alignment for the DTW matrix
+ 0 (b↔ b) + 0 (c↔ c) + 1 (c↔ b) + 1 (c↔ d) + 0 (e↔ e) + 1 (f ↔ e) + 1 (g ↔ e) =
4, where “↔” means “paired with”.
5.1.3 The WARP Algorithm
To detect the periodicity in a single sequence T = e1e2 . . . en, the WARP algorithm [28]
compares the original sequence T with a sequence obtained by shifting some number of
symbols, p. If there is high similarity between the two in terms of the DTW distance, p is
considered a candidate period. Specifically, for a given positive integer p, T(p) denotes the
first n − p symbols and T (p) denotes the last n − p symbols. If DTW(T(p), T(p)) is small
enough, p is considered a candidate period.
For example, with T = e1e2 . . . e9 = “abcabcabd”, T(3) = “abcabc” and T (3) = “abcabd”,
and DTW(T(3), T(3)) = 1. If this warping cost is considered small enough, p = 3 is a
candidate period. The reason is as follows. Consider the re-occurrence of e1 in the actual
alignment in Figure 5.3, e4 = e1 and e7 = e4, where the symbols on the LHS of = are from
T (3) and the symbols on the RHS of = are from T(3). These equalities imply e1 = e4 = e7.
CHAPTER 5. TREND BASED PERIODICITY DETECTION 42
For longer sequences T(3) and T (3) with a small DTW distance, these equalities imply that
e1 re-occurs at the regular interval of three time units. The same argument applies to e2
and e3. Therefore, e1e2e3 is periodic with a period 3.
Figure 5.3: Alignment for T(3) and T (3) where T = “abcabcabd”
To find all candidate periods, for p = 1, . . ., n/2, DTW(T(p), T(p)) is computed. Note
that the maximum value of DTW(T(p), T(p)) is n − p. For each possible value of p, the
confidence of p [28] is
(n− p−DTW (T(p), T(p))/(n− p). (5.3)
If the confidence of p is larger than or equal to a given threshold τ , p is considered a candidate
period.
Figure 5.4: DTW matrix for sequences T and T where T = e1e2 . . . en
Figure 5.4 shows the DTW matrix for sequences T and T where T = e1e2 . . . en. It can
be seen that to compute DTW(T(p), T(p)) is to find the minimum warping path from cell
(1, p + 1) to cell (n − p, n) from the DTW matrix. It should be noted that the values in
CHAPTER 5. TREND BASED PERIODICITY DETECTION 43
the diagonal are all zeros in the DTW matrix. In this case, the zero values in the diagonal
would drag the minimum warping path for T(p) and T (p) towards the diagonal, meaning that
the p-positions shift is ignored in alignment for T(p) and T (p). Therefore, in order to avoid
this situation, the zero values in the diagonal are replaced by infinity values (∞) [28].
Let cp be the warping cost of a candidate period value p, and ca be the warping cost
of any adjacent period value around p. cp and ca have the relation cp ≤ ca. Therefore, in
order to reduce the number of redundant periods, only the candidate periods with the local
minimal warping cost cp are considered [28]. For a more detailed description of the WARP
algorithm, the reader is referred to [28].
5.2 The Trend Based Algorithm
The WARP technique described in Section 5.1.3 cannot be directly applied to real valued
time series. In this section, we present a novel periodicity detection algorithm, called trend
based algorithm, for a real valued time series. The algorithm has four steps:
1. Approximate the time series using a smoothing curve;
2. Model the trends in the smoothing curve by a sequence of ∪ shapes and ∩ shapes that
correspond to the peaks and valleys in the smoothing curve;
3. Identify the periodicity by extending the DTW distance to sequences of ∪ and ∩shapes and taking into account the similarity between such shapes;
4. Express the periodicity in the length of time.
As mentioned in Section 4.2, generating a smoothing curve takes O(n2) time. This is
the most time consuming part for the algorithm, resulting in the time complexity of the
algorithm to be O(n2).
Step 1 is already explained in Chapter 3 and step 2 is explained in Section 4.2.2. Below,
we first make some observations useful for periodicity detection in Section 5.2.1. And then
we explain step 3 to step 4 in detail in Section 5.2.2 and Section 5.2.3, respectively.
5.2.1 Observations for Periodicity Detection
In the following we will make some observations on a few properties of the smoothing curve
which are useful for periodicity detection.
CHAPTER 5. TREND BASED PERIODICITY DETECTION 44
For periodicity detection, as discussed in Chapter 1, the most interesting information
lies at peaks and valleys of the smoothing curve. The sequence of such peaks and valleys
describes the trends on how the data goes up and down while paying less attention to actual
data values. Based on this observation, we shall detect the periodicity in the data from such
peaks and valleys. An example of this intuition is shown in Figure 5.5 which has eight
weeks’ data with a weekly periodicity. The period is approximately the length of a peak
plus the length of a valley of the smoothing curve in this example.
Figure 5.5: Example for a period consisting of a peak and a valley
5.2.2 Identifying Periodicities Using The Shape Sequence
This section is step 3 for the trend based algorithm. As discussed in Section 4.2.2, the
smoothing curve is partitioned into a sequence of ∪ shapes and ∩ shapes. In this chapter,
the sequence is called the shape sequence and is denoted by S. Further, each ∪ shape and
∩ shape is represented by a feature vector (sig, len, max, ave, min), where sig indicates
whether it is a ∪ shape or ∩ shape; len is the number of time points of the shape; max is
the highest value of the shape; ave is the average value of the shape; min is the lowest value
of the shape. We can tell if two shapes are similar to each other using their feature vectors.
More details will follow shortly.
Now we will detect the periodicity using the shape sequence S defined above. Let us
consider an example to illustrate the idea. Figure 5.6 represents eight weeks time series
data and the smoothing curve. The second rectangle box indicates a periodic pattern that
occurs in most of the weeks. This pattern is reflected by a periodic pattern of one ∩ shape
CHAPTER 5. TREND BASED PERIODICITY DETECTION 45
(weekdays) followed by one ∪ shape (weekend). The first and third rectangle boxes indicate
deviations from this pattern where the ∩ shapes and ∪ shapes have quite different time
lengths and y values from those in the other weeks. The periodicity of this data set comes
from the fact that a majority of weeks have a similar sequence of a large ∩ followed by a
small ∪ shape. Below, we describe an algorithm for detecting periodicity based on this idea.
Figure 5.6: Example for a period
The main idea of our algorithm is to extend the WARP framework in Section 5.1.3 to the
shape sequence S. A key component of the WARP algorithm is the DTW distance between
two sequences A = a1a2 . . . am and B = b1b2 . . . bn of symbols. The DTW distance makes
use of the distance function d(ai, bj) in Equation 5.2 for the two symbols ai and bj . For two
shapes ai and bj , a direct application of this distance function always yields d(ai, bj) = 0
because it is unlikely that two shapes are exactly identical.
To adopt the distance function d(ai, bj) to two shapes ai and bj , we introduce a difference
threshold ε: ai and bj are considered similar if they have the same type, i.e., either both
are ∪ shapes or both are ∩ shapes, and if their relative difference in length, max value
and min value is at most ε. Precisely, let ai = (sig a, len a,max a, ave a,min a) and
bj = (sig b, len b,max b, ave b,min b). d(ai, bj) = 0 if all of the following conditions hold:
1) sig a = sig b,
2) |len a− len b|/Max(len a, len b) ≤ ε,3) |max a−max b|/Max(max a,max b) ≤ ε,4) |ave a− ave b|/Max(ave a, ave b) ≤ ε, and
5) |min a−min b|/Max(min a,min b) ≤ ε.
CHAPTER 5. TREND BASED PERIODICITY DETECTION 46
Otherwise, we define d(ai, bj) = 1. With this definition of d(ai, bj) for two shapes, the
WARP framework in Section 5.1.3 can be applied to the shape sequence S, treating each
shape as a symbol.
5.2.3 Computing the Length of Candidate Periods
This section is step 4 for the trend based algorithm. The above algorithm returns a set of
candidate periods, where each candidate period is a sequence of ∪ shapes and ∩ shapes.
The final step of our algorithm is to transform each such candidate period into a candidate
period in terms of the length of time. Consider a candidate period p. Suppose that the
shape sequence S has n shapes in total. For i = 1, 2, . . . , p, the i-th shape in the period
p is expected to occur at the location for all the j-th shapes in S, where j = i + k × p,0 ≤ k ≤ (n − i)/p. The time length for the i-th shape in the period p is defined as the
average length of these j-th shapes, and the time length for the period p is defined as the
sum of the time length for the i-th shape in the period p over i = 1, 2, . . . , p. In the presence
of corrupted shapes (an example is shown in Figure 5.6), the median should be used instead
of the average as the latter is more sensitive to the bias introduced by corrupted shapes.
5.2.4 The Algorithm
Summarizing all above steps, the trend based algorithm is resented in Algorithm 1.
Algorithm 1 Trend Based Algorithm
Input: A real valued time series T = e1e2 . . . en, and the confidence threshold τ .Output: All candidate periods for T .Method:1. Generate the smoothing curve C for T using the kernel smoothing technique;2. Extract ∪ shapes and ∩ shapes from C and construct the shape sequence S;3. For p = 1, 2, ,m/2, where m is the number of shapes in S:
a. compute d = DTW (S(p), S(p));
b. compute the confidence defined in Equation 5.3 conf = (m− p− d)/(m− p),if conf ≥ τ and d is a local minimal, add p to Cand;
4. For each p in Cand, output the time length of p.
CHAPTER 5. TREND BASED PERIODICITY DETECTION 47
5.3 Experiments
In this section we study the performance of the trend based algorithm by comparing it
with the WARP algorithm [28], which is known to outperform other algorithms. Section
5.3.1 explains our selection of data sets and Section 5.3.2 explains parameter settings used.
Section 5.3.3 studies the accuracy of periodicity detection. Section 5.3.4 examines the effect
of smoothness level on the trend based algorithm. Section 5.3.5 examines the effect of
discretization on the WARP algorithm. Section 5.3.6 examines the applicability of the
trend based algorithm for detecting multiple periodicities.
5.3.1 Data Selection
Two time series data sets from industrial load curves collected by BC Hydro were used in our
experiments. These are two different data sets from the data sets used in the experiments
in Section 4.4 for X-outlier detection. They are hourly electricity consumption for one year,
one from January 2008 to December 2008 and one from December 2004 to November 2005,
respectively. Both data sets have a weekly periodicity, therefore, the periods are known:
24 × 7 × i hours, where i = 1, 2, . . . , 52/2. An example of one week’s data is shown in
Figure 5.7 where the weekend has a lower consumption than weekdays and night time has a
lower consumption than daytime. In addition, these data sets have different levels of noise.
The first data set, the “Normal” data, has preserved almost every weekly pattern, with a
few exceptions where one or two days in the weekdays were corrupted into low values like
those of weekends in some weeks. The second data set, the “Noisy” data, has about 15% of
data corrupted into low values. Both data sets are presented to the trend based algorithm
and the WARP algorithm. For the WARP algorithm, we first discretized the consumption
values into four bins using equi-width binning. The effect of other binning choices will be
examined in Section 5.3.5.
5.3.2 Parameter Settings
The difference threshold ε for two shapes ai and bj (Section 5.2.2) is set to 30%. This is
the best setting from several settings we tried: 20%, 25%, 30% and 35%. The window size
w used by the DTW distance (Section 5.1.2) is set to 24 × 2 (i.e., two days) for the hourly
based WARP algorithm and is set to 3 (i.e., three shapes) for the trend based method. The
confidence threshold θ (Section 5.1.3) is ranged from 0.7 to 0.9. We divide the smoothness
CHAPTER 5. TREND BASED PERIODICITY DETECTION 48
Figure 5.7: One weeks data with weekly pattern
level of the smoothing parameter for the Nadaraya-Watson estimator into the following five
levels:
h =2i−1
1000, i = {1, 2, . . . , 5} (5.4)
Level i = 1 corresponds to the roughest level and level i = 5 corresponds to the smoothest
level.
5.3.3 Accuracy
The first set of experiments evaluates the accuracy of the trend based algorithm compared
with the WARP algorithm. We say that a detected period pd is correct if there exists a
real period pi such that |pd − pi|/Max(pd, pi) ≤ η. In our experiments, η is set to 5%.
True positive (TP ) is the number of correctly detected periods; false positive (FP ) is the
number of wrongly detected periods; false negative (FN) is the number of periods that
are not detected. The precision (P ), recall (R) and F -measure (F ) are defined as follows:
P =TP
TP + FP(5.5)
R =TP
TP + FN(5.6)
F =2× precision× recallprecision+ recall
(5.7)
CHAPTER 5. TREND BASED PERIODICITY DETECTION 49
Table 5.1: Accuracy comparison on “Noisy” data
Confidence “Noisy” data set
Threshold Trend based algorithm (level 3) WARP
(%) TP FP FN P R F TP FP FN P R F
70 22 4 4 85% 85% 85% 26 51 0 34% 100% 50%75 22 4 4 85% 85% 85% 26 51 0 34% 100% 50%80 22 4 4 85% 85% 85% 21 49 5 30% 81% 44%85 19 4 7 83% 73% 78% 3 46 23 6% 12% 8%90 9 2 17 82% 35% 49% 0 0 26 0%
Table 5.2: Accuracy comparison on “Normal” data
Confidence “Normal” data set
Threshold Trend based algorithm (level 3) WARP
(%) TP FP FN P R F TP FP FN P R F
70 26 0 0 100% 100% 100% 26 39 0 40% 100% 57%75 26 0 0 100% 100% 100% 26 39 0 40% 100% 57%80 26 0 0 100% 100% 100% 26 39 0 40% 100% 57%85 26 0 0 100% 100% 100% 26 39 0 40% 100% 57%90 26 0 0 100% 100% 100% 15 12 11 56% 58% 57%
The accuracy comparison between the trend based algorithm (with smoothness level 3)
and the WARP algorithm for the “Noisy” and “Normal” data sets are shown in TABLE 5.1
and TABLE 5.2. For both data sets, FP of the WARP algorithm is much larger than that of
the trend based algorithm. This is because the WARP algorithm depends on discretization
to map continuous consumption values to a fixed number of bins on a point-by-point basis,
which is not sensitive to the trends in the data. Consequently, many false patterns not
representing the weekly pattern were generated. In contrast, the trend based algorithm
preserves the weekly pattern through the smoothing curve and the ∪ shapes and ∩ shapes
on the smoothing curve. For the “Normal” data set, the trend based algorithm has the
perfect periodicity detection across all confidence thresholds. For the “Noisy” data set,
except for the very high confidence threshold 90%, the trend based algorithm finds most
periods correctly (i.e., 19 or 22 out of 26) while returning a few false positives, yielding the
F-measure of 0.85 that is significantly higher than that of the WARP algorithm. This study
CHAPTER 5. TREND BASED PERIODICITY DETECTION 50
suggests that the trend based algorithm is able to detect periodicity accurately.
5.3.4 Effect of Smoothness Levels
The smoothness level affects the level of details modeled by the trend based algorithm.
We study this effect and summarize the results in TABLE 5.3 and TABLE 5.4. Consider
TABLE 5.4 for example. When the smoothness level is low, say level 1, TP and FP are
extremely low because the smoothing curve models a lot of details, which leads to many
small ∪ shapes and ∩ shapes that are largely contributed by the noise in the raw data. When
DTW is applied to the shape sequence, most of such shapes are considered dissimilar.
When the smoothness increases to level 3, the smoothing curve correctly models a se-
quence of ∪ shapes and ∩ shapes corresponding to high usages during the weekdays and low
usages at the weekend. With such a sequence of ∪ shapes and ∩ shapes, the trend based
algorithm finds all real periods with no false positives.
When the smoothness reaches level 5, the smoothing curve is rather flat and there are
only a few large size ∪ shapes and ∩ shapes that each includes more than one weeks data,
resulting in the situation that the weekly periods were not detected.
Table 5.3: Trend based algorithm for the “Noisy” data (confidence threshold set as 70%)
Smoothness Level TP FP FN P R F
1 0 0 26 0%2 4 0 22 100% 15% 27%3 22 4 4 85% 85% 85%4 14 1 12 93% 54% 68%5 2 0 24 100% 8% 14%
Table 5.4: Trend based algorithm for the “Normal” data (confidence threshold set as 70%)
Smoothness Level TP FP FN P R F
1 1 1 25 50% 4% 7%2 23 0 3 100% 88% 94%3 26 0 0 100% 100% 100%4 19 1 7 95% 73% 83%5 0 0 26 0%
CHAPTER 5. TREND BASED PERIODICITY DETECTION 51
Clearly, a proper choice of the smoothness level is crucial. In practice, the user does not
have to make such a choice in advance. As mentioned in Section 4.3, we have developed
a software tool with a user-friendly interface, which allows the user to slide a bar for the
smoothness level and displays the corresponding smoothing curve interactively. Based on
visual inspection of the fit between the smoothing curve and the time series, the user can
adjust the smoothness level using the sliding bar and immediately get a new smoothing
curve based on the adjusted smoothness level. Typically, after several trials the user is able
to converge to a desired smoothness level. In our case, five users have had experience on
this. At the beginning the users need four to five trials on average to get a proper smoothing
curve for periodicity detection with good results. After they get more familiar with it, the
number of trials are reduced to two to three.
5.3.5 Effect of Discretization on WARP
One of our observations is that the uniform binning assumed in previous works could distort
the trends in the data. To validate this observation, we vary the number of bins in the
WARP algorithm and examine if any choice will produce a better result. The findings are
reported in TABLE 5.5 for equi-width binning and in TABLE 5.6 for equi-depth binning.
The “Normal” data set was used in this experiment.
Table 5.5: WARP on “Normal” data (confidence threshold set as 70%, equi-width binning)
Number of Bins TP FP FN P R F
3 26 45 0 37% 100% 54%4 26 39 0 40% 100% 57%5 26 33 0 44% 100% 61%6 22 24 4 48% 85% 61%7 15 21 11 42% 58% 48%
Even though the “Normal” data set has a strong week periodicity, for all the number of
bins tested and for both binning methods, P (precision) is low due to many false positives.
A larger confidence threshold did not help because it will reduce TP of the WARP algorithm
as shown in TABLE 5.2. In fact, we did not observe any “proper” number of bins. The
reason for this is that the periodicity pattern does not follow equi-width binning or equi-
depth binning. For example, in Figure 5.7 the consumption at daytime of weekdays trends
CHAPTER 5. TREND BASED PERIODICITY DETECTION 52
Table 5.6: WARP on “Normal” data (confidence threshold set as 70%, equi-depth binning)
Number of Bins TP FP FN P R F
3 26 47 0 36% 100% 53%4 26 35 0 43% 100% 60%5 26 34 0 43% 100% 60%6 14 29 12 33% 54% 41%7 9 16 17 36% 35% 35%
to stay within the narrow range [4, 5], and in this case the equi-depth binning will divide
this range into several bins, which clearly destroys the underlying trends. The trend based
algorithm does not have this problem because it uses a smoothing curve to model the trends
in the data.
5.3.6 Multiple Periodicities
Time series data often have multiple periodicities (i.e., daily periodicity, weekly periodicity,
etc.) at the same time. Another advantage of the trend based algorithm is its ability to
detect different periodicities. This is done by using different smoothness levels to model
the trends at different detail levels. We explain this point using the five weeks data set in
Figure 5.8 and Figure 5.9.
In Figure 5.8, the smoothing curve was generated with smoothness level 3 and the curve
models the daily trend by a ∪ shape followed by a ∩ shape. This pattern occurs approxi-
mately every day with the weekend having slightly lower values. With such a sequence of
∪ shapes and ∩ shapes, a daily periodicity will be found by the trend based algorithm.
In Figure 5.9, the smoothing curve was generated (on the same data) with smoothness
level 5 and the curve models the weekly trend by one ∩ shape for weekdays and one ∪shape for weekend. Unlike the smoothing curve in Figure 5.8, the detailed change between
daytime and nighttime on each day is not modeled. With such a sequence of shapes, the
trend based algorithm will be able to find the weekly periodicity.
As discussed in Section 4.3, a user-friendly interface and visualization tool will help the
user to identify a proper smoothness level.
CHAPTER 5. TREND BASED PERIODICITY DETECTION 53
Figure 5.8: Five weeks data with daily patterns, smoothness level 3
Figure 5.9: Five weeks data with weekly patterns, smoothness level 5
Chapter 6
Conclusion and Future Work
Load curve data cleansing is an essential task in power systems. With good quality of
load curve data, high accuracy in load forecasting, system analysis, operation modeling
and planning studies of power systems can be obtained and therefore reliability of power
systems can be improved. In this thesis, a novel class of X-outliers, which are consequences
of various random factors, is presented. We argue that traditional smoothing techniques,
which take into account only local information, are not suitable for detecting X-outliers.
A four-step approach is proposed to detect and repair X-outliers. This includes smoothing
of load curve, representation of smoothing curve by a sequence of ∪ shapes and ∩ shapes,
identification of X-outliers and repairing X-outliers.
Outlier detection in time series data usually involves periodicity detection. For periodic-
ity detection, previous work assumes that real valued data points can be properly discretized
into a small number of bins, and treats a time series as a sequence of discrete symbols. A
major drawback of this approach is that much information is lost because discretization is
not sensitive to the preservation of the trends in the data. Another drawback is that it is
difficult to specify a proper number of bins and the uniform binning scheme is not suitable
for a time series where different parts have different characteristics.
The trend based approach proposed in this thesis addresses these problems by modeling
the trends in the data by a sequence of ∪ shapes and ∩ shapes. These shapes represent
the most interesting information in the data and are extracted from a smoothing curve
that approximates the time series data. The periodic patterns are detected by finding the
re-occurrence of subsequences of ∪ shapes and ∩ shapes, taking into account the similarity
of such shapes. The proposed approach is trend preserving, noise resilient, and flexible for
54
CHAPTER 6. CONCLUSION AND FUTURE WORK 55
detecting multiple periodicities.
Both of the outlier detection and periodicity detection algorithms proposed in this thesis
have the challenge of determining the best smoothing parameter according to time series
with different lengths. In our application, the user interaction is involved. With a user-
friendly interface, the user can easily find the best smoothing parameter after several trials.
In this thesis the time complexity for the proposed outlier detection and periodicity
detection algorithms is O(n2) because generating a smoothing curve takes O(n2) time. A
future direction of research is to develop faster algorithms while still keeping high accuracy.
Bibliography
[1] E. Keogh, J. Lin, and A. Fu, HOT SAX: Finding the most unusual time series subse-
quence: algorithms and applications, In ICDM 2005.
[2] E. Keogh, J. Lin, S. H. Lee, and H. V. Herle, Finding the most unusual time series
subsequence: algorithms and applications, Knowledge and Information Systems,
11(1):127, 2006.
[3] V. J. Hodge and J. Austin, A survey of outlier detection methodologies, Artif. Intell.
Rev., Vol. 22, Mo. 2, pp. 5126, Oct. 2004.
[4] V. Barnett and T. Lewis, Outliers in statistical data, 3rd ed. New Y ork: Wiley, 1994,
pp. 397415.
[5] J. Chen, W. Li, A. Lau, J. Cao, K. Wang, Automated load curve data cleansing in
power systems, IEEE PES Transactions on Smart Grid, Vol. 1, No. 2, September
2010.
[6] A. Fallon and C. Spade, Detection and accommodation of out-
liers in normally distributed data sets/ [Online]. Available:
http://www.cee.vt.edu/ewr/environmental/teach/smprimer/outlier/outlier.html
[7] A. J. Fox, Outliers in time series, J. Roy. Stat. Soc. B, Methodological, vol. 34, pp.
350363, 1972.
[8] G. M. Ljung, On outlier detection in time series, J. Roy. Stat. Soc. B, Methodological,
Vol. 55, pp. 559567, 1993.
[9] B. Abraham and N. Yatawara, A score test for detection of time series outliers, J. T ime
Ser. Anal., Vol. 9, pp. 109119, 1988.
56
BIBLIOGRAPHY 57
[10] B. Abraham and A. Chuang, Outlier detection and time series modeling,
Technometrics, Vol. 31, pp. 241248, 1989.
[11] W. Schmid, The multiple outlier problems in time series analysis, Australian. J.Stat.,
Vol. 28, pp. 400413, 1986.
[12] I. Chang and G. C. Tiao, Estimation of time series parameters in the presence of
outliers, Technometrics, 30, 193-204, 1988.
[13] R. S. Tsay, Outlier, level shifts, and variance changes in time series, Journal of
Forecasting, 7, 1-20, 1988.
[14] SAS/ETS 9.22 users guide [Online]. Available: http://support.sas.com/documentation
/cdl/en/etsug/60372/PDF/default/etsug.pdf
[15] E. M. Knorr, and R. T. Ng, Algorithms for mining distance-based outliers in large
datasets, In V LDB, 1998.
[16] S. Ramaswamy, R. Rastogi, and S. Kyuseok, Efficient algorithms for mining outliers
from large data sets, Proc. ACMSIGMOD Int. Conf. on Management of Data,
2000.
[17] F. Angiulli and C. Pizzuti, Fast outlier detection in high dimensional spaces, In
Proceedings of the Sixth European Conference on the Principles of Data Mining
and Knowledge Discovery, pages 15-26, 2002.
[18] D. Gasgupta and S. Forrest, Novelty detection in time series data using ideas from im-
munology, In Proceedings of the International Conference on Intelligent Systems,
pp. 82-87, 1996.
[19] J. Ma and S. Perkins, Online novelty detection on temporal sequences, In Proceedings
of the Ninth ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pp. 157-166, 2003.
[20] L. Wei, N. Kumar, V. Lolla, E. Keogh, S. Lonardi and C. Ratanamahatana,
Assumption-free anomaly detection in time series, In SSDBM 2005: Proceedings
of the 17th International Conference on Scientific and Statistical Database
Management, pp. 237-240, 2005.
BIBLIOGRAPHY 58
[21] W. Hardle, Applied nonparametric regression. Cambridge University Press, 1990.
[22] M. Vlachos, G. Kollios, and D. Gunopulos, Discovering similar multidimensional tra-
jectories, ICDE 2002.
[23] D. M. Bourg, Excel scientific and engineering cookbook. OREILLY. 2006.
[24] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, Dimensionality reduction
for fast similarity search in large time series databases, Knowledge and Information
Systems 3(3), 2000.
[25] J. Laurikkala, M. Juhola, and E. Kentala, Informal identification of outliers in medical
data, In Fifth International Workshop on Intelligent Data Analysis in Medicine
and Pharmacology IDAMAP-2000 Berlin, 22 August. Organized as a workshop of the
14th European Conference on Artificial Intelligence ECAI-2000.
[26] P. Chan and M. Mahoney, Modeling multiple time series for anomaly detection, In
ICDM, 2005.
[27] M.G. Elfeky, W.G. Aref and A.K. Elmagramid: Periodicity detection in time series
databases. In TKDE, 2005.
[28] M.G. Elfeky, W.G. Aref and A.K. Elmagarmid: WARP: time warping for periodicity
detection. In ICDM, 2005.
[29] P. Indyk, N. Koudas and S. Muthukrishan: Identifying representative trends in massive
time series data sets using sketches. In VLDB, 2000.
[30] S. Ma and J. Hellerstein: Mining partially periodic event patterns with unknown peri-
ods. In ICDE, 2001.
[31] S. Papadimitriou, A. Brockwell and C. Faloutsos.: Adaptive, hands-off stream mining.
In VLDB, 2003.
[32] C. Berberidis, W. Aref, M. Atallah, I. Vlahavas and A. Elmagarmid: Multiple and
partial periodicity mining in time series databases. In ECAI, 2002.
[33] J. Yang, W. Wang and P. Yu: InfoMiner +: mining partial periodic patterns with gap
penalties, In ICDM, 2002.
BIBLIOGRAPHY 59
[34] A. Weigend and N. Gershenfeld: Time series prediction: forecasting the future and
understanding the past. Addison-Wesley, Reading, Massachusetts, 1994.
[35] D. Berndt and J. Clifford: Using dynamic time warping to find patterns in time series.
In KDD, 1994.
[36] F. Rasheed, and R. Alhajj: STNR: A suffix tree based noise resilient algorithm for
periodicity detection in time series databases. In Appl Intell, 2010.
[37] W. Li: Risk assessment of power systems: models, methods, and applications. IEEE
PressWiley, 2005.
[38] M. Ahdesmki, H. Lhdesmki, R. Pearson, H. Huttunen and O. Yli-Harja: Robust detec-
tion of periodic time series measured from biological systems. In BMC Bioinformatics,
2005.
[39] E.F. Glynn, J. Chen and A.R. Mushegian, Detecting periodic patterns in unevenly
spaced gene expression time series using LombScargle periodograms. In Bioinformatics,
2006.
[40] J. W. Taylor. An evaluation of methods for very short-term load forecasting, using
minute-by-minute british data. International Journal of Forecasting, 2008, Vol. 24, pp.
645-658.
[41] R. Weron. Modeling and forecasting electricity loads and prices - a statistical approach.
John Wiley & Sons, 2006.
[42] Smart Grid. Available via http://www.oe.energy.gov/smartgrid.htm.
[43] Hardle et al. Nonparametric and semiparametric models. Springer, 2004.
[44] B.W. Silverman. Density estimation for statistics and data analysis, Chapman & Hall,
1986.
[45] J. Durbin and S.J. Koopman. Time Series Analysis by State Space Methods, Oxford
University Press, Oxford, UK, 2001.
[46] V. Dordonnat, State-Space Modelling for High Frequency Data, Three Applications to
French National Electricity Load, Ph.D thesis, VU University Amsterdam, 2009
BIBLIOGRAPHY 60
[47] V. Dordonnat, S.J. Koopman, M. Ooms, A. Dessertaine, J. Collet, An Hourly Periodic
State Space Model for Modelling French National Electricity Load, Tinbergen Institute,
Paper Number 08-008/4.
[48] D. Ruppert, S. J. Sheather, and M. P.Wand, An effective bandwidth selector for local
least squares regression, Journal of the American Statistical Association, vol. 90, pp.
2571270, 1995.
[49] S. Salvador, P. Chan, and J. Brodie. Learning states and rules for time series anomaly
detection. In Proceedings of the seventeenth international Florida artificial intelligence
research society conference, 2004.