time series & machine learning - horizon 2020 ireland · supervised vs unsupervised learning...
Post on 24-Jul-2020
5 Views
Preview:
TRANSCRIPT
TIME SERIES & MACHINE LEARNING
PHD Luis Miralles
INDEX:
1.- What are Time Series and its relation with Machine Learning?
2.- Time series supervised learning: Human activity recognition
3.- Time series unsupervised learning: Similarity measures
4.- Time series clustering
2
1.- What are Time Series and its relation with
Machine Learning?
What are time series?
A series of values of a quantity obtained at successive times, often with equal
intervals between them.
4
What are Time Series?
5
Sampling techniques
Sampling is the process of transforming continuous data into discrete data. There
are basically two ways of sampling: one based on time (Riemann) and the other
based on the behaviour of the signals (Lebesgue).
6
Time series sampling techniques
- Riemann sampling: Captures samples depending on the time.
- Lebesgue sampling: Captures samples depending on the variation of the
output signal.
7
Riemann sampling
It is also known as Riemann sampling and it captures the information from the
continuous-signals at an equidistant time intervals (every second, every
minute,...). It is very simple to implement and that is why it has been used for
many years.
8
Lebesgue sampling
● It is more effective but it is also more difficult to implement.
● Advantages: Increasing the battery life of the sensors, reducing network traffic
by decreasing the amount of information transferred and using fewer computer
resources.
9
What is the relationship between Machine Learning
and Time Series?
10
What is Machine Learning?
● Arthur Samuel (1959). Machine Learning: Field of study that gives computers
the ability to learn without being explicitly scheduled.
● Semi-automated extraction of knowledge from data
● Extract knowledge and insight from data. The computer extracts some
information from data using algorithms.
11
Supervised vs Unsupervised learning
Machine Learning
Techniques
Supervised Learning
Unsupervised
Learning
Random Forest, Support Vector
Machine, Neural Networks, Naïve
Bayesian, K-nearest neighbors.
Hierarchical clustering, k-Means
clustering
12
Supervised learning
Supervised Learning
Classification
Regression
The classification algorithms predict a
class/label based on the inputs.
The regression algorithms predict a quantity
based on the inputs.
13
Unsupervised learning
Unsupervised
Learning
k-Means
Hierarchical clustering
14
Reinforcement learning
15
Cold start versus Warm start
16
Basic Machine Learning Steps
17
Step I:
Extract information
Step II:
Preprocessing: Clean
data, Missing values,
Feature selection
Step III:
Build model, Optimize
model
Step IV:
Implement data
Step III:
Plot results
Confusion matrix performance
18
CM Binary class CM Multi-class
Cross validation technique to optimise models
19
Machine Learning most used metrics
20
2.- Time series classification and Human
activity recognition
22
Time series Regression
Time series Regression
23
24
25
Time series classification
Overview of the HAR process
26
How to apply a time window to raw data
27
1 1000
2000
1
12
Methodology for HAR systems
28
Why Deep Learning is so famous?
29
IMU: Inertial measurement unit
An inertial measurement unit (IMU) is an electronic
device that measures and reports a body's specific
force, angular rate, and sometimes the magnetic field
surrounding the body, using a combination of
accelerometers and gyroscopes, sometimes also
magnetometers.
IMUs are typically used to manoeuvre aircraft,
including unmanned aerial vehicles (UAVs), among
many others, and spacecraft, including satellites and
landers.
30
Feature selection saves time and improves accuracy
31
Selecting features individually Selecting features by subsets
Filter vs Wrapper
PCA: Principal component analysis
32
33
Feature selection in HAR
Feature extraction from the window
34
Time
Frequency
Confusion matrix
35
HAR: Accelerometer, magnetometer and Gyroscope
▪ The Accelerometer measures total acceleration on the vehicle, including the
static acceleration from gravity it would experience even when its not moving.
▪ The magnetometer measures the magnetic field around the robot, including the
static magnetic field pointing approximately north caused by the earth.
▪ The Gyroscope measures your instantaneous angular momentum around each
axis, basically how fast its rotating.
36
Human activity recognition steps
37
Evaluation
38
Number of samples per activity
40
Number of samples per user
41
Values per axis for the walking activity
42
3.- Time series similarity measures
Euclidean distance between ts1 and ts3 is smaller than ED between ts2 and ts3.
Euclidean distance is not the best similarity measure
44
Similarity distance metrics I
● Euclidean distance
● Manhattan distance
● Minkowski distance
45
Similarity distance metrics II
•Correlation distance
• Cov(X,Y) stands for covariance of X and Y
• degree to which two different variables are related
• Var(X) stands for variance of X
• measurement of a sample differ from their mean
46
Similarity distance metrics III
• Variance
• Covariance
• Positive covariance
• two variables vary in the same way
• Negative covariance
• one variable might increase when the other decreases
• Covariance is only suitable for heterogeneous pairs 47
TS similarity measures ranking
Some papers focused on comparing the best dissimilarity
measures for time series. A remarkable one is that of Giusti,
R., & Batista, G. E. (2013) and its results is shown in Figure
1. Along this paper, all this measures are tested with a big
set of datasets.
In order to rank the best similarity measures 1-NN (Nearest
Neighbor) classification is used. 1-NN is a simple instance-
based classifier that depends heavily on the
similarity/dissimilarity measure employed. It is also
understood to be extremely competitive with more robust,
complex classification models.
Figure 1: Best Dissimilarity measures for Time series
49
TRAINING SET
TESTING SET
K-NN algorithm
How can we calculate how good a new TSM is?
Time series most used similarity measures
● Most used similarity measures are Euclidean, DTW, Pearson, Spearman, and Cosine.
Minkowski, Mahalanobis and Manhattan distance are also very well-known measures.
● Some other interesting methods can be LCSS (Longest Common SubSequence) and the
EDR techniques (Edit distance with Real Penalty and Edit Distance on Real sequences).
● EDR has been shown to be robust in the presence of noise, time shifts, and data scaling
Morse, M. D., & Patel, J. M. (2007, June).
● SAX is a novel algorithm which due to its outstanding performance can be interesting to
test. Lin, J. et al. (2007).
50
1.- Shape-based distances 2.- Feature-based distances
1.1.- Lock-step measures (Partial) Autocorrelation based
Lp distances Fourier Decomposition based
DISSIM TQuest
Short Time Series Distance (STS) Wavelet Decomposition based
Cross-correlation based (Integrated) Periodogram based
Pearson correlation based SAX representation based
CORT distance Spectral Density-based
1.2.- Elastic measures 3.- Structure-based distances
Frechet distance 3.1- Model-based
Dynamic Time Warping (DTW) Piccolo distance
Keogh_LB for DTW Maharaj distance
Edit Distance for Real Sequences (EDR) Cepstral based distances
Edit Distance with Real Penalty (ERP) 3.2.- Compression based
Longest Common Subsequence (LCSS) Compression based distances
Complexity invariant distance
Permutation distribution based distance
4.- Prediction based
Non-Parametric Forecast based
Time series similarity measures classification
51
Dynamic Time Warping
Sakoe, Hiroaki, and Seibi Chiba. "Dynamic programming algorithm optimization for spoken word recognition." IEEE
transactions on acoustics, speech, and signal processing 26.1 (1978): 43-49. (5000 citations)
Dynamic time warping (DTW) is a well-known technique to find an optimal alignment
between two given (time-dependent) sequences under certain restrictions
52
What is DTW versus Euclidean Distance?
53
Dynamic Time Warping
Advantages
● The DTW distance takes into account (part of) the local temporal
correlations.
● No system identification step is needed.
● Lower bounds on the distance are reasonably efficient.
● This measure allows to calculate distances between time series
of different lengths.
Disadvantages
● There is no clear link between this distance measure and the
generating system.
● The DTW distance as such is expensive to calculate.
● This measure does not take the input into account.
54
Symbolic Aggregate Approximation (SAX)
● SAX is the first symbolic representation for time series that allows for
dimensionality reduction and indexing with a lower-bounding distance measure.
In classic data mining tasks such as clustering, classification, index, etc.
● SAX is as good as well-known representations such as Discrete Wavelet
Transform (DWT) and Discrete Fourier Transform (DFT), while requiring less
storage space.
55
Symbolic Aggregate Approximation (SAX)
56
4.- Clustering approach
What is clustering?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
58
• A clustering problem can be viewed as unsupervised classification.
• Clustering is appropriate when there is no a priori knowledge about the data.
• Finding the class labels and the number of classes directly from the data (in contrast to classification).
• More informally, finding natural groupings among objects.
When do we have to apply clustering?
59
Organizing data into classes such that there is:● High intra-class similarity● Low inter-class similarity
Intraclass and interclass similarity
60
Types of clustering:
61
School EmployeesSimpson's Family MalesFemales
Clustering is subjective
What is a natural grouping among these objects?
62
But at the same time... we can detect similarity.
63
Two Types of Clustering
Hierarchical
• Partitional algorithms: Construct various partitions and then evaluate them by some criterion (we will see an example called BIRCH)• Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion.
Partitional
64
Hierarchical Clustering
• Produces a set of nested clusters organized as a hierarchical tree
• Can be visualized as a dendrogram, which is a tree-like diagram that records the sequences of merges or splits.
65
Hierarchical Clustering
66
Hierarchical clustering
• Advantages• They have good visualization
• Gives similarity distances between clusters
•Disadvantages• Not great performance
67
Partitional algorithms (K-means):
The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.
68
k-means
69
Algorithm k-means
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by assigning them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the memberships found above are correct.
5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.
70
Comments on the K-Means Method
Pros:• Relatively efficient• Often terminates at a local optimum.
Cons:• No applicable to categorical data• Need to specify the number of clusters• Unable to handle noisy data and outliers
71
72
Time series k-means
Time series dendogram
73
TIME SERIES & MACHINE LEARNING
PHD Luis Miralles
top related