multistage neural networks for pattern recognition829352/fulltext01.pdfmultistage neural networks...
TRANSCRIPT
Master ThesisComputer ScienceThesis no: MSE-2009:34May 2009
Multistage neural networks for
pattern recognition
Maciej Zieba
School of EngineeringBlekinge Institute of TechnologyBox 520SE – 372 25 RonnebySweden
This thesis is submitted to the School of Engineering at Blekinge Institute of Tech-
nology in partial fulfillment of the requirements for the degree of Master of Science in
Computer Science. The thesis is equivalent to 24 weeks of full time studies.
Contact Information:
Author:
Maciej Zieba
E-mail: [email protected]
University advisors:
Jerzy Swiatek, Professor
Dept. of Computer Science and Management
Wroc law University of Technology, Poland
Ludwik Kuzniarz, Doctor
Dept. of Software Engineering and Computer Science
Blekinge Institute of Technology, Sweden
School of Engineering Internet : www.bth.se/tek
Blekinge Institute of Technology Phone : +46 457 38 50 00
Box 520 Fax : + 46 457 271 25
SE – 372 25 Ronneby
Sweden
Abstract
In this work the concept of multistage neural net-
works is going to be presented. The possibility of
using this type of structure for pattern recogni-
tion would be discussed and examined with cho-
sen problem from field area. The results of ex-
periment would be confront with other possible
methods used for the problem.
Keywords: two-stage neural network, multi-
stage neural network, multistage pattern recogni-
tion, two-stage pattern recognition, writer identi-
fication ,iconic gesture recognition, online recog-
nition
Contents
Abstract iii
Symbols and abbreviations viii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Expected outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Structure of MPR model 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Two-stage pattern recognition model . . . . . . . . . . . . . . . . . 7
2.3 MPR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Introduction to NN 9
3.1 Model of neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Activation function - survey . . . . . . . . . . . . . . . . . . 10
3.1.2 Model of perceptron . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Multilayer Feedforward Network (MFN) . . . . . . . . . . . . . . . 12
3.2.1 Taxonomy of Neural Networks . . . . . . . . . . . . . . . . . 12
3.2.2 Model of MFN . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Neural networks for pattern recognition . . . . . . . . . . . . 13
4 MNN model 15
4.1 Two-stage neural network . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 Two-stage identification concept . . . . . . . . . . . . . . . . 15
iv
CONTENTS v
4.1.2 Two-stage neuron . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 Binding functions . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.4 Model of two-stage MFN . . . . . . . . . . . . . . . . . . . . 20
4.1.5 Two-stage MFN for pattern recognition . . . . . . . . . . . . 22
4.2 Multistage generalization for two-stage neural networks . . . . . . . 24
5 Introduction to estimation of MNN parameters 26
5.1 Estimation parameters for two-stage identification . . . . . . . . . . 26
5.2 Parameter estimation for neural networks - learning methods . . . . 27
5.2.1 Learning taxonomy . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.2 Backpropagation algorithm . . . . . . . . . . . . . . . . . . . 28
6 MNN learning method 32
6.1 Two-stage MFN learning process . . . . . . . . . . . . . . . . . . . . 32
6.2 Multistage MFN learning process . . . . . . . . . . . . . . . . . . . 33
6.3 Parameter estimation for MFNs used for two-stage PR problem . . 34
7 Experiment 36
7.1 Experiment description . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.1.1 Description of considered pattern recognition problem . . . . 36
7.1.2 Dataset description . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Features selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2.1 g-48 features . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2.2 Cosine Representation . . . . . . . . . . . . . . . . . . . . . 39
7.3 Methods used for the experiment . . . . . . . . . . . . . . . . . . . 40
7.3.1 Method 1 - MFN network . . . . . . . . . . . . . . . . . . . 40
7.3.2 Method 2 - Two-stage pattern recognition system with two
MLF networks on each stage . . . . . . . . . . . . . . . . . . 41
7.3.3 Method 3 - Two-stage MFN for PR . . . . . . . . . . . . . 42
7.4 Testing and validation details . . . . . . . . . . . . . . . . . . . . . 43
7.5 Results of experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.6 Method 4 - Two-stage smart switching . . . . . . . . . . . . . . . 45
8 Conclusions 46
9 Future Works 49
Bibliography 50
CONTENTS vi
A Features description 53
List of Figures
2.1 Pattern recognition model . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Two-stage pattern recognition model . . . . . . . . . . . . . . . . . . 7
3.1 Simple neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Bipolar sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Unipolar sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Simple perceptron model . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Model of Multilayer Feedforward Network . . . . . . . . . . . . . . . 13
3.6 Pattern recognition model with neural networks . . . . . . . . . . . . 14
4.1 Two-stage identification model . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Two-stage neuron model . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Two-stage neuron model with direct dependency between stages . . . 19
4.4 Two-stage Multilayer Feedforward Network model . . . . . . . . . . . 22
4.5 Two-stage pattern recognition procedure using neural networks . . . . 23
4.6 Two-stage neural network used for pattern recognition . . . . . . . . 23
7.1 The set of plotted icons . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Example of usage Cosine Representation . . . . . . . . . . . . . . . . 39
7.3 Results of first experiment . . . . . . . . . . . . . . . . . . . . . . . . 43
7.4 Wrongly recognized icons . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.5 Results of second experiment . . . . . . . . . . . . . . . . . . . . . . 45
vii
Symbols and abbreviations
¯x - vector of features
Ψ - algorithm of recognition
Ψ - algorithm of identification
j - result of recognition
n−m− k − p - configuration of NN
W = [wi,j] - matrix of weights
x - input vector
y - output vector
f , g, h - activation functions from first, second and third layer of the
NN
γ - activation function parameter
α - stepsize
u - input vector for activation functions
o - output vector on hidden layers
F - function describing object
F - binding function
reshape(·) - procedure of reshaping matrix to vector
NN - Neural Network
MNN - Multistage Neural Network
PR - Pattern Recognition
MPR - Multistage Pattern Recognition
MFN - Multilayer Feedforward Network
The notation used in Chapter 7: Experiment is taken from items [29] and [8].
viii
Chapter 1
Introduction
1.1 Background
Pattern Recognition (PR) is one of the most important disciplines in machine
learning. The aim of PR is to classify object to one of the K possible classes. The
problem of PR is some time very sophisticated and the choice of classification method
depends on knowledge about the object. In same cases instant recognition of the
object is difficult or even impossible. For example the results of preliminary examina-
tion give no final recognition of the disease, but indicate further examination of the
patient and after couple of sub-recognizing stages final diagnose can be given. The
result of recognition of each stage indicates in which way the object is going to be
recognized on the next stage. The algorithm of recognition process depends not only
on the current input, but also on the results of recognition on directly related stages.
This problem can be found as Multistage Pattern Recognition (MPR) [14] .
Neural networks, one of the most common tools in Artificial Intelligence (AI),
are widely used for PR problems [16, 21, 31]. The idea of multistage neural networks
(MNN) is based on multistage identification concept. The parameter values of neurons
on the n stage depend on the output values of the neurons which creates the n + 1
stage. This kind of structure can be used for MPR problem. Each stage of recognition
is combined with proper stage of the MNN. The final result of recognition process is
provided by the output of the first stage of MNN.
1
CHAPTER 1. INTRODUCTION 2
1.2 Motivation and goal
Many PR problems can be easily decomposed to related subproblems. For in-
stance, The style of writing can be recognized first instead of direct handwritten
characters recognition. On the other hand, it can be much easier to recognize the
character first, than identify the writer in way determined by the previous recognition
result. There are plenty of direct methods data mining methods which can be com-
bined to be used for MPR problems, but there are only few methods directly purposed
for it. The model of MNN can be used for this case.
The main goal of the thesis is to define, design and implement the MNN and use
it for chosen pattern recognition problem. The results of the experiment should be
compared to results of other methods used for the problem.
1.3 Related works
There are no visible hints of MNN in literature. Books describing various models
of NN like [13, 15, 16, 21] were analysed to find some information about MNN, but
without success. The concept of two-stage NN were presented in [3, 18] but it was
different to thesis idea.
The thesis model of MNN was inspired by the multistage identification concept.
Item [25] is a monograph which describes in detail the problem of multistage identi-
fication. The author of the book concentrates on basic aspects of the problems like
parameter estimation or choice of the best model. There are also couple of practical
examples. Despite the fact, that the item was published in 1987 the concept is very
actual, what proves related with the topic PhD thesis [9], which was defended in 2005.
The introduction to NN is presented in [21]. The book is used as a course book for
BTH course - Neural Networks. It is very interesting compendium which describes the
NN from biological background to sophisticated and complex network models. Very
useful for the topic of the thesis are also lecture notes related with the mentioned
course [7]. The notation used in this item is also used in the thesis. Very interesting
for the topic are also [15, 16]. The books describes models NN used for solving prac-
tical problems.
CHAPTER 1. INTRODUCTION 3
The problem of writer identification were chosen as PR problem. Related prob-
lems are widely described in articles: [3, 19, 23, 24, 26]. The mentioned items are
strongly related with writer identification based on analysis of text samples. The
multistage concept is presented in the thesis, so the literature for handwritten ges-
tures and characters recognition to create two-stage writer identification system were
studied [1, 8, 11, 17, 20, 29, 30]. There are couple of articles related to the problem of
iconic gesture recognition for crisis management [20, 29, 30]. The item [29] is tech-
nical report, in which the way of calculating features from trajectory based data of
handwritten signs is presented. The other two publications presents the results of
recognition using methods like NN or Support Vector Machines. The experiments
described in mentioned publications were made using fully accessible dataset which
can be found on [27]. Very interesting way of presenting the features of gesture or
character with Cosine Representation is presented in [8].
Very useful for the thesis experiment is [31]. It is a course book in which interesting
chapter about validation and testing methods can be found.
1.4 Expected outcomes
Suggestion for definition of MNN is expected as an outcome of the thesis. Effective
method for MNN parameter estimation should be presented. The results of exper-
iment with using chosen model of MNN for the MPR problem would be presented,
analysed and compared with other neural network based methods which can be used
for the problem.
1.5 Research questions
The thesis addresses the following research questions:
Research Question 1: How the MNN can be defined ?
Research Question 2: How to estimate parameters of MNN ?
Research Question 3: How to design the MNN for PR ?
CHAPTER 1. INTRODUCTION 4
Research Question 4: How MNN perform for chosen problem of PR comparing
to other methods ?
Research Question 5: How to improve MNN method for pattern recognition
to increase correctness of recognition ?
1.6 Thesis outline
Chapter 2: Structure of MPR model. In this chapter MPR model is going
to be presented and described.
Chapter 3: Introduction to NN. The basic definitions and problems related
with the neural networks are going to be presented in this section. The mechanism of
using NN for PR is going to be described. The notation used in this chapter is very
important for better understanding the thesis content.
Chapter 4: MNN model . The definition of two-stage neuron is going to be
presented in this part. The two-stage multilayer feedforward network would be also
defined in the chapter. The relation between classical two-stage recognition system
and the two-stage NN for PR is going to be described.
Chapter 5: Introduction to estimation of MNN parameters In this chap-
ter the mechanism of parameter estimation for two-stage identification model would
be described. The idea of backpropagation algorithm is going to be presented also in
this part.
Chapter 6: MNN learning method. In this section mechanism of learning
two-stage NN is going to be presented.
Chapter 7: Experiment . In this chapter experiment of using two-stage neural
network for real PR problem is going to be presented.
Chapter 8: Conclusions . In this part an effort to answer the research ques-
tion is going to be made. Additional conclusions after empirical verification of MNN
should be added.
CHAPTER 1. INTRODUCTION 5
Chapter 9: Future Work . In this section possible next steps related with the
topic of thesis are going to be described.
Chapter 2
Structure of MPR model
2.1 Introduction
The goal of the PR problems is to classify the considered object to one of K pos-
sible classes (Figure 2.1). The object is described by the previously selected features
¯x from X, the universe of possible feature values. The object is recognised using
algorithm Ψ. The vector ¯x is an input of the algorithm and the output is j value,
which represents the result of recognition [14] .
Figure 2.1: Pattern recognition process.
For some cases it is ineffective, difficult or even impossible to recognize the ob-
jects directly. The way of recognizing the object may depend on special conditions.
This conditions could be the results of recognition problem related to the object and
different to considered one. For this case the used algorithm depends not only on the
vector of features, but also on results of sub-recognition problems which takes as an
input other vectors of features related to the object. The problem is known as MPR
problem.
6
CHAPTER 2. STRUCTURE OF MPR MODEL 7
2.2 Two-stage pattern recognition model
The special case of MPR is two-stage PR model presented on the Figure 2.2. Ob-
jects considered for PR problem are represented by the features from X universe. On
each stage of recognition process the vectors of features are selected. The process of
selection on the second stage is independent and as a result the vector of the features
¯x(2) is gained. The result of the recognition on the second stage (j2) is achieved using
algorithm Ψ2. The process of selection the features on the first stage depends on the
result of classification on the second stage, so having j2 the vector ¯x(1)(j2) could be
gained. To achieve the desired result of recognition j1 algorithm Ψ1 is used. The
result of the algorithm also depends on recognition process on the second stage.
Figure 2.2: Two-stage pattern recognition model (based on [14]).
The two-stage model can be described using equations:
j1 = Ψ1(¯x(1)(j2), j2) (2.1)
j2 = Ψ2(¯x(2)) (2.2)
For further deliberation in the thesis simpler model is going to be considered:
1. The selection process on all stages is independent. It means that ¯x(1)(j2) = ¯x(1).
2. Only the result of final recognition on the first stage is considered for the prob-
lem. It means that j2 need not to be obtained at the end of recognition process.
CHAPTER 2. STRUCTURE OF MPR MODEL 8
2.3 MPR model
For MPR model each recognition process depends on results of recognition on
higher stages. The MPR model with N stages is going to be considered. On each stage
of the model the vectors of features ¯x(1),. . . ,¯x(N) are given as an input. Independent
features selection is assumed as it was mentioned for two-stage model. The MPR
model can be described using equations:
jn = Ψn(¯x(n), jn+1, . . . , jN) (2.3)
jN = ΨN(¯x(N)) (2.4)
Where n = 1,. . . ,N − 1. Recognition algorithms on each stage are denoted as Ψ1,
Ψ2, . . . , ΨN .
The special case of MPR model is cascade model where the result of recognition
of n-th stage besides ¯xn value depends on only on jn+1. For this model equations 2.3
are replaced by:
jn = Ψn(¯x(n), jn+1) (2.5)
There are plenty of algorithms used for PR. The most common are methods,
which uses supervised kind of learning. They can be the rules or decision trees [31]
for nominal and real sort of features. If the features are only real, algorithms like
k-NN, support vector machines (SVM) [31], or neural networks (NN) could be used.
For the thesis only algorithms based on NN are considered.
Chapter 3
Introduction to NN
3.1 Model of neuron
The model of simple neuron is well defined in literature [7,10,13,15,16,21] and is
shown in Figure 3.1. Model of the neuron could be described by the equation:
y = f(u) = f(n∑j=1
wjxj) (3.1)
The input of network is denoted as vector x = [x1, . . . , xn]. Weights w1,. . . ,wn
are parameters of the neuron which are being estimated while training. Function
f(u) is call activation function. In some cases activation function depends also on
parameter value: f(u, γ). The function parameter in most cases is chosen arbitrary
but literature provides solutions for some types of activation function to find optimal
values of parameters while training [10,12,32].
Figure 3.1: Simple neuron without bias term.
9
CHAPTER 3. INTRODUCTION TO NN 10
3.1.1 Activation function - survey
There are usually three types of activation function used in neurons:
1. Linear function.
2. Bipolar sigmoid function.
3. Unipolar sigmoid function.
Linear function as a activation function does not contain any parameters. It could
be easily proved for single neuron with N inputs and linear activation function (
f(u) = au ):
f(N∑j=1
wjxj) = a
N∑j=1
wjxj =N∑j=1
awjxj =N∑j=1
w′
jxj (3.2)
In bipolar sigmoid function the parameter γ called slope is used. The function,
called also tangent hyperbolic is described by the equation:
f(u) = tangh(γu) =1− e−γu
1 + e−γu(3.3)
Figure 3.2: Plot of the bipolar sigmoid function for different γ values.
The plot of the bipolar sigmoid function was presented on the Figure 3.2. The
possible values for the function are between -1 to 1. When values between 0 and 1
are suspected on the output then the unipolar sigmoid function could be used (Figure
3.3). The equation for this function is presented below:
f(u) = tangh(γu) =1
1 + e−γu(3.4)
CHAPTER 3. INTRODUCTION TO NN 11
Figure 3.3: Plot of the unipolar sigmoid function for different γ values.
In some cases the parameter γ = 1 is chosen arbitrary. However, in some cases
improperly chosen γ value could destroy results of well projected neural network.
3.1.2 Model of perceptron
There is also possibility that the output of the neuron can take boolean values.
For this case the activation function is signum function. This model is called the
perceptron (Figure 3.4).
Figure 3.4: Simple perceptron model.
The θ parameter is called shifting or threshold value. The output y takes the
values from {0, 1}, but vectors x, w ∈ <n.
Perceptrons are widely used for separation linearly separable classes.
CHAPTER 3. INTRODUCTION TO NN 12
3.2 Multilayer Feedforward Network (MFN)
3.2.1 Taxonomy of Neural Networks
The following taxonomy can presented according to the literature [2] :
1. Feedforward networks
(a) Single-layer perceptron.
(b) Multilayer perceptron (MLP).
(c) Radial Basis Functions (RBF) nets.
2. Recurrent (feedback) networks.
(a) Competitive networks.
(b) Kohonen’s self-organizing nets.
(c) Hopfield networks.
(d) Adaptive Resonance Theory (ART) models.
Feedforward networks are the models of the networks without loops and recurrent
networks contain feedback connection between output and input of network.
There is also ambiguity in taxonomy related to MLP model. In literature [8, 10,
20, 30] the MLP model is defined as the network consisted of neurons with default
activation functions, not only of perceptrons. For thesis instead of MLP the Multilayer
Feedforward Network (MFN) name is going to be used [7, 21]. The (RBS) nets are
going to be categorized as Modular Neural Networks. In the paper only MFN models
are going to be considered as the components of MNN, however it is interesting to
analyze the other types of networks for this case in the future.
3.2.2 Model of MFN
The model of MFN is presented in Figure 3.5. There is scientifically proved that
3 layer networks are sufficient for solving most desired problems under condition,
that sufficient number of neurons is used [10, 22]. The configuration of the network
is n − m − k − p, what means that first, second and third layer consists of m, k,
p neurons respectively. Functions f , g, h are activation functions of neurons in the
layers. Weights for each layer are described by matrices W [1] = [w[1]i,j ], W
[2] = [w[2]i,j ],
CHAPTER 3. INTRODUCTION TO NN 13
W [3] = [w[3]i,j ]. The input vector is denoted as n+1 size vector x = [x1, x2, . . . , xn, xn+1]
which includes the bias term xn+1 = 1. The i-th output value of the first layer (o[1]i )
can be calculated in following way:
o[1]i = f(
n+1∑j=1
w[1]i,jxj) (3.5)
The input vector for the next layer o[1] is constructed using output values o[1]i and
extended with bias term o[1]m+1 = 1. The output values of the second layer o
[2]i can be
gained in similar way:
o[2]i = f(
m+1∑j=1
w[2]i,jo
[1]j ) (3.6)
The process is repeated on the last layer and the output values of the network
yi = o[3]i are equal:
yi = f(k+1∑j=1
w[3]i,jo
[2]j ) (3.7)
Figure 3.5: Model of MFN
The MFN model can be described using matrix equation:
y = h(W [3][g(W [2][f(W [1]xT ), 1]T ), 1]T ) (3.8)
3.2.3 Neural networks for pattern recognition
Neural networks are widely used to solve pattern recognition problems. The recog-
nition process using neural network is presented in Figure 3.6. It is possible to educe
three steps of Ψ algorithm in this case: pre-processing, calculating the neural network
CHAPTER 3. INTRODUCTION TO NN 14
output values, interpreting the network output and mapping to the result of recogni-
tion [5] .
Figure 3.6: Pattern recognition procedure using neural networks.
The pre-processing is the process in which the input vector of features ¯x is trans-
formed to the network input vector x. For instance, ¯x may contain nominal features
so using ¯x as an output directly is impossible. The other example is handwritten
character recognition problem in which each character is represented by the sequence
of coordinates with different length value. All sequences during the pre-processing
must be transformed to vectors of the same length.
In mapping procedure the network output vector y must be interpreted and trans-
formed to j value, which is the result of recognition. For example yj, the member of
the y vector can represent the the confidence that the object is recognized as a member
of j-th class. The class with highest confidence rate is going to be chosen for this case.
The pre-processing procedure strongly depends on vector of features. The mapping
process is more independent because there are universal methods of transferring the
network output to classification result.
Chapter 4
MNN model
4.1 Two-stage neural network
4.1.1 Two-stage identification concept
The idea of two-stage neural network is based on two-stage identification problem
[9,25]. The model of two-stage identification system can be observed in Figure 4.1 .
System can be described using two equations:
y = F1(x1, a1) (4.1)
a1 = F2(x2, a2) (4.2)
Figure 4.1: Two-stage identification model [25].
The object O1 (the object on the first stage) is described by F1 function. The
output y for object on the first stage depends on the input x1 and the parameter a1,
which is the output of the object on the second stage (O2). The object O2 is described
by F2 function which depends on x2 input and parameter a2.
15
CHAPTER 4. MNN MODEL 16
To construct the two-stage identification system, which will be able to solve iden-
tification problem it is necessary to define the model of the function on each stage.
Then, the parameters of the function should be estimated to give the the system op-
portunity to give satisfactory identification results.
The complex network system presented in the thesis combines the two-stage (mul-
tistage) identification concept with MFN model. For this case considered objects are
neural networks. Influence on parameters of one network made by another is observed
in this case. Two types of parameters are going to be considered: weight values and
activation function parameters.
4.1.2 Two-stage neuron
The model of two-stage neuron is going to be presented first. Consider the struc-
ture presented in Figure 4.2. It consists of l + 1 neurons. One neuron, which can be
identify using parameters: vector of weights w(1) = [w(1)1, . . . , w(1)n(1)] , activation
function parameter γ, is the first stage neuron. The constant model of the activation
function f with single, scalar parameter is assumed in this case. All parameters of the
neuron are dependent on the second stage neurons. There are l neurons on the sec-
ond stage which can be identified by weight vectors w(2, i) = [w(2, i)1, . . . , w(2, i)n(2)i],
where i = 1, . . . , l. The activation function parameters were omitted on the second
stage, because the constant values of them were assumed to simplify the notation.
The binding items between stages are functions: F1, . . . , Fn(1)+1. They take as an
input the output values of second stage neurons and give the parameter values of first
stage neuron as an output.
The two-stage neuron can be defined as a structure of simple neurons. One neuron
described by equation:
y(1) = f(γ,
n(1)∑j=1
w(1)jx(1)j) (4.3)
and l neurons described by equations:
y(2)i = g(
n(2)i∑j=1
w(2, i)jx(2, i)j) (4.4)
CHAPTER 4. MNN MODEL 17
Figure 4.2: Two-stage neuron model.
CHAPTER 4. MNN MODEL 18
where i = 1, . . . , l. The relationships between neurons are described by binding
functions:
w(1)j = Fj(y(2)1, . . . , y(2)l) (4.5)
γ = Fn(1)+1(y(2)1, . . . , y(2)l) (4.6)
where j = 1, . . . , n(1).
In the presented two-stage model the activation function parameter γ dependency
were included. There are some trials of tuning some models of activation functions
during network training [10,12,32]. The optimal value of γ parameter can be obtained
during this process, what is necessary to estimate network parameters. However, the
constant, arbitrary chosen by the network designer γ value is going be consider in the
thesis.
4.1.3 Binding functions
The set of binding functions is created by functions Fi : Y (2)1 × · · · × Y (2)l −→V (1)i. Y (2)j is the universe of possible values for y(2)j and Vi is the universe of
possible values for w(1)i.
The most common example of binding function model is the linear dependency
between stages. In this case, the set of binding functions consists of following func-
tions:
Fi(y(2)1, . . . , y(2)l) = a(1,i)y(2)1 + · · ·+ a(l,i)y(2)l (4.7)
where i = 1, . . . , n(1). In matrix notation the set of equations is equivalent to:
w(1) = Ay(2)T (4.8)
where w(1) = [w(1)1, . . . w(1)n(1)], y(1) = [y(2)1, . . . y(2)n(1)] and A = [a(i,j)]. It
is easy to notice that in this case the set of binding functions creates one layer of
linear network. The binding function can be interpreted as a activation function
of the neuron created in this way. For instance second stage neurons can create 1
layer linear neural network. It was proved in [7] that N− layer linear neural network
CHAPTER 4. MNN MODEL 19
Figure 4.3: An example of two-stage neuron model with direct dependency between
stages.
CHAPTER 4. MNN MODEL 20
can be reduced to one layer linear neural network, so creating additional linear layer
using binding functions is pointless. It was mentioned in the thesis that three layered
MFN is sufficient for all considered problems, so again next layer created by binding
functions is useless. The model of two-stage neuron (Figure 4.2) presented in the
thesis is only theoretical and make sense only, if the second stage neurons do not create
the network structure. In practice, there is no sense to create binding functions other
than in way presented in Figure 4.3. It means that:
Fi(y(2)1, . . . , y(2)l) = Fi(y(2)i) = y(2)i (4.9)
The dependency between stages is direct. This type of binding between stages is
going to be considered in this thesis.
4.1.4 Model of two-stage MFN
To define two-stage MFN the following MFNs must be considered first. They can
be described using matrix equations:
y(1) = h(1)(W (1)[3][g(1)(W (1)[2][f(1)(W (1)[1]x(1)T ), 1]T ), 1]T ) (4.10)
y(2) = h(2)(W (2)[3][g(2)(W (2)[2][f(2)(W (2)[1]x(2)T ), 1]T ), 1]T ) (4.11)
where:
x(jstage) - input vector of the jstage stage network.
y(jstage) - output vector of the jstage stage network.
f(jstage), g(jstage), h(jstage) - activation functions from first, second and third layer of NN
situated on jstage stage.
W (jstage)[jlayer] = [w(jstage)
[jlayer]i,j ] is matrix of weights of jlayer layer for NN situated
on jstage stage.
As it could be noticed the constant value of activation function parameter were
assumed. The configuration of networks are n(1) − m(1) − k(1) − p(1) and n(2) −m(2) − k(2) − p(2) respectively. The two-stage MFN is a neural network structure
which consists of simple and two-stage neurons. First stage is composed of first stage
neurons which creates the MFN described by equation 4.10. Second stage neurons
create the last layer of second stage MFN network described by equation 4.11. The
other layers in second stage network consist of simple neurons. For instance, the
CHAPTER 4. MNN MODEL 21
i(1)-th two-stage neuron creating the first layer of first stage network and last layer
of second stage can be described using equation:
o(1)[1]i(1) = f(1)(
n(1)+1∑j(1)=1
w(1)[1]i(1),j(1)x(1)j(1)) (4.12)
and set of equations describing the neurons, which output values are directly the
values of weight on first stage of considered i(1)-th two-stage neuron:
w(1)[1]i(1),1 = h(2)(
∑k(2)+1j(2)=1 w(2)
[3]m(1)(i(1)−1)+1,j(2)o(2)
[2]j(2))
...
w(1)[1]i(1),j(1) = h(2)(
∑k(2)+1j(2)=1 w(2)
[3]m(1)(i(1)−1)+j(1),j(2)o(2)
[2]j(2))
...
w(1)[1]i(1),n(1)+1 = h(2)(
∑k(2)+1j(2)=1 w(2)
[3]m(1)(i(1)−1)+n(1)+1,j(2)o(2)
[2]j(2))
(4.13)
where:
o(jstage)[jlayer]i - output for i-th neuron of jlayer layer of network situated on jstage stage.
x(jstage)j - j − th coordinate of vector x(jstage) .
Defining the two-stage network structure using two-stage neurons is complex and
can be unclear. It is much easier to define the two-stage MFN as a two MFN networks
described by equations 4.10 and 4.11 binded by vector equation:
y(2) = [reshape(W (1)[1]), reshape(W (1)[2]), reshape(W (1)[3])] (4.14)
Function reshape changes m× n size matrix W = [wi,j] to vector of m ∗ n length,
which consists of all members of the matrix:
reshape(W ) = [w1,1w2,1, . . . , wm,1, . . . , w1,jw2,j, . . . , wm,j, . . . , w1,nw2,n, . . . , wm,n]
(4.15)
It is also important to mention, that vector of vectors is interpreted as a simple
vector with all coordinates of included vectors saving indicated order. For instance
vector v = [v1v2], where v1 = [v1,1, . . . , v1,n1 ] and v2 = [v2,1, . . . , v2,n2 ] is simply a vector
v = [v1,1, . . . , v1,n1 , v2,1, . . . , v2,n2 ].
Number of MFN parameters can be very large. In this case it is better to consid-
ered more than one network on the second stage to avoid huge number of outputs of
CHAPTER 4. MNN MODEL 22
Figure 4.4: Two-stage MFN model.
one network. It is also important to discuss the possibility of only partial dependency
between stages. In this case, the simple neurons can occur in first stage network.
However, problem of estimation independent weights is visible here. The independent
parameters should fit to the problem globally, while the dependent parameters lo-
cally, in way determined by the second stage input. The model of two stage network
with partial dependency can be transformed to complete dependency model. The
knowledge related with independent parameters would be accumulated in bias term.
4.1.5 Two-stage MFN for pattern recognition
If the schema of two-stage pattern recognition from Figure 2.2 is taken and it is
extended by taking neural networks as recognition algorithms the structure presented
in Figure 4.5 will be gained. As it was presented in Chapter 2, the algorithm on the
first stage takes as an input the result of recognition on the second stage. The result j2
can indicate what kind of neural network, type of activation function or which values
of parameters are going to be used on the first stage. The main goal of recognition
on the second stage is to ’choose’ the best model of network for the first stage. If the
three-layer feedforward network model with arbitrary chosen activation functions is
CHAPTER 4. MNN MODEL 23
assumed the behaviour of the network on first stage can be changed by manipulating
values contained in weight matrices, which are denoted as W . The values of weights
are being switched by results of recognition on the second stage.
Figure 4.5: Two-stage pattern recognition procedure using neural networks
Typical model of two-stage neural network for pattern recognition problem is pre-
sented in Figure 4.6. In this case there are no two separate algorithms on each stage.
There is only one algorithm Ψ(¯x(1), ¯x(2)) which includes complex, two-stage mecha-
nism similar to two-stage recognition model.
Figure 4.6: Two-stage neural network used for pattern recognition
Influence of second stage structure is bounded by set of possible j2 values for
two-stage pattern recognition model. The model of two-stage neural network enables
variety of possible influence determined by y. For example, vector ¯x(1) can repre-
sent the features characteristic for handwritten character. Vector ¯x(2) can contain
features describing style of writing. Using two-stage pattern recognition model writer
(or group of writers with similar handwriting style) can be identified first using Ψ2
algorithm. Then, the result of recognition j2 determines the choice of weights for the
CHAPTER 4. MNN MODEL 24
first stage network which is the recognition method used in algorithm Ψ1. Finally
the desired j2 recognition value is obtained. That could be the situation in which
the style of writing considered character would be similar to two of possible styles
almost equally. In this case the algorithm Ψ2 would choose one of two similar styles
of writing. The knowledge of similarity to other style will be lost. If two-stage neural
network is used the directly taken network output y could determinate the style of
writing composed of two similar styles. Chosen weight values in this case will enable
to recognize the character written in combined style what was unable in two-stage
pattern recognition process.
Two-stage neural networks do not consider the recognition process on the second
stage. For instance, the sub-universe X(2) ⊂ X can be taken and the clustering
algorithm can be used to gain the possible classes. The first stage network taking
the vector of features ¯x(2) would return the cluster number which would indicate the
proper weights values W .
4.2 Multistage generalization for two-stage neural
networks
The model of two-stage MFN can be extended to multistage MFN model. Multi-
stage MFN consists of N MFNs described with equations:
y(j) = h(j)(W (j)[3][g(j)(W (j)[2][f(j)(W (j)[1]x(j)T ), 1]T ), 1]T ) (4.16)
where j = 1, . . . , N . The binding between stages can be described using N − 1
equations :
y(j) = [reshape(W (j − 1)[1]), reshape(W (j − 1)[2]), reshape(W (j − 1)[3])] (4.17)
where j = 2, . . . , N .
The multilayer MFN can be used for MPR in analogical way as it was presented
for two-stage PR problem. Instead of using algorithm Ψj on each stage of recognition
process, MNN is used, which takes the vector of features related to the stage.
CHAPTER 4. MNN MODEL 25
MFN are objects, which must be described by plenty of parameters. It differs the
MNN from multistage identification, in which the number of parameters describing
the object depends on model of function. Parameter estimation can be unacceptably
time consuming even for a few stages of MNN.
Chapter 5
Introduction to estimation of MNN
parameters
5.1 Estimation parameters for two-stage identifi-
cation
It is necessary to present the learning algorithm procedure of parameters estima-
tion for two-stage identification problem as a introduction to MNN parameters estima-
tion. The model of two-stage identification is presented in Figure 4.1 and is described
by equations 4.1 and 4.2. We assume that models of functions F1, F2 of objects O1
and O2 respectively are known. The goal of identification in this case is to estimate
value of parameter a2, which is parameter of function F2 and only independent for con-
sidered system, having following training set: (x2,1, x1,1,1, y1,1),. . . ,(x2,1, x1,1,N1 , y1,N1)
, . . . , (x2,j, x1,j,1, yj,1),. . . ,(x2,j, x1,j,Nj, yj,Nj
), . . . , (x2,M , x1,M,1, yM,1),. . . ,
(x2,M , x1,M,NM, yM,NM
)
In the two-stage identification system the a1 parameter should be estimated for
each constant input value of x2. It would provide the set of input values x2,j and
corresponding output values a1,j, which are estimated values of parameter on first
stage. The estimation of parameter a1 is made M times for each constant x2,j value
(j = 1, . . . ,M ), using sequence of observed input values x1,j,1, . . . , x1,j,i, . . . , x1,j,Nj
and output values yj,1, . . . , yj,i, . . . , yj,Njon the first stage with supervised algorithm
of identification:
a1,j = Ψ1((x1,j,1, yj,1), . . . , (x1,j,i, yj,i), . . . , (x1,j,Nj, yj,Nj
)) (5.1)
26
CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 27
Each tact provides the new member of training set, which can be used for esti-
mating the parameter on the second stage. The complete sequence of input values
x2,1, . . . , x2,j, . . . , x2,M and output values a1,1, . . . , a1,j, . . . , a1,M enable estimation of
parameter a2 using supervised algorithm of estimation on second stage:
a∗2 = Ψ2((x2,1, a1,1), . . . , (x2,j, a1,j), . . . , (x2,M , a1,M)) (5.2)
The parameter a1 is estimated M times, for each constant value of input x2 on the
second stage. Each time the training set (x1,j,1, yj,1), . . . (x1,j,i, yj,i), . . . (x1,j,i, yj,Nj)
were used for the estimation. Then, the second stage parameter were gained using
training set composed of pairs: (x2,j, a1,j).
Estimation of parameters for multistage identification is analogical as those pre-
sented for two-stage example. The process of parameter estimation, also for multistage
identification model, is in detail described in [25].
Once can noticed, that this kind of estimation method can be easily used for
two-stage networks. The weights on the first stage can be estimated during training
for each input value on the second stage. The training set for second stage network
could be gained and parameter values for this network estimated. Backpropagation
algorithm for MFN is going to be presented next complete survey for MNN learning.
5.2 Parameter estimation for neural networks -
learning methods
5.2.1 Learning taxonomy
Following learning methods taxonomy can be presented for Neural networks [7] :
1. supervised learning.
2. unsupervised learning.
(a) corrective learning.
(b) reinforced learning.
CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 28
For supervised learning the training set includes input and desired output values.
The members of training set are used for parameter estimation. Backpropagation
algorithm for MFN is most common example of supervised learning. There are also
unsupervised learning methods for which only input values are known for estimation.
In this case corrective and reinforced learning division can be mentioned. First kind
of learning methods are related with real valued desired signal outputs and the second
one with boolean values of the output.
5.2.2 Backpropagation algorithm
Backpropagation algorithm is the most typical method for learning MFN. This
method is based on steepest decent method in which the parameter of estimation is
updated iteratively by subtracting the gradient of error multiplied by α parameter.
The MFN presented in Figure 3.5 would be considered for backpropagation learn-
ing. Weights values for each layer - w[jlayer]i,j are going to be iteratively estimated using
the training set (x1, t1), . . . , (xitrain, titrain
), . . . , (xN , tN) with procedure:
w[jlayer](tn+1)i,j = w
[jlayer](tn)i,j − α ∂Eitrain
∂w[jlayer](tn)i,j
= w[jlayer](tn)i,j + ∆w
[jlayer]i,j (5.3)
where:
α - stepsize
w[jlayer](tn)i,j - value of weight w
[jlayer]i,j in tn moment
Eitrainis the error value for training member (xitrain
, titrain):
Eitrain=
1
2
p∑i=1
(ti,itrain− yi,itrain
)2 =1
2
p∑i=1
e2i,itrain
(5.4)
where yitrainis the vector of output values for the input vector xitrain
. The initial
w[jlayer](t0)i,j values are randomly chosen and updating process is stopped, when the er-
ror value achieve stepping criteria.
At the beginning the estimation process is going to be considered on the last layer.
The weights would be updated in following way:
CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 29
∆w[3]i,j = −α∂Eitrain
∂w[3]i,j
= −α∂Eitrain
∂u[3]i
∂u[3]i
∂w[3]i,j
(5.5)
u(1)[3]i can be described as:
u[3]i =
k+1∑j=1
w[3]i,jo
[2]j (5.6)
It is important to mention that o[2] is in this case extended vector, what means
that o(1)[2]k+1 = 1. So it can written:
∂u[3]i
∂w[3]i,j
= o[2]j (5.7)
Next the local error on the third layer would be defined as:
δ[3]i = −∂Eitrain
∂u[3]i
= − ∂Eitrain
∂ei,itrain
∂ei,itrain
∂u[3]i
= −ei,itrain
∂ei,itrain
∂u[3]i
(5.8)
δ[3]i = −ei,itrain
∂ei,itrain
∂yi
∂yi
∂u[3]i
= ei,itrain
∂h(u[3]i )
∂u[3]i
(5.9)
So the weights on the third layer can be updated in following way:
∆w[3]i,j = αδ
[3]i o
[2]j (5.10)
Next the updating process for wages from second layer would be presented:
∆w[2]i,j = −α∂Eitrain
∂w[2]i,j
= −α∂Eitrain
∂u[2]i
∂u[2]i
∂w[2]i,j
= −α∂Eitrain
∂u[2]i
o[1]j (5.11)
as it was in previous layer, the local error δ[2]i would be defined:
δ[2]i = −∂Eitrain
∂u[2]i
= −∂Eitrain
∂o[2]i
∂o[2]i
∂u[2]i
= −∂Eitrain
∂o[2]i
∂g(u[2]i )
∂u[2]i
(5.12)
CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 30
then, the − ∂E
∂o(1)[2]i
must be calculated:
− ∂Eitrain
∂o[2]i
= −p∑j=1
∂Eitrain
∂u[3]j
∂u[3]j
∂o[2]i
= −p∑j=1
δ[3]j wj,i[3] (5.13)
So the wages are updated in following way:
∆w[2]i,j = αδ
[2]i o
[1]j (5.14)
In the same way the weights on first layer are modified:
∆w[1]i,j = αδ
[1]i xj,itrain
(5.15)
The current pair (xitrain, yitrain
) is chosen randomly during each iteration. Authors
of [7] present the backpropagation algorithm in following way:
Given
• model of the three-layer feedforward network
• training set (x1, t1), . . . , (xitrain, titrain
), . . . , (xN , tN)
• stepsize α value
• stopping criteria value ε
Find
• estimated values of weights: W [1], W [2], W 3].
Step 1
Initiate the weight matrices: W [1], W [2], W 3]. The following rule of initialization
is proposed in [7]:” Pick values randomly in the interval < −0.5 0.5 > and divide with
fan-in, which is the number of units feeding the layer.”
CHAPTER 5. INTRODUCTION TO ESTIMATION OF MNN PARAMETERS 31
Step 2
Pick a member (xitrain, titrain
) of training set. Calculate the output yitrainfor input
value xitrainusing equation 3.8. If the 1
p
∑pi=1(yi,itrain
− ti,itrain) ≤ ε then stop (ε is
maximal acceptable error value, which set before training).
Step 3
Find the weight corrections for each layer: ∆W [3] = [∆w[3]i,j ], ∆W [2] = [∆w
[2]i,j ],
∆W [1] = [∆w[1]i,j ] using equations: 5.10, 5.14, 5.15.
Step 4
Update the weight values: W [3] = W [3] + ∆W [3], W [2] = W [2] + ∆W [2] and
W [1] = W [1] + ∆W [1]. go to Step 2.
There are possible additional features improving convergence speed and gener-
alization. The most common is momentum term for which the process of updating
weight values includes the average of previous gradients. Momentum term is described
in detail in [7] and [10] .
Chapter 6
MNN learning method
The concept two-stage MFN with multistage generalization was presented in Chap-
ter 4. The process of parameters estimation for two-stage identification system was
presented in Chapter 5. In this part of the thesis the process of estimation pa-
rameters for MNN is going to be described. Author in the paper concentrate mostly
on two-stage case of MNN, but brief description about learning procedure for bigger
number of stages would be provided. It is also important to add few words about
process of estimation weights of MFNs for classical problem of two-stage PR using
NN.
6.1 Two-stage MFN learning process
The process of weights estimation for two-stage MFN is very similar to method
for parameter estimation used in two-stage identification. The model of two-stage
MFN described in Chapter 4 is known. The aim of estimation in this case is to find
weight values on second stage (values of W (2)[1],W (2)[2], W (2)[3]) with training set
(x(2)1, x(1)1,1, y(1)1,1),. . . ,(x(2)1, x(1)1,N1 , y(1)1,N1) , . . . , (x(2)j, x(1)j,1, y(1)j,1),. . . ,
(x(2)j, x(1)j,Nj, y(1)j,Nj
), . . . , (x(2)M , x(1)M,1, y(1)M,1),. . . , (x(2)M , x(1)M,NM, y(1)M,NM
).
For each constant j−th vector of values x(2)j of vector x(2) the network on the first
stage is trained with training set: (x(1)j,1, y(1)j,1), . . . , (x(1)j,i, y(1)j,i), . . . (x(1)j,Nj,
y(1)j,N(1)j) using backpropagation algorithm. In this process of estimation the values
W (1)[1]j , W (1)
[2]j ,W (1)
[3]j of weight matrices are achieved. After reshaping (see equa-
tion 4.14) the pair (x(2)j, y(2)j) is gained. The values of W (2)[1],W (2)[2], W (2)[3]
are calculated with training set (x(2)1, y(2)1), . . . , (x(2)j, y(2)j), . . . , (x(2)M , y(2)M)
using backpropagation algorithm. This values are desired result of estimation, be-
32
CHAPTER 6. MNN LEARNING METHOD 33
cause weights cumulated in matrices W (2)[1],W (2)[2], W (2)[3] are the only parameters
in two-stage MFN.
In practice, the possibility of existing the training set with sufficient number of
members for each constant value of x(2) is very low. Especially for two - stage neural
network used PR it is better to train the network not for one constant value x(2)j,
but for set of similar, related in some way set of values x(2)j,1,. . . ,x(2)j,N(2)j. In one
tact the members (x(2)j,1, y(2)j),. . . ,(x(2)j,N(2)j, y(2)j) of training set used on second
stage would be gained instead of only one pair. For two-stage MFN used for two-stage
PR the number of tacts during estimation can be equal to number of possible results
of recognition on the second stage of the two-stage recognition process.
6.2 Multistage MFN learning process
It is easy to extend the procedure of estimation for multistage MFN. The algo-
rithm of estimation the parameters for N-stage network can be presented in following
way:
Given
• model of the N-stage MFN
• training set composed of members (x(N)jN , x(N−1)jN ,jN−1, . . . , x(2)jN ,jN−1,...,j2 ,
x(1)jN ,jN−1,...,j2,j1 , y(1)jN ,jN−1,...,j2,j1), where jN = 1, . . . ,M , jN−1 = 1, . . . , NjN ,
. . . , j1 = 1, . . . , NjN ,...,j2 .
• parameters of backpropagation algorithm for MFN
Find
• estimated values of weights: W (N)[1],W (N)[2], W (N)[3].
Step 1
Set jstage = 1.
CHAPTER 6. MNN LEARNING METHOD 34
Step 2
For each constant vector of values x(jstage+1)jN ,jN−1,...,jjstage+1estimate weight ma-
trices values W (jstage)[1]jN ,jN−1,...,jjstage
, W (jstage)[2]jN ,jN−1,...,jjstage
, W (jstage)[3]jN ,jN−1,...,jjstage
using backpropagation algorithm with training set:
(x(jstage)jN ,jN−1,...,jstage+1,1, y(jstage)jN ,jN−1,...,jstage+1,1),
(x(jstage)jN ,jN−1,...,jstage+1,2, y(jstage)jN ,jN−1,...,jstage+1,2), . . . ,
(x(jstage)jN ,jN−1,...,jstage+1,NM,...,jstage+1, y(jstage)jN ,jN−1,...,jstage+1,NM,...,jstage+1
)
Step 3
Transform each of calculated in Step 2 three weights matricesW (jstage)[1]jN ,jN−1,...,jjstage
,
W (jstage)[2]jN ,jN−1,...,jjstage
, W (jstage)[3]jN ,jN−1,...,jjstage
to vector of values y(jstage+1)jN ,jN−1,...,jjstage
using reshape method . Increase jstage - jstage = jstage + 1.
If jstage < N go to Step 2.
Step 4
Estimate values of weights: W (N)[1],W (N)[2], W (N)[3] using backpropagation al-
gorithm with training sequence:
(x(N)1, y(N)1), . . . , (x(N)M , y(N)M)
This values are desired result of estimation.
For pattern recognition purpose of MNN usage it is necessary to modify the method
of estimation in way presented for two-stage NN.
6.3 Parameter estimation for MFNs used for two-
stage PR problem
Two-stage PR process with NN used as algorithms for recognition is presented in
Figure 4.5 and was described in Chapter 4. To estimate the parameters of networks
CHAPTER 6. MNN LEARNING METHOD 35
situated on first and second stage of the structure it is necessary to have sample of two
pairs of feature vectors and corresponding class values for each stage. It means, that
the following training set is known: (¯x(1)1, ¯x(2)1, j1,1, j2,1), . . . , (¯x(1)M , ¯x(2)M , j1,M , j2,M)
. It is important to highlight, that presented training set is for PR algorithms, not
for networks directly.
Second stage recognition process is totally independent and can be used separately.
To train the NN used for algorithm Ψ2 it is necessary to obtain the training set suit-
able for network estimation. The vectors of features ¯x(2)i should be transformed to
input vector x(2)i in pre-processing. Having the rules of mapping procedure it is easy
to retrieve the output vectors y(2)i from known class values j2,i. Having the train-
ing sequence for the network on the second stage consisted of pairs (x(2)i, y(2)i) the
weight values can be gained using backpropagation algorithm.
The process of estimation on the first stage is more complex. The algorithm
Ψ1 depends on result of recognition on the second stage. The values of weights are
changed each time, when different result occurs on the second stage. It is easy to
observe, that weight values should be estimated each time for all possible values of
j2. For each possible j2 = 1, . . . , K2 only the members (¯x(1)i, ¯x(2)i, j1,i, j2) are taken,
and the weight values corresponding to the current value of j2 are estimated with
training set (x(1)j2,i, y(1)j2,i) using backpropagation algorithm. As a result K2 sets
of weight values are gained and process of estimation is completed. In two-stage
pattern recognition system the second stage weight values and values of weights for
possible scenarios on the first stage must be remembered. Two-stage MFN is described
only with weight parameters creating the second stage, because values of weights are
directly dependent on second stage NN output.
Chapter 7
Experiment
7.1 Experiment description
7.1.1 Description of considered pattern recognition problem
The problem of writer identification based on online handwritten iconic gesture
is going to be considered as a chosen pattern recognition problem in the experiment.
The are plenty of literature items related to the problem of handwritten characters
recognition [1,6,8,11,17,20,26,30] , but only few in which writer recognition is consid-
ered [3,4,19,23]. In some areas correct and instant writer recognition is as important
as recognizing the written character. The domain, in which time plays extremely
important role is crisis management. For this case, the communication system must
work extremely fast. There is a set of publications [20, 29, 30] in which authors con-
sider the set of 14 icons (Figure 7.1 ) which represents emergency situations. As they
concluded, this kind of communication tool is much faster than handwriting, easy to
learn and remember and and visual meaningful shape.
All considered papers [20, 29, 30] are related to iconic gesture recognition. In the
thesis the problem of writer recognition is going to be considered. The direct solution
using MFN is going to be used in the experiment. The two-stage recognition systems
and two-stage MFN are going to be tested as alternative methods.
7.1.2 Dataset description
The dataset used for the experiment is fully available on the web page [27]. It
contains the set of icons drawn by 32 volunteers. Every participant of the experiment
36
CHAPTER 7. EXPERIMENT 37
Figure 7.1: The set of icons plotted by one of the participants (taken from [30]).
were supposed to fill out 22 paper forms. Each form contained 35 boxes arranged in 7
rows and 5 columns, two calibration crosses and an identification area. As the result,
the set consisted of 24,441 samples were gained. The set was reduced to 8 writers and
3256 samples due to time consuming data transformation process. Each sample can
be described by writer ID, icon ID and trajectory:
σi = {(xi, yi), fi, ti} (7.1)
where (xi, yi) is current position during drawing, fi is current pressure and ti is
current time of drawing.
7.2 Features selection
The writer identification techniques are in most cases text [4, 19, 23] or signature
[3, 28] based methods. For this case, writer is going to be recognized having only
one icon produced by him. It is important to highlight, that features for typical
sign recognition problem are often used for identification of the producer [3, 26]. It
seems to be natural that features like: number of straight lines or convex hull area
are the features which differs the different types of icons or characters but also the
styles of writing. In this section two types of features, which could be used for writer
and gesture recognition are going to be presented: g-48 features [29] and Cosine
Representation [8] based on Discrete Cosine Transform.
7.2.1 g-48 features
The set of g-48 features was presented in technical report related to the dataset
[29]. This features can be used not only for iconic gesture recognition, but the writer
can be also identify using them. The set consists of space-based, dynamic and force-
based features. Most of features definition are presented in Appendix A, others can
CHAPTER 7. EXPERIMENT 38
be found in mentioned technical report. Categories of used features are going to be
briefly described next.
Length and Area Features
This kind of features are related with shape of the icon. The features like length
of trajectory or the distance first and last sample are considered here. The convex
hull Area is very important in the group. It can be easily calculated using Graham
Algorithm. There are also principal components based features like orientation of the
principal axis or the centroid offset.
Direction, Curvature and Perpendicularity
This kind of subset of features is related with angles between vectors which de-
scribes current writing direction. The most important features in this case are: Aver-
age, absolute and squared curvature, perpendicularity and maximum angular differ-
ence.
Octans
Octans are eight features which describes the the distribution of sample points on
the surface. The circle covering whole icon with center in centroid point is taken and
is divided into eight, equal parts. For each part the number of in-points is calculated
as one feature.
Trajectory-based features
As it was mentioned in technical document, trajectory-based features use the
relation between different parts of the trajectory as features. The number of crossings
seems to be the most important in this group of features for iconic gesture and writer
recognition.
Straight Lines
This kinds of features are related to a set of straight lines in the plotted icon. The
technical report present the efficient algorithm for finding the straight lines. The basic
idea of the algorithm is to find the lines not shorter than arbitrarily taken minimal
value, for which the distance for subsequent set of sample points of the trajectory is
closer to the line as a predefined threshold.
CHAPTER 7. EXPERIMENT 39
Cups
The notion of the cups is very common in handwritten characters analysis. Cups
are U-shape parts of the trajectory. The technical report presents three algorithms
for finding cups related features like: the number of cups present in a trajectory, the
offset (relative start position) of the
first cup and the offset (relative end position) of the last cup.
Dynamic Features
Dynamic features are time depending features. For this group issues like average
velocity or acceleration of the writing can be mentioned.
Force-based Features
This set of features consider the force values of the trajectory. The average pressure
is obvious feature in this case.
7.2.2 Cosine Representation
(a) original icon (b) retrieved icon
Figure 7.2: The Figure 7.2(a) contains the original plot of the icon and the Figure
7.2(b) presents the same icon retrieved from cosine transform.
The sequence of both coordinates of sample points si = (xi, yi) can be transformed
to Cosine Representation using Discrete Cosine Transformation. For sequence of
coefficients (xi, yi), where i = 1, . . . , N , transformed coordinates (vk, zk) can be gained
be gained in following way:
CHAPTER 7. EXPERIMENT 40
v0 =1
N
N−1∑n=0
(xn) (7.2)
vk =2
N
N−1∑n=0
(xn)cos(ktn) (7.3)
z0 =1
N
N−1∑n=0
(yn) (7.4)
zk =2
N
N−1∑n=0
(yn)cos(ktn) (7.5)
where k = 0, . . . , K − 1, K is length of desired Cosine Representation sequence.
The tn is defined as:
tn =π
N(n+
1
2) (7.6)
The symbol can be easily retrieved with small data lost using inversed transfor-
mation [8]. The main advantage of the Cosine Representation is that it saves the
features of each symbol in sequences of the same length. It can be easily used as an
input in the neural network.
7.3 Methods used for the experiment
Following methods and systems are going to be used for the writer recognition :
1. Method 1 - MFN for pattern recognition.
2. Method 2 - Two-stage MFN system with iconic gesture recognition on the
second stage.
3. Method 3 - Two-stage MFN for PR.
7.3.1 Method 1 - MFN network
The three layer feedforward network will be used as a first method. This kind
of network was presented in Figure 3.5. On each stage bipolar sigmoid functions
(see Figure 3.2) are used with constant, activation parameter value: γ = 1. The
configuration of the network weights is: 89− 150 − 100 − 8, what means, that there
CHAPTER 7. EXPERIMENT 41
was 89 input values and 8 output values of the network which consists of 3 layers
with 150, 100 and 8 neurons respectively. The input vector consists of 57 normalized
features values (features are presented in Appendix A, one feature value was omit-
ted) and 32 values representing the Cosine Representation - 16 for representing xi
values and 16 for representing yi values. The output vector takes the values from the
I8, where I = [−1 1]. The mapping process (see Figure 3.6 ) is the following: the
maximal value of the output vector is taken and position of this value determines the
recognition result. The output vector (if the output would be normalized or unipolar
function would be used instead ) can be interpreted as a vector of confidence rates of
writer recognition. The i− th value of the output vector is related to the i− th writer
in this case.
As a learning procedure the backpropagation algorithm were used. The value of
parameter α were found in experimental way. The training set used for estimation
consisted of two vectors: vector of features values of considered object, and the vector,
which has 1 value on the i − th positron (which was the current writer’s id) and −1
on other positions.
7.3.2 Method 2 - Two-stage pattern recognition system with
two MLF networks on each stage
The two-stage recognition system with two three layer neural networks is going to
be used as a next method. The schema of two-stage recognition using NNs was pre-
sented in Figure 4.5. On the first stage process the writer is going to be recognized.
The process of recognition on this level depends on recognition result of recognition
on the second stage. On the second stage the iconic gesture is going to be recognized.
The network structure is similar to MFN used for writer recognition. The set of fea-
tures considered for the writer recognition is the same as for recognising the icons.
The configuration of the network on the second stage is: 89−150−100−14. The map-
ping process in this case is analogical as for the writer recognition using direct solution.
The process of iconic gesture recognition has the influence both on the features
and network parameters selection. Each time, the different icon is plotted the set
of features is changed and the network parameter values are switched. In the ex-
periment two sets of features are considered: the whole set, which contains the g-48
and Cosine Representation features and the second one, which contains only the g-48
CHAPTER 7. EXPERIMENT 42
features. As the empirical research shows, for some icons the extended set of features
with Cosine Representation gives better recognition results, but for other icons the
result are worse then for g-48 features set only. It is easy to observe, that for different
results on the second stage the input vector for the first stage network is going to have
different lengths. Besides the dependent features selection the values of weights must
be changed in way determined by the second stage recognition result. In the result
there are 14 sets of possible weights values on the first stage. The number of neurons
creating the first stage network is constant, however, due to different input vector
lengths the number of parameters in sets of weights is also different. It is important
to notice, that all network parameters depend on second stage result of recognition, so
the process of switching weights is actually the process of switching the whole network.
The weights on the second stage were estimated using backpropagation algorithm.
This network is totally independent and can be used separately for iconic gesture
recognition. For each possible output on the second stage two sets of parameters
describing the network on the first stage are estimated using backpropagation algo-
rithm. The network is trained using the whole vector of features in the first case.
The second kind of training does not include the Cosine Representation. In both
cases the training set contains only samples of considered gesture. As a result, 28
sets of weight values are gained. For each pair of sets related to the chosen icon, the
set which gave better performance on the testing set is chosen. In practice it says,
which set of features used to be taken to achieve better accuracy of writer recognition.
Concluding, if the recognition result on the second stage is j2,i, the set of features
which gave better performance on the testing set is going to be selected. Then, the
corresponding to j2,i result weight values would be selected and used to gain j1,k,
which is the result of recognition on first stage.
7.3.3 Method 3 - Two-stage MFN for PR
The third method considered in the experiment is two-stage MFN. The model of
this kind of network was described in Chapter 4. The second stage of the model con-
sists of 586 networks with output vector of size 50. As it was presented in Chapter
4 each output value on the second stage is directly connected with the corresponding
weight value. The input vector on the first stage consists of g-48 features, while the
vector on second stage also included the Cosine Representation.
CHAPTER 7. EXPERIMENT 43
The two-stage MFN were estimated using modified method described in Chapter
6. Each tact of estimation was related with one type of iconic gesture.
7.4 Testing and validation details
As the criteria of evaluating the performance of recognition system the accuracy of
recognition is taken. Matlab v. 7.5 were used as an experimental tool. The dataset
used for the experiment was split into three sets using stratified sampling [31]:
1. Training set 36%.
2. Testing set 24%.
3. Evaluation set 40%.
Training set is used for parameter estimation. During training current condition
of the learning precess is examined using testing set. To avoid overfitting, those pa-
rameters are chosen for which the average training and testing recognition accuracy is
the highest. The real recognition accuracy is checked after training using evaluation
set.
7.5 Results of experiment
Figure 7.3: Values of accuracy rate for presented methods.
CHAPTER 7. EXPERIMENT 44
Method 1 Method 2 Method 3
Time of estimation ∼ 115 s ∼ 8 min. ∼ 19 h
Table 7.1: Times of estimation for methods used in experiment.
Results of the experiment are presented in Figure 7.3. The direct solution (
Method 1) gave significantly better results then Method 2 and Method 3. The
reason for the bad performance of the Method 2 is poor recognition rate on the
second stage. The icons were recognized correctly in about 95%. The Figure 7.4
presents the icons, which unexpectly occured on the second stage as a result of recog-
nition more then 2 times during testing. The process of testing during training avaible
detection of unexpected outcomes on the second stage. If repetivly bad recognition
can be obsereved on the second stage while testing, it is high probability, that some
area of PR process on this stage does not work well. To correct this problem the
Method 4 is going to be presented.
Figure 7.4: Icons, which wrongly occurred as a result of icon recognition on the
second stage more then 2 times while testing.
Method 3 , which uses the two-stage NN gave the worst results. The structure
of the network was very complex, what was caused by large number of parameters
on the second stage. It was necessary to train 586 networks on the second stage. If
the number of networks is low there is a possibility to train the network couple of
times for different α values and choose the best values of weights. It was impossible
for Method 3 . In such complex system couple of bad estimated weight values on
the second stage can have big influence on whole network performance.
CHAPTER 7. EXPERIMENT 45
It is also important to analyse the times of parameters estimation for considered
methods. The main disadvantage of usage NN is the time of estimation for com-
plex problems. As it was shown in table 7.1 times of parameter estimation for the
systems were extremely large. For single MFN it was almost two minutes. It was
cased by extensive and frequent process of testing while training and usage of poor
simulation environment. For Method 2 estimation took about 8 minutes. For this two
methods time reduction is possible, but time of estimation for Method 3 was totally
unacceptable.
7.6 Method 4 - Two-stage smart switching
As it was mentioned before, the main reason for bad performance of two-stage
recognition system presented as Method 2 was insufficient recognition rate on the
second stage. To eliminate this case simple Method 4 is going to be presented.
Figure 7.4 presents the gestures with high possibility of being badly classify, what
was examined on the testing set. It can be assumed that this type of recognition
mistakes can occur in practical usage of the network. This kind of incorrectness can
be easily eliminated. The two-stage system presented in this section will switch the
network to those, used in Method 1 if on the output of the second stage one of
the icons from 7.4 occurred as a result of recognition. Otherwise, the system would
work as typical two-stage recognition system presented as Method 2. This simple
modification, which combines two previously used methods, gave significantly better
results for writer recognition ( Figure 7.5 ).
Figure 7.5: Value of accuracy rate for Method 4 comparing to previously used
methods.
Chapter 8
Conclusions
Research Question 1: How the MNN can be defined ?
The topic of this thesis is related with concept not defined in literature. As it was
mentioned in introduction, there are only couple of items related with the notion of
MNN. The idea of the MNN presented in the thesis is strongly rooted in multistage
identification concept. Before presenting the definition of MNN it was necessary to
present brief review about NNs. As there are plenty of network models only MFN were
chosen to be considered in the work. Defining the multistage MFN was introduced by
presenting the concept of two-stage neuron model. It was important to consider the
influence of couple of neurons on weight values so the need of some kind of binding
function was provided. In practice, if the two-stage neurons construct some NN, that
character of binding between stages other then totally direct is out of sense. There is
no need to add new complex binding function because influence of input vector on the
second stage is cumulated in the network. Some trial of defining the two-stage MFN
with two-stage neuron were made, but it was observed that it is much easier to define
it as two directly binded MFNs, characterized by weight values on the second stage.
It seems to be natural, because even if we consider the single MFN trained with back-
propagation algorithm it is just a kind of equation with parameters estimated with
steepest decent algorithm. Finding the relationship with neurobiology seems to be
artificial and commercial. Concluding, the two-stage MFN model is a special case of
two-stage identification with arbitrary chosen model of the function and estimation
method. The multistage MFN was defined at the end of Chapter 4.
Research Question 2: How to estimate parameters of MNN ?
46
CHAPTER 8. CONCLUSIONS 47
As the relationship between multistage identification and MNN can be observed
the process of estimation multistage MFN is analogical as for identification. The
algorithm of estimation used during each tact was simply the backpropagation algo-
rithm. The formally described algorithm for estimation was presented in Chapter 6.
Research Question 3: How to design the MNN for PR ?
The concept MNN seemed to ideally fit identification problems. The model of
MNN can be also used instead of classical MPR. The MNN should consist of as
many stages as in consider recognition problem. Network on the first stage should
be designed as for typical PR problem. The process of constructing MNN for PR is
described in details in Chapter 4.
It is also important to highlight the need of modification of parameter estimation
if the MNN is used for PR problem. The process of tacting for each constant value
of lower stage input can be difficult to achieve. Instead, there is possibility of tacking
for each possible recognition result related with the stage where constant value should
be assumed. This modification is described in Chapter 6.
Research Question 4: How MNN perform for chosen problem of PR comparing
to other methods ?
The problem of writer identification was taken as a chosen PR problem. Three
methods were considered for the problem: direct recognition with one-stage MFN,
two-stage recognition model with iconic gesture recognition on the second stage, and
two stage NN which took vector of features describing icon on the second stage, and
vector of features describing writer on first stage. The results were the best for direct
solution. The two-stage PR model performed due to poor gesture recognition rate.
It was corrected by using the forth method, which was the combination of direct and
two-stage solution. The usage of the two-stage network for the problem was very
ineffective and gave the worst results. It was caused by two-complex structure, what
made the training process on the second stage very difficult to control. The MNN
should be used for problems, which seeks low number of parameters on the first stage.
Research Question 5: How to improve MNN method for pattern recognition
to increase correctness of recognition ?
CHAPTER 8. CONCLUSIONS 48
The good solution to improve method MNN is to use two-stage PR model extended
on smart switching process. This method performed the best in the experiment de-
scribed in Chapter 7 . This kind of improvement minimizes incorrectness on first
stage of recognition process and in parallel maximize the correctness of recognition
on the second stage.
Chapter 9
Future Works
As it was mentioned in previous chapter the usage of MNN for PR problems is in-
effective and does not correct the accuracy of recognition. It is interesting to consider
the MNN for multistage identification problems. The networks should be designed in
effective way to avoid to complex structures.
As to experiment provided in the thesis it is interesting to analyse two-stage model
with various possible methods used as recognition algorithms. SVM can be good so-
lution in this case. It is also worth to analyse the deeper features selection indicated
by result of gesture recognition.
It can be also worth to study hybrid NN systems. It could be interesting to exam-
ine the possibility of steering the Hebian learning for Hopfield network by the three
layer neural network.
49
Bibliography
[1] AKHTAR JAMEEL Experiments with various Recurrent Neural Network Ar-
chitectures for Handwritten Character Recognition , IEEE Xplore, 1994
[2] ANIL K. JAIN, JIANCHANG MAO K. M. MOHIUDDIN Artificial Neural
Networks - A Tutorial , IEEE Xplore, 1996
[3] BALTZAKIS H., PAPAMARKOS N. A new signature verification technique
based on a two-stage neural network classifier , Elsevier, 2001
[4] BENSEFIA A., NOSARY A., PAQUET T., HEUTTE L., Writer Identification
By Writers Invariants , IEEE Xplore, 2002
[5] BISHOP C. M., Neural Networks for Pattern Recognition , Oxford University
Press, 2005
[6] CONNELL S. D., JAIN A. K., Writer Adaptation for Online Handwriting
Recognition , IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 24, NO. 03, MARCH 2002
[7] CORNELIUS P., GRBIC N. Neural Networks - Lecture notes for course Neural
Networks (ETD007) , BTH, Karlskrona, 1999
[8] DUY BUI Classifying Online Handwriting Characters under Cosine Represen-
tation , IEEE Xplore, 2007
[9] GIERACHA J., Recursive two-stage estimation algorithms , Wroclaw Univer-
sity of Technology - PhD thesis, Wroclaw, 2005 (in Polish)
[10] GRBIC N., Development of a General Purpose On-Line Update Multiple Layer
Feedforward Backpropagation Neural Network , Master Thesis MEE 97-04, BTH
, Karlskrona/Ronneby, 1997
50
BIBLIOGRAPHY 51
[11] JAEGER S., MANKEL S., REICHERT J., WAIBEL A., Online handwriting
recognition: the NPen++ recognizer, Springer, IJDAR (2001) 3: 169180
[12] KAMRUZZAMAN J., AZIZ S. M. A Note on Activation Function in Multilayer
Feedforward Network , IEEE Xplore, 2002
[13] KOSINSKI R. A., Artificial Neural networks: non-linear dynamics and chaos ,
WNT , Warszawa, 2007 (in Polish)
[14] KURZYNSKI M., Pattern recognition - statistic methods , Oficyna Wydawnicza
Politechniki Wroclawskiej, Wroclaw, 1997 (in Polish)
[15] KWASNICKA H., Evolutionary designing of neural networks , Oficyna
Wydawnicza Politechniki Wroclawskiej, Wroclaw, 2007 (in Polish)
[16] KWASNICKA H., MARKOWSKA-KACZMAR U., Neural networks in practise
, Oficyna Wydawnicza Politechniki Wroclawskiej, Wroclaw, 2005 (in Polish)
[17] MARUKATAT S., SICARD R., ARTIERES T., GALLINARI P. A Flexible
Recognition Engine for Complex On-line Handwritten Character Recognition ,
IEEE Xplore, 2003
[18] MOHAMED N. AHMED, ALY A. FARAG Two-stage neural network for vol-
ume segmentation of medical images , Elsevier, 1996
[19] NIELS R., VUURPIJL L., Generating copybooks from consistent handwriting
styles , Nijmegen Institute for Cognition and Information Radboud University
Nijmegen, Nijmegen, The Netherlands, 2008
[20] NIELS R., WILLEMS D., VUURPIJL L., The NicIcon Database of Handwrit-
ten Icons for Crisis Management, Nijmegen Institute for Cognition and Infor-
mation Radboud University Nijmegen, Nijmegen, The Netherlands, 2008, link:
http://unipen.nici.ru.nl/NicIcon/.
[21] ROJAS R., Neural Networks - A Systematic Introduction , Springer, Berlin,
1996
[22] RUMELHART D. E., MCLELLAND J. L. Parallel Distributed Processing ,
Cambridge, MA.: The M.I.T. Press, 1986.
[23] SANTANA O., TRAVIESO C. M., ALONSO J. B., FERRER M. A., Writer
Identification Based on Graphology Techniques, IEEE Xplore, 2008
BIBLIOGRAPHY 52
[24] SCHLAPBACH A., LIWICKI M., BUNKE H., A writer identification system
for on-line whiteboard data , Elsevier, Pattern Recognition 41 (2008) 2381 2397
[25] SWIATEK J., Two-stage identification and its technical and biomedical appli-
cations , Wydawnictwo Politechniki Wroclawskiej, Wroclaw, 1987 (in Polish)
[26] TENG LONG, LIAN-WEN JIN, LI-XIN ZHEN, JIAN-CHENG HUANG, One
Stroke Cursive Character Recognition Using Combination of Directional and
Positional Features , IEEE Xplore, 2005
[27] UNIPEN FUNDATION, link: http://www.unipen.org/
[28] VIELHAUER C., STEINMETZ R., MAYERHOFER Biometric Hash based on
Statistical Features of Online Signatures , IEEE Xplore, 2002
[29] WILLEMS D., NIELS R., Definitions for Features used in Online Pen
Gesture Recognition , Nijmegen Institute for Cognition and Information
Radboud University Nijmegen, Nijmegen, The Netherlands, 2008, link:
http://unipen.nici.ru.nl/NicIcon/
[30] WILLEMS D., NIELS R., VAN GERVEN M., VUURPIJL L., Iconic and
multi-stroke gesture recognition, Elsevier, Pattern Recognition (2009),doi:
10.1016/j.patcog.2009.01.030.
[31] WITTEN I.H., FRANK E., Data Mining. Practical Machine Learning Tools
and Techniques, Elsevier, San Francisco, 2005
[32] YAMADA TAKAYUKI , YABUTA TETSURO, Remarks on Neural Network
Controller Using Different Sigmoid Functions , IEEE Xplore, 1994
[33] YANG FENG , YANG FAN Character Recognition Using Parallel BP Neural
Network , IEEE Xplore, 2008
Appendix A
Features description
Lp Name Definition
1 Length Φ1 =∑N−1
n=1 ‖ sn+1 − sn ‖2 Area Φ2 = A
3 Compactnes Φ3 =Φ2
1
A
4 Eccentricity Φ4 =√
1− b2
a2
5 Ratio coord. axes Φ5 = b′
a′
6 Closure Φ6 =∑N−1
n=1 ‖sn+1−sn‖‖sN−s1‖
7 Circular variance Φ7 =∑N
n=1(‖sn−µ‖−Φ73)2
NΦ273
8 Curvature Φ8 =∑N−1
n=2 Ψsn
9 Avg. curvature Φ9 = 1N−2
∑N−1n=2 Ψsn
10 Abs. curvature Φ62 =∑N−1
n=2 | Ψsn |11 Squared curvature Φ63 =
∑N−1n=2 Ψ2
sn
12 Avg. direction Φ12 = 1N−1
∑N−1n=1 arctan
yn+1−yn
xn+1−xn
13 Perpendicularity Φ13 =∑N−1
n=2 sin2Ψsn
14 Avg. perpendicularity Φ14 = 1N−2
∑N−1n=2 sin
2Ψsn
15 Centroid offset Φ16 =‖ p(µ− c) ‖16 Length princ. axis Φ17 = α
Table A.1: Table of features g-48 (1) (taken from [30]).
53
APPENDIX A. FEATURES DESCRIPTION 54
17 Orient. princ. axis Φ18 = sinΨ, Φ19 = cosΨ
18 Ratio ofprinc. axes Φ67 = βα
19 Length b. box diag. Φ57 =√a2 + b2
20 Angle b. box diag. Φ58 = tan ba
21 Rectangularity Φ20 = Aαβ
22 Max. ang. difference Φ21 = max1+k≤n≤N−kΨksn
23 Cup count see [29]
24 Last cupoffset see [29]
25 First cupoffset see [29]
26 Initial hor. offset Φ35 = x1−xmin
a
27 Final hor. offset Φ36 = xN−xmin
a
28 Initial ver.offset Φ37 = y1−ymin
b
29 Final verr. offset Φ38 = yN−ymin
b
30 N straight lines see [29]
31 Straight line ratio see [29]
32 Largest str. line ratio see [29]
33 Sample ratio octants see [29]
34 N connected comp. see [29]
35 N crossings see [29]
36 Initial angle Φ55 = x3−x1
‖s3−s1‖ , Φ56 = y3−y1‖s3−s1‖
37 Dist. first-last Φ59 =‖ sN − s1 ‖38 Angle first-last Φ60 = xN−x1
‖sN−s1‖, Φ61 = yN−y1
‖sN−s1‖
39 Avg. centr. radius Φ59 = 1N
∑Nn=1 ‖ sn − µ ‖
40 Duration Φ24 = tN − t141 Avg. velocity Φ25 = 1
N−2
∑N−1n=2 Vn
42 Max. velocity Φ25 = max2≤n≤N−1Vn
43 Avg. acceleration Φ28 = 1N−4
∑N−2n=3 an
44 Max. acceleration Φ29 = max3≤n≤N−2an
45 Max. deceleration Φ30 = min3≤n≤N−2an
46 Npen down see [29]
47 Avg. pressure Φ20 = 1N
∑Nn=1 fn
48 Penup/down ratio see [29]
Table A.2: Table of features g-48 (2) (taken from [30]).
APPENDIX A. FEATURES DESCRIPTION 55
Lp Description Notation
1 Unit vectors (x- and y-axes)
spanning R2
e1 = (1, 0), e2 = (0, 1)
2 Pen trajectory with N sam-
ple points
= = {σ1, . . . , σN}
3 Sample σi = {si, fi, ti}4 Position si = (xi, yi)
5 Area of the convex hull A
6 Angle between subsequent
segments
Ψsn = arctan (sn−sn−1)(sn+1−sn)‖sn−sn−1‖‖sn+1−sn‖
7 Length along the x-axis a = max1≤i<j≤N | xi − xj |.8 Length along the y-axis a = max1≤i<j≤N | yi − yj |.9 Center of the bounding box c = (xmin + 1
2(xmax + xmin) , ymin +
12(ymax + ymin))
10 Longest edge-length of the
bounding box
a′= a if a > b else a
′= b
11 Shortest edge-length of the
bounding box
b′= b if a < b else b
′= a
12 Principal components pi
13 Angle of first principal axis Ψ = arctanp1e2p1e1
14 Length of first principal axis α = 2max0≤i<N | p2(c− sn) |15 Length of second principal
axis
α = 2max0≤i<N | p1(c− sn) |
16 Centroid µ = 1N
∑Nn=1 sn
17 Velocity Vi = si+1−si−1
ti+1−ti−1
18 Acceleration ai = Vi+1−Vi−1
ti+1−ti−1
Table A.3: The notation and definitions used in A
(taken from [30]).