master's thesis technische universität wien › files › publik_280264.pdf · 5 third oor of...
TRANSCRIPT
institute oftelecommunications
Master's Thesis
Technische Universität Wien
Institute of Telecommunications
Data Driven Prediction of Crowd Mobility inSmall Cell Environments
Miriam Leopoldseder1225520
June 2019
Supervision:
Univ. Prof. Dipl.-Ing. Dr.techn. Markus Rupp
Senior Scientist Dipl.-Ing. Dr.techn. Philipp Svoboda
Abstract
With the growing number of users and high standards regarding data rate, latency
and coverage, in mobile networks, new technologies need to be developed to meet
the demand. The realization of these new concepts requires more planning and
intelligent utilization of information on all network levels. One proposal is to
apply user location prediction methods and incorporate information about how
many users will be where in future time steps in the planning.
This work benchmarks several user location prediction methods. We di�erentiate
between predictors on an individual user level, where we investigate an order-L
Markov and Hidden Markov Model (HMM) based predictor, and access point (AP)
aggregated prediction methods, that directly predict the number of users per AP.
Representing the second category we analyse a Kalman �lter based method and a
machine learning (ML), model-less, approach using neural networks (NNs). The
algorithms show varying robustness in their mean square error (MSE) performance
regarding di�erent number of prediction steps. The Kalman approach has smallest
increase in MSE between one and �ve prediction steps compared to the other model
based methods. To analyse the consequences of incorrect parameters theoretical
data on the basis of a state space model, a Hidden Markov Model (HMM) and
an agent based simulation with the software Anylogic was generated. The results
show that with parameter errors MSE increases most for HMM compared to the
order-L Markov and Kalman predictor.
Lastly, collected data from the Wireless Local Area Network (WLAN) network
of Technische Universität Wien (TU Wien) o�ers a realistic review of prediction
performance in a small cell environment. Its analysis shows that the purely data-
based ML method results in the lowest MSE.
Zusammenfassung
Mit der wachsenden Anzahl an Benutzern in mobilen Netzen, die hohe Ansprüche
bezüglich Datenrate, Latenz und Verfügbarkeit haben, müssen neue Technologien
entwickelt werden um der Nachfrage gerecht zu werden.
Die Realisierung dieser neuen Konzepte benötigt mehr Planung und intelligente
Informationsverarbeitung auf allen Netzwerkebenen. Ein Vorschlag ist Vorhersa-
gemethoden zu nützen, die es ermöglichen die Anzahl der Benutzer pro Zelle für
mehrere Zeitschritte in der Zukunft zu planen. In dieser Arbeit werden verschiede-
ne dieser Methoden untersucht. Man unterscheidet zwischen Prädiktoren die auf
der Ortstrajektorie von individuellen Benutzer basieren, hier untersuchen wir einen
Order-L Markov Ansatz und einen Prädiktor basierend auf Hidden Markov Mo-
dels (HMMs), und Methoden welche die Anzahl an Benutzern pro Zugangspunkt
AP direkt Vorhersagen. Als Repräsentanten dieser Kategorie analysieren wir einen
Kalman�lter basierenden Prädiktor und eine machine learning (ML) Methode ba-
sierend auf neuronalen Netzen (NNs).
Die genannten Algorithmen sind unterschiedlich robust in ihrer mittleren quadrati-
schen Abweichung (MSE) bezüglich verschiedener Anzahl an Vorhersageschritten.
Beim Kalman Ansatz erhöht sich der MSE am geringsten im Vergleich zu den
anderen modellbasierten Methoden. Die Untersuchung von fehlerhaften Parame-
tern anhand von theoretischen Daten generiert mit verschiedenen Modellen zeigt,
dass HMM für wenige Zeitschritte einen niedrigen MSE liefert. Im Gegensatz zu
den anderen Methoden verschlechtert sich die Leistung stark bei einer gröÿeren
Vorhersagezeit.
Um ein realistisches Szenario zu einem mobilen Netzwerk zu untersuchen, wer-
den gesammelte Daten aus dem Wireless Local Area Network (WLAN) Netzwerk
der TU Wien verwendet. Hier erreicht die rein datenbasierte ML Methode den
niedrigsten MSE.
Contents
1 Introduction 1
1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 State of the Art 6
3 Test Scenarios and Experimental Setup 13
3.1 Simulated Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Experimental Wi-Fi Scenario . . . . . . . . . . . . . . . . . . . . . 16
4 Models and Prediction Methods 22
4.1 Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Order-L Markov Predictor . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Hidden Markov Model (HMM) . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Inference in Hidden Markov Models (HMMs) . . . . . . . . . 27
4.3.2 Prediction in Hidden Markov Models (HMMs) . . . . . . . . 28
4.3.3 Training the Parameters of HMMs . . . . . . . . . . . . . . 35
4.4 State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.1 Kalman Filter Implementation . . . . . . . . . . . . . . . . . 42
4.6 Machine Learning and Neural Networks . . . . . . . . . . . . . . . . 45
4.6.1 A short Machine Learning Overview . . . . . . . . . . . . . 45
4.6.2 De�nition of Neural Network . . . . . . . . . . . . . . . . . . 46
4.6.3 Practical Aspects of the Neural Network Prediction Method 49
5 Performance Analysis 50
5.1 Theoretical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1.1 Data generated by State Space Model . . . . . . . . . . . . . 50
5.1.2 Data generated by Hidden Markov Model (HMM) . . . . . . 54
5.1.3 Data generated by Agent based Simulation . . . . . . . . . . 60
5.2 Wi-Fi Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.1 Order-L Markov Predictor . . . . . . . . . . . . . . . . . . . 63
5.2.2 Benchmarking of di�erent Prediction Methods . . . . . . . . 64
6 Discussion 68
7 Conclusion 72
8 References 74
A Appendix 80
A.1 Floor Plans of Campus Guÿhaus . . . . . . . . . . . . . . . . . . . . 80
A.2 Kalman Filter Implementation in Matlab . . . . . . . . . . . . . . . 81
A.3 Parameters for Anylogic Simulation . . . . . . . . . . . . . . . . . . 82
A.4 Anylogic Simulation Results . . . . . . . . . . . . . . . . . . . . . . 83
A.5 HMM Example Simulation Results . . . . . . . . . . . . . . . . . . 88
List of Figures
1 Comparison of current network architecture and proposed Centralized
Radio Access Network (C-RAN) architecture. . . . . . . . . . . . . 3
2 Flowchart showing the logic behind the Anylogic simulation. The
enter and exit location is chosen randomly with pre-de�ned proba-
bilities. Wait time is distributed uniformly between 0.9 and 1 hour. 13
3 Snapshot of a simulation run in Anylogic. The dots in di�erent
colours on the map are agents currently moving to their destination
EI 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Stacked number of agents measured in the Anylogic simulation at
each router over time. . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Third �oor of Campus Guÿhaus, displaying access points (APs) [28]. 16
6 Flow diagram of how APs are connected in Campus Guÿhaus. . . . 17
7 Number of users over time per AP during Monday, 22nd of January
2019, measured every 30 seconds at Campus Guÿhaus. . . . . . . . 18
8 Distribution of sequence lengths and amount of AP visited per user. 19
9 Variation of the probability of where users enter the system for our
experimental setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
10 Example path through Campus Guÿhaus [28]. . . . . . . . . . . . . 21
11 Example of state transition diagram for an order-1 Markov chain
[32, p. 590]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
12 Development in time for a Markov chain and a HMM [33]. . . . . . 28
13 Trellis diagram to motivate a linear state space model to model the
movement between cells. . . . . . . . . . . . . . . . . . . . . . . . . 38
14 Block diagram showing the relationship between state estimator and
system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
15 Exemplary graph for a neural network. . . . . . . . . . . . . . . . . 47
16 State diagram for the theoretical state space model example. . . . . 51
17 Number of users at each AP over time generated by the state space
model example described in section 5.1.1. . . . . . . . . . . . . . . . 52
18 Variation of MSE for increasing prediction horizon and additive
uniform noise on A with variance 10−2. . . . . . . . . . . . . . . . . 53
19 Results for mean square error (MSE) over noise variance for di�erent
prediction horizons and noisy parameters A and B for the state
space model example. . . . . . . . . . . . . . . . . . . . . . . . . . . 54
20 Number of users over time stacked for each AP for the theoretical
HMM data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
21 Results for Markov predictors with di�erent order L for the HMM
example data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
22 MSE results for HMM theoretical example data applying the HMM
based prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
23 Results for the HMM theoretical data applying the Kalman �lter
predictor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
24 Comparison of prediction performance for di�erent prediction meth-
ods and parameter noise variances for the HMM theoretical example
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
25 Results for a one step prediction for HMM, Order-3 Markov and
Kalman �lter prediction . . . . . . . . . . . . . . . . . . . . . . . . 60
26 Plot of normalized data from the Anylogic simulation and its Kalman
prediction for 5 prediction steps. . . . . . . . . . . . . . . . . . . . . 61
27 Development of MSE with the Kalman �lter prediction for increas-
ing prediction horizon applied on the Anylogic data. . . . . . . . . 62
28 MSE results for di�erent order Markov predictors applied on the
measured Wi-Fi data. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
29 MSE for HMM, Kalman, order-1 and ULS prediction for increasing
prediction horizon using the Wi-Fi data from Campus Guÿhaus. . . 64
30 Normalized number of users over time for AP-CF02-7 for a one
step prediction for the Wi-Fi data from TU Wien for the 26th of
November 2018. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
31 Normalized number of users over time for AP-CF02-7 for a �ve step
prediction for the Wi-Fi data from Technische Universität Wien
(TU Wien) for the 26th of November 2018. . . . . . . . . . . . . . . 67
32 Floor plan of second �oor of Campus Guÿhaus with APs marked in
green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
33 Floor plan of third �oor of Campus Guÿhaus with APs marked in
green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
34 Floor plan of fourth �oor of Campus Guÿhaus with APs marked in
green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
35 MSE for Kalman prediction of the Anylogic simulation for AP 1. . . 83
36 MSE for Kalman prediction of the Anylogic simulation for AP 2. . . 84
37 MSE for Kalman prediction of the Anylogic simulation for AP 3. . . 85
38 MSE for Kalman prediction of the Anylogic simulation for AP 4. . . 85
39 MSE for Kalman prediction of the Anylogic simulation for AP 5. . . 86
40 MSE for Kalman prediction of the Anylogic simulation for AP 6. . . 86
41 Sum of users for the Anylogic example and estimate using the
Kalman �lter prediction with prediction horizon one. . . . . . . . . 87
42 Sum of users for the Anylogic example and estimate using the
Kalman �lter prediction with prediction horizon �ve. . . . . . . . . 87
43 Plot of original HMM data and prediction result for 1 prediction
step for HMM, Markov order-3 and Kalman prediction showing AP 2. 88
44 Plot of original HMM data and prediction result for 1 prediction
step for HMM, Markov order-3 and Kalman prediction showing AP 3. 89
45 Plot of original HMM data and prediction result for 1 prediction
step for HMM, Markov order-3 and Kalman prediction showing AP 4. 89
46 Plot of original HMM data and prediction result for 1 prediction
step for HMM, Markov order-3 and Kalman prediction showing AP 5. 90
47 Plot of original HMM data and prediction result for 1 prediction
step for HMM, Markov order-3 and Kalman prediction showing AP 6. 90
List of Tables
1 Schedule of EI 1 for the Anylogic simulation. . . . . . . . . . . . . . 15
2 Schedule of EI 4 for the Anylogic simulation. . . . . . . . . . . . . . 15
3 Example entry of data measured in Campus Guÿhaus on the 23th
of April. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Mean percentage per AP of where users are �rst registered. . . . . . 20
5 MSE for di�erent prediction methods for prediction horizon 1, 5
and 20. The results for Kalman and HMM based predictions are
displayed for noise variance 0.1. . . . . . . . . . . . . . . . . . . . . 59
6 Comparison of MSE for 1 and 5 prediction steps for the recorded
Wi-Fi data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
List of Abbreviations
AP access point
BBU Baseband Unit
C-RAN Centralized Radio Access Network
EM expectation maximisation
HMM Hidden Markov Model
ID identi�cation
IEEE Institute of Electrical and Electronics Engineers
IP Internet Protocol
LeZi Lempel-Ziv
LMMSE linear minimum mean square error
LOS line of sight
LZ Lempel-Ziv
MAC Media-Access-Control
mIoT massive Internet of Things
ML machine learning
MMP Mobile Motion Prediction
MSE mean square error
NN neural network
ODM origin destination matrix
pdf probability density function
PPM Prediction by partial matching
QoS Quality of Service
RRH Remote Radio Head
TU Wien Technische Universität Wien
VoIP Voice over IP
WLAN Wireless Local Area Network
1 INTRODUCTION
1 Introduction
In the last few years reliable mobile internet access has gained massive importance
in our society. A report by Cisco predicts that the global mobile tra�c will increase
by over 600% between 2017 and 2022 [1]. It estimates that for each person in the
world there will be on average 1.5 mobile devices. Additionally, mobile connection
speed will triple, while the average smartphone will generate 11GB of mobile data
tra�c per month by 2022. In the 4th quarter of 2018 alone, 43 million new mobile
subscriptions were added worldwide and it is forecasted that by 2022 almost one
billion more devices than in 2016 will be in use [1, 2].
With more people than ever connected over their mobile devices and ever increasing
demands for bandwidth, low latency and connection reliability, there are numerous
challenges that need to be addressed in future network generations like 5G and
beyond. Some requested features to meet demand are �exible function capabilities,
cost e�cient throughput coverage and context aware networks [3]. To tackle these
challenges, new technologies have to be introduced. One new concept proposed for
5G is network slicing. 3rd Generation Partnership Project (3GPP) [4] de�nes a
network slice as �A logical network that provides speci�c network capabilities and
network characteristics�. It o�ers the possibility to run di�erent services on the
same hardware while maintaining a certain Quality of Service (QoS) for each user.
Network slicing also makes the system more �exible and easier to scale because
it separates core network development, deployment and service maintenance [5].
Such features also create new challenges.
The virtualization of services generates the need for e�ciently shared network
resources for services which at times have competing interests. On the one hand
there are ultra-reliable low latency applications while on the other there are massive
Internet of Things (mIoT) networks, where latency is an insigni�cant factor. Con-
sequently these slices need to be isolated from each other to guarantee a speci�c
QoS for each of them [5].
In order to manage resources and access demand an orchestrator is introduced.
It provides communication between the core network and is under the constraint
of the access layer of the network [5]. This orchestrator has many di�erent tasks,
1
1 INTRODUCTION
but always has to make sure that the promised QoS of each slice is maintained.
To reliably match resources to demand, new models and procedures need to be
developed. One possible change to make allocating resources to certain services
feasible is a �exible system architecture that dynamically adapts network structure
to demand.
In order to aid the orchestrator in managing the changing demands for services, we
need to exploit every possible information, to forecast at which particular location
and when the service is expected to be utilized. This is especially helpful if a node,
like a car, is moving through the network. In that case the prediction of where
users will be in the next time period brings improvements to how the orchestrator
can plan resource allocation in advance. A suggested technology to achieve this is
discussed below.
Centralized Radio Access Network (C-RAN)
To accommodate the new �exibility demands, network architecture has to evolve.
Current mobile networks work on a cellular basis, where each cell has its own
Remote Radio Head (RRH), consisting of the main hardware needed to transmit
a signal, and its own Baseband Unit (BBU), responsible for resource allocation
and computation. Currently not all resources of RRHs exploit their full capacity,
which means that BBU computational power is not utilized e�ciently. The sug-
gested C-RAN architecture o�ers an improvement to the current system [6]. As
seen in �g. 1, in a C-RAN one BBU-pool works with several RRHs. The computa-
tional power of each BBU gets shared between di�erent RRHs. This improves the
e�ciency of the network, because the load in each cell varies over time [7]. Addi-
tionally, with smaller cell types being introduced in 5G, a more e�cient planning
of computational resources will be necessary [3].
One challenge of this new kind of mobile network is allocating enough radio re-
sources to where they are needed. Demand is assumed to be proportional to the
amount of users at a given location. Methods to model how people are moving
through a network and predicting how many people will be at a certain time and
location in the system, relying only on historical data, may help match resources
to demand.
2
1 INTRODUCTION
Figure 1: Comparison of current network architecture and proposed C-RAN ar-chitecture.
Other applications of location prediction
Predicting how people are moving through buildings or networks is utilized in mul-
tiple applications, other than improving �exible mobile communication systems.
In general we can di�erentiate between two classes of applications that bene�t
from predicting the user's location:
1. User-focused applications: Location prediction helping the user to pre-
pare for or adapt to a situation. Examples are road tra�c optimization or
stolen vehicle localization. Here, an exact position may be more important
than in system-focused applications.
2. System-focused applications: Improving system performance, availabil-
ity or other metrics due to location information. An example is hando�
optimization. In these applications the precise location is less important and
symbolic coordinates like cell-id or access point are su�cient [8].
As an example, tracking users and predicting their location helps directing crowds
through a network while minimizing congestion. In 2016 Transport for London
conducted a project where they tracked users of the Virgin Media Wi-Fi network
in London's subway stations and their subway gateway data. Among other things,
their objective was to improve journey planners, rerouting during disruptions in
the service and sta� distribution, implementing the insights gained [9].
3
1 INTRODUCTION
This shows how versatile data tracked from mobile or Wi-Fi networks can be, but
also raises the issue of privacy of the users and the ownership of a data pro�le.
Tracking the movement of users through the public transport system may reveal
information about their home, place of work, habits and schedule. On a bigger
scale, like the mobile network, even more information like travelling destination,
relationship status, favourite restaurants and so on could be gathered.
Good practise is to disguise unique identi�cations (IDs) like Media-Access-Control
(MAC) or Internet Protocol (IP) address. To further protect private information
while working with location data several measures are possible. Noise can be
added to the location or, depending on the application, one could bin users into
groups and only look at the group behaviour [10]. Killijian, Gambs, and Cortez [10]
suggest sanitization algorithms to secure privacy, however they conclude that every
privacy measure requires a trade-o� between accuracy and privacy. Protecting the
privacy of the users will not be covered in this thesis, but nevertheless should be
addressed in every real-life application.
For our suggested application, which is applying location prediction to help a net-
work orchestrator make more informed decision and plan ahead, there are several
dimensions that need to be considered. First we assume that there is the possibility
to measure the user location history on a user and AP level in the network. Due to
every measurement taking up resources in the system, we would like to minimize
the number of measurements we need to make, while still knowing with a certain
accuracy how many users are at what location in the network and where they
will be in the future. This requires maximising the timespan that is accurately
predicted. To evaluate the proposed methods we consider how much memory is
required during the prediction, what data is needed for the prediction, the amount
of data necessary for the training and how the prediction complexity scales with
the size of the network.
To e�ciently utilize available data in mobile or Wi-Fi networks good modelling
and prediction methods are required. This work investigates several approaches,
some stemming from methods already proposed for other mobile networking ap-
plications, others that are applied in a variety of di�erent tasks.
4
1.1 Outline 1 INTRODUCTION
1.1 Outline
We will �rst discuss state-of-the-art methods and applications for location predic-
tion and leveraging mobile network data for location-based analysis in section 2.
Subsequently, we describe di�erent test scenarios, evaluating prediction perfor-
mance in section 3. Section 4 addresses models and methods that are the basis for
the prediction algorithms. We discuss two individual user based prediction meth-
ods, HMM and order-L Markov predictor, that are widespread in the literature.
Because we are interested in the crowd movement, we sum over all users to get
the number of users per AP over time. Additionally we investigate two AP based
methods, Kalman and ML predictor, that work with the aggregated number of
users at each AP to directly predict crowd movement.
Finally we benchmark the performance of the di�erent methods for our test sce-
narios in section 5, discuss the results in section 6 and o�er concluding remarks in
section 7.
5
2 STATE OF THE ART
2 State of the Art
Matching demand with resources is a fundamental challenge in a modern society
in motion. Predicting tra�c �ows allows for mapping demand for mobility to the
required resources. It is therefore no surprise that the prediction of where people
will be and how they will use network infrastructure has early been investigated,
especially in regards to tra�c planning.
Location tracking in tra�c planning
Tra�c �ow has traditionally been modelled with origin destination matrices (ODMs).
Each ODM entry describes how many people travel from a speci�c origin to a desti-
nation during a certain period of time. Data aggregation to �ll these matrices was
historically either done by surveys, tra�c count stations, tra�c count cameras or
by collecting data from bus or taxi systems [11, 12]. Those methods were usually
very laborious, which meant updating ODMs was very costly. Therefore ODMs
were rarely up-to-date and sophisticated adaption methods were needed. This also
meant that most models where static models. With new ways of gathering data,
implementation of dynamic models became possible [12].
Dynamic ODM models allow for route planning or determining the ideal service
frequency for public transport. Pinelli et al. [13] introduce a method which uses
mobility traces from a mobile network and an ODM based model to e�ciently
plan routes in a public transport network. The authors could reduce the overall
journey times by 27 %. They speci�cally highlight the advantages of utilizing
mobile network data for emerging markets, because the data collection does not
need additional infrastructure like road measurement points or counting sensors
in public transport, but is available at all times. The downside of network data
is that typically only event data are available. If for parts of the journey no calls
are made or text messages received, the trajectory might be not representative of
the ground truth and therefore not reliable. Consequently, it is necessary to track
data and to ensure its validity. Additionally, data from di�erent providers might
be necessary to capture characteristics of movement, which makes it more di�cult
to obtain complete datasets [13]. Those downsides are still outweighed by the ease
6
2 STATE OF THE ART
of which an almost complete trajectory can be captured, which was not possible
with earlier methods.
The ODM models work well with vehicular or internet tra�c, which are mostly
focused on the �ow between locations. They are less ideal if we are not interested
in looking at �ow, but rather at amount of users at a certain location [14]. We
will be considering methods used in mobile communication next.
Location prediction in Mobile communication
Historically, a common goal in mobile communication is to predict the hand-over
of user connections between cells. Subsequently, the prediction of the location of
a single user has received more attention than tracking movement of a group of
users. To reach the goal of predicting the next user location, many algorithms
de�ne the movement history for a single user Hm = 〈x1 = ai, . . . , xm = aj〉 wherei, j ∈ {1, . . . , N}, as a base for the prediction. Here x represents an abstract
location like a cell or a precise location like GPS-coordinates at time or event i.
We assume that the locations x can be modelled as random variables from a �nite
alphabet A = {a1, . . . , aN}, which describes all possible locations. If for example,
we describe the movement of a user trough �ve di�erent rooms over a period of
24 hours, A will include �ve possible locations a1 to a5. If we update the location
every hour, m will equal 24.
The update ofHm can be movement based with an entry being added event driven,
like at cell crossings, or the updates can happen in �xed time intervals. A combi-
nation of both approaches is possible, where H is updated after some time units
and every cell crossing [8]. Hm is also called user sequence, trajectory or trace.
Cheng, Jain, and Berg [8] di�erentiate between two types of location prediction
methods for hand-over prediction: �domain-independent� or �domain-speci�c� al-
gorithms. Examples for domain-independent algorithms are order-L Markov pre-
dictors detailed in section 4.2, which predicts the next step based on the last L
steps of the movement history and Lempel-Ziv (LZ or LeZi) predictors based on
Ziv and Lempel [15]. LZ based algorithms depend on two main assumptions:
1. The user's mobility patterns are repetitive making the movement history a
stationary process.
7
2 STATE OF THE ART
2. The user's movement follows a probabilistic model and therefore the move-
ment history H is a stochastic process too.
Examples of domain-speci�c approaches are Mobile Motion Prediction (MMP)
for improving mobility management in a cellular network introduced by Liu and
Maguire [16], and segment matching introduced by Chan, Zhou, and Seneviratne
[17]. We will focus on domain-independent methods, because they are less restric-
tive on the kind of networks or applications they work with.
Both Lempel-Ziv (LZ) and Markov methods mentioned above have several lim-
itations. Mainly, they are independent of time, which means known time of day
e�ects or other additional knowledge cannot be incorporated into the prediction
[18]. Building on the basic ideas of Markov model based or LZ predictions, many
variations have been investigated in literature.
Rodriguez-Carrion et al. [19] compare three di�erent versions of the LZ algorithm:
classical LZ, LeZi Update and Active LeZi. The classical LZ divides H into sub-
strings and builds a tree that represents which location appeared after a certain
substring and how frequently. Prediction is done by evaluating the branch of the
current context by counting how many times a location appeared after the cur-
rent context, divided by how often the current context appears in general. LeZi
Update additionally takes patterns within substrings into account. It uses predic-
tion by partial matching method, which combines di�erent order context length
for prediction. The Active LeZi Algorithm considers all possible substrings with a
variable window length. It needs the most memory resources of the three. They
have the advantage of online training the needed parameters, and therefore an
initial training phase is not necessary. This online training property also brings
the possibility to automatically adapt to changes in user behaviour, because the
parameters change with the data. The authors in [19] work with GSM-based loca-
tion data to evaluate the performance of the three LZ-variants. In terms of average
correct prediction, Active LeZi outperformed the other methods but also had the
highest resource consumption.
It is di�cult to compare the various methods by di�erent authors, because de-
pending on the exact application the prediction performance is measured uniquely.
8
2 STATE OF THE ART
Rodriguez-Carrion et al. [19] measure performance in terms of hitrate1. With their
methods they achieve a 60% hitrate for all users and a hitrate over 80% for less
than ten percent of all users. Theoretically LZ methods achieve minimum uncer-
tainty with lowest order [19], however with practical data Song [20] shows that
Markov based methods outperform LZ based methods.
Hadachi et al. [21] evaluate mobility prediction based on di�erent enhanced Markov
model schemes. They combine algorithms utilizing a global and local context and
di�erent order Markov methods. To minimize errors due to common problems in
mobile network trajectories, they �lter the data to minimize ping-pong e�ects2
and to split traces, if there is a signi�cant time gap between locations. They
measure the prediction performance in terms of percentage of correct predictions3, percentage of wrong predictions4 and percentage of failed predictions5. For users
who's data have been already present in the trainings set, their best combination
method reaches 95.67% of correct predictions. For new users their best method
only achieves 53.87% of correct predictions, while 22.33% were failed predictions.
Several of their other combined methods have a higher number of wrong predictions
than correct ones and especially for new users, the percentage of failures is very
high with up to 97%. These methods have problems with new users which might
indicate that they do not �t the underlying system.
Comparing both Markov and LZ based schemes, Cheng, Qiao, and Yang [18]
utilize tra�c data from a mobile network gathered over 27-days and resulting in
over 4000 user trajectories. They introduce an improved Markov based prediction
and compare it among others to a time based Markov approach. In their suggested
method they alter probabilities with weights depending on time between samples
and combine di�erent order Markov predictors. Their new methods almost keeps
up with the performance of Active Lempel-Ziv (LeZi) and LeZi Update but uses
less resources and is faster. They measure prediction performance by a �fraction
of users for which the algorithms correctly identi�ed the next location� and then
1Equalling number of correctly predicted next cells divided by total number of cell changes2Ping-pong e�ects are a fast switching between cells if the user is located in the border region
between cells.3Ratio of number of correct predictions to total number of users4Ratio of number of wrong predictions to total number of users5Ratio of number of failed predictions to total number of users
9
2 STATE OF THE ART
calculate a mean over all timesteps. In summary they achieve a prediction accuracy
of� 36.1% with Active LeZi,
� 35.8% with LeZi Update,
� 35.4% with improved Markov,
� 29.9% with LZ, and
� 29.4% with Markov prediction.
They comment that the prediction accuracy is better for sequences with less lo-
cations and less change, which they call high regularity. The gap between time
based Markov and improved Markov decreases when trajectory regularity is low.
This is emphasized by their �nding that the prediction works better at night time,
because there the trajectories are less diverse. This shows the possible tendency
for Markov predictors to prefer staying in a location and not being able to actually
predict location changes.
In Libo Song et al. [22] di�erent domain independent prediction methods in terms
of prediction performance for Wi-Fi data are compared. They conclude that en-
tropy, as indicator for movement randomness, correlates with the performance of
Markov based predictors. They �nd that with higher entropy the accuracy de-
creases, which is also supported by the results from [18]. This raises the topic
of how a typical Wi-Fi data trajectory looks in general. We discuss this �nding
further in section 3.2.
For the authors of [22], the best prediction performance can be achieved by di�erent
order Markov and LZ predictors. Surprisingly the lower order Markov predictors
showed the best performance overall when compared to more complex compression-
based methods, �nding that methods that rely on online learning need a long
trace length until they deliver reliable results. Libo Song et al. [22] achieve an
accuracy of 65%-75% for the median user, with the performance depending on
the length of the trace. They benchmark their methods on Wi-Fi data recorded
on the Darthmouth campus over several years for their analysis. Usually people
connected to a Wi-Fi network indoors or on a college campus will move for a
short amount of time through the network and then stay for a longer time in
one place, for example in a lecture room. In this case already a high percentage
10
2 STATE OF THE ART
of predictions would be accurate if either the last position or the most common
position is used as the prediction. This might explain the correlation the authors
�nd between entropy and prediction accuracy and we will further investigate this
theory in section 5.2.2.
Another method to increase the degrees of freedom of a Markov based model are
HMMs. These models are used for a wide array of tasks such as speech recognition
[23] and gene �nding [24]. Employing HMMs for movement prediction in buildings
has already been proposed by [25], however they apply their methods on data from
smart doorplates in an o�ce. Si et al. [26] applies the same principles to an example
in a cellular mobile network. They compare a prediction in a cellular mobile
system based on an order-1 Markov chain, order-2 Markov chain and a HMM.
The authors �nd that with all tested sequence lengths the HMM prediction was
the most accurate. A downside of HMM based prediction compared to a simpler
Markov prediction is nevertheless the computational complexity of the calculation
[26].
Contribution of this work
We focus on domain independent algorithms, because the real nature of future
small cell environments is not yet known, keeping the results as general as possible.
From literature summarized previously we conclude that for small cell scenarios
like Wi-Fi networks, simple Markov based schemes and HMM methods promise
the biggest potential. They are �exible models with low complexity and show good
results in similar research like [22] and [26].
We are generally not interested in predicting a single user's trajectory, but want
to predict where users move as a crowd in future timesteps. Regardless if we can
predict the path of single user accurately, we can sum over all users and get the
information in terms of APs. We contribute a benchmark for the di�erent state-
of-the-art methods, as well as investigating location prediction on a crowd level,
which may be more robust against random decisions of users.
As aggregated AP methods, we propose a prediction based on a Kalman �lter
and also compare the results to a ML approach. We hope to draw conclusions on
how much information is hidden in the data and can therefore be learned from the
11
2 STATE OF THE ART
data. Additionally we investigate results for several prediction steps, studying how
the prediction performance changes in case of predicting a longer timespan ahead,
something that is not done in the sources discussed. This introduces the possibility
of a situation, where measurement of users in the network is possible, but may not
be done too often to not interfere with normal network functions. We investigate
how these methods would perform, if they had measurements every m steps. This
approach o�ers trade-o� decision making between accuracy of prediction and load
on the network due to measuring.
12
3 TEST SCENARIOS AND EXPERIMENTAL SETUP
3 Test Scenarios and Experimental Setup
To evaluate the performance of di�erent prediction methods, a suitable test sce-
nario is necessary. Mobile networks are not yet equipped with the capabilities of a
C-RAN and data from mobile networks on a large scale is not freely available. We
hypothesize that for our research interest Wireless Local Area Networks (WLANs)
or Wi-Fi networks o�er a similar characteristic as small cells in a mobile network.
Another advantage is that many buildings already have routers or APs installed
and a lot of people use the technology when available.
We therefore consider two test scenarios to evaluate prediction algorithms: �rst, a
simulated scenario mimicking user movement through a building and second, data
from the WLAN network at Campus Guÿhaus, TU Wien.
3.1 Simulated Scenario
To benchmark whether or not the implemented algorithms work in a controlled
environment, we modelled a part of the building geometry in the agent based sim-
ulation tool Anylogic. This tool is used for a wide array of applications in industry
and research. It o�ers the possibility to simulate agent based or event based sys-
tems. An advantage of this software are the many prede�ned libraries ranging from
�uid- to rail-tra�c-models [27]. They also o�er a pedestrian library, which allows
straight forward modelling of pedestrians moving through environments. Anylogic
o�ers the possibility to combine a visual editor, to model buildings, supply chains
or ports, and an event and agent based programming interface.
Users
Where to enter?
Stg. 8
Stg. 10
Stg. 9What
destination?EI 1
EI 4
WaitWhere to
exit?Stg. 8
Stg. 10
Stg. 9 Exit
Figure 2: Flowchart showing the logic behind the Anylogic simulation. The enterand exit location is chosen randomly with pre-de�ned probabilities. Wait time isdistributed uniformly between 0.9 and 1 hour.
13
3.1 Simulated Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
Figure 3: Snapshot of a simulation run in Anylogic. The dots in di�erent colourson the map are agents currently moving to their destination EI 1.
As seen in �g. 3 for our purposes the �oor plan of Campus Guÿhaus was imported
into Anylogic, to ensure that the set up simulation is to scale to the real building.
The agents can walk between walls that are painted in orange. Three di�erent
entry points close to the stairs and destinations in the lecture rooms where de�ned,
visible in green in �g. 3. Each agent is represented as a coloured dot on the map.
The entry points are located in front of staircases and are named after the german
word �Stiege�. According to the �owchart in �g. 2 the users enter the system at
the de�ned entry points, walk to their designated goal, wait for a certain time and
then exit at one of the exit points. Agents enter at Stg.8 42% of the time and
at Stg.9 and Stg.10 each 29% of the time. They then walk to their designated
location according to the schedules in tables 1 and 2.
After a uniformly distributed time between 0.9h and 1h the agents leave the lecture
room. With a probability of 50% they exit the simulation at Stg.8. Alternatively,
with probability of 25% each, the agents disappear at Stg.9 or 10. This is supposed
14
3.1 Simulated Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
Start time End time Nr. of users entering8:45 9:00 1510:45 11:00 2512:45 13:00 5015:45 16:00 56
Table 1: Schedule of EI 1 for the Anylogic simulation.
Start time End time Nr. of users entering12:45 13:00 9017:45 18:15 45
Table 2: Schedule of EI 4 for the Anylogic simulation.
to simulate the behaviour of students during a weekday. In this setup we neglect
simulating people freely moving through the building.
The number of users measured over the simulation time is displayed in �g. 4. There
is a clear di�erence between routers in lecture rooms and hallways. As expected
the routers covering the lecture rooms show a constant number of people over a
longer period of time and the routers that only capture agents in hallways appear
as small variation on top of that. We will compare the results from the simulation
to the collected Wi-Fi data in section 3.2.
08:00 10:00 12:00 14:00 16:00 18:00
Time Mar 29, 2019
0
50
100
150
Nr.
of
ag
en
ts
Router 1
Router 2
Router 3
Router 4
Router 5
Router 6
Figure 4: Stacked number of agents measured in the Anylogic simulation at eachrouter over time.
15
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
3.2 Experimental Wi-Fi Scenario
In our test setup we monitored the WLAN network of Campus Guÿhaus on three
separate �oors, focusing around an area with many lecture rooms. This ensures
increased frequency of students and a certain weekly repeating schedule, because
of lectures in the rooms. The most frequented �oor can be seen in �g. 5, where
lecture rooms are marked with yellow and every green circle represents an AP.
Every student and employee of the university can connect to the WLAN network.
We measured data from the Wi-Fi network between April 2018 and February 2019.
Figure 5: Third �oor of Campus Guÿhaus, displaying APs [28].
In total we collected 4 883 603 data points, which includes 11 weeks measured com-
pletely, and some weeks, where only a few days have been measured. The system
was polled every 30 seconds. For each entry we get a timestamp, anonymized
client ID, AP MAC, AP name and AP ID. The client ID is anonymized before
we get access to the data, ensuring we cannot trace back the original ID of the
user. Example entries can be seen in table 3. Figure 6 shows how the APs are
connected in the building. A clear day of time e�ect in the number of users over a
day can be seen in the data. As an illustrative example 24h of measured data on
Monday 22th of January 2019 are plotted in �g. 7. Each colour represents a di�er-
ent AP and all APs are stacked on top of each other. This shows the total number
16
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
Table 3: Example entry of data measured in Campus Guÿhaus on the 23th ofApril.
Time Client ID AP MAC AP Name AP ID23.04.2018 12:38 rZOgnSxIQI. 0c:27:24:6d:4c:30 AP-CG04-1 1823.04.2018 12:38 VJRLEEt7RaY 0c:27:24:6d:4c:30 AP-CG04-1 1823.04.2018 12:38 zAoMt2IzbWo b4:a4:e3:b5:12:f0 AP-CF01-3 1523.04.2018 12:38 EsNv4vkbkak b4:a4:e3:b5:12:f0 AP-CF01-3 1523.04.2018 12:38 kf6xAgz.ZbQ d0:c7:89:0f:b5:a0 AP-CG02-1 12
Figure 6: Flow diagram of how APs are connected in Campus Guÿhaus.
of people measured in the system over time and the variation of the number of
users per AP during the day. There is almost no activity until 8am, then after
a relatively steep rise, the number of users decreases slightly at around 1pm and
again at 4pm. These times mark the end of lectures. After 6pm most users leave
the building. When comparing the data from the Anylogic simulation in �g. 4, the
Wi-Fi data in �g. 7 does not show distinct rectangles or other shapes that would
indicate a bigger group of users arriving at the same time as the simulated data
does. The real data seems noisier with a lot less regularity. There are many more
options on where users could go to and therefore there is more variation.
Libo Song et al. [22] �nds that their prediction method works worse, if the user
17
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
00:00 06:00 12:00 18:00 00:00
Jan 22, 2019
0
10
20
30
40
50
60
70
80
90
Nr.
of users
AP 1
AP 2
AP 3
AP 4
AP 5
AP 6
AP 7
AP 8
AP 9
AP 10
AP 11
AP 12
AP 13
AP 14
AP 15
AP 16
AP 17
Figure 7: Number of users over time per AP during Monday, 22nd of January2019, measured every 30 seconds at Campus Guÿhaus.
sequence is short. We therefore look at the distribution of length of user sequences
of our data. The length of a user sequence is equal to the length of the location
history H. In our setup one location is added to the sequence every 30 seconds, if
a device is connected to the Wi-Fi. The length distribution of the user sequences
can be seen in �g. 8a. Around 20% of all users have a sequence length smaller
than 40. Additionally in �g. 8b we can see that 40% of all users visit less than 5
APs and more than 70% have less than 10 APs. Hence, one trajectory of a user is
often not very diverse which may in�uence the prediction. On the one hand high
regularity makes it theoretically easier to estimate the next step, because there
is less uncertainty. Contrary, it also makes predictors, that are trained on real
data, prone to only predict if something stays the same and not able to deal with
a location change. This is also something we found in regards to the results pre-
sented in [22]. They show that for long traces they get a median accuracy of 72%.
However in our experiments the accuracy rapidly decreased, if we only looked at
the timesteps where the user changes location. We therefore conclude, that their
method is mainly good at predicting that a users stays in the AP, which still leads
to good results with data that �ts the model assumptions, but might lead to the
18
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
(a) Length distribution (b) APs Nr distribution
Figure 8: Distribution of sequence lengths and amount of AP visited per user.
wrong impression on the performance on semi static data.
Compared to a mobile network, where most people are connected constantly, it
can be assumed that only a certain share of the population has turned on Wi-Fi on
their devices at all times. To illustrate this, table 4 shows the percentage of users
who �rst log-in into the network at each AP. Even at APs that are not located at
entrances, the percentage of people appearing the �rst time in the system is not
zero. This con�rms that many users log into the system after they have entered
the building. A total of 6.69% of people are �rst registered in the system at the
AP CF02-3, located in lecture room EI4 on the second �oor of the building. To
get there users �rst have to walk through hallways and staircases where they pass
several other APs. However, it is likely that some users only turn on the Wi-Fi
on their mobile phone while they are sitting in the lecture room instead of having
it turned on at all times. There is also some variation over time to where users
are �rst registered into the system. Figure 9 shows the variation of entry prob-
ability for each AP for di�erent weeks. For APs like CG03-1 the variation over
several weeks is quite small compared to AP CF01-4, where the variation is more
signi�cant. This can be explained due to di�erent schedules of each day or even
week. The spreading of the entry probabilities decreases if only looking at data
from each day of the week.
19
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
Table 4: Mean percentage per AP of where users are �rst registered.
AP-CF02-4 4.42 % AP-CF02-7 6.37 %AP-CF02-3 6.69 % AP-CG02-1 2.06 %AP-CF02-10 5.44 % AP-CF01-4 8.36 %AP-CF02-5 5.15 % AP-CF01-1 2.96 %AP-CF02-11 7.83 % AP-CF01-3 14.88 %AP-CF02-6 10.92 % AP-CF01-2 1.69 %AP-CG03-1 1.02 % AP-CF01-5 7.42 %AP-CF02-2 6.18 % AP-CG04-1 1.26 %AP-CF02-1 6.05 %
Wi-Fi can have a range between 45 - 100m and even though the indoor range
might be a lot smaller, it is still considerably large compared to our test area [29].
Compared to mobile networks there are no regulations for WLAN in regards to
overlap of di�erent APs in the network, while in cellular networks there are strict
limits on how much cells can overlap and interfere outside of their border. The
cells in the investigated WLAN setup strongly overlap. In Wi-Fi there is also no
binding standard regarding handover between cells as is usual in mobile networks.
There are newer standards like IEEE 802.11r-2008 that cater to the needs of ap-
plications like Voice over IP (VoIP) and therefore o�er some controlled handover
policies [30].
If these new standards are used, depends on the device and not the network itself.
With new devices and WLAN generations the adoption of these protocols will
become more common. There is also a di�erence between indoor and outdoor
networks. WLAN usually is used indoors where a clear line of sight (LOS) is
less frequent compared to mobile networks outside, where there is usually LOS
additional to multipath propagation.
The problems resulting from the potential overlapping coverage and no handover
protocol for our application can be shown with the following example: During a
walk on the red path shown in �g. 10, assuming no overlapping of AP-ranges, a user
device would connect to CF02-11, CF02-5, CF02-3 and then CF02-10. However
in reality, with a normal walking speed, the phone is only connected to CF02-11
and then CF02-10. This happens because the connection to AP CF02-11 in the
hallway is still very good, even if other APs are physically closer.
20
3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP
AP-CF02
-4
AP-CF02
-3
AP-CF02
-10
AP-CF02
-5
AP-CF02
-11
AP-CF02
-6
AP-CG03
-1
AP-CF02
-2
AP-CF02
-1
AP-CF02
-7
AP-CG02
-1
AP-CF01
-4
AP-CF01
-1
AP-CF01
-3
AP-CF01
-2
AP-CF01
-5
AP-CG04
-1
0
5
10
15
20
25
30
35
40
45
en
try p
rob
ab
ility
[%
]
Figure 9: Variation of the probability of where users enter the system for ourexperimental setup.
Figure 10: Example path through Campus Guÿhaus [28].
These challenges would not appear in a mobile cell network, where consideration is
going into cell size and overlapping. In our experimental setup these issues cause
considerable noise. This potentially leads to a bigger problem if we want to utilize
the data to predict an exact location. If we are only interested in higher level local-
ization like cell ID, it might need to be taken into account in the modelling stage,
but will not automatically have a negative impact on the prediction performance.
21
4 MODELS AND PREDICTION METHODS
4 Models and Prediction Methods
Given our experimental setup we now are presented with a threefold problem.
First we need to identify a model to describe our system. Secondly we then look
at parameter training when necessary. Lastly we have to decide on prediction
methods that suit the data model.
If, for a dynamic system, parameters x1, . . . , xn exist, with the properties that
the outputs y1, . . . yq at any given time t are uniquely de�ned through the input
u1(τ), . . . , up(τ), on the interval t0 ≤ τ ≤ t, and the initial values x1(t0), . . . , xn(t0),
for any �xed t0, then x1, . . . , xn are called states of the system [31]. In our case
x(t) describes how many users are at an AP at time t.
We di�erentiate between two prediction approaches:
1. Based on individual user trajectories: The next location is predicted
for each user individually and x is determined by summing over all users
for each AP. We consider order-L Markov and HMM predictors for this
approach.
2. Based on aggregated users per AP: The aggregated number of users
per AP x is directly predicted for each time step. We evaluate the Kalman
and an ML based prediction method for this approach.
We also need to introduce an error measure to compare and analyse the di�erent
modelling methods. Because of its simplicity and favourable properties regarding
optimization we choose the MSE
MSE = E{‖x− x‖22
}≈ 1
N
N∑i=1
(xi − xi)2 (1)
as cost function that is minimized in order to �t the model to the data and later
compare prediction performance of di�erent algorithms. As outlined in section 2,
in the literature various performance metrics are used, which makes it hard to
compare results from di�erent authors. Because we are interested in the perfor-
mance on a crowd level, we choose to measure the performance in terms of all
users at an AP instead of other performance metrics that are single user based.
This also allows us to compare methods that predict the location for individual
22
4.1 Markov Model 4 MODELS AND PREDICTION METHODS
users and crowd based models that predict the number of people at each AP.
To model our system we will mainly work with probabilistic models because, even
though there are patterns for each day or week, there is a strong random character
to the movement of users through a network. Even if a student arrives for a
lecture every Monday morning they might not arrive at the exact same time each
Monday and choose di�erent paths to their destination. Probabilistic models o�er
a robustness against these kind of random variations.
In the following we will look into Markov models which are the basis for the
prediction order-L Markov predictor and HMMs. We then present a model to
motivate a Kalman �lter based prediction method and lastly introduce a common
machine learning technique as a benchmark.
4.1 Markov Model
Markov models or chains are a probabilistic way of modelling a sequence of ob-
servations often called timeseries. They have been used in a wide array of areas
such as speech coding and forecasting [32]. We �rst look at the general theoretical
background of Markov models.
With a probabilistic model the joint distribution p(v1:T ) is used to describe char-
acteristics of the data v1, . . . , vT . Given that our timeseries is causal we get the
model
p(v1:T ) =T∏t=1
p(vt|v1:t−1). (2)
The characteristic of a Markov chain is that only a certain number of past el-
ements in�uence the present. Therefore the distribution ful�ls the conditional
independence assumption
p(vt|v1:t−1) = p(vt|vt−L:t−1) (3)
which is often called Markov assumption, where L is called the order of the Markov
chain [32, 33]. A higher order L results in a more complex model.
23
4.1 Markov Model 4 MODELS AND PREDICTION METHODS
For an order-1 Markov chain the probability density function (pdf) reduces to
p(v1:T ) = p(v1)p(v2|v1)p(v3|v2) · · · = p(v1)T∏t=2
p(vt|vt−1). (4)
In the case of the transitions p(vt = si|vt−1 = sj) = f(si, sj) being time-independent
the Markov chain is called stationary [33]. This is a major restriction in our case,
because, as described in section 3.2, the nature of the movement changes over time.
Every stationary �nite-state Markov chain can be visualized by a directed graph
where each node represents a state. The arrows show if a transition between the
states is possible and if it is, how likely it is. A simple example is presented in
�g. 11. If vt ∈ {1, . . . , V } is discrete we can introduce a state transition matrix A
1 2
1− α 1− βα
β
Figure 11: Example of state transition diagram for an order-1 Markov chain [32,p. 590].
where each entry Aij = p(vt = j|vt−1 = i) describes the probability to move from
one location to the other [32]. The transition matrix for the example in �g. 11 is
A =
(1− α α
β 1− β
). (5)
The parameters α and β describe the probability of changing states. Because the
total probability of all possible transition options from each state must sum up
to one, the probability to stay in the state has to be 1 − α and 1 − β. This is
equivalent to the requirement that the sum of each row of A equals one. Markov
based predictors have the advantage that they work well for a small amount of
users in the system because they predict the next location based on one user
and not the whole system. They also only deliver integers and positive results,
which is something that cannot be guaranteed with other methods as described in
24
4.2 Order-L Markov Predictor 4 MODELS AND PREDICTION METHODS
section 4.5.
4.2 Order-L Markov Predictor
In Libo Song et al. [22] the authors present one method to utilize the Markov
assumption in eq. (3) for an individual user based next step prediction. It uses
low complexity calculations for the prediction and supports online training of the
parameters.
At each point in time the user location is represented by x ∈ A = (a1, . . . , aN). We
assume that we know the location historyHm = 〈x1 = ai, . . . xm = aj〉, where i, j ∈{1, . . . , N}, of each user up to the point of the estimation. Assuming stationarity,
the maximum of the probability of the next step xm+1, considering the history
Hm, gives us the maximum likelihood estimate for the prediction
xm+1 = argmaxa∈A
P (xm+1 = a|Hm). (6)
Because of the Markov assumption, only the last L entries are important for our es-
timation. We call the last L entries ofHm the context cm = 〈xm−L+1 = ai, . . . , xm = aj〉of Hm. The prediction from eq. (6) then simpli�es to
xm+1 = argmaxa∈A
P (xm+1 = a|cm). (7)
To estimate the probability P (xm+1 = a|cm) we calculate the ratio of how often
xm+1 = a followed c and how often c occurred in total. We de�ne N (a|c;Hm)
as the number of times the location a followed c in the location history Hm and
N (c;Hm) as the number of times the context c appeared in Hm. We then get the
estimate for the probability
P (xm+1 = a|cm) ≈ P (xm+1 = a|cm) =N (a|cm;Hm)
N (cm;Hm)(8)
which we insert in eq. (7) to predict the next location
xm+1 = argmaxa∈A
N (a|cm;Hm)
N (cm;Hm). (9)
25
4.2 Order-L Markov Predictor 4 MODELS AND PREDICTION METHODS
4.2.1 Implementation
Algorithm 1 described by [22] is a method to predict the location in the next
timestep and updates the transition matrix M and list of previous contexts cprev
online. At the beginning a prediction is often not possible if the current context
Algorithm 1: Order-L Markov Prediction
Input:- Current symbol xm- Transition matrix M- List of previous contexts cprev- Context cm- Context length LOutput:- Prediction xm+1
- Updated transition matrix Mnew
- Updated context cm+1
if length(cm) < L thencm+1 ←− add xm to cmxm+1 ←− NaN . Prediction is not possible
elseif cm is member of cprev then
Mnew(cm, xm)←−M(cm, xm) + 1else
cprev ←− add cm to cprevMnew(cm, xm) = 1
endcm+1 ←− add xm to cmcm+1 ←− delete earliest entry of cmif cm+1 is member of cprev then
xm+1 ←− maximum of Mnew(cm+1, :)else
xm+1 ←− NaN . Prediction is not possible
end
end
cm has not occurred before. The number of samples needed until a prediction is
possible for every potential context relies also on the context length L. For L = 1
with N di�erent locations, at least N samples are required until the prediction does
26
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
always deliver a prediction. For L > 1 we need at least NL entries to never miss
a result. Fortunately, in practical applications this number might be signi�cantly
lower, because not every context will actually appear.
4.3 Hidden Markov Model (HMM)
Another individual user dependent method based on a more complex model, with
more degrees of freedom, compared to the Markov chain described in section 4.1, is
the HMM. The HMM consists of hidden states ht ∈ {1, . . . , H} and the observed
variables vt ∈ {1, . . . , V }, which are connected according to an observation model
p(vt|ht) [32]. The di�erence between a normal Markov chain from section 4.1 and a
HMM can be seen in �g. 12. In Markov chain we can observe every state transition,
while in the HMM there is a hidden layer that in�uences the observations. This
leads to the joint distribution of hidden and visible states
p(h1:T , v1:T ) = p(v1|h1)p(h1)T∏t=2
p(vt|ht)p(ht|ht−1). (10)
Assuming the HMM is stationary, the transition probability can be represented
by a H ×H transition matrix A with A(i, j) = Aij = p(ht+1 = i|ht = j) and the
emission probability as a V × H emission matrix B with B(i, j) = Bij = p(vt =
i|ht = j) [33, 32].
4.3.1 Inference in Hidden Markov Models (HMMs)
There are several classic inference tasks possible with a HMM. Usually the goal
is to infer the hidden state sequence, which cannot be observed directly, assuming
all other parameters are known [32]. We can distinguish between several di�erent
types of inference, for example
� Filtering: Calculating p(ht|v1:t) online, which reduces noise.
� Smoothing: Calculating p(ht|v1:u) where t < u.
� Prediction: Calculating p(ht+k|v1:t) or p(vt+k|v1:t) where k > 0 de�nes theprediction horizon.
� MAP estimationFor our application we will work with the prediction methods for HMMs.
27
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
v1 v2 v3
(a) Order-1 Markov chain
h1 h2 h3
v1 v2 v3
(b) Order-1 hidden Markov chain
Figure 12: Development in time for both a Markov chain and a HMM [33]. Forboth models only vi are observable.
4.3.2 Prediction in Hidden Markov Models (HMMs)
For the prediction we �rst want to know the probability of a hidden state ht given
the data v1:t−1. We calculate this considering the joint distribution
p(ht, v1:t−1) =∑ht−1
p(ht, ht−1, v1:t−1) (11a)
=∑ht−1
p(ht|ht−1, v1:t−1)p(ht−1, v1:t−1) (11b)
=∑ht−1
p(ht|ht−1)p(ht−1, v1:t−1) (11c)
and therefore we get
p(ht|v1:t−1) =∑ht−1
p(ht|ht−1)p(ht−1|v1:t−1). (12)
To decide which hidden state is the most likely in the next timestep, we need to
maximize the posterior probability over all possible states ht
h∗t = argmaxht
p(ht|v1:t−1). (13)
The conditional probability for a visible state at time t, given a sequence of visible
states v1, . . . , vt−1, is determined by summing over all hidden states
p(vt|v1:t−1) =∑ht
p(vt|ht)p(ht|v1:t−1) (14)
28
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
and we therefore get the estimate for vt as
v∗t = argmaxvt
p(vt|v1:t−1) = argmaxvt
{∑ht
p(vt|ht)p(ht|v1:t−1)
}. (15)
Forward algorithm
To e�ciently calculate eq. (11) we select the forward algorithm or α-recursion from
literature [33]. We de�ne
α(ht) := p(ht, v1:t) (16)
and rewrite the joint probability in eq. (11c) as
p(ht, v1:t−1) =∑ht−1
p(ht|ht−1)α(ht−1). (17)
We calculate α(ht) with the recursion
α(ht) = p(vt|ht)∑ht−1
p(ht|ht−1)α(ht−1) (18)
with α(h1) = p(h1, v1) = p(v1|h1)p(h1).In the recursion to calculate α(ht) we often multiply values that are smaller than
one with each other. This makes the values even smaller and causes numerical
stability problems. In the implementation we therefore apply the logarithm to
all values. This requires changing the calculation of the sum and product. We
introduce numerically more stable functions to do so in algorithm 2 and algorithm 3
[34].
29
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Algorithm 2: Logarithm multiplication functions [34]
Function x * y:if x or y is equal to NaN then
z = NaNelse
z = x+ yendreturn z
end
Algorithm 3: Logarithm addition functions [34]
Function x ⊕ y:if isnan(x) or isnan(y) then
if isnan(x) thenz = y
elsez = x
end
elseif x > y then
z = x+ eln(1 + exp(y − x))else
z = y + eln(1 + exp(x− y))end
endreturn z
end
Implementation of Hidden Markov Model (HMM) prediction
The steps for the HMM prediction are outlined in algorithm 4. We �rst calculate
the α-values according to algorithm 5 and initialize x as
x(1) = p(hT ) = α(hT ). (19)
We assume that the number of hidden and visible states equals N . For each step
k that we want to predict, we determine the conditional probability for the hidden
states ht
P(:, k) = p(ht|v1:t−1) = Ax. (20)
30
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
To utilize the logarithm multiplication and addition from algorithms 2 and 3 we
rewrite eq. (20) element wise as
P(j, k) =N∑i=1
λij (21)
with
λij = A(i, j)x(i) (22)
for i, j ∈ {1, . . . , N}. This determines the new initial value for the next prediction
step
x(k) =
P(1, k)
...
P(N, k)
(23)
and the prediction for the kth step
x∗(k) = maxvt
(p(vt|v1:t−1)) = max (Bx(k)) . (24)
Because the values for the probability are always smaller than one, the multi-
plication of several probabilities can cause numerical problems. Identical to the
calculation of α before we again apply the logarithm to each value and work with
the numerically more stable functions stated in algorithms 2 and 3 [34].
31
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Algorithm 4: HMM k-step prediction
Input:- Transition matrix A- Emission matrix B- Trainingsdata d, T = length(d)- Initial State π
- Number of steps to predict K- Number of States NOutput:- Most probable State x∗ as a K-length vector- Conditional probabilities P as an N ×K matrix
→ Calculate αlog = fwd(d,π) . defined in Algorithm 5
x = αlog(T, :)for k = 1:K do
for j = 1:N dofor i = 1:N do
λij = x(i) ∗ ln(Aij) . with * defined in algorithm 2
P(j, k) = P(j, k)⊕ λij . with ⊕ defined in algorithm 3
end
endx = P(:, k)x∗(k) = max(Bx)
end
32
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Algorithm 5: Alpha recursion
Input:- Transition matrix A- Emission matrix B- Trainingsdata d, T = length(d)- Initial states π- Number of states NOutput:- Logarithm of α values αlog as a T ×N -matrix
for i = 1:N doαlog(1, i) = ln(π(i)) ∗ ln(B(i,d1)) . with ∗ defined in
algorithm 2
endfor t = 2 : T do
for j = 1 : N doαtemp = NaNfor i = 1 : N do
αtemp = αtemp ⊕ αlog(t− 1, i) ∗ ln(Aij) . with ∗ and ⊕defined in algorithms 2 and 3
endαlog(t, j) = αtemp ∗ ln (B(j,dt)) . with ∗ defined in
algorithm 2
end
end
33
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Backward prediction algorithm
To later train the parameters of the HMM we also need the backward prediction
algorithm or β-recursion [33]. We de�ne
β(ht) := p(vt+1:T |ht) (25)
with which we can calculate
p(vt:T |ht−1) =∑ht
p(vt, vt+1:T , ht|ht−1) (26)
=∑ht
p(vt|vt+1:T , ht, ht−1)p(vt+1:T , ht|ht−1) (27)
=∑ht
p(vt|ht)p(vt+1:T , ht|ht−1) (28)
=∑ht
p(vt|ht)p(vt+1:T |ht, ht−1)p(ht|ht−1) (29)
=∑ht
p(vt|ht)p(vt+1:T |ht)p(ht|ht−1). (30)
We then get
β(ht−1) =∑ht
p(vt|ht)β(ht)p(ht|ht−1). (31)
Combining both α- and β-recursion gives us the estimation for the posterior prob-
ability for a hidden state given a sequence of T visible states
p(ht|v1:T ) := γ(ht) =α(ht)β(ht)∑htα(ht)β(ht)
. (32)
34
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Algorithm 6: Beta recursion
Input:- Transition matrix A- Emission matrix B- Trainingsdata d- Number of States N- T = length(d)Output:-βlog is a T ×N -matrix
→ Initialize βlog with 0for t = (T − 1) : −1 : 1 do
for i = 1 : N doβtemp = NaNfor j = 1 : N do
βtemp = βtemp ⊕ [ln(Aij) ∗ ln(B(j,dt+1)) ∗ βlog(t+ 1, j)] . with
∗ and ⊕ defined in algorithms 2 and 3
endβlog(t, i) = βtemp
end
end
4.3.3 Training the Parameters of HMMs
Since there is no analytical way of calculating the parameters of a HMM we have to
employ a statistical method [35]. To estimate the parameters we apply the Baum-
Welch algorithm, which is a variant of the expectation maximisation (EM) method
[33]. The algorithm locally maximizes the posterior probability p(v1:T |A,B,π) re-lying on the observation data v1:T . Here v1:T are the visible states, A is the
transition matrix, B is the emission matrix and π = γ(h1) [26].
The method itself is not numerically stable and therefore some measures to deal
with very small numbers need to be taken. We choose an approach illustrated by
several authors, like Barber [33] and P. Murphy [32], which is to apply logarithms
to all parameters. Additionally the function for addition and multiplication de�ned
in algorithm 2 and algorithm 3 are used.
35
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
For the parameter training we �rst need to determine α and β from algorithm 5
and algorithm 6 for each user. We then calculate γ from eq. (32). This also gives
us πi = γ(h1), which is the estimated probability to start in state i [34]. Utilizing
the previous A and B, ξ can be determined as
ξt(i, j) = p(ht = i, ht+1 = j|v1:T ) (33a)
=αt(j)A
(0)ij B
(0)j (vt+1)βt+1(j)∑N
k=1
∑Nl=1 αt(k)A
(0)kl B
(0)l (vt+1)βt+1(l)
(33b)
We now get the entries for the matrix A as
Aij =
∑T−1t=1 ξt(i, j)∑T−1t=1 γt(i)
(34)
and matrix B as
Bj(k) =
∑t:vt=vk
γt(j)∑Tt=1 γt(j)
. (35)
The training for the parameters of the HMM based method is outlined in algo-
rithm 7.
36
4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS
Algorithm 7: Training of HMM Parameters [34, 35]
Input:- Training data V- Initial emission matrix A(0)
- Initial transition matrix B(0)
- Initial starting state B(0)
- Number of states NOutput:- Transition matrix A- Emission matrix B- π
while ‖A−A(0)‖ > ε doforeach user in V do
v1:T ←− Vuser
α = fwd(v1:T ,A,B,π) . defined (def.) in Algorithm 5
β = bwd(v1:T ,A,B,π) . def. in Algorithm 6
→ Calc. γt(i) ∀t and ∀i ∈ N def. in eq. (32).
→ Calc. ξt(i, j) ∀t and ∀i, j ∈ N def. in eq. (33)
→ Calculate πi
for i = 1 : N doπ(i) = γ1(i)
end→ Calc. each entry Aij of A def. in eq. (34)
→ Calc. each entry Bj(k) of B def. in eq. (35)
end
A(0) = A
B(0) = B
π(0) = π
end
37
4.4 State Space Model 4 MODELS AND PREDICTION METHODS
4.4 State Space Model
Switching from predicting an individual user location to predicting crowd move-
ment, we now describe the state of the system with the aggregated sum of users per
AP. This leads us a linear state space model as an alternative modelling approach
to a simple Markov chain or HMM. We again postulate a Markov assumption,
but now assume that the state only depends on the timestep before. We illustrate
this idea with a simple example with three states and an �output/input state� o
as seen in �g. 13. This could represent a three cell network where users can move
freely between cells and leave or enter the system via state o.
x1 x2
y1 y2
z1 z2
o1 o2
fx(x1 → x2)
fy(x1 → y2)
fz(x1 → z2)
fo(x1 → o2)
Figure 13: Trellis diagram to motivate a linear state space model to model themovement between cells.
For the proposed example we can write the equation for timestep 1 to timestep 2
as
x2 = fx(x1 → x2) + gx(y1 → x2) + hx(z1 → x2) + kx(o1 → x2) (36a)
y2 = fy(x1 → y2) + gy(y1 → y2) + hy(z1 → y2) + ky(o1 → y2) (36b)
z2 = fz(x1 → z2) + gz(y1 → z2) + hz(z1 → z2) + kz(o1 → z2). (36c)
We now assume that the transition function is linear, fx(x1 → x2, t) = p(x1 →
38
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
x2)x1, resulting in a transition modelxk+1
yk+1
zk+1
=
p(xk → xk+1) p(yk → xk+1) p(zk → xk+1)
p(xk → yk+1) p(yk → yk+1) p(zk → yk+1)
p(xk → zk+1) p(yk → zk+1) p(zk → zk+1)
xkykzk
+
p(ok → xk+1)
p(ok → yk+1)
p(ok → zk+1)
uk. (37)
Entering and exiting the system is now modelled with the input uk. This leads us
to the linear state space model
xk+1 = Akxk +Bkuk (38)
where the state xk describes how many users are at the APs at a certain point in
time. Here Ak is the transition probability matrix and Bk de�nes the weights for
the input uk.
A linear model is the basis for many control methods. Our particular task is to
predict the next state xk+1 minimizing the MSE. A well known estimator for such
a prediction task is the Kalman �lter, which we introduce in the next Section.
4.5 Kalman Filter
Assuming the system we want to characterize can be described by a linear state
space model,
xk+1 = Axk +Buk (39)
yk = Cxk +Duk, (40)
estimating the future state in the next timestep is equal to minimizing the er-
ror between the estimated state xk+1 and the real state xk+1. The easiest state
39
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
estimator is a simulator,
xk+1 = Axk +Buk (41)
yk = Cxk +Duk (42)
which comes down to copying the system and then calculating the next step based
on the linear model. This trivial state estimator does not utilize measurements
available that might improve the prediction. It therefore only works if the system
is described well by the linear model and no noise is present [31].
A better state estimator is the Luenberger observer
xk+1 = Axk +Buk + K (yk − yk) (43)
yk = Cxk +Duk. (44)
It adds the observation error eobs = yk − yk weighted by a constant K to the
prediction of the simulator [31]. If K is chosen correctly the new estimate xk+1 is
improved by counteracting the estimation error eest = ‖xk+1 − xk+1‖.
System
State Estimator
u+
Disturbance
y
x
Figure 14: Block diagram showing the relationship between state estimator andsystem.
Now the open questions remains how to optimally choose K. Utilizing the perfor-
mance metric introduced in eq. (1) and including the assumptions
E {vk} = 0 E{vkv
Tj
}= Rδkj (45a)
E {wk} = 0 E{wkw
Tj
}= Qδkj (45b)
E{wkv
Tj
}= 0 (45c)
40
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
about measurement noise vk and driving noise wk, the optimal solution for K
is given by the Kalman �lter [36]. It combines sequential linear minimum mean
square error (LMMSE) estimation and a linear state space model of the signal [37].
Modelling our network scenario
For our purposes we assume a linear, possibly time varying model
xk+1 = Akxk +Bkuk +Gkwk x(0) = x0 (46a)
yk = Ckxk +Dkuk +Hkwk + vk (46b)
where x ∈ Rn is the state, u ∈ Rp is the input, y ∈ Rq is the output, w ∈ Rr is
the driving noise and v ∈ Rq is the measurement noise.
For the Kalman �lter to be optimal in terms of MSE we need several assumptions
to be ful�lled. First we assume v and w are zero mean, white and stationary as
de�ned in eq. (45) with Q ≥ 0, R ≥ 0 and
δkj =
1, k = j
0, k 6= j.(47)
We also assume the expectation of the initial value m0 = E{x0} and the initial
error covariance P0 = E{(x0 − x0)(x0 − x0)T} ≥ 0. Additionally the noise and
state are considered to be uncorrelated E{wkxT0 } = 0 and E{vlx
T0 } = 0.
The Kalman �lter
The Kalman �lter is a recursive formulation of sequential an LMMSE estimation
[37]. It consists of three steps: calculating the Kalman gain matrix, updating the
estimate and then determining the new error covariance [36].
First, the Kalman gain matrix
Kk = AkP(k|k − 1)CTk
(CkP(k|k − 1)CT
k +HkQHk +R)−1
(48)
is calculated. Then the state is updated by combining the state space model and
41
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
the di�erence of the measured output yk and the output from the model
x(k + 1|k) = Akx(k|k − 1) +Bkuk +Kk(yk −Ckx(k|k − 1)−Dkuk). (49)
Lastly the new error covariance is determined
P(k + 1|k) = AkP(k|k − 1)Ak +GkQGTk −KkCkP(k|k − 1). (50)
If the system matrices A, B, C, H and the noise covariance matrices H and R
are not time dependent, the Kalman gain K and the error covariance P can be
precomputed before knowing any measurements yk. If they vary over time, they
have to be calculated at each step, including an inversion of an r × r-matrix. De-
pending on the number of states in the system, this might increase computational
time signi�cantly because the complexity of a matrix inversion grows with O(n3).
The recursion is either initialized with prior knowledge of x0 = m0 and P0 or, if
there is no information about the initial state or error covariance, we set x0 = 0
and P0 = αE with α� 1 [36].
The error behaviour of the �lter can be in�uenced by choosing the covariance
matrices Q and R. R can be seen as con�dence in the sensors. We can therefore
set the entry in the main diagonal of R proportional to our con�dence in the
measurement. For Q no such idea can be found and therefore it is commonly set
by trial and error [36].
4.5.1 Kalman Filter Implementation
We implemented the Kalman Filter prediction method in Matlab, see appendix A.2
for the code. In the following we will outline the most important design decisions.
Prediction implementation
We know that the number of people at each AP cannot be smaller than zero and
therefore add an additional step to ensure that our estimation always has a posi-
tive result.
42
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
There are two methods suggested by Gupta and Hauser [38] to account for in-
equality constraints when working with a Kalman �lter:
1. Project the unconstrained estimate xk+1 = x(k+1|k) into a space that ful�lsthe constraints. This can be written as the optimization problem
x(p)k+1 = argmin
x
{(x− xk+1)
T(x− xk+1)}
(51a)
s.t. Cx ≤ d. (51b)
2. Restrict the optimal Kalman gain to only allow solutions that �t the con-
straints. This results in the optimization problem
K(p)k = arg min
K∈Rn×mtrace
{(I−KCk)P(k|k − 1)(I−KCk)
T +KRKT}(52a)
s.t. C (xk+1 +Kkνk) ≤ d (52b)
where νk is the measurement residual νk = yk −Ckx(k|k − 1)−Dkuk.
In the implementation summarized in algorithm 8 we �rst calculate the standard
Kalman estimate and then modify the result to ful�l the constraint x ≥ 0 by
solving the minimization problem outlined in eq. (51). For the optimization we
apply the Matlab function fmincon.
43
4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS
Algorithm 8: Kalman �lter prediction
Input:- New measurement y- Estimate xk
- Error covariance Pk
- System parameters A,B,C,D,G,H,R,Q- Input uk:k+h
- Number of steps to predict hOutput:- Estimates xk+1:k+h
- ErrorCovariance Pk+h
for i = 1 : h doK = APk+i−1C
T(CPk+i−1CT +HQHT +R)−1
xk+i = Axk+i−1 +Buk +K(y −Cxk+i−1 −Duk)Pk+i = APk+i−1A
T +GQGT −KCPk+i−1AT
if xk+i < 0 then
xk+i ←− calculate x(p)k+i according to eq. (51) with C = −I and
d = 0
end
end
Training implementation
For the Kalman �lter prediction method we primarily need to train the parameters
A and B. For A we estimate the probability of a transition between states with
the sample mean of each transition from the AP changes in the individual user
sequences.
To estimate B we also utilize the individual user sequences. We sum over all �rst
and last locations of each user and then normalize the resulting vector. This gives
us the probabilities of where users appear Bin and where users disappear Bout in
the system. Depending on the sign of u we later work with Bin or Bout as an input
for the Kalman �lter described in algorithm 8.
In order to minimize the number of iterations, we calculate all parameters A, Bin
and Bout in one function as outlined in algorithm 9.
Q and R were determined by optimizing for an minimum MSE using the fmincon
function in Matlab.
44
4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS
Algorithm 9: Calculation of transition matrix A for Kalman �lter pre-dictionInput:- Training Data Vusers
Output:- Transition Matrix A- Input transition vector Bin
- Output transition vector Bout
foreach v1:T ∈ Vusers doBin(v1)+ = 1Bout(vT )+ = 1for k = 1 : length(v1:T ) do
TransRow = vkTransColumn = vk+1
A(TransRow, TransColumn)+ = 1end
endNormalize Bin and Bout
Normalize columns of A
4.6 Machine Learning and Neural Networks
To compare the algorithms described in previous sections to a purely data-driven
method, we now introduce the basics of machine learning (ML) and the popular
variation neural network (NN). It also o�ers an AP based prediction approach,
directly inferring the number of user for every AP for each prediction step.
We give an overview on ML, describe the structure of a NN and summarize the
main points of our NN based prediction implementation.
4.6.1 A short Machine Learning Overview
ML is a general term for prediction, estimation or classi�cation methods that
do not rely on an underlying data model. P. Murphy [32] de�nes ML as �a
set of methods that can automatically detect patterns in data, and then use the
uncovered patterns to predict future data, or to perform other kinds of decision
making under uncertainty[...].�
45
4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS
As summarized by [32], we can generally di�erentiate between two types of ML.
1. Predictive learning or supervised learning: The goal is to learn a
mapping of inputs x to outputs y given a labelled set of input-output pairs
which are called training set. Here x can be scalar, a vector or more complex
data like an image, sentence or graph. The types of problem where y is from
a �nite set (e.g. y ∈ {red, blue, green}) are called classi�cation or pattern
recognition tasks, in contrast to regression tasks where y is real valued.
2. Descriptive or unsupervised learning: With this type of application the
goal is to �nd important patterns in data.
Generally we can de�ne ML problems as function �tting problems [39]. We assume
that y can be estimated the sum of selected basis functions φj(x)
y(w,x) = w0 +M−1∑j=1
wkφj(x) (53)
or with φ1(x) = 1 as
y(w,x) =M∑j=1
wkφj(x). (54)
If φj(x) are nonlinear functions the model for y itself is nonlinear, but the estima-
tion is linear in regards to the M parameters wk.
4.6.2 De�nition of Neural Network
There are di�erent approaches on how to choose the basis functions φj(x). The
neural network (NN) method is the most common ML method, which uses a �xed
number of basis functions and only adapts the weights w during training. Figure 15
shows an example of a NN with three layers. Each layer is connected with weights.
In NNs φj(x) are nonlinear activation function h(·) applied to linear functions of
x [39]. Often used activation functions are sigmoidal functions such as logistic
sigmoid or tanh [39].
46
4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS
N1(1)
N2(1)
N3(1)
N1(2)
N2(2)
N3(2)
N4(2)
N1(3)
N2(3)
x1
x3
x2
w11(1)
w21(1)
w31(1)w41
(1)
w11(2)
w21(2)
Hidden layer
Output layer
Input layer
y1
y2
Figure 15: Exemplary graph for a neural network.
In each layer the weights and the input of the layer are used to calculate the
activations aj for j = 1, . . . ,M . The coe�cients aj can be described as M linear
transformations of the input variables x = (x1, . . . , xD)T
aj =D∑i=1
w(1)ji xi + w
(1)j0 (55)
=D∑i=0
w(1)ji xi with x0 = 1. (56)
The index (1) indicates that it is the �rst layer of the network. The variables w(1)ji
are weights and w(1)j0 are often called biases [39].
In a neural network the corresponding outputs zj of basis functions φj(x) are often
written as function of aj evaluating the activation function h(·)
zj = h(aj). (57)
For the second, and in our case last layer, zj are linearly combined to give the
47
4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS
output unit activation
ak =M∑j=1
w(2)kj zj + w
(2)k0 . (58)
The outputs are calculated as yk = σ(ak) with appropriate activation function σ(·).For regression no last activation function is required and therefore yk = ak [39].
Combining eq. (55) and eq. (58), the calculation of a neural network for regression
results in
yk(x,w) =M∑j=0
w(2)kj h
(D∑i=0
w(1)ji xi
). (59)
NNs can be displayed by a directed graph as seen in �g. 15. This example shows
a NN with three input variables, one hidden layer with four nodes and an output
layer with two nodes6.
The number of hidden layers and number of nodes in each hidden layer can be
chosen to enhance the performance of the NN. The goal of training a NN is equal
to �nding the optimal weights w in terms of an error function E(w). If the weights
are determined successfully, eq. (59) gives us the results of the prediction. NNs
are always trained on a training set and their performance is evaluated with a test
set. The test set consists of new data that was not known during the training
phase. During training the NN is evaluated and then the weights adjusted in a
way that minimizes the chosen error function. If the characteristic of the training
data can be mapped perfectly by the NN, the system is called over�tted. In
this case the performance of the test set decreases, because the NN does not
represent the essential properties of the data in general, but only the characteristic
of the trainings set. To prevent over�tting methods like dropout, which temporarily
removes a certain percentage of neurons randomly in each training iteration, are
used [40].
There are numerous training methods for neural networks and speci�c optimization
algorithms are therefore outside the scope of this work. We selected the later
implemented methods according to state-of-the-art [39].
6For this examples the dimension are D = 3, M = 4 and K = 2.
48
4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS
4.6.3 Practical Aspects of the Neural Network Prediction Method
The NN was implemented with the well known deep learning framework Keras
in Python [41]. It o�ers an easy to use, �exible and e�cient implementation of
the most common ML methods. We work with a standard con�guration of a two
layer neural network with 256 neurons each. The optimal number of neurons is
determined by an optimization using the python package Hyperopt by [42]. For the
activation function we choose relu for the two hidden layers, because it is a very
successful and widely used activation function [43]. It is de�ned as element-wise
maximum of x
relu(x) = max(x, 0). (60)
During training we apply dropout in each layer to prevent over�tting [40]. The
number of nodes in the output layer is proportional to then number of states and
prediction steps. For optimization we choose the algorithm Adam, with the MSE
as a loss function, because it is a computationally e�cient method for problems
with many parameters or large datasets [44].
49
5 PERFORMANCE ANALYSIS
5 Performance Analysis
For the Wi-Fi data from our test scenario we cannot know the ground truth. There-
fore we �rst investigate several di�erent theoretical data models to benchmark the
prediction methods.
The scenarios we look at are setups where we assume to be able to measure the
current number of users at each AP, however, due to the measurement creating
load on the system, we want to minimize the amount of measurements that we
have to take while still knowing how many users are at the APs at each point in
time. To do so we �rst investigate theoretical data which o�ers the possibility to
test sensitivity of parameters to noise on the prediction performance. Here we test
the algorithms for di�erent prediction horizons or prediction steps. A prediction
horizon of p, or p prediction steps implies we assume that we know every pth state
of the system, x1:p:k, and predict xk+1:k+p for k = 1, . . . , T − p.
5.1 Theoretical Data
To test our prediction methods we set up two di�erent theoretical examples. The
�rst scenario generates data on an AP basis with a state space model, giving us the
opportunity to investigate how increasing prediction horizon and parameter error
in�uence the performance of the Kalman predictor. The second set of data consists
of individual user sequences generated by a HMM. This models the randomness
with an underlying statistical model that is also visible in our collected Wi-Fi data.
It o�ers a �rst benchmark of order-L Markov, HMM and the Kalman predictor.
5.1.1 Data generated by State Space Model
We generate a time series with six states, where each state represents an AP. The
movement diagram, including the enter and exit possibility of users represented
by state o are displayed in �g. 16. Users may only enter the system in state 1 and
leave the system in state 5.
50
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
1 5
4
3
2
6
o
Figure 16: State diagram for the theoretical state space model example.
We model this system with the state space equations
xk+1 = Axk +B(uk)uk (61a)
yk = Cxk +Duk. (61b)
In this model we are free to de�ne any transition matrix. The transition matrix
we choose is
A =
0.5 0 0 0 0 0
0.5 0.25 0.5 0.5 0 0
0 0.25 0.5 0 0 0
0 0.25 0 0.5 0 0
0 0.25 0 0 0.5 0.5
0 0 0 0 0.5 0.5
. (62)
Apart from the de�ned movement between the states, according to the de�nition of
A in eq. (62), it is also possible that users remain in a state for several timesteps,
because the entries in the main diagonal are not zero. The diagram in �g. 16
implies that B depends on the sign of uk, because users can only leave and enter
the system in certain states. This results in
B =
(1 0 0 0 0 0
)T, uk > 0(
0 0 0 0 1 0)T
, uk ≤ 0. (63)
51
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
We select the matrices C, H, G, Q, R to be identity matrices, set D to zero and
therefore reduce the model from eqs. (61a) and (61b) to
xk+1 = Axk +B(uk)uk (64a)
yk = xk, (64b)
and set the initial state as x0 = 0.
As an input function we choose
uk = sin
(k − 60
100
)+ sin
(k
500
)(65)
to simulate a rise and fall of users arriving over a day. The resulting number of users
over timesteps can be seen in �g. 17. In the plot it is visible that occasionally the
number of users is below zero. This type of model does not guarantee a positive
result, which makes adaptions to prediction methods that are based on a state
space model necessary. Our chosen modi�cations are outlined in section 4.5.1.
0 500 1000 1500 2000 2500 3000 3500
Timesteps
-0.2
0
0.2
0.4
0.6
0.8
1
Nr.
of users
Router 1
Router 2
Router 3
Router 4
Router 5
Router 6
Figure 17: Number of users at each AP over time generated by the state spacemodel example described in section 5.1.1.
52
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
Kalman Filter Prediction
We employ the data generated by the state space model to analyse the stability
of the Kalman �lter prediction in terms of parameter noise and prediction hori-
zon. Due to our precise knowledge of the parameters, the prediction is perfect no
matter how long the prediction horizon gets. However if we introduce noise to the
parameter A, like in �g. 187, the MSE increases with larger prediction horizons.
Noisy parameters occur due to measurement errors or wrong models. Figure 18
also shows that the minimal MSE grows slower than the maximal MSE.
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940
Prediction Horizon
0
1
2
3
4
5
6
MS
E
10-4
Figure 18: Variation of MSE for increasing prediction horizon and additive uniformnoise on A with variance 10−2.
Another factor that a�ects the performance of real world prediction is the sensitiv-
ity of the predictor regarding parameter estimation. For our theoretical example
we simulated noise on either parameter A or B with a variance between 0.01 and
1 for di�erent prediction horizons. As seen in �g. 19, noise on di�erent parameters
in�uences the performance of the prediction in di�erent ways. For both, A and
B, noise has the least in�uence for the lowest prediction horizon. There is almost
no impact on the MSE for noise on B with the prediction horizon one, while an
error of A increases the MSE signi�cantly.
7The boxplot of 100 realizations shows the median MSE with the red line. The lower limit of
the blue box is de�ned by the median of all values under the overall median and the upper limit
is de�ned by the median of all values above the median MSE. The whiskers are maximally 1.5
times the interquartile range long and therefore represent ±2.7σ or 99.3% of data if the values
were normally distributed. The red crosses display outliers outside of that whisker length.
53
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
0 0.2 0.4 0.6 0.8 1
Noise variance
0
1
2
3
4
5
6
7
MS
E
10-3
B | 1
B | 20
B | 40
A | 1
A | 20
A | 40
Parameter with Noise | Prediction Horizon
Figure 19: Results for MSE over noise variance for di�erent prediction horizonsand noisy parameters A and B for the state space model example.
The steepness of the increase in MSE also di�ers between the two parameters. For
A the error increases steeper in the beginning and for B the increase in MSE starts
slowly but then follows an almost exponential growth.
In conclusion, the Kalman �lter prediction might be very sensitive to errors when
choosing the parameters. It also shows that more resources should be put into
determining A in comparison to estimating B.
5.1.2 Data generated by Hidden Markov Model (HMM)
For an initial benchmark of the order-L Markov, HMM and Kalman predictor, we
generate individual user sequences with an HMM. The generated data, as seen
in �g. 20, shows similar fast �uctuations as the Wi-Fi data in �g. 7, giving us
insight into the behaviour of the prediction methods when dealing with potential
randomness in user movement. The HMM example has six states and the transition
matrix
A =
0.882 0.029 0 0 0 0
0.027 0.838 0.027 0.027 0 0
0 0.029 0.886 0 0 0
0 0.029 0 0.857 0.029 0
0 0.116 0 0 0.698 0.116
0 0 0 0 0.029 0.882
(66)
54
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
0 50 100 150 200 250 300 350 400 450 500
Timesteps
0
0.5
1
1.5
2
2.5
Nr.
of users
Router 1
Router 2
Router 3
Router 4
Router 5
Router 6
Figure 20: Number of users over time stacked for each AP for the theoretical HMMdata.
and the emission matrix
B =
0.75 0.25 0 0 0 0
0.2 0.6 0.2 0 0 0
0 0.2 0.6 0.2 0 0
0 0 0.2 0.6 0.2 0
0 0 0 0.2 0.6 0.2
0 0 0 0 0.25 0.75
. (67)
Because in a real world scenario not all users would have the same sequence length8,
we vary the trace length for each user with a uniform distribution between 0 and
500. We then choose a start timestep, so that the sequence ends latest at 500
timesteps.
Prediction with order-L Markov predictor without parameter noise
The order-L Markov predictor does not need any pre-trained parameters for its
prediction. Therefore there is no concern with noise in the parameters. However if
the model is not ful�lling the assumptions presented in section 4.1, there is no way
to tailor variables to the problem. It also creates the problem of a convergence
8Sequence length is de�ned as the length of the user's movement history.
55
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
phase in the beginning of the prediction, where the algorithm has not seen enough
data to make a prediction.
0 2 4 6 8 10 12 14 16 18 20
Prediction Horizon
0
0.05
0.1
0.15M
SE
Order 1
Order 2
Order 3
Order 4
Order 5
Order 6
Figure 21: Results for Markov predictors with di�erent order L for the HMMexample data.
For the Markov predictor the order L, representing the length of context that in-
�uences the next step, could be important in terms of prediction accuracy. For the
HMM example we compare di�erent orders of Markov predictors in �g. 21. For one
or two prediction steps the order-2 predictor has a lower MSE, however for higher
prediction horizons the order-3 predictor shows better results. Interestingly, the
order-1 Markov predictor has the worse performance for higher prediction horizons
together with the order-6 predictor. This shows that increasing the context length
increases MSE if the additionally considered data is not relevant to the prediction.
Consequently, the right order of Markov assumption needs to be determined in
order for this approach to work well.
Prediction with HMM based predictor without parameter noise
With a certain probability a user either moves to one AP or the other, therefore the
prediction of the theoretical HMM data with maximum likelihood is not perfect,
even just for one prediction step.
For a higher prediction horizon this error propagates and increases each time an
error has been made in the previous prediction. This behaviour is displayed in
56
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
�g. 22a, where without any noise added, the MSE increases with larger prediction
horizons.
Prediction with HMM based predictor with parameter noise
If even without parameter noise the larger prediction horizon leads to an MSE
increase, we now want to answer the question of what happens if the parameters
are not known exactly. To do so, we look at the HMM example data again and
add uniformly distributed noise to the transition matrix A with varying variance.
0 2 4 6 8 10 12 14 16 18 20
Prediction Horizon
0.011
0.012
0.013
0.014
0.015
0.016
0.017
MS
E
(a) without noise
0 2 4 6 8 10 12 14 16 18 20
Prediction Horizon
0
0.05
0.1
0.15
0.2
0.25
MS
E0.00
0.05
0.10
0.15
0.20
0.25
Noise variance
(b) uniform noise added with di�erent vari-
ances to transition matrix A
Figure 22: MSE results for HMM theoretical example data applying the HMMbased prediction.
The results in �g. 22b con�rm the MSE increases with higher noise variance. The
plot also shows that the MSE for all noise variances for a prediction horizon lower
than 5 does not increase signi�cantly when compared to the rapid MSE change
for over 5 steps of prediction. This suggests that even with a high parameter noise
up to 5 prediction steps are possible without a decrease of prediction performance.
Based on this observation we assume that there is a limit for how many prediction
steps with the HMM method are possible before the MSE increases abruptly.
57
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
Prediction with Kalman �lter
As compared to the prediction with a HMM based method, the Kalman Filter
estimate stays more constant over prediction horizon. In �g. 22b the MSE increases
rapidly after a certain prediction horizon. Compared to the MSE increase in
�g. 19, the error in �g. 23 does not signi�cantly increase, even for higher prediction
horizons. This suggests that even though we found in section 5.1.1 that a noise on
A creates a steep increase of the MSE, the error does not seem to propagate as
fast as for the HMM prediction if the data is generated by a HMM.
0 2 4 6 8 10 12 14 16 18 20
Prediction Horizon
0.005
0.01
0.015
0.02
0.025
0.03
0.035
MS
E
0.00
0.05
0.10
0.15
0.20
0.25
Noise variance
Figure 23: Results for the HMM theoretical data applying the Kalman �lter pre-dictor.
Comparison of prediction methods for data generated by HMM
To compare the di�erent prediction methods we look at MSE of all three methods
considering increasing prediction horizon and noise variance. The results in �g. 24
show that HMM and Kalman �lter display a very similar behaviour for no noise
over increasing prediction horizons. However, as soon as we introduce parameter
noise, the Kalman �lter prediction generally outperforms the HMM estimate. We
also compare the results to the order-3 Markov prediction, because it was showing
the best results in our previous analysis from �g. 21. For prediction lengths over
10, the Markov based predictor delivers better results than the HMM method,
but the Kalman based approach always results in a lower MSE. This leads us to
the conclusion that the Kalman �lter approach is very promising in providing a
58
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
0 2 4 6 8 10 12 14 16 18 20
Prediction Horizon
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
MS
E
HMM | 0.00
HMM | 0.10
HMM | 0.25
Kalman | 0.00
Kalman | 0.10
Kalman | 0.25
Order-3 Markov
Pred. Method | Noise variance
Figure 24: Comparison of prediction performance for di�erent prediction methodsand parameter noise variances for the HMM theoretical example data.
low MSE prediction, especially for a long prediction horizon. It delivers relatively
constant results for increasing prediction horizon and is stable when parameter
noise is added.
Table 5: MSE for di�erent prediction methods for prediction horizon 1, 5 and20. The results for Kalman and HMM based predictions are displayed for noisevariance 0.1.
Order-3 Markov Kalman HMMPrediction Horizon 1 2.14 · 10−2 1.08 · 10−2 1.48 · 10−2Prediction Horizon 5 3.75 · 10−2 1.77 · 10−2 1.93 · 10−2Prediction Horizon 20 4.75 · 10−2 2.23 · 10−2 14.20 · 10−2
As summarized in table 5, the MSE for a one step Kalman �lter prediction increase
by 2.18 · 10−3 and for a �ve step prediction by 1.50 · 10−2. Figure 25 depicts
the sum of users of AP 1 and the prediction results for a one step prediction
from all three methods9. The original data has many fast changes that are not
represented by the Kalman estimate, however the smoothed curve that the Kalman
�lter approach delivers seems to be the best estimate overall. This property might
9Examples of other APs can be seen in appendix A.5
59
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
0 100 200 300 400 500
Timesteps
0
5
10
15
20
25
30
35
Nr.
of users
AP Nr. 1
Original Data
HMM
Order-3 Markov
Kalman
Figure 25: Results for a one step prediction for HMM, Order-3 Markov and Kalman�lter prediction
be a disadvantage if the number of users �uctuates and the increase or decrease
needs to be predicted.
5.1.3 Data generated by Agent based Simulation
The data generated by the Anylogic simulation o�ers the possibility to benchmark
the Kalman �lter approach in regards to fast user �uctuations missing in the pre-
vious HMM model data. The number of users �uctuate very abruptly and the
prediction needs to follow with a small error. We will �rst describe the Anylogic
simulation and how the parameters for the Kalman predictor were chosen. We
then present results for the prediction performance for di�erent prediction hori-
zons.
During the simulation Anylogic records how many people are in the de�ned router
areas in every simulated timestep. Because we do not have traces for single users
from the Anylogic simulation, we �rst need to set the parameters for the predictor.
We initialize the parameters according to the �oor plan resulting in the values for
A and B as stated in eq. (68) and eq. (69) in appendix A.3. We then use the
Matlab function fmincon iteratively, changing the new optimal value if it delivers
60
5.1 Theoretical Data 5 PERFORMANCE ANALYSIS
0 100 200 300 400 500 600 700
Timesteps
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09AP Nr. 1
Estimate
Data
(a)
0 100 200 300 400 500 600 700
Timesteps
0
0.2
0.4
0.6
0.8
1
1.2AP Nr. 2
Estimate
Data
(b)
Figure 26: Plot of normalized data from the Anylogic simulation and its Kalmanprediction for 5 prediction steps.
better results than the past values in order to �nd the optimized parameters. We
do this until the norm of the old MSE minus the new MSE is smaller than the
threshold 10−5. The results for the matrices from this optimization are stated in
eqs. (70) to (72) in appendix A.3.
For �ve prediction steps we get an MSE of 2.203 · 10−3 and the results for one
AP are displayed in �g. 26. The Kalman �lter does not deal ideally with sudden
steep rises. As seen in �g. 26a, sometimes the estimates cuts o� peaks in the
data. This appears to happen especially if the peaks are small compared to data
at other APs. At AP Nr. 2 the total amount of users is higher and the predictor
overestimates the values of the peaks. This di�erence in dynamic is in�uenced by
the parameters of the predictor. While choosing the parameters there is a trade-o�
between dynamically reacting to fast �uctuations and also being able to predict
when the number of users at an AP is not changing. This shows that the Kalman
method can react to sudden changes in the amount of users, but if the parameters
are not set in an optimal way it will over or underestimate the �uctuations. This
points to a practical problem in choosing the parameters.
61
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
The Anylogic example also gives us the possibility to investigate how the MSE
for the Kalman �lter approach changes with increasing prediction horizon. As
�g. 27 shows, this increase in MSE has a linear trend.
0 5 10 15 20 25 30 35 40
Prediction Horizon
-2
0
2
4
6
8
10
12
14
16M
SE
10-3
Figure 27: Development of MSE with the Kalman �lter prediction for increasingprediction horizon applied on the Anylogic data.
5.2 Wi-Fi Data
For the performance comparison with our collected Wi-Fi data we de�ne one week
of data for training and one week of test data to measure mean prediction error.
We will �rst determine the best order for the Markov predictor and then benchmark
it against the HMM, Kalman and ML prediction methods. We also determined
results for a predictor that always repeats the last step for the length of the pre-
diction horizon to analyse if the predictor can outperform a trivial predictor. We
call this predictor use last step (ULS).
Training parameters of di�erent models with Wi-Fi data
We calculate transition and input matrix for the Kalman �lter method and de�ne
the input function by �tting a spline curve to the summed data of the training
week. We also estimate the transition and emission matrix with the training data
for our HMM prediction method. The order-L prediction method does not need
62
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
any training because it learns all parameters online. Utilizing training data to help
initialize the method did not improve the result.
5.2.1 Order-L Markov Predictor
Because we have seen in section 5.1.2 that the order of the Markov predictor
can play a signi�cant role in the performance of the method, we �rst test several
variations of L in order to determine which delivers the best results in terms of
MSE. Figure 28 shows the results for the Markov predictor for the Wi-Fi data
with di�erent context lengths. For one prediction step the results for all orders are
very close and for higher prediction steps the lower order predictors outperform the
higher order methods. Because order-1 delivers slightly better results for prediction
horizons 2, 3 and 5 and is the least complex, we choose order-1 for later evaluations.
1 2 3 4 5
Prediction Horizon
0
0.5
1
1.5
2
2.5
3
3.5
4
MS
E
10-3
1
2
3
4
5
Order
Figure 28: MSE results for di�erent order Markov predictors applied on the mea-sured Wi-Fi data.
63
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
5.2.2 Benchmarking of di�erent Prediction Methods
We now benchmark the results for the prediction of number of users per AP for
the order-1 Markov, Kalman, HMM, ML and ULS predictor.
In a �rst step we analyse MSE for di�erent prediction horizons. Figure 29 shows
that for a one step prediction the order-1 Markov, ULS and HMM method result
in an almost identical MSE which is lower than Kalman �lter result. For longer
prediction horizons the HMM and Kalman approach outperform the the order-1
and ULS method. This con�rms that most users in the Wi-Fi network stay in one
location for at least a certain period of time, making the last location a very good
predictor for one step. Only for longer prediction horizons we can see the bene�t
of working with a model predicting movement.
1 2 3 4 5 6 7 8 9 10
Prediction Horizon
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
MS
E
10-3
HMM
Kalman Filter
Order-1 Markov
ULS
Pred. Method
Figure 29: MSE for HMM, Kalman, order-1 and ULS prediction for increasingprediction horizon using the Wi-Fi data from Campus Guÿhaus.
A di�erence between the trends of order-1 Markov, HMM, Kalman, ML and ULS
can be examined in �g. 30b. The ML, order-1 Markov, ULS and HMM curve
follow the data almost perfectly. The Kalman �lter estimate seems to follow the
data a step behind and sometimes resembles a moving average result. Figure 31
shows the results for the same dataset, but with the prediction results for a �ve
64
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
step prediction, which equals a time of 2.5 minutes. Figure 31a reveals that for 5
steps the order-1 Markov and ULS method do not predict the trend of the data
any more and the performance drops drastically. In Figure 31b we can see that
the ML, Kalman and HMM approach follow the general trend of the data, but
deliver worse results than for the one step prediction.
Table 6: Comparison of MSE for 1 and 5 prediction steps for the recorded Wi-Fidata.
Prediction Horizon 1 Prediction Horizon 5ULS 0.64 · 10−4 26.92 · 10−4
Order-1 Markov 0.67 · 10−4 19.73 · 10−4HMM 0.76 · 10−4 1.95 · 10−4Kalman 4.35 · 10−4 4.81 · 10−4ML 0.19 · 10−4 1.1 · 10−4
Finally, we summarize our results in table 6 for prediction horizon one and �ve.
The ML approach results in the lowest MSE. The ULS method delivers the second
best result for prediction horizon one, however the MSE increases signi�cantly for
the 5 step prediction. The order-1 predictor displays a similar behaviour. The
Kalman �lter approach results in the least MSE increase when comparing results
from one and 5 step predictions, while the HMM approach delivers the second best
MSE with only being 1.7 times higher than the results for ML.
65
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
(a)
(b)
Figure 30: Normalized number of users over time for AP-CF02-7 for a one stepprediction for the Wi-Fi data from TU Wien for the 26th of November 2018.
66
5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS
07:4
1
08:0
9
08:3
4
08:5
9
09:2
4
09:4
9
10:1
4
10:3
9
11:0
4
11:2
9
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Nr.
of
use
rs
Data
Order-1 Markov
HMM
ULS
Kalman
ML
(a)
08:1
9
08:2
2
08:2
4
08:2
7
08:2
9
08:3
2
08:3
4
08:3
7
08:3
9
08:4
2
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Nr.
of
use
rs
Data
Order-1 Markov
HMM
Kalman
ML
(b)
Figure 31: Normalized number of users over time for AP-CF02-7 for a �ve stepprediction for the Wi-Fi data from TU Wien for the 26th of November 2018.
67
6 DISCUSSION
6 Discussion
Our objective was to benchmark di�erent predictions techniques estimating how
many user will be where in future timesteps. This could be useful for future
networks that employ small cell network architecture with spatially distributed
antenna and processing units. We analyse the prediction methods for di�erent
prediction horizons, evaluating the limits in terms of predictable timespan. This
o�ers us the possibility to adopt the number of prediction steps within a margin
of error as a feasibility design parameter in future applications. We select MSE on
an AP level as performance metric, because it enables us to benchmark individual
user and AP based methods in terms of crowd prediction accuracy.
Benchmark results for theoretical data
A �rst review of prediction performance with data generated by a state space model
allows us to understand how a parameters estimation error increases the MSE for
the Kalman �lter predictor. Figure 18 shows that even though it might be obvious
that for more prediction steps the error increases as well, it is important to notice
that an error in the parameters with the same magnitude can have a varied e�ect
by prediction horizon 40. While the variation of MSE for one prediction step is
small, by the prediction horizon 40 the MSE can vary between 10 and almost 16
times the MSE from one step. Each parameter is analysed, as it a�ects the result
di�erently. This allows us insight revealing where we may spend e�ort during
parameter training and helps us understand real world performance issues. For
the particular Kalman method, the parameter A is more sensitive to an error than
parameter B.
Data generated from a more complex statistical model, based on a HMM, o�ers
the opportunity to model random �uctuations that we encounter in the Wi-Fi
trace collected from the WLAN network at TU Wien. The HMM data still cannot
reproduce the temporal evolution found in the Wi-Fi data, but shows how the
prediction methods perform when applied on data with an underlying random
model. For the created test vectors from the HMM, the Kalman �lter predictor
outperforms the HMM and order-L Markov based methods. The investigation of
di�erent orders of Markov predictors identi�es that for this HMM data the order-3
68
6 DISCUSSION
predictor results in the lowest MSE. The HMM predictor is less robust against
parameter noise and the propagation of error with increasing prediction horizons
than the order-L Markov and the Kalman method. Without parameter noise the
HMM predictor delivers only slightly higher MSE values for prediction steps 1-
20, outperforming the order-3 Markov predictor. With added parameter noise,
the MSE for the HMM predictor is higher than the MSE of the order-3 Markov
predictor for a prediction horizon over 8.
To model the temporal time of day e�ects of a network we generate agent based
simulation data with the software Anylogic. It simulates a scaled down scenario
with less lecture rooms, no Wi-Fi noise as described in section 3.2 and only users
that walk from pre-de�ned points of entry to designated lecture rooms.
This simulation enables us to examine how the Kalman method performs on data
that is created with a more realistic movement pattern than with the HMM data.
Due to the linear nature of the Kalman predictor it over or underestimates fast
�uctuations. This points to a trade-o� in selecting the parameters of the Kalman
predictor in terms of dynamic or static MSE performance.
Benchmark results for Wi-Fi data
As a last step we compare the prediction methods with the Wi-Fi data that we
collected at TU Wien. This scenario is closely linked to a small cell deployment
as introduced it in section 1. Here we also cross evaluate the introduced methods
with a purely data driven prediction, a neural network based machine learning
predictor, and a trivial predictor always using the last location as the predicted
location we named ULS. We �nd that the machine learning approach delivers the
best results with an MSE of 0.9 · 10−4 for a one step prediction and 1.1 · 10−4 for a�ve step prediction. The second best predictors, with almost identical results, for
a one step prediction are the order-1 Markov, ULS and HMM predictor with an
approximately 3.5 times higher MSE. For the �ve step prediction the MSE for the
order-1 predictor increases to 19.73 · 10−4, almost 18 times higher than the result
for the ML method for the same number of prediction steps. The ULS predictor
has an around 25 times higher MSE than the ML approach for 5 prediction steps,
while the Kalman �lter results in a 4 times and the HMM predictor in a 1.7 times
69
6 DISCUSSION
higher MSE than ML.
The HMM predictor has the second best performance. However, due to the adap-
tions to the algorithm required to solve the numerical issues during training and
prediction this method also takes the longest time for each prediction, making it
not suited for larger scale applications unless a more e�cient implementation is
found. This attests that the model based methods considered can capture some
dynamics of the system, however the machine learning approach delivered the best
results overall.
Implementation considerations
Prediction accuracy alone is not su�cient to determine the best prediction method
for a use case in a mobile network. We need to consider time required for a
prediction, computational complexity and storage scalability too.
Individual user based methods, like HMM and order-L predictors, have the disad-
vantage that they need trajectories for each user during training and prediction.
The network is required to store these trajectories, while AP based methods, like
Kalman and ML predictor, rely on the number of active users per AP as an input
for the prediction, reducing the memory and signalling resources needed to acquire
the data. The AP methods, Kalman �lter and ML, are also faster than the user
based methods, because they predict the total number of users for each AP in one
iteration.
The e�ort required for each prediction is another important factor It depends on
the number of APs and the number of users in the network.
The most time consuming step for the order-L Markov predictor is searching
through all previous contexts for each users. For the HMM prediction several
matrix multiplications with the dimension of the number of APs N have to be
calculated for each user. Each prediction with a Kalman predictor takes an inver-
sion of a N -dimensional matrix and some matrix multiplications with the same
dimensions. The complexity for each prediction for the neural network scales with
the number of layers and depending on the activation function is proportional to
the number of neurons in each layer.
Assuming the number of users is signi�cantly higher than the number of APs, this
70
6 DISCUSSION
makes the Kalman and ML predictor better at scaling with an increasing network
size.
The order-L Markov predictor is the only method not requiring any data for train-
ing, however this compromises the prediction accuracy in the initial phase. Espe-
cially in a network with a high number of APs the time for this initialization phase
gets long, because the amount of data the order-L Markov predictor requires until
every context was part of the training data is proportional to the number of APs
and the context length.
The HMM and Kalman predictors both require a similar amount of user individ-
ual sequences for the training. From our experimental results the training for the
Kalman predictor is signi�cantly faster. The ML approach needs only training
data on AP level, but as it purely relies on �nding the characteristic patterns
from the data, it might require more data than a model based approach, like the
Kalman predictor.
In our scenario a purely data driven ML prediction has the lowest MSE. Con-
trary to the ML method, adding or removing an AP does not make it necessary
to completely retrain the parameters of the Kalman predictor. Depending on
the application, for longer prediction horizons, the Kalman �lter may o�er a low
complexity solution to the prediction problem, where the parameters characterise
known parts of the system. Here, further research into time dependent variations
or online updating of parameters may o�er further inside into performance for
di�erent applications. Depending on the data the HMM predictor o�ers a great
performance for systems with a low number of users. As the comparison with ULS
shows, more complex models only deliver better results for longer prediction times.
In future work adaptions for the order-L Markov predictor for more than one step
could be investigated.
71
7 CONCLUSION
7 Conclusion
This research aimed to benchmark di�erent location prediction methods for po-
tential application in small cell environments and investigate the limits of crowd
mobility prediction.
The order-L Markov, HMM, Kalman and ML predictor all have their theoretical
advantages and we illustrate that each methods performs very di�erently depend-
ing on the data that is used. Generally the simple Markov chain based prediction
yields impressive results for its simplicity. However this method proved to perform
not better than a trivial estimator, resulting in the same performance as a predic-
tor just repeating the previous step. This raises the question on the predictability
of the data itself. If users mostly stay in one location, every trained predictor
will get biased towards this location. This also might be a considerable di�erence
between indoor and outdoor data, which needs to be further analysed.
Based on practical Wi-Fi data we conclude that the purely data driven ML method
results in the lowest MSE overall. The HMM based method delivered the second
lowest MSE, however with its numerical instability during training and estimation,
a more e�cient way for the prediction is necessary to make it feasible for real world
applications.
Generally per AP aggregated user based techniques o�er a more memory e�cient
and faster way to predict the number of users at each AP in future time steps
than individual user based methods. Model based approaches, like the Kalman
predictor, are easier to adapt to a new network architecture, as they do not require
complete retraining of parameters if e.g. an AP is added or removed. It therefore
presents a low complexity predictor while being easily scalable with the number
of users. Additionally results from both the theoretical and practical analysis
show that this predictor works well for increasing prediction horizon. Here future
research could address time dependent parameters and an e�cient ways to train
them to further improve results. In terms of training the ML method requires
more complex training techniques than the approach we applied for the Kalman
�lter, possibly making it less usable for increasing network sizes.
Generally the suggested model based approaches were apparently not able to fully
capture the characteristic of the data, as they were outperformed by the model-
72
7 CONCLUSION
less ML approach. This either calls for a more complex model, or shows that a
general model like the one of a NN is the best at realizing the underlying complex
movement pattern in a small cell, indoor environment.
We also conclude that the type of performance measure is important. Accuracy
metrics based on individual users suggested in the literature are not applicable for
crowd based applications. MSE o�ers a good indicator on how well the predictions
match the data, it might however fail as an performance indicator if an application
depends on under or over estimating being avoided.
Improvements could be made for all methods, however we think that the crowd
based methods shows greater usage in a network with small cells. Prediction
methods for aggregated user data scale better, o�er higher accuracy on a global
scale, have the additional bene�t of securing the privacy of the individual users
while making remembering trajectories after the training phase unnecessary.
In the future di�erent ML techniques that incorporate more external knowledge
like time of day, weekday/weekend or even weather for outside networks could
be considered. As training is the most computationally complex part of an ML
predictor, e�cient training methods need to be found to handle large amounts of
data from long sequences.
73
8 REFERENCES
8 References
[1] Cisco. Cisco visual networking index (VNI) global mobile data tra�c forecast
update, 2017-2022. Tech. rep. 2019. url: http://www.gsma.com/spectrum/
wp-content/uploads/2013/03/Cisco%7B%5C_%7DVNI-global-mobile-
data-traffic-forecast-update.pdf.
[2] Ericsson. Ericsson Mobility Report Q4 Update February 2019. Tech. rep.
2018. url: www.ericsson.com/mobility-report.
[3] NGMN Alliance. 5G White Paper. Tech. rep. 2015, pp. 1�125.
[4] 3rd Generation Partnership Project (3GPP). TS 123 501 - V15.3.0 - 5G;
System Architecture for the 5G System (3GPP TS 23.501 version 15.3.0 Re-
lease 15). 2018. url: https://portal.etsi.org/TB/ETSIDeliverableStatus.
aspx.
[5] Gianfranco Nencioni et al. �Orchestration and Control in Software-De�ned
5G Networks: Research Challenges�. In: Wireless Communications and Mo-
bile Computing 2018 (2018), pp. 1�18. issn: 1530-8669. doi: 10.1155/2018/
6923867.
[6] Chih-Lin I et al. �Recent Progress on C-RAN Centralization and Cloudi-
�cation�. In: IEEE Access 2 (2014), pp. 1030�1039. issn: 2169-3536. doi:
10.1109/ACCESS.2014.2351411. url: http://ieeexplore.ieee.org/
document/6882182/.
[7] Aleksandra Checko et al. �Cloud RAN for Mobile Networks�A Technol-
ogy Overview�. In: IEEE Communications Surveys & Tutorials 17.1 (2015),
pp. 405�426. issn: 1553-877X. doi: 10.1109/COMST.2014.2355255. url:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=
6897914.
[8] Christine Cheng, Ravi Jain, and Eric van der Berg. �Location Prediction
Algorithms for Mobile Wireless Systems�. In: Handbook of Wireless Internet.
CRC Press, 2003. Chap. 11, pp. 245�261. isbn: 0849315026.
74
8 REFERENCES
[9] James O Malley. Here's What TfL Learned From Tracking Your Phone On
the Tube. 2017. url: http://www.gizmodo.co.uk/2017/02/heres-what-
tfl-learned-from-tracking-your-phone-on-the-tube/.
[10] Marc-olivier Killijian, Sebastien Gambs, and Miguel Nunez del Prado Cortez.
�Show Me How You Move and I Will Tell You Who You Are�. In: Transac-
tions on Data Privacy 4 (2011), pp. 103�126.
[11] Dietmar Bauer et al. �Quasi-Dynamic Estimation of OD Flows From Tra�c
Counts Without Prior OD Matrix�. In: IEEE Transactions on Intelligent
Transportation Systems 19.6 (June 2018), pp. 2025�2034. issn: 1524-9050.
doi: 10.1109/TITS.2017.2741528. url: https://ieeexplore.ieee.org/
document/8032480/.
[12] Sharminda Bera and K. V.Krishna Rao. �Estimation of origin-destination
matrix from tra�c counts: The state of the art�. In: European Transport -
Trasporti Europei 49 (2011), pp. 3�23.
[13] Fabio Pinelli et al. �Data-Driven Transit Network Design From Mobile Phone
Trajectories�. In: IEEE Transactions on Intelligent Transportation Systems
17.6 (June 2016), pp. 1724�1733. issn: 1524-9050. doi: 10.1109/TITS.
2015.2496783. url: http://ieeexplore.ieee.org/document/7471487/.
[14] Anselmo Ramalho Pitombeira Neto, Francisco Moraes Oliveira Neto, and
Carlos Felipe Grangeiro Loureiro. �Statistical models for the estimation of
the origin-destination matrix from tra�c counts�. In: Transportes 25 (Dec.
2017), pp. 1�13. issn: 2237-1346. doi: 10.14295/transportes.v25i4.1344.
url: https://www.revistatransportes.org.br/anpet/article/view/
1344.
[15] Jacob Ziv and Abraham Lempel. �Compression of lndividual Sequences via
Variable-Rate Coding�. In: IEEE transactions on Information Theory 24.5
(1978), pp. 530�536.
[16] George Liu and Gerald Maguire. �A class of mobile motion prediction al-
gorithms for wireless mobile computing and communications�. In: Mobile
Networks and Applications 1.2 (June 1996), pp. 113�121. issn: 1383-469X.
75
8 REFERENCES
doi: 10.1007/BF01193332. url: http://link.springer.com/10.1007/
BF01193332.
[17] J Chan, S Zhou, and A Seneviratne. �A QoS adaptive mobility predic-
tion scheme for wireless networks�. In: IEEE GLOBECOM 1998 (Cat. NO.
98CH36250). Vol. 3. IEEE, 1998, pp. 1414�1419. isbn: 0-7803-4984-9. doi:
10.1109/GLOCOM.1998.776573. url: http://ieeexplore.ieee.org/
document/776573/.
[18] Yihang Cheng, Yuanyuan Qiao, and Jie Yang. �An improved Markov method
for prediction of user mobility�. In: 2016 12th International Conference on
Network and Service Management (CNSM). IEEE, Oct. 2016, pp. 394�399.
isbn: 9783901882852. doi: 10 . 1109 / CNSM . 2016 . 7818454. url: http :
//ieeexplore.ieee.org/document/7818454/.
[19] Alicia Rodriguez-Carrion et al. �Study of LZ-Based Location Prediction and
Its Application to Transportation Recommender Systems�. In: Sensors 12.6
(June 2012), pp. 7496�7517. issn: 1424-8220. doi: 10.3390/s120607496.
url: http://www.mdpi.com/1424-8220/12/6/7496.
[20] Libo Song. �Evaluating Mobility Predictors in Wireless Networks for Improv-
ing Hando� and Opportunistic Routing�. PhD thesis. Dartmouth College,
2008, pp. 1�200. url: http://www.cs.dartmouth.edu/reports/TR2008-
611.pdf.
[21] Amnir Hadachi et al. �Cell phone subscribers mobility prediction using en-
hanced Markov Chain algorithm�. In: IEEE Intelligent Vehicles Symposium,
Proceedings June 2016 (2014), pp. 1049�1054. doi: 10.1109/IVS.2014.
6856442.
[22] Libo Song et al. �Evaluating Next-Cell Predictors with Extensive Wi-Fi Mo-
bility Data�. In: IEEE Transactions on Mobile Computing 5.12 (Dec. 2006),
pp. 1633�1649. issn: 1536-1233. doi: 10.1109/TMC.2006.185. url: http:
//ieeexplore.ieee.org/document/1717434/.
[23] Patrick Kenny, Matthew Lennig, and Paul Mermelstein. �A Linear Predictive
HMM for Vector-Valued Observations with Applications to Speech Recog-
76
8 REFERENCES
nition�. In: IEEE Transactions on Acoustics, Speech, and Signal Processing
38.2 (1990), pp. 220�225. issn: 00963518. doi: 10.1109/29.103057.
[24] Simon L. Cawley and Lior Pachter. �HMM sampling and applications to gene
�nding and alternative splicing�. In: Bioinformatics 19.Suppl. 2 (Sept. 2003),
pp. ii36�ii41. issn: 1367-4803. doi: 10.1093/bioinformatics/btg1057.
url: https://academic.oup.com/bioinformatics/article-lookup/
doi/10.1093/bioinformatics/btg1057.
[25] Arpad Gellert and Lucian Vintan. �Person Movement Prediction Using Hid-
den Markov Models�. In: Studies in Informatics and Control 15.1 (2006),
pp. 17�30. url: http://webspace.ulbsibiu.ro/arpad.gellert/html/
SIC%7B%5C_%7DHMM.pdf.
[26] Hongbo Si et al. �Mobility Prediction in Cellular Network Using Hidden
Markov Model�. In: 2010 7th IEEE Consumer Communications and Net-
working Conference (2010), pp. 1�5. doi: 10.1109/CCNC.2010.5421684.
url: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=5421684.
[27] The Anylogic Company. Anylogic. 2019. url: https://www.anylogic.de/.
[28] Gebäude und Technik TU Wien. GUT Grundrisspläne. 2019. url: https:
/ / www . gut . tuwien . ac . at / wir % 7B % 5C _ %7Dfuer % 7B % 5C _ %7Dsie /
immobilienmanagement/grundrisse%7B%5C_%7Dobjekte/.
[29] Sanjeev Dhawan. �Analogy of Promising Wireless Technologies on Di�er-
ent Frequencies: Bluetooth, WiFi, and WiMAX�. In: The 2nd International
Conference on Wireless Broadband and Ultra Wideband Communications
(AusWireless 2007). AusWireless. IEEE, Aug. 2007, pp. 14�14. isbn: 0-7695-
2842-2. doi: 10.1109/AUSWIRELESS.2007.27. url: http://ieeexplore.
ieee.org/document/4299663/.
[30] C/LM - LAN/MAN Standards Committee. �IEEE Standard for Information
technology� Local and metropolitan area networks� Speci�c requirements�
Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Speci�cations Amendment 2: Fast Basic Service Set (BSS) Transi-
tion�. In: IEEE Std 802.11r-2008 (Amendment to IEEE Std 802.11-2007
77
8 REFERENCES
as amended by IEEE Std 802.11k-2008) (2008), pp. 1�126. doi: 10.1109/
IEEESTD.2008.4573292. url: https://standards.ieee.org/standard/
802%7B%5C_%7D11r-2008.html%7B%5C#%7DStandard.
[31] Andreas Kugi. Automatisierung. 2019. url: https://www.acin.tuwien.
ac.at/file/teaching/bachelor/automatisierung/Gesamtskriptum.
pdf.
[32] Kevin P. Murphy.Machine Learning: A Probabilistic Perspective. 1991, pp. 73�
78, 216�244. isbn: 9780262018029. doi: 10 . 1007 / SpringerReference _
35834. arXiv: 0 - 387 - 31073 - 8. url: http : / / link . springer . com /
chapter/10.1007/978-94-011-3532-0%7B%5C_%7D2.
[33] David Barber. Bayesian Reasoning and Machine Learning. Cambridge: Cam-
bridge University Press, 2011, p. 646. isbn: 9780511804779. doi: 10.1017/
CBO9780511804779. arXiv: arXiv:1011.1669v3. url: http://ebooks.
cambridge.org/ref/id/CBO9780511804779.
[34] Tobias P Mann. �Numerically Stable Hidden Markov Model Implementa-
tion�. In: An HMM scaling tutorial (2006), pp. 1�8. url: http://bozeman.
genome.washington.edu/compbio/mbt599%7B%5C_%7D2006/hmm%7B%5C_
%7Dscaling%7B%5C_%7Drevised.pdf.
[35] L.R. Rabiner. �A tutorial on hidden Markov models and selected applications
in speech recognition�. In: Proceedings of the IEEE 77.2 (1989), pp. 257�286.
issn: 00189219. doi: 10.1109/5.18626. arXiv: arXiv:1011.1669v3. url:
http://ieeexplore.ieee.org/document/18626/.
[36] Wolfgang Kemmetmüller and Andreas Kugi. Regelungssysteme 1. 2018. url:
https://www.acin.tuwien.ac.at/master/regelungssysteme-2/.
[37] Franz Hlawatsch. Parameter estimation Methods. 2016, pp. 1�116.
[38] Nachi Gupta and Raphael Hauser. Kalman Filtering with Equality and In-
equality State Constraints. Tech. rep. Oxford University Computing Labora-
tory, Sept. 2007. arXiv: 0709.2791. url: http://arxiv.org/abs/0709.
2791.
78
8 REFERENCES
[39] Christopher M. Bishop. Pattern Recognition and Machine Learning (Infor-
mation Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006.
isbn: 0-387-31073-8.
[40] Nitish Srivastava et al. �Dropout: a simple way to prevent neural networks
from over�tting�. In: The Journal of Machine Learning Research 15.1 (2014),
pp. 1929�1958. url: http://jmlr.org/papers/v15/srivastava14a.htm.
[41] François Chollet et al. Keras. 2015. url: https://keras.io.
[42] James Bergstra, Daniel L K Yamins, and D Cox. �Making a science of model
search: Hyperparameter optimization in hundreds of dimensions for vision
architectures�. In: Proceedings of the 30th International Conference on Ma-
chine Learning (2013), pp. 115�123. url: http://jmlr.org/proceedings/
papers/v28/bergstra13.html.
[43] Prajit Ramachandran, Barret Zoph, and Quoc V Le. �Searching for Activa-
tion Functions�. In: CoRR. Oct. 2017, pp. 1�13. arXiv: 1710.05941. url:
http://arxiv.org/abs/1710.05941.
[44] Diederik P. Kingma and Jimmy Ba. �Adam: A Method for Stochastic Opti-
mization�. In: International Conference on Learning Representations (Dec.
2014). arXiv: 1412.6980. url: http://arxiv.org/abs/1412.6980.
79
A APPENDIX
A Appendix
A.1 Floor Plans of Campus Guÿhaus
Figure 32: Floor plan of second �oor of Campus Guÿhaus with APs marked ingreen [28].
Figure 33: Floor plan of third �oor of Campus Guÿhaus with APs marked in green[28].
80
A.2 Kalman Filter Implementation in Matlab A APPENDIX
Figure 34: Floor plan of fourth �oor of Campus Guÿhaus with APs marked ingreen [28].
A.2 Kalman Filter Implementation in Matlab
1 function [estimate,errorCovariance]
2 = KalmanFilterEstimation(y,previousEstimate,previousErrorCovariance,
A,B,C,D,G,H,R,Q,u)
3 NrOfStates = length(A);
4 % System:
5 % x(k+1) = A*x(k)+B*u(k)+G*w(k)
6 % y(k) = C*x(k)+D*u(k)+H*w(k)+v(k)
7 % E{v_k*v_j} = R*delta_kj
8 % E{w_k*w_j} = Q*delta_kj
9 K = A*previousErrorCovariance*C'*(C*previousErrorCovariance*
C'+H*Q*H'+R)^(−1);10 estimate = A*previousEstimate+B*u+K*(y−C*previousEstimate−D*
u);
11 errorCovariance = A*previousErrorCovariance*A'+G*Q*G'−K*C*previousErrorCovariance*A';
12 if estimate<0
13 projEstimate = @(x) (x−estimate)'*(x−estimate);14 estimate = fmincon(projEstimate,estimate,eye(
NrOfStates)*−1,zeros(NrOfStates,1));15 end
81
A.3 Parameters for Anylogic Simulation A APPENDIX
16 end
A.3 Parameters for Anylogic Simulation
A0 =
13
23
0 0 0 0
0.4 0.2 0.2 0 0.2 0
0 13
13
0 13
0
0 0 0 1 0 0
0 0.2 0.2 0 0.2 0.4
0 0 0 0 23
13
(68)
Bin,0 = Bout,0 =
13
013
0
013
(69)
Aopt =
0.278728 0 0 2.47E − 08 0 5.34E − 08
2.970083 0.992932 4.43E − 07 0 6.90E − 08 0
0.209038 0.002664 0.220401 0 0 0.576441
0 2.65E − 07 1.15E − 06 0.702691 0.04961 0.557901
0.280836 0 0.231901 0.845527 0.787416 2.713404
5.27E − 08 7.79E − 05 0 0 0.002155 0.747378
(70)
Bout,opt =
0.000
0.149
0.000
0.142
0.709
0.000
(71)
82
A.4 Anylogic Simulation Results A APPENDIX
Bin,opt =
0.090
0.243
0.288
0.189
0.125
0.064
(72)
C = eye(NrOfStates);
D = zeros(NrOfStates,1);
H = eye(NrOfStates);
G = eye(NrOfStates);
P0 = eye(NrOfStates)*100;
x0 = mean(Data,2);
A.4 Anylogic Simulation Results
0 100 200 300 400 500 600 700
Timesteps
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09AP Nr. 1
Estimate
Data
Figure 35: MSE for Kalman prediction of the Anylogic simulation for AP 1.
83
A.4 Anylogic Simulation Results A APPENDIX
0 100 200 300 400 500 600 700
Timesteps
0
0.2
0.4
0.6
0.8
1
1.2AP Nr. 2
Estimate
Data
Figure 36: MSE for Kalman prediction of the Anylogic simulation for AP 2.
84
A.4 Anylogic Simulation Results A APPENDIX
0 100 200 300 400 500 600 700
Timesteps
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5AP Nr. 3
Estimate
Data
Figure 37: MSE for Kalman prediction of the Anylogic simulation for AP 3.
0 100 200 300 400 500 600 700
Timesteps
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5AP Nr. 4
Estimate
Data
Figure 38: MSE for Kalman prediction of the Anylogic simulation for AP 4.
85
A.4 Anylogic Simulation Results A APPENDIX
0 100 200 300 400 500 600 700
Timesteps
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8AP Nr. 5
Estimate
Data
Figure 39: MSE for Kalman prediction of the Anylogic simulation for AP 5.
0 100 200 300 400 500 600 700
Timesteps
0
0.01
0.02
0.03
0.04
0.05
0.06AP Nr. 6
Estimate
Data
Figure 40: MSE for Kalman prediction of the Anylogic simulation for AP 6.
86
A.4 Anylogic Simulation Results A APPENDIX
0 100 200 300 400 500 600 700
0
0.5
1
1.5
2
2.5
3
Data
estimate
Figure 41: Sum of users for the Anylogic example and estimate using the Kalman�lter prediction with prediction horizon one.
0 100 200 300 400 500 600 700
0
0.5
1
1.5
2
2.5
3
Data
estimate
Figure 42: Sum of users for the Anylogic example and estimate using the Kalman�lter prediction with prediction horizon �ve.
87
A.5 HMM Example Simulation Results A APPENDIX
A.5 HMM Example Simulation Results
0 50 100 150 200 250 300 350 400 450 5000
5
10
15
20
25
30AP Nr. 2
Original Data
HMM
Order-3 Markov
Kalman
Figure 43: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 2.
88
A.5 HMM Example Simulation Results A APPENDIX
0 50 100 150 200 250 300 350 400 450 5000
2
4
6
8
10
12
14
16
18
20AP Nr. 3
Original Data
HMM
Order-3 Markov
Kalman
Figure 44: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 3.
0 50 100 150 200 250 300 350 400 450 5000
2
4
6
8
10
12
14
16
18
20AP Nr. 4
Original Data
HMM
Order-3 Markov
Kalman
Figure 45: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 4.
89
A.5 HMM Example Simulation Results A APPENDIX
0 50 100 150 200 250 300 350 400 450 5000
1
2
3
4
5
6
7
8
9
10AP Nr. 5
Original Data
HMM
Order-3 Markov
Kalman
Figure 46: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 5.
0 50 100 150 200 250 300 350 400 450 5000
5
10
15AP Nr. 6
Original Data
HMM
Order-3 Markov
Kalman
Figure 47: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 6.
90
Hiermit erkläre ich, dass die vorliegende Arbeit gemäÿ dem Code of Conduct �
Regeln zur Sicherung guter wissenschaftlicher Praxis (in der aktuellen Fassung des
jeweiligen Mitteilungsblattes der TU Wien), insbesondere ohne unzulässige Hilfe
Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel, angefertigt
wurde. Die aus anderen Quellen direkt oder indirekt übernommenen Daten und
Konzepte sind unter Angabe der Quelle gekennzeichnet.
Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder in ähnli-
cher Form in anderen Prüfungsverfahren vorgelegt.
Wien, Juni 2019
Miriam Leopoldseder