master's thesis technische universität wien › files › publik_280264.pdf · 5 third oor of...

institute oftelecommunications

Master's Thesis

Technische Universität Wien

Institute of Telecommunications

Data Driven Prediction of Crowd Mobility inSmall Cell Environments

Miriam Leopoldseder1225520

June 2019

Supervision:

Univ. Prof. Dipl.-Ing. Dr.techn. Markus Rupp

Senior Scientist Dipl.-Ing. Dr.techn. Philipp Svoboda

Abstract

With the growing number of users and high standards regarding data rate, latency

and coverage, in mobile networks, new technologies need to be developed to meet

the demand. The realization of these new concepts requires more planning and

intelligent utilization of information on all network levels. One proposal is to

apply user location prediction methods and incorporate information about how

many users will be where in future time steps in the planning.

This work benchmarks several user location prediction methods. We di�erentiate

between predictors on an individual user level, where we investigate an order-L

Markov and Hidden Markov Model (HMM) based predictor, and access point (AP)

aggregated prediction methods, that directly predict the number of users per AP.

Representing the second category we analyse a Kalman �lter based method and a

machine learning (ML), model-less, approach using neural networks (NNs). The

algorithms show varying robustness in their mean square error (MSE) performance

regarding di�erent number of prediction steps. The Kalman approach has smallest

increase in MSE between one and �ve prediction steps compared to the other model

based methods. To analyse the consequences of incorrect parameters theoretical

data on the basis of a state space model, a Hidden Markov Model (HMM) and

an agent based simulation with the software Anylogic was generated. The results

show that with parameter errors MSE increases most for HMM compared to the

order-L Markov and Kalman predictor.

Lastly, collected data from the Wireless Local Area Network (WLAN) network

of Technische Universität Wien (TU Wien) o�ers a realistic review of prediction

performance in a small cell environment. Its analysis shows that the purely data-

based ML method results in the lowest MSE.

Zusammenfassung

Mit der wachsenden Anzahl an Benutzern in mobilen Netzen, die hohe Ansprüche

bezüglich Datenrate, Latenz und Verfügbarkeit haben, müssen neue Technologien

entwickelt werden um der Nachfrage gerecht zu werden.

Die Realisierung dieser neuen Konzepte benötigt mehr Planung und intelligente

Informationsverarbeitung auf allen Netzwerkebenen. Ein Vorschlag ist Vorhersa-

gemethoden zu nützen, die es ermöglichen die Anzahl der Benutzer pro Zelle für

mehrere Zeitschritte in der Zukunft zu planen. In dieser Arbeit werden verschiede-

ne dieser Methoden untersucht. Man unterscheidet zwischen Prädiktoren die auf

der Ortstrajektorie von individuellen Benutzer basieren, hier untersuchen wir einen

Order-L Markov Ansatz und einen Prädiktor basierend auf Hidden Markov Mo-

dels (HMMs), und Methoden welche die Anzahl an Benutzern pro Zugangspunkt

AP direkt Vorhersagen. Als Repräsentanten dieser Kategorie analysieren wir einen

Kalman�lter basierenden Prädiktor und eine machine learning (ML) Methode ba-

sierend auf neuronalen Netzen (NNs).

Die genannten Algorithmen sind unterschiedlich robust in ihrer mittleren quadrati-

schen Abweichung (MSE) bezüglich verschiedener Anzahl an Vorhersageschritten.

Beim Kalman Ansatz erhöht sich der MSE am geringsten im Vergleich zu den

anderen modellbasierten Methoden. Die Untersuchung von fehlerhaften Parame-

tern anhand von theoretischen Daten generiert mit verschiedenen Modellen zeigt,

dass HMM für wenige Zeitschritte einen niedrigen MSE liefert. Im Gegensatz zu

den anderen Methoden verschlechtert sich die Leistung stark bei einer gröÿeren

Vorhersagezeit.

Um ein realistisches Szenario zu einem mobilen Netzwerk zu untersuchen, wer-

den gesammelte Daten aus dem Wireless Local Area Network (WLAN) Netzwerk

der TU Wien verwendet. Hier erreicht die rein datenbasierte ML Methode den

niedrigsten MSE.

Contents

1 Introduction 1

1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 State of the Art 6

3 Test Scenarios and Experimental Setup 13

3.1 Simulated Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Experimental Wi-Fi Scenario . . . . . . . . . . . . . . . . . . . . . 16

4 Models and Prediction Methods 22

4.1 Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Order-L Markov Predictor . . . . . . . . . . . . . . . . . . . . . . . 25

4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Hidden Markov Model (HMM) . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 Inference in Hidden Markov Models (HMMs) . . . . . . . . . 27

4.3.2 Prediction in Hidden Markov Models (HMMs) . . . . . . . . 28

4.3.3 Training the Parameters of HMMs . . . . . . . . . . . . . . 35

4.4 State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.5.1 Kalman Filter Implementation . . . . . . . . . . . . . . . . . 42

4.6 Machine Learning and Neural Networks . . . . . . . . . . . . . . . . 45

4.6.1 A short Machine Learning Overview . . . . . . . . . . . . . 45

4.6.2 De�nition of Neural Network . . . . . . . . . . . . . . . . . . 46

4.6.3 Practical Aspects of the Neural Network Prediction Method 49

5 Performance Analysis 50

5.1 Theoretical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.1.1 Data generated by State Space Model . . . . . . . . . . . . . 50

5.1.2 Data generated by Hidden Markov Model (HMM) . . . . . . 54

5.1.3 Data generated by Agent based Simulation . . . . . . . . . . 60

5.2 Wi-Fi Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.1 Order-L Markov Predictor . . . . . . . . . . . . . . . . . . . 63

5.2.2 Benchmarking of di�erent Prediction Methods . . . . . . . . 64

6 Discussion 68

7 Conclusion 72

8 References 74

A Appendix 80

A.1 Floor Plans of Campus Guÿhaus . . . . . . . . . . . . . . . . . . . . 80

A.2 Kalman Filter Implementation in Matlab . . . . . . . . . . . . . . . 81

A.3 Parameters for Anylogic Simulation . . . . . . . . . . . . . . . . . . 82

A.4 Anylogic Simulation Results . . . . . . . . . . . . . . . . . . . . . . 83

A.5 HMM Example Simulation Results . . . . . . . . . . . . . . . . . . 88

List of Figures

1 Comparison of current network architecture and proposed Centralized

Radio Access Network (C-RAN) architecture. . . . . . . . . . . . . 3

2 Flowchart showing the logic behind the Anylogic simulation. The

enter and exit location is chosen randomly with pre-de�ned proba-

bilities. Wait time is distributed uniformly between 0.9 and 1 hour. 13

3 Snapshot of a simulation run in Anylogic. The dots in di�erent

colours on the map are agents currently moving to their destination

EI 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Stacked number of agents measured in the Anylogic simulation at

each router over time. . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Third �oor of Campus Guÿhaus, displaying access points (APs) [28]. 16

6 Flow diagram of how APs are connected in Campus Guÿhaus. . . . 17

7 Number of users over time per AP during Monday, 22nd of January

2019, measured every 30 seconds at Campus Guÿhaus. . . . . . . . 18

8 Distribution of sequence lengths and amount of AP visited per user. 19

9 Variation of the probability of where users enter the system for our

experimental setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

10 Example path through Campus Guÿhaus [28]. . . . . . . . . . . . . 21

11 Example of state transition diagram for an order-1 Markov chain

[32, p. 590]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

12 Development in time for a Markov chain and a HMM [33]. . . . . . 28

13 Trellis diagram to motivate a linear state space model to model the

movement between cells. . . . . . . . . . . . . . . . . . . . . . . . . 38

14 Block diagram showing the relationship between state estimator and

system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

15 Exemplary graph for a neural network. . . . . . . . . . . . . . . . . 47

16 State diagram for the theoretical state space model example. . . . . 51

17 Number of users at each AP over time generated by the state space

model example described in section 5.1.1. . . . . . . . . . . . . . . . 52

18 Variation of MSE for increasing prediction horizon and additive

uniform noise on A with variance 10−2. . . . . . . . . . . . . . . . . 53

19 Results for mean square error (MSE) over noise variance for di�erent

prediction horizons and noisy parameters A and B for the state

space model example. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

20 Number of users over time stacked for each AP for the theoretical

HMM data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

21 Results for Markov predictors with di�erent order L for the HMM

example data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

22 MSE results for HMM theoretical example data applying the HMM

based prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

23 Results for the HMM theoretical data applying the Kalman �lter

predictor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

24 Comparison of prediction performance for di�erent prediction meth-

ods and parameter noise variances for the HMM theoretical example

data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

25 Results for a one step prediction for HMM, Order-3 Markov and

Kalman �lter prediction . . . . . . . . . . . . . . . . . . . . . . . . 60

26 Plot of normalized data from the Anylogic simulation and its Kalman

prediction for 5 prediction steps. . . . . . . . . . . . . . . . . . . . . 61

27 Development of MSE with the Kalman �lter prediction for increas-

ing prediction horizon applied on the Anylogic data. . . . . . . . . 62

28 MSE results for di�erent order Markov predictors applied on the

measured Wi-Fi data. . . . . . . . . . . . . . . . . . . . . . . . . . . 63

29 MSE for HMM, Kalman, order-1 and ULS prediction for increasing

prediction horizon using the Wi-Fi data from Campus Guÿhaus. . . 64

30 Normalized number of users over time for AP-CF02-7 for a one

step prediction for the Wi-Fi data from TU Wien for the 26th of

November 2018. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

31 Normalized number of users over time for AP-CF02-7 for a �ve step

prediction for the Wi-Fi data from Technische Universität Wien

(TU Wien) for the 26th of November 2018. . . . . . . . . . . . . . . 67

32 Floor plan of second �oor of Campus Guÿhaus with APs marked in

green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

33 Floor plan of third �oor of Campus Guÿhaus with APs marked in

green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

34 Floor plan of fourth �oor of Campus Guÿhaus with APs marked in

green [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

35 MSE for Kalman prediction of the Anylogic simulation for AP 1. . . 83






41 Sum of users for the Anylogic example and estimate using the

Kalman �lter prediction with prediction horizon one. . . . . . . . . 87

42 Sum of users for the Anylogic example and estimate using the

Kalman �lter prediction with prediction horizon �ve. . . . . . . . . 87

43 Plot of original HMM data and prediction result for 1 prediction

step for HMM, Markov order-3 and Kalman prediction showing AP 2. 88









List of Tables

1 Schedule of EI 1 for the Anylogic simulation. . . . . . . . . . . . . . 15

2 Schedule of EI 4 for the Anylogic simulation. . . . . . . . . . . . . . 15

3 Example entry of data measured in Campus Guÿhaus on the 23th

of April. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Mean percentage per AP of where users are �rst registered. . . . . . 20

5 MSE for di�erent prediction methods for prediction horizon 1, 5

and 20. The results for Kalman and HMM based predictions are

displayed for noise variance 0.1. . . . . . . . . . . . . . . . . . . . . 59

6 Comparison of MSE for 1 and 5 prediction steps for the recorded

Wi-Fi data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

List of Abbreviations

AP access point

BBU Baseband Unit

C-RAN Centralized Radio Access Network

EM expectation maximisation

HMM Hidden Markov Model

ID identi�cation

IEEE Institute of Electrical and Electronics Engineers

IP Internet Protocol

LeZi Lempel-Ziv

LMMSE linear minimum mean square error

LOS line of sight

LZ Lempel-Ziv

MAC Media-Access-Control

mIoT massive Internet of Things

ML machine learning

MMP Mobile Motion Prediction

MSE mean square error

NN neural network

ODM origin destination matrix

pdf probability density function

PPM Prediction by partial matching

QoS Quality of Service

RRH Remote Radio Head

TU Wien Technische Universität Wien

VoIP Voice over IP

WLAN Wireless Local Area Network

1 INTRODUCTION

1 Introduction

In the last few years reliable mobile internet access has gained massive importance

in our society. A report by Cisco predicts that the global mobile tra�c will increase

by over 600% between 2017 and 2022 [1]. It estimates that for each person in the

world there will be on average 1.5 mobile devices. Additionally, mobile connection

speed will triple, while the average smartphone will generate 11GB of mobile data

tra�c per month by 2022. In the 4th quarter of 2018 alone, 43 million new mobile

subscriptions were added worldwide and it is forecasted that by 2022 almost one

billion more devices than in 2016 will be in use [1, 2].

With more people than ever connected over their mobile devices and ever increasing

demands for bandwidth, low latency and connection reliability, there are numerous

challenges that need to be addressed in future network generations like 5G and

beyond. Some requested features to meet demand are �exible function capabilities,

cost e�cient throughput coverage and context aware networks [3]. To tackle these

challenges, new technologies have to be introduced. One new concept proposed for

5G is network slicing. 3rd Generation Partnership Project (3GPP) [4] de�nes a

network slice as �A logical network that provides speci�c network capabilities and

network characteristics�. It o�ers the possibility to run di�erent services on the

same hardware while maintaining a certain Quality of Service (QoS) for each user.

Network slicing also makes the system more �exible and easier to scale because

it separates core network development, deployment and service maintenance [5].

Such features also create new challenges.

The virtualization of services generates the need for e�ciently shared network

resources for services which at times have competing interests. On the one hand

there are ultra-reliable low latency applications while on the other there are massive

Internet of Things (mIoT) networks, where latency is an insigni�cant factor. Con-

sequently these slices need to be isolated from each other to guarantee a speci�c

QoS for each of them [5].

In order to manage resources and access demand an orchestrator is introduced.

It provides communication between the core network and is under the constraint

of the access layer of the network [5]. This orchestrator has many di�erent tasks,

1

1 INTRODUCTION

but always has to make sure that the promised QoS of each slice is maintained.

To reliably match resources to demand, new models and procedures need to be

developed. One possible change to make allocating resources to certain services

feasible is a �exible system architecture that dynamically adapts network structure

to demand.

In order to aid the orchestrator in managing the changing demands for services, we

need to exploit every possible information, to forecast at which particular location

and when the service is expected to be utilized. This is especially helpful if a node,

like a car, is moving through the network. In that case the prediction of where

users will be in the next time period brings improvements to how the orchestrator

can plan resource allocation in advance. A suggested technology to achieve this is

discussed below.

Centralized Radio Access Network (C-RAN)

To accommodate the new �exibility demands, network architecture has to evolve.

Current mobile networks work on a cellular basis, where each cell has its own

Remote Radio Head (RRH), consisting of the main hardware needed to transmit

a signal, and its own Baseband Unit (BBU), responsible for resource allocation

and computation. Currently not all resources of RRHs exploit their full capacity,

which means that BBU computational power is not utilized e�ciently. The sug-

gested C-RAN architecture o�ers an improvement to the current system [6]. As

seen in �g. 1, in a C-RAN one BBU-pool works with several RRHs. The computa-

tional power of each BBU gets shared between di�erent RRHs. This improves the

e�ciency of the network, because the load in each cell varies over time [7]. Addi-

tionally, with smaller cell types being introduced in 5G, a more e�cient planning

of computational resources will be necessary [3].

One challenge of this new kind of mobile network is allocating enough radio re-

sources to where they are needed. Demand is assumed to be proportional to the

amount of users at a given location. Methods to model how people are moving

through a network and predicting how many people will be at a certain time and

location in the system, relying only on historical data, may help match resources

to demand.

2

1 INTRODUCTION

Figure 1: Comparison of current network architecture and proposed C-RAN ar-chitecture.

Other applications of location prediction

Predicting how people are moving through buildings or networks is utilized in mul-

tiple applications, other than improving �exible mobile communication systems.

In general we can di�erentiate between two classes of applications that bene�t

from predicting the user's location:

1. User-focused applications: Location prediction helping the user to pre-

pare for or adapt to a situation. Examples are road tra�c optimization or

stolen vehicle localization. Here, an exact position may be more important

than in system-focused applications.

2. System-focused applications: Improving system performance, availabil-

ity or other metrics due to location information. An example is hando�

optimization. In these applications the precise location is less important and

symbolic coordinates like cell-id or access point are su�cient [8].

As an example, tracking users and predicting their location helps directing crowds

through a network while minimizing congestion. In 2016 Transport for London

conducted a project where they tracked users of the Virgin Media Wi-Fi network

in London's subway stations and their subway gateway data. Among other things,

their objective was to improve journey planners, rerouting during disruptions in

the service and sta� distribution, implementing the insights gained [9].

3

1 INTRODUCTION

This shows how versatile data tracked from mobile or Wi-Fi networks can be, but

also raises the issue of privacy of the users and the ownership of a data pro�le.

Tracking the movement of users through the public transport system may reveal

information about their home, place of work, habits and schedule. On a bigger

scale, like the mobile network, even more information like travelling destination,

relationship status, favourite restaurants and so on could be gathered.

Good practise is to disguise unique identi�cations (IDs) like Media-Access-Control

(MAC) or Internet Protocol (IP) address. To further protect private information

while working with location data several measures are possible. Noise can be

added to the location or, depending on the application, one could bin users into

groups and only look at the group behaviour [10]. Killijian, Gambs, and Cortez [10]

suggest sanitization algorithms to secure privacy, however they conclude that every

privacy measure requires a trade-o� between accuracy and privacy. Protecting the

privacy of the users will not be covered in this thesis, but nevertheless should be

addressed in every real-life application.

For our suggested application, which is applying location prediction to help a net-

work orchestrator make more informed decision and plan ahead, there are several

dimensions that need to be considered. First we assume that there is the possibility

to measure the user location history on a user and AP level in the network. Due to

every measurement taking up resources in the system, we would like to minimize

the number of measurements we need to make, while still knowing with a certain

accuracy how many users are at what location in the network and where they

will be in the future. This requires maximising the timespan that is accurately

predicted. To evaluate the proposed methods we consider how much memory is

required during the prediction, what data is needed for the prediction, the amount

of data necessary for the training and how the prediction complexity scales with

the size of the network.

To e�ciently utilize available data in mobile or Wi-Fi networks good modelling

and prediction methods are required. This work investigates several approaches,

some stemming from methods already proposed for other mobile networking ap-

plications, others that are applied in a variety of di�erent tasks.

4

1.1 Outline 1 INTRODUCTION

1.1 Outline

We will �rst discuss state-of-the-art methods and applications for location predic-

tion and leveraging mobile network data for location-based analysis in section 2.

Subsequently, we describe di�erent test scenarios, evaluating prediction perfor-

mance in section 3. Section 4 addresses models and methods that are the basis for

the prediction algorithms. We discuss two individual user based prediction meth-

ods, HMM and order-L Markov predictor, that are widespread in the literature.

Because we are interested in the crowd movement, we sum over all users to get

the number of users per AP over time. Additionally we investigate two AP based

methods, Kalman and ML predictor, that work with the aggregated number of

users at each AP to directly predict crowd movement.

Finally we benchmark the performance of the di�erent methods for our test sce-

narios in section 5, discuss the results in section 6 and o�er concluding remarks in

section 7.

5

2 STATE OF THE ART

2 State of the Art

Matching demand with resources is a fundamental challenge in a modern society

in motion. Predicting tra�c �ows allows for mapping demand for mobility to the

required resources. It is therefore no surprise that the prediction of where people

will be and how they will use network infrastructure has early been investigated,

especially in regards to tra�c planning.

Location tracking in tra�c planning

Tra�c �ow has traditionally been modelled with origin destination matrices (ODMs).

Each ODM entry describes how many people travel from a speci�c origin to a desti-

nation during a certain period of time. Data aggregation to �ll these matrices was

historically either done by surveys, tra�c count stations, tra�c count cameras or

by collecting data from bus or taxi systems [11, 12]. Those methods were usually

very laborious, which meant updating ODMs was very costly. Therefore ODMs

were rarely up-to-date and sophisticated adaption methods were needed. This also

meant that most models where static models. With new ways of gathering data,

implementation of dynamic models became possible [12].

Dynamic ODM models allow for route planning or determining the ideal service

frequency for public transport. Pinelli et al. [13] introduce a method which uses

mobility traces from a mobile network and an ODM based model to e�ciently

plan routes in a public transport network. The authors could reduce the overall

journey times by 27 %. They speci�cally highlight the advantages of utilizing

mobile network data for emerging markets, because the data collection does not

need additional infrastructure like road measurement points or counting sensors

in public transport, but is available at all times. The downside of network data

is that typically only event data are available. If for parts of the journey no calls

are made or text messages received, the trajectory might be not representative of

the ground truth and therefore not reliable. Consequently, it is necessary to track

data and to ensure its validity. Additionally, data from di�erent providers might

be necessary to capture characteristics of movement, which makes it more di�cult

to obtain complete datasets [13]. Those downsides are still outweighed by the ease

6

2 STATE OF THE ART

of which an almost complete trajectory can be captured, which was not possible

with earlier methods.

The ODM models work well with vehicular or internet tra�c, which are mostly

focused on the �ow between locations. They are less ideal if we are not interested

in looking at �ow, but rather at amount of users at a certain location [14]. We

will be considering methods used in mobile communication next.

Location prediction in Mobile communication

Historically, a common goal in mobile communication is to predict the hand-over

of user connections between cells. Subsequently, the prediction of the location of

a single user has received more attention than tracking movement of a group of

users. To reach the goal of predicting the next user location, many algorithms

de�ne the movement history for a single user Hm = 〈x1 = ai, . . . , xm = aj〉 wherei, j ∈ {1, . . . , N}, as a base for the prediction. Here x represents an abstract

location like a cell or a precise location like GPS-coordinates at time or event i.

We assume that the locations x can be modelled as random variables from a �nite

alphabet A = {a1, . . . , aN}, which describes all possible locations. If for example,

we describe the movement of a user trough �ve di�erent rooms over a period of

24 hours, A will include �ve possible locations a1 to a5. If we update the location

every hour, m will equal 24.

The update ofHm can be movement based with an entry being added event driven,

like at cell crossings, or the updates can happen in �xed time intervals. A combi-

nation of both approaches is possible, where H is updated after some time units

and every cell crossing [8]. Hm is also called user sequence, trajectory or trace.

Cheng, Jain, and Berg [8] di�erentiate between two types of location prediction

methods for hand-over prediction: �domain-independent� or �domain-speci�c� al-

gorithms. Examples for domain-independent algorithms are order-L Markov pre-

dictors detailed in section 4.2, which predicts the next step based on the last L

steps of the movement history and Lempel-Ziv (LZ or LeZi) predictors based on

Ziv and Lempel [15]. LZ based algorithms depend on two main assumptions:

1. The user's mobility patterns are repetitive making the movement history a

stationary process.

7

2 STATE OF THE ART

2. The user's movement follows a probabilistic model and therefore the move-

ment history H is a stochastic process too.

Examples of domain-speci�c approaches are Mobile Motion Prediction (MMP)

for improving mobility management in a cellular network introduced by Liu and

Maguire [16], and segment matching introduced by Chan, Zhou, and Seneviratne

[17]. We will focus on domain-independent methods, because they are less restric-

tive on the kind of networks or applications they work with.

Both Lempel-Ziv (LZ) and Markov methods mentioned above have several lim-

itations. Mainly, they are independent of time, which means known time of day

e�ects or other additional knowledge cannot be incorporated into the prediction

[18]. Building on the basic ideas of Markov model based or LZ predictions, many

variations have been investigated in literature.

Rodriguez-Carrion et al. [19] compare three di�erent versions of the LZ algorithm:

classical LZ, LeZi Update and Active LeZi. The classical LZ divides H into sub-

strings and builds a tree that represents which location appeared after a certain

substring and how frequently. Prediction is done by evaluating the branch of the

current context by counting how many times a location appeared after the cur-

rent context, divided by how often the current context appears in general. LeZi

Update additionally takes patterns within substrings into account. It uses predic-

tion by partial matching method, which combines di�erent order context length

for prediction. The Active LeZi Algorithm considers all possible substrings with a

variable window length. It needs the most memory resources of the three. They

have the advantage of online training the needed parameters, and therefore an

initial training phase is not necessary. This online training property also brings

the possibility to automatically adapt to changes in user behaviour, because the

parameters change with the data. The authors in [19] work with GSM-based loca-

tion data to evaluate the performance of the three LZ-variants. In terms of average

correct prediction, Active LeZi outperformed the other methods but also had the

highest resource consumption.

It is di�cult to compare the various methods by di�erent authors, because de-

pending on the exact application the prediction performance is measured uniquely.

8

2 STATE OF THE ART

Rodriguez-Carrion et al. [19] measure performance in terms of hitrate1. With their

methods they achieve a 60% hitrate for all users and a hitrate over 80% for less

than ten percent of all users. Theoretically LZ methods achieve minimum uncer-

tainty with lowest order [19], however with practical data Song [20] shows that

Markov based methods outperform LZ based methods.

Hadachi et al. [21] evaluate mobility prediction based on di�erent enhanced Markov

model schemes. They combine algorithms utilizing a global and local context and

di�erent order Markov methods. To minimize errors due to common problems in

mobile network trajectories, they �lter the data to minimize ping-pong e�ects2

and to split traces, if there is a signi�cant time gap between locations. They

measure the prediction performance in terms of percentage of correct predictions3, percentage of wrong predictions4 and percentage of failed predictions5. For users

who's data have been already present in the trainings set, their best combination

method reaches 95.67% of correct predictions. For new users their best method

only achieves 53.87% of correct predictions, while 22.33% were failed predictions.

Several of their other combined methods have a higher number of wrong predictions

than correct ones and especially for new users, the percentage of failures is very

high with up to 97%. These methods have problems with new users which might

indicate that they do not �t the underlying system.

Comparing both Markov and LZ based schemes, Cheng, Qiao, and Yang [18]

utilize tra�c data from a mobile network gathered over 27-days and resulting in

over 4000 user trajectories. They introduce an improved Markov based prediction

and compare it among others to a time based Markov approach. In their suggested

method they alter probabilities with weights depending on time between samples

and combine di�erent order Markov predictors. Their new methods almost keeps

up with the performance of Active Lempel-Ziv (LeZi) and LeZi Update but uses

less resources and is faster. They measure prediction performance by a �fraction

of users for which the algorithms correctly identi�ed the next location� and then

1Equalling number of correctly predicted next cells divided by total number of cell changes2Ping-pong e�ects are a fast switching between cells if the user is located in the border region

between cells.3Ratio of number of correct predictions to total number of users4Ratio of number of wrong predictions to total number of users5Ratio of number of failed predictions to total number of users

9

2 STATE OF THE ART

calculate a mean over all timesteps. In summary they achieve a prediction accuracy

of� 36.1% with Active LeZi,

� 35.8% with LeZi Update,

� 35.4% with improved Markov,

� 29.9% with LZ, and

� 29.4% with Markov prediction.

They comment that the prediction accuracy is better for sequences with less lo-

cations and less change, which they call high regularity. The gap between time

based Markov and improved Markov decreases when trajectory regularity is low.

This is emphasized by their �nding that the prediction works better at night time,

because there the trajectories are less diverse. This shows the possible tendency

for Markov predictors to prefer staying in a location and not being able to actually

predict location changes.

In Libo Song et al. [22] di�erent domain independent prediction methods in terms

of prediction performance for Wi-Fi data are compared. They conclude that en-

tropy, as indicator for movement randomness, correlates with the performance of

Markov based predictors. They �nd that with higher entropy the accuracy de-

creases, which is also supported by the results from [18]. This raises the topic

of how a typical Wi-Fi data trajectory looks in general. We discuss this �nding

further in section 3.2.

For the authors of [22], the best prediction performance can be achieved by di�erent

order Markov and LZ predictors. Surprisingly the lower order Markov predictors

showed the best performance overall when compared to more complex compression-

based methods, �nding that methods that rely on online learning need a long

trace length until they deliver reliable results. Libo Song et al. [22] achieve an

accuracy of 65%-75% for the median user, with the performance depending on

the length of the trace. They benchmark their methods on Wi-Fi data recorded

on the Darthmouth campus over several years for their analysis. Usually people

connected to a Wi-Fi network indoors or on a college campus will move for a

short amount of time through the network and then stay for a longer time in

one place, for example in a lecture room. In this case already a high percentage

10

2 STATE OF THE ART

of predictions would be accurate if either the last position or the most common

position is used as the prediction. This might explain the correlation the authors

�nd between entropy and prediction accuracy and we will further investigate this

theory in section 5.2.2.

Another method to increase the degrees of freedom of a Markov based model are

HMMs. These models are used for a wide array of tasks such as speech recognition

[23] and gene �nding [24]. Employing HMMs for movement prediction in buildings

has already been proposed by [25], however they apply their methods on data from

smart doorplates in an o�ce. Si et al. [26] applies the same principles to an example

in a cellular mobile network. They compare a prediction in a cellular mobile

system based on an order-1 Markov chain, order-2 Markov chain and a HMM.

The authors �nd that with all tested sequence lengths the HMM prediction was

the most accurate. A downside of HMM based prediction compared to a simpler

Markov prediction is nevertheless the computational complexity of the calculation

[26].

Contribution of this work

We focus on domain independent algorithms, because the real nature of future

small cell environments is not yet known, keeping the results as general as possible.

From literature summarized previously we conclude that for small cell scenarios

like Wi-Fi networks, simple Markov based schemes and HMM methods promise

the biggest potential. They are �exible models with low complexity and show good

results in similar research like [22] and [26].

We are generally not interested in predicting a single user's trajectory, but want

to predict where users move as a crowd in future timesteps. Regardless if we can

predict the path of single user accurately, we can sum over all users and get the

information in terms of APs. We contribute a benchmark for the di�erent state-

of-the-art methods, as well as investigating location prediction on a crowd level,

which may be more robust against random decisions of users.

As aggregated AP methods, we propose a prediction based on a Kalman �lter

and also compare the results to a ML approach. We hope to draw conclusions on

how much information is hidden in the data and can therefore be learned from the

11

2 STATE OF THE ART

data. Additionally we investigate results for several prediction steps, studying how

the prediction performance changes in case of predicting a longer timespan ahead,

something that is not done in the sources discussed. This introduces the possibility

of a situation, where measurement of users in the network is possible, but may not

be done too often to not interfere with normal network functions. We investigate

how these methods would perform, if they had measurements every m steps. This

approach o�ers trade-o� decision making between accuracy of prediction and load

on the network due to measuring.

12

3 TEST SCENARIOS AND EXPERIMENTAL SETUP

3 Test Scenarios and Experimental Setup

To evaluate the performance of di�erent prediction methods, a suitable test sce-

nario is necessary. Mobile networks are not yet equipped with the capabilities of a

C-RAN and data from mobile networks on a large scale is not freely available. We

hypothesize that for our research interest Wireless Local Area Networks (WLANs)

or Wi-Fi networks o�er a similar characteristic as small cells in a mobile network.

Another advantage is that many buildings already have routers or APs installed

and a lot of people use the technology when available.

We therefore consider two test scenarios to evaluate prediction algorithms: �rst, a

simulated scenario mimicking user movement through a building and second, data

from the WLAN network at Campus Guÿhaus, TU Wien.

3.1 Simulated Scenario

To benchmark whether or not the implemented algorithms work in a controlled

environment, we modelled a part of the building geometry in the agent based sim-

ulation tool Anylogic. This tool is used for a wide array of applications in industry

and research. It o�ers the possibility to simulate agent based or event based sys-

tems. An advantage of this software are the many prede�ned libraries ranging from

�uid- to rail-tra�c-models [27]. They also o�er a pedestrian library, which allows

straight forward modelling of pedestrians moving through environments. Anylogic

o�ers the possibility to combine a visual editor, to model buildings, supply chains

or ports, and an event and agent based programming interface.

Users

Where to enter?

Stg. 8

Stg. 10

Stg. 9What

destination?EI 1

EI 4

WaitWhere to

exit?Stg. 8

Stg. 10

Stg. 9 Exit

Figure 2: Flowchart showing the logic behind the Anylogic simulation. The enterand exit location is chosen randomly with pre-de�ned probabilities. Wait time isdistributed uniformly between 0.9 and 1 hour.

13

3.1 Simulated Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP

Figure 3: Snapshot of a simulation run in Anylogic. The dots in di�erent colourson the map are agents currently moving to their destination EI 1.

As seen in �g. 3 for our purposes the �oor plan of Campus Guÿhaus was imported

into Anylogic, to ensure that the set up simulation is to scale to the real building.

The agents can walk between walls that are painted in orange. Three di�erent

entry points close to the stairs and destinations in the lecture rooms where de�ned,

visible in green in �g. 3. Each agent is represented as a coloured dot on the map.

The entry points are located in front of staircases and are named after the german

word �Stiege�. According to the �owchart in �g. 2 the users enter the system at

the de�ned entry points, walk to their designated goal, wait for a certain time and

then exit at one of the exit points. Agents enter at Stg.8 42% of the time and

at Stg.9 and Stg.10 each 29% of the time. They then walk to their designated

location according to the schedules in tables 1 and 2.

After a uniformly distributed time between 0.9h and 1h the agents leave the lecture

room. With a probability of 50% they exit the simulation at Stg.8. Alternatively,

with probability of 25% each, the agents disappear at Stg.9 or 10. This is supposed

14

3.1 Simulated Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP

Start time End time Nr. of users entering8:45 9:00 1510:45 11:00 2512:45 13:00 5015:45 16:00 56

Table 1: Schedule of EI 1 for the Anylogic simulation.

Start time End time Nr. of users entering12:45 13:00 9017:45 18:15 45

Table 2: Schedule of EI 4 for the Anylogic simulation.

to simulate the behaviour of students during a weekday. In this setup we neglect

simulating people freely moving through the building.

The number of users measured over the simulation time is displayed in �g. 4. There

is a clear di�erence between routers in lecture rooms and hallways. As expected

the routers covering the lecture rooms show a constant number of people over a

longer period of time and the routers that only capture agents in hallways appear

as small variation on top of that. We will compare the results from the simulation

to the collected Wi-Fi data in section 3.2.

08:00 10:00 12:00 14:00 16:00 18:00

Time Mar 29, 2019

0

50

100

150

Nr.

of

ag

en

ts

Router 1

Router 2

Router 3

Router 4

Router 5

Router 6

Figure 4: Stacked number of agents measured in the Anylogic simulation at eachrouter over time.

15

3.2 Experimental Wi-Fi Scenario 3 TEST SCENARIOS AND EXPERIMENTAL SETUP

3.2 Experimental Wi-Fi Scenario

In our test setup we monitored the WLAN network of Campus Guÿhaus on three

separate �oors, focusing around an area with many lecture rooms. This ensures

increased frequency of students and a certain weekly repeating schedule, because

of lectures in the rooms. The most frequented �oor can be seen in �g. 5, where

lecture rooms are marked with yellow and every green circle represents an AP.

Every student and employee of the university can connect to the WLAN network.

We measured data from the Wi-Fi network between April 2018 and February 2019.

Figure 5: Third �oor of Campus Guÿhaus, displaying APs [28].

In total we collected 4 883 603 data points, which includes 11 weeks measured com-

pletely, and some weeks, where only a few days have been measured. The system

was polled every 30 seconds. For each entry we get a timestamp, anonymized

client ID, AP MAC, AP name and AP ID. The client ID is anonymized before

we get access to the data, ensuring we cannot trace back the original ID of the

user. Example entries can be seen in table 3. Figure 6 shows how the APs are

connected in the building. A clear day of time e�ect in the number of users over a

day can be seen in the data. As an illustrative example 24h of measured data on

Monday 22th of January 2019 are plotted in �g. 7. Each colour represents a di�er-

ent AP and all APs are stacked on top of each other. This shows the total number

16


Table 3: Example entry of data measured in Campus Guÿhaus on the 23th ofApril.

Time Client ID AP MAC AP Name AP ID23.04.2018 12:38 rZOgnSxIQI. 0c:27:24:6d:4c:30 AP-CG04-1 1823.04.2018 12:38 VJRLEEt7RaY 0c:27:24:6d:4c:30 AP-CG04-1 1823.04.2018 12:38 zAoMt2IzbWo b4:a4:e3:b5:12:f0 AP-CF01-3 1523.04.2018 12:38 EsNv4vkbkak b4:a4:e3:b5:12:f0 AP-CF01-3 1523.04.2018 12:38 kf6xAgz.ZbQ d0:c7:89:0f:b5:a0 AP-CG02-1 12

Figure 6: Flow diagram of how APs are connected in Campus Guÿhaus.

of people measured in the system over time and the variation of the number of

users per AP during the day. There is almost no activity until 8am, then after

a relatively steep rise, the number of users decreases slightly at around 1pm and

again at 4pm. These times mark the end of lectures. After 6pm most users leave

the building. When comparing the data from the Anylogic simulation in �g. 4, the

Wi-Fi data in �g. 7 does not show distinct rectangles or other shapes that would

indicate a bigger group of users arriving at the same time as the simulated data

does. The real data seems noisier with a lot less regularity. There are many more

options on where users could go to and therefore there is more variation.

Libo Song et al. [22] �nds that their prediction method works worse, if the user

17


00:00 06:00 12:00 18:00 00:00

Jan 22, 2019

0

10

20

30

40

50

60

70

80

90

Nr.

of users

AP 1

AP 2

AP 3

AP 4

AP 5

AP 6

AP 7

AP 8

AP 9

AP 10

AP 11

AP 12

AP 13

AP 14

AP 15

AP 16

AP 17

Figure 7: Number of users over time per AP during Monday, 22nd of January2019, measured every 30 seconds at Campus Guÿhaus.

sequence is short. We therefore look at the distribution of length of user sequences

of our data. The length of a user sequence is equal to the length of the location

history H. In our setup one location is added to the sequence every 30 seconds, if

a device is connected to the Wi-Fi. The length distribution of the user sequences

can be seen in �g. 8a. Around 20% of all users have a sequence length smaller

than 40. Additionally in �g. 8b we can see that 40% of all users visit less than 5

APs and more than 70% have less than 10 APs. Hence, one trajectory of a user is

often not very diverse which may in�uence the prediction. On the one hand high

regularity makes it theoretically easier to estimate the next step, because there

is less uncertainty. Contrary, it also makes predictors, that are trained on real

data, prone to only predict if something stays the same and not able to deal with

a location change. This is also something we found in regards to the results pre-

sented in [22]. They show that for long traces they get a median accuracy of 72%.

However in our experiments the accuracy rapidly decreased, if we only looked at

the timesteps where the user changes location. We therefore conclude, that their

method is mainly good at predicting that a users stays in the AP, which still leads

to good results with data that �ts the model assumptions, but might lead to the

18


(a) Length distribution (b) APs Nr distribution

Figure 8: Distribution of sequence lengths and amount of AP visited per user.

wrong impression on the performance on semi static data.

Compared to a mobile network, where most people are connected constantly, it

can be assumed that only a certain share of the population has turned on Wi-Fi on

their devices at all times. To illustrate this, table 4 shows the percentage of users

who �rst log-in into the network at each AP. Even at APs that are not located at

entrances, the percentage of people appearing the �rst time in the system is not

zero. This con�rms that many users log into the system after they have entered

the building. A total of 6.69% of people are �rst registered in the system at the

AP CF02-3, located in lecture room EI4 on the second �oor of the building. To

get there users �rst have to walk through hallways and staircases where they pass

several other APs. However, it is likely that some users only turn on the Wi-Fi

on their mobile phone while they are sitting in the lecture room instead of having

it turned on at all times. There is also some variation over time to where users

are �rst registered into the system. Figure 9 shows the variation of entry prob-

ability for each AP for di�erent weeks. For APs like CG03-1 the variation over

several weeks is quite small compared to AP CF01-4, where the variation is more

signi�cant. This can be explained due to di�erent schedules of each day or even

week. The spreading of the entry probabilities decreases if only looking at data

from each day of the week.

19


Table 4: Mean percentage per AP of where users are �rst registered.

AP-CF02-4 4.42 % AP-CF02-7 6.37 %AP-CF02-3 6.69 % AP-CG02-1 2.06 %AP-CF02-10 5.44 % AP-CF01-4 8.36 %AP-CF02-5 5.15 % AP-CF01-1 2.96 %AP-CF02-11 7.83 % AP-CF01-3 14.88 %AP-CF02-6 10.92 % AP-CF01-2 1.69 %AP-CG03-1 1.02 % AP-CF01-5 7.42 %AP-CF02-2 6.18 % AP-CG04-1 1.26 %AP-CF02-1 6.05 %

Wi-Fi can have a range between 45 - 100m and even though the indoor range

might be a lot smaller, it is still considerably large compared to our test area [29].

Compared to mobile networks there are no regulations for WLAN in regards to

overlap of di�erent APs in the network, while in cellular networks there are strict

limits on how much cells can overlap and interfere outside of their border. The

cells in the investigated WLAN setup strongly overlap. In Wi-Fi there is also no

binding standard regarding handover between cells as is usual in mobile networks.

There are newer standards like IEEE 802.11r-2008 that cater to the needs of ap-

plications like Voice over IP (VoIP) and therefore o�er some controlled handover

policies [30].

If these new standards are used, depends on the device and not the network itself.

With new devices and WLAN generations the adoption of these protocols will

become more common. There is also a di�erence between indoor and outdoor

networks. WLAN usually is used indoors where a clear line of sight (LOS) is

less frequent compared to mobile networks outside, where there is usually LOS

additional to multipath propagation.

The problems resulting from the potential overlapping coverage and no handover

protocol for our application can be shown with the following example: During a

walk on the red path shown in �g. 10, assuming no overlapping of AP-ranges, a user

device would connect to CF02-11, CF02-5, CF02-3 and then CF02-10. However

in reality, with a normal walking speed, the phone is only connected to CF02-11

and then CF02-10. This happens because the connection to AP CF02-11 in the

hallway is still very good, even if other APs are physically closer.

20


AP-CF02

-4

AP-CF02

-3

AP-CF02

-10

AP-CF02

-5

AP-CF02

-11

AP-CF02

-6

AP-CG03

-1

AP-CF02

-2

AP-CF02

-1

AP-CF02

-7

AP-CG02

-1

AP-CF01

-4

AP-CF01

-1

AP-CF01

-3

AP-CF01

-2

AP-CF01

-5

AP-CG04

-1

0

5

10

15

20

25

30

35

40

45

en

try p

rob

ab

ility

[%

]

Figure 9: Variation of the probability of where users enter the system for ourexperimental setup.

Figure 10: Example path through Campus Guÿhaus [28].

These challenges would not appear in a mobile cell network, where consideration is

going into cell size and overlapping. In our experimental setup these issues cause

considerable noise. This potentially leads to a bigger problem if we want to utilize

the data to predict an exact location. If we are only interested in higher level local-

ization like cell ID, it might need to be taken into account in the modelling stage,

but will not automatically have a negative impact on the prediction performance.

21

4 MODELS AND PREDICTION METHODS

4 Models and Prediction Methods

Given our experimental setup we now are presented with a threefold problem.

First we need to identify a model to describe our system. Secondly we then look

at parameter training when necessary. Lastly we have to decide on prediction

methods that suit the data model.

If, for a dynamic system, parameters x1, . . . , xn exist, with the properties that

the outputs y1, . . . yq at any given time t are uniquely de�ned through the input

u1(τ), . . . , up(τ), on the interval t0 ≤ τ ≤ t, and the initial values x1(t0), . . . , xn(t0),

for any �xed t0, then x1, . . . , xn are called states of the system [31]. In our case

x(t) describes how many users are at an AP at time t.

We di�erentiate between two prediction approaches:

1. Based on individual user trajectories: The next location is predicted

for each user individually and x is determined by summing over all users

for each AP. We consider order-L Markov and HMM predictors for this

approach.

2. Based on aggregated users per AP: The aggregated number of users

per AP x is directly predicted for each time step. We evaluate the Kalman

and an ML based prediction method for this approach.

We also need to introduce an error measure to compare and analyse the di�erent

modelling methods. Because of its simplicity and favourable properties regarding

optimization we choose the MSE

MSE = E{‖x− x‖22

}≈ 1

N

N∑i=1

(xi − xi)2 (1)

as cost function that is minimized in order to �t the model to the data and later

compare prediction performance of di�erent algorithms. As outlined in section 2,

in the literature various performance metrics are used, which makes it hard to

compare results from di�erent authors. Because we are interested in the perfor-

mance on a crowd level, we choose to measure the performance in terms of all

users at an AP instead of other performance metrics that are single user based.

This also allows us to compare methods that predict the location for individual

22

4.1 Markov Model 4 MODELS AND PREDICTION METHODS

users and crowd based models that predict the number of people at each AP.

To model our system we will mainly work with probabilistic models because, even

though there are patterns for each day or week, there is a strong random character

to the movement of users through a network. Even if a student arrives for a

lecture every Monday morning they might not arrive at the exact same time each

Monday and choose di�erent paths to their destination. Probabilistic models o�er

a robustness against these kind of random variations.

In the following we will look into Markov models which are the basis for the

prediction order-L Markov predictor and HMMs. We then present a model to

motivate a Kalman �lter based prediction method and lastly introduce a common

machine learning technique as a benchmark.

4.1 Markov Model

Markov models or chains are a probabilistic way of modelling a sequence of ob-

servations often called timeseries. They have been used in a wide array of areas

such as speech coding and forecasting [32]. We �rst look at the general theoretical

background of Markov models.

With a probabilistic model the joint distribution p(v1:T ) is used to describe char-

acteristics of the data v1, . . . , vT . Given that our timeseries is causal we get the

model

p(v1:T ) =T∏t=1

p(vt|v1:t−1). (2)

The characteristic of a Markov chain is that only a certain number of past el-

ements in�uence the present. Therefore the distribution ful�ls the conditional

independence assumption

p(vt|v1:t−1) = p(vt|vt−L:t−1) (3)

which is often called Markov assumption, where L is called the order of the Markov

chain [32, 33]. A higher order L results in a more complex model.

23

4.1 Markov Model 4 MODELS AND PREDICTION METHODS

For an order-1 Markov chain the probability density function (pdf) reduces to

p(v1:T ) = p(v1)p(v2|v1)p(v3|v2) · · · = p(v1)T∏t=2

p(vt|vt−1). (4)

In the case of the transitions p(vt = si|vt−1 = sj) = f(si, sj) being time-independent

the Markov chain is called stationary [33]. This is a major restriction in our case,

because, as described in section 3.2, the nature of the movement changes over time.

Every stationary �nite-state Markov chain can be visualized by a directed graph

where each node represents a state. The arrows show if a transition between the

states is possible and if it is, how likely it is. A simple example is presented in

�g. 11. If vt ∈ {1, . . . , V } is discrete we can introduce a state transition matrix A

1 2

1− α 1− βα

β

Figure 11: Example of state transition diagram for an order-1 Markov chain [32,p. 590].

where each entry Aij = p(vt = j|vt−1 = i) describes the probability to move from

one location to the other [32]. The transition matrix for the example in �g. 11 is

A =

(1− α α

β 1− β

). (5)

The parameters α and β describe the probability of changing states. Because the

total probability of all possible transition options from each state must sum up

to one, the probability to stay in the state has to be 1 − α and 1 − β. This is

equivalent to the requirement that the sum of each row of A equals one. Markov

based predictors have the advantage that they work well for a small amount of

users in the system because they predict the next location based on one user

and not the whole system. They also only deliver integers and positive results,

which is something that cannot be guaranteed with other methods as described in

24

4.2 Order-L Markov Predictor 4 MODELS AND PREDICTION METHODS

section 4.5.

4.2 Order-L Markov Predictor

In Libo Song et al. [22] the authors present one method to utilize the Markov

assumption in eq. (3) for an individual user based next step prediction. It uses

low complexity calculations for the prediction and supports online training of the

parameters.

At each point in time the user location is represented by x ∈ A = (a1, . . . , aN). We

assume that we know the location historyHm = 〈x1 = ai, . . . xm = aj〉, where i, j ∈{1, . . . , N}, of each user up to the point of the estimation. Assuming stationarity,

the maximum of the probability of the next step xm+1, considering the history

Hm, gives us the maximum likelihood estimate for the prediction

xm+1 = argmaxa∈A

P (xm+1 = a|Hm). (6)

Because of the Markov assumption, only the last L entries are important for our es-

timation. We call the last L entries ofHm the context cm = 〈xm−L+1 = ai, . . . , xm = aj〉of Hm. The prediction from eq. (6) then simpli�es to

xm+1 = argmaxa∈A

P (xm+1 = a|cm). (7)

To estimate the probability P (xm+1 = a|cm) we calculate the ratio of how often

xm+1 = a followed c and how often c occurred in total. We de�ne N (a|c;Hm)

as the number of times the location a followed c in the location history Hm and

N (c;Hm) as the number of times the context c appeared in Hm. We then get the

estimate for the probability

P (xm+1 = a|cm) ≈ P (xm+1 = a|cm) =N (a|cm;Hm)

N (cm;Hm)(8)

which we insert in eq. (7) to predict the next location

xm+1 = argmaxa∈A

N (a|cm;Hm)

N (cm;Hm). (9)

25

4.2 Order-L Markov Predictor 4 MODELS AND PREDICTION METHODS

4.2.1 Implementation

Algorithm 1 described by [22] is a method to predict the location in the next

timestep and updates the transition matrix M and list of previous contexts cprev

online. At the beginning a prediction is often not possible if the current context

Algorithm 1: Order-L Markov Prediction

Input:- Current symbol xm- Transition matrix M- List of previous contexts cprev- Context cm- Context length LOutput:- Prediction xm+1

- Updated transition matrix Mnew

- Updated context cm+1

if length(cm) < L thencm+1 ←− add xm to cmxm+1 ←− NaN . Prediction is not possible

elseif cm is member of cprev then

Mnew(cm, xm)←−M(cm, xm) + 1else

cprev ←− add cm to cprevMnew(cm, xm) = 1

endcm+1 ←− add xm to cmcm+1 ←− delete earliest entry of cmif cm+1 is member of cprev then

xm+1 ←− maximum of Mnew(cm+1, :)else

xm+1 ←− NaN . Prediction is not possible

end

end

cm has not occurred before. The number of samples needed until a prediction is

possible for every potential context relies also on the context length L. For L = 1

with N di�erent locations, at least N samples are required until the prediction does

26

4.3 Hidden Markov Model (HMM) 4 MODELS AND PREDICTION METHODS

always deliver a prediction. For L > 1 we need at least NL entries to never miss

a result. Fortunately, in practical applications this number might be signi�cantly

lower, because not every context will actually appear.

4.3 Hidden Markov Model (HMM)

Another individual user dependent method based on a more complex model, with

more degrees of freedom, compared to the Markov chain described in section 4.1, is

the HMM. The HMM consists of hidden states ht ∈ {1, . . . , H} and the observed

variables vt ∈ {1, . . . , V }, which are connected according to an observation model

p(vt|ht) [32]. The di�erence between a normal Markov chain from section 4.1 and a

HMM can be seen in �g. 12. In Markov chain we can observe every state transition,

while in the HMM there is a hidden layer that in�uences the observations. This

leads to the joint distribution of hidden and visible states

p(h1:T , v1:T ) = p(v1|h1)p(h1)T∏t=2

p(vt|ht)p(ht|ht−1). (10)

Assuming the HMM is stationary, the transition probability can be represented

by a H ×H transition matrix A with A(i, j) = Aij = p(ht+1 = i|ht = j) and the

emission probability as a V × H emission matrix B with B(i, j) = Bij = p(vt =

i|ht = j) [33, 32].

4.3.1 Inference in Hidden Markov Models (HMMs)

There are several classic inference tasks possible with a HMM. Usually the goal

is to infer the hidden state sequence, which cannot be observed directly, assuming

all other parameters are known [32]. We can distinguish between several di�erent

types of inference, for example

� Filtering: Calculating p(ht|v1:t) online, which reduces noise.

� Smoothing: Calculating p(ht|v1:u) where t < u.

� Prediction: Calculating p(ht+k|v1:t) or p(vt+k|v1:t) where k > 0 de�nes theprediction horizon.

� MAP estimationFor our application we will work with the prediction methods for HMMs.

27


v1 v2 v3

(a) Order-1 Markov chain

h1 h2 h3

v1 v2 v3

(b) Order-1 hidden Markov chain

Figure 12: Development in time for both a Markov chain and a HMM [33]. Forboth models only vi are observable.

4.3.2 Prediction in Hidden Markov Models (HMMs)

For the prediction we �rst want to know the probability of a hidden state ht given

the data v1:t−1. We calculate this considering the joint distribution

p(ht, v1:t−1) =∑ht−1

p(ht, ht−1, v1:t−1) (11a)

=∑ht−1

p(ht|ht−1, v1:t−1)p(ht−1, v1:t−1) (11b)

=∑ht−1

p(ht|ht−1)p(ht−1, v1:t−1) (11c)

and therefore we get

p(ht|v1:t−1) =∑ht−1

p(ht|ht−1)p(ht−1|v1:t−1). (12)

To decide which hidden state is the most likely in the next timestep, we need to

maximize the posterior probability over all possible states ht

h∗t = argmaxht

p(ht|v1:t−1). (13)

The conditional probability for a visible state at time t, given a sequence of visible

states v1, . . . , vt−1, is determined by summing over all hidden states

p(vt|v1:t−1) =∑ht

p(vt|ht)p(ht|v1:t−1) (14)

28


and we therefore get the estimate for vt as

v∗t = argmaxvt

p(vt|v1:t−1) = argmaxvt

{∑ht

p(vt|ht)p(ht|v1:t−1)

}. (15)

Forward algorithm

To e�ciently calculate eq. (11) we select the forward algorithm or α-recursion from

literature [33]. We de�ne

α(ht) := p(ht, v1:t) (16)

and rewrite the joint probability in eq. (11c) as

p(ht, v1:t−1) =∑ht−1

p(ht|ht−1)α(ht−1). (17)

We calculate α(ht) with the recursion

α(ht) = p(vt|ht)∑ht−1

p(ht|ht−1)α(ht−1) (18)

with α(h1) = p(h1, v1) = p(v1|h1)p(h1).In the recursion to calculate α(ht) we often multiply values that are smaller than

one with each other. This makes the values even smaller and causes numerical

stability problems. In the implementation we therefore apply the logarithm to

all values. This requires changing the calculation of the sum and product. We

introduce numerically more stable functions to do so in algorithm 2 and algorithm 3

[34].

29


Algorithm 2: Logarithm multiplication functions [34]

Function x * y:if x or y is equal to NaN then

z = NaNelse

z = x+ yendreturn z

end

Algorithm 3: Logarithm addition functions [34]

Function x ⊕ y:if isnan(x) or isnan(y) then

if isnan(x) thenz = y

elsez = x

end

elseif x > y then

z = x+ eln(1 + exp(y − x))else

z = y + eln(1 + exp(x− y))end

endreturn z

end

Implementation of Hidden Markov Model (HMM) prediction

The steps for the HMM prediction are outlined in algorithm 4. We �rst calculate

the α-values according to algorithm 5 and initialize x as

x(1) = p(hT ) = α(hT ). (19)

We assume that the number of hidden and visible states equals N . For each step

k that we want to predict, we determine the conditional probability for the hidden

states ht

P(:, k) = p(ht|v1:t−1) = Ax. (20)

30


To utilize the logarithm multiplication and addition from algorithms 2 and 3 we

rewrite eq. (20) element wise as

P(j, k) =N∑i=1

λij (21)

with

λij = A(i, j)x(i) (22)

for i, j ∈ {1, . . . , N}. This determines the new initial value for the next prediction

step

x(k) =

P(1, k)

...

P(N, k)

(23)

and the prediction for the kth step

x∗(k) = maxvt

(p(vt|v1:t−1)) = max (Bx(k)) . (24)

Because the values for the probability are always smaller than one, the multi-

plication of several probabilities can cause numerical problems. Identical to the

calculation of α before we again apply the logarithm to each value and work with

the numerically more stable functions stated in algorithms 2 and 3 [34].

31


Algorithm 4: HMM k-step prediction

Input:- Transition matrix A- Emission matrix B- Trainingsdata d, T = length(d)- Initial State π

- Number of steps to predict K- Number of States NOutput:- Most probable State x∗ as a K-length vector- Conditional probabilities P as an N ×K matrix

→ Calculate αlog = fwd(d,π) . defined in Algorithm 5

x = αlog(T, :)for k = 1:K do

for j = 1:N dofor i = 1:N do

λij = x(i) ∗ ln(Aij) . with * defined in algorithm 2

P(j, k) = P(j, k)⊕ λij . with ⊕ defined in algorithm 3

end

endx = P(:, k)x∗(k) = max(Bx)

end

32


Algorithm 5: Alpha recursion

Input:- Transition matrix A- Emission matrix B- Trainingsdata d, T = length(d)- Initial states π- Number of states NOutput:- Logarithm of α values αlog as a T ×N -matrix

for i = 1:N doαlog(1, i) = ln(π(i)) ∗ ln(B(i,d1)) . with ∗ defined in

algorithm 2

endfor t = 2 : T do

for j = 1 : N doαtemp = NaNfor i = 1 : N do

αtemp = αtemp ⊕ αlog(t− 1, i) ∗ ln(Aij) . with ∗ and ⊕defined in algorithms 2 and 3

endαlog(t, j) = αtemp ∗ ln (B(j,dt)) . with ∗ defined in

algorithm 2

end

end

33


Algorithm 6: Beta recursion

Input:- Transition matrix A- Emission matrix B- Trainingsdata d- Number of States N- T = length(d)Output:-βlog is a T ×N -matrix

→ Initialize βlog with 0for t = (T − 1) : −1 : 1 do

for i = 1 : N doβtemp = NaNfor j = 1 : N do

βtemp = βtemp ⊕ [ln(Aij) ∗ ln(B(j,dt+1)) ∗ βlog(t+ 1, j)] . with

∗ and ⊕ defined in algorithms 2 and 3

endβlog(t, i) = βtemp

end

end

4.3.3 Training the Parameters of HMMs

Since there is no analytical way of calculating the parameters of a HMM we have to

employ a statistical method [35]. To estimate the parameters we apply the Baum-

Welch algorithm, which is a variant of the expectation maximisation (EM) method

[33]. The algorithm locally maximizes the posterior probability p(v1:T |A,B,π) re-lying on the observation data v1:T . Here v1:T are the visible states, A is the

transition matrix, B is the emission matrix and π = γ(h1) [26].

The method itself is not numerically stable and therefore some measures to deal

with very small numbers need to be taken. We choose an approach illustrated by

several authors, like Barber [33] and P. Murphy [32], which is to apply logarithms

to all parameters. Additionally the function for addition and multiplication de�ned

in algorithm 2 and algorithm 3 are used.

35


For the parameter training we �rst need to determine α and β from algorithm 5

and algorithm 6 for each user. We then calculate γ from eq. (32). This also gives

us πi = γ(h1), which is the estimated probability to start in state i [34]. Utilizing

the previous A and B, ξ can be determined as

ξt(i, j) = p(ht = i, ht+1 = j|v1:T ) (33a)

=αt(j)A

(0)ij B

(0)j (vt+1)βt+1(j)∑N

k=1

∑Nl=1 αt(k)A

(0)kl B

(0)l (vt+1)βt+1(l)

(33b)

We now get the entries for the matrix A as

Aij =

∑T−1t=1 ξt(i, j)∑T−1t=1 γt(i)

(34)

and matrix B as

Bj(k) =

∑t:vt=vk

γt(j)∑Tt=1 γt(j)

. (35)

The training for the parameters of the HMM based method is outlined in algo-

rithm 7.

36


Algorithm 7: Training of HMM Parameters [34, 35]

Input:- Training data V- Initial emission matrix A(0)

- Initial transition matrix B(0)

- Initial starting state B(0)

- Number of states NOutput:- Transition matrix A- Emission matrix B- π

while ‖A−A(0)‖ > ε doforeach user in V do

v1:T ←− Vuser

α = fwd(v1:T ,A,B,π) . defined (def.) in Algorithm 5

β = bwd(v1:T ,A,B,π) . def. in Algorithm 6

→ Calc. γt(i) ∀t and ∀i ∈ N def. in eq. (32).

→ Calc. ξt(i, j) ∀t and ∀i, j ∈ N def. in eq. (33)

→ Calculate πi

for i = 1 : N doπ(i) = γ1(i)

end→ Calc. each entry Aij of A def. in eq. (34)

→ Calc. each entry Bj(k) of B def. in eq. (35)

end

A(0) = A

B(0) = B

π(0) = π

end

37

4.4 State Space Model 4 MODELS AND PREDICTION METHODS

4.4 State Space Model

Switching from predicting an individual user location to predicting crowd move-

ment, we now describe the state of the system with the aggregated sum of users per

AP. This leads us a linear state space model as an alternative modelling approach

to a simple Markov chain or HMM. We again postulate a Markov assumption,

but now assume that the state only depends on the timestep before. We illustrate

this idea with a simple example with three states and an �output/input state� o

as seen in �g. 13. This could represent a three cell network where users can move

freely between cells and leave or enter the system via state o.

x1 x2

y1 y2

z1 z2

o1 o2

fx(x1 → x2)

fy(x1 → y2)

fz(x1 → z2)

fo(x1 → o2)

Figure 13: Trellis diagram to motivate a linear state space model to model themovement between cells.

For the proposed example we can write the equation for timestep 1 to timestep 2

as

x2 = fx(x1 → x2) + gx(y1 → x2) + hx(z1 → x2) + kx(o1 → x2) (36a)

y2 = fy(x1 → y2) + gy(y1 → y2) + hy(z1 → y2) + ky(o1 → y2) (36b)

z2 = fz(x1 → z2) + gz(y1 → z2) + hz(z1 → z2) + kz(o1 → z2). (36c)

We now assume that the transition function is linear, fx(x1 → x2, t) = p(x1 →

38

4.5 Kalman Filter 4 MODELS AND PREDICTION METHODS

x2)x1, resulting in a transition modelxk+1

yk+1

zk+1

=

p(xk → xk+1) p(yk → xk+1) p(zk → xk+1)

p(xk → yk+1) p(yk → yk+1) p(zk → yk+1)

p(xk → zk+1) p(yk → zk+1) p(zk → zk+1)

xkykzk

+

p(ok → xk+1)

p(ok → yk+1)

p(ok → zk+1)

uk. (37)

Entering and exiting the system is now modelled with the input uk. This leads us

to the linear state space model

xk+1 = Akxk +Bkuk (38)

where the state xk describes how many users are at the APs at a certain point in

time. Here Ak is the transition probability matrix and Bk de�nes the weights for

the input uk.

A linear model is the basis for many control methods. Our particular task is to

predict the next state xk+1 minimizing the MSE. A well known estimator for such

a prediction task is the Kalman �lter, which we introduce in the next Section.

4.5 Kalman Filter

Assuming the system we want to characterize can be described by a linear state

space model,

xk+1 = Axk +Buk (39)

yk = Cxk +Duk, (40)

estimating the future state in the next timestep is equal to minimizing the er-

ror between the estimated state xk+1 and the real state xk+1. The easiest state

39


estimator is a simulator,

xk+1 = Axk +Buk (41)

yk = Cxk +Duk (42)

which comes down to copying the system and then calculating the next step based

on the linear model. This trivial state estimator does not utilize measurements

available that might improve the prediction. It therefore only works if the system

is described well by the linear model and no noise is present [31].

A better state estimator is the Luenberger observer

xk+1 = Axk +Buk + K (yk − yk) (43)

yk = Cxk +Duk. (44)

It adds the observation error eobs = yk − yk weighted by a constant K to the

prediction of the simulator [31]. If K is chosen correctly the new estimate xk+1 is

improved by counteracting the estimation error eest = ‖xk+1 − xk+1‖.

System

State Estimator

u+

Disturbance

y

x

Figure 14: Block diagram showing the relationship between state estimator andsystem.

Now the open questions remains how to optimally choose K. Utilizing the perfor-

mance metric introduced in eq. (1) and including the assumptions

E {vk} = 0 E{vkv

Tj

}= Rδkj (45a)

E {wk} = 0 E{wkw

Tj

}= Qδkj (45b)

E{wkv

Tj

}= 0 (45c)

40


about measurement noise vk and driving noise wk, the optimal solution for K

is given by the Kalman �lter [36]. It combines sequential linear minimum mean

square error (LMMSE) estimation and a linear state space model of the signal [37].

Modelling our network scenario

For our purposes we assume a linear, possibly time varying model

xk+1 = Akxk +Bkuk +Gkwk x(0) = x0 (46a)

yk = Ckxk +Dkuk +Hkwk + vk (46b)

where x ∈ Rn is the state, u ∈ Rp is the input, y ∈ Rq is the output, w ∈ Rr is

the driving noise and v ∈ Rq is the measurement noise.

For the Kalman �lter to be optimal in terms of MSE we need several assumptions

to be ful�lled. First we assume v and w are zero mean, white and stationary as

de�ned in eq. (45) with Q ≥ 0, R ≥ 0 and

δkj =

1, k = j

0, k 6= j.(47)

We also assume the expectation of the initial value m0 = E{x0} and the initial

error covariance P0 = E{(x0 − x0)(x0 − x0)T} ≥ 0. Additionally the noise and

state are considered to be uncorrelated E{wkxT0 } = 0 and E{vlx

T0 } = 0.

The Kalman �lter

The Kalman �lter is a recursive formulation of sequential an LMMSE estimation

[37]. It consists of three steps: calculating the Kalman gain matrix, updating the

estimate and then determining the new error covariance [36].

First, the Kalman gain matrix

Kk = AkP(k|k − 1)CTk

(CkP(k|k − 1)CT

k +HkQHk +R)−1

(48)

is calculated. Then the state is updated by combining the state space model and

41


the di�erence of the measured output yk and the output from the model

x(k + 1|k) = Akx(k|k − 1) +Bkuk +Kk(yk −Ckx(k|k − 1)−Dkuk). (49)

Lastly the new error covariance is determined

P(k + 1|k) = AkP(k|k − 1)Ak +GkQGTk −KkCkP(k|k − 1). (50)

If the system matrices A, B, C, H and the noise covariance matrices H and R

are not time dependent, the Kalman gain K and the error covariance P can be

precomputed before knowing any measurements yk. If they vary over time, they

have to be calculated at each step, including an inversion of an r × r-matrix. De-

pending on the number of states in the system, this might increase computational

time signi�cantly because the complexity of a matrix inversion grows with O(n3).

The recursion is either initialized with prior knowledge of x0 = m0 and P0 or, if

there is no information about the initial state or error covariance, we set x0 = 0

and P0 = αE with α� 1 [36].

The error behaviour of the �lter can be in�uenced by choosing the covariance

matrices Q and R. R can be seen as con�dence in the sensors. We can therefore

set the entry in the main diagonal of R proportional to our con�dence in the

measurement. For Q no such idea can be found and therefore it is commonly set

by trial and error [36].

4.5.1 Kalman Filter Implementation

We implemented the Kalman Filter prediction method in Matlab, see appendix A.2

for the code. In the following we will outline the most important design decisions.

Prediction implementation

We know that the number of people at each AP cannot be smaller than zero and

therefore add an additional step to ensure that our estimation always has a posi-

tive result.

42


There are two methods suggested by Gupta and Hauser [38] to account for in-

equality constraints when working with a Kalman �lter:

1. Project the unconstrained estimate xk+1 = x(k+1|k) into a space that ful�lsthe constraints. This can be written as the optimization problem

x(p)k+1 = argmin

x

{(x− xk+1)

T(x− xk+1)}

(51a)

s.t. Cx ≤ d. (51b)

2. Restrict the optimal Kalman gain to only allow solutions that �t the con-

straints. This results in the optimization problem

K(p)k = arg min

K∈Rn×mtrace

{(I−KCk)P(k|k − 1)(I−KCk)

T +KRKT}(52a)

s.t. C (xk+1 +Kkνk) ≤ d (52b)

where νk is the measurement residual νk = yk −Ckx(k|k − 1)−Dkuk.

In the implementation summarized in algorithm 8 we �rst calculate the standard

Kalman estimate and then modify the result to ful�l the constraint x ≥ 0 by

solving the minimization problem outlined in eq. (51). For the optimization we

apply the Matlab function fmincon.

43


Algorithm 8: Kalman �lter prediction

Input:- New measurement y- Estimate xk

- Error covariance Pk

- System parameters A,B,C,D,G,H,R,Q- Input uk:k+h

- Number of steps to predict hOutput:- Estimates xk+1:k+h

- ErrorCovariance Pk+h

for i = 1 : h doK = APk+i−1C

T(CPk+i−1CT +HQHT +R)−1

xk+i = Axk+i−1 +Buk +K(y −Cxk+i−1 −Duk)Pk+i = APk+i−1A

T +GQGT −KCPk+i−1AT

if xk+i < 0 then

xk+i ←− calculate x(p)k+i according to eq. (51) with C = −I and

d = 0

end

end

Training implementation

For the Kalman �lter prediction method we primarily need to train the parameters

A and B. For A we estimate the probability of a transition between states with

the sample mean of each transition from the AP changes in the individual user

sequences.

To estimate B we also utilize the individual user sequences. We sum over all �rst

and last locations of each user and then normalize the resulting vector. This gives

us the probabilities of where users appear Bin and where users disappear Bout in

the system. Depending on the sign of u we later work with Bin or Bout as an input

for the Kalman �lter described in algorithm 8.

In order to minimize the number of iterations, we calculate all parameters A, Bin

and Bout in one function as outlined in algorithm 9.

Q and R were determined by optimizing for an minimum MSE using the fmincon

function in Matlab.

44

4.6 Machine Learning and Neural Networks 4 MODELS AND PREDICTION METHODS

Algorithm 9: Calculation of transition matrix A for Kalman �lter pre-dictionInput:- Training Data Vusers

Output:- Transition Matrix A- Input transition vector Bin

- Output transition vector Bout

foreach v1:T ∈ Vusers doBin(v1)+ = 1Bout(vT )+ = 1for k = 1 : length(v1:T ) do

TransRow = vkTransColumn = vk+1

A(TransRow, TransColumn)+ = 1end

endNormalize Bin and Bout

Normalize columns of A

4.6 Machine Learning and Neural Networks

To compare the algorithms described in previous sections to a purely data-driven

method, we now introduce the basics of machine learning (ML) and the popular

variation neural network (NN). It also o�ers an AP based prediction approach,

directly inferring the number of user for every AP for each prediction step.

We give an overview on ML, describe the structure of a NN and summarize the

main points of our NN based prediction implementation.

4.6.1 A short Machine Learning Overview

ML is a general term for prediction, estimation or classi�cation methods that

do not rely on an underlying data model. P. Murphy [32] de�nes ML as �a

set of methods that can automatically detect patterns in data, and then use the

uncovered patterns to predict future data, or to perform other kinds of decision

making under uncertainty[...].�

45


As summarized by [32], we can generally di�erentiate between two types of ML.

1. Predictive learning or supervised learning: The goal is to learn a

mapping of inputs x to outputs y given a labelled set of input-output pairs

which are called training set. Here x can be scalar, a vector or more complex

data like an image, sentence or graph. The types of problem where y is from

a �nite set (e.g. y ∈ {red, blue, green}) are called classi�cation or pattern

recognition tasks, in contrast to regression tasks where y is real valued.

2. Descriptive or unsupervised learning: With this type of application the

goal is to �nd important patterns in data.

Generally we can de�ne ML problems as function �tting problems [39]. We assume

that y can be estimated the sum of selected basis functions φj(x)

y(w,x) = w0 +M−1∑j=1

wkφj(x) (53)

or with φ1(x) = 1 as

y(w,x) =M∑j=1

wkφj(x). (54)

If φj(x) are nonlinear functions the model for y itself is nonlinear, but the estima-

tion is linear in regards to the M parameters wk.

4.6.2 De�nition of Neural Network

There are di�erent approaches on how to choose the basis functions φj(x). The

neural network (NN) method is the most common ML method, which uses a �xed

number of basis functions and only adapts the weights w during training. Figure 15

shows an example of a NN with three layers. Each layer is connected with weights.

In NNs φj(x) are nonlinear activation function h(·) applied to linear functions of

x [39]. Often used activation functions are sigmoidal functions such as logistic

sigmoid or tanh [39].

46


N1(1)

N2(1)

N3(1)

N1(2)

N2(2)

N3(2)

N4(2)

N1(3)

N2(3)

x1

x3

x2

w11(1)

w21(1)

w31(1)w41

(1)

w11(2)

w21(2)

Hidden layer

Output layer

Input layer

y1

y2

Figure 15: Exemplary graph for a neural network.

In each layer the weights and the input of the layer are used to calculate the

activations aj for j = 1, . . . ,M . The coe�cients aj can be described as M linear

transformations of the input variables x = (x1, . . . , xD)T

aj =D∑i=1

w(1)ji xi + w

(1)j0 (55)

=D∑i=0

w(1)ji xi with x0 = 1. (56)

The index (1) indicates that it is the �rst layer of the network. The variables w(1)ji

are weights and w(1)j0 are often called biases [39].

In a neural network the corresponding outputs zj of basis functions φj(x) are often

written as function of aj evaluating the activation function h(·)

zj = h(aj). (57)

For the second, and in our case last layer, zj are linearly combined to give the

47


output unit activation

ak =M∑j=1

w(2)kj zj + w

(2)k0 . (58)

The outputs are calculated as yk = σ(ak) with appropriate activation function σ(·).For regression no last activation function is required and therefore yk = ak [39].

Combining eq. (55) and eq. (58), the calculation of a neural network for regression

results in

yk(x,w) =M∑j=0

w(2)kj h

(D∑i=0

w(1)ji xi

). (59)

NNs can be displayed by a directed graph as seen in �g. 15. This example shows

a NN with three input variables, one hidden layer with four nodes and an output

layer with two nodes6.

The number of hidden layers and number of nodes in each hidden layer can be

chosen to enhance the performance of the NN. The goal of training a NN is equal

to �nding the optimal weights w in terms of an error function E(w). If the weights

are determined successfully, eq. (59) gives us the results of the prediction. NNs

are always trained on a training set and their performance is evaluated with a test

set. The test set consists of new data that was not known during the training

phase. During training the NN is evaluated and then the weights adjusted in a

way that minimizes the chosen error function. If the characteristic of the training

data can be mapped perfectly by the NN, the system is called over�tted. In

this case the performance of the test set decreases, because the NN does not

represent the essential properties of the data in general, but only the characteristic

of the trainings set. To prevent over�tting methods like dropout, which temporarily

removes a certain percentage of neurons randomly in each training iteration, are

used [40].

There are numerous training methods for neural networks and speci�c optimization

algorithms are therefore outside the scope of this work. We selected the later

implemented methods according to state-of-the-art [39].

6For this examples the dimension are D = 3, M = 4 and K = 2.

48


4.6.3 Practical Aspects of the Neural Network Prediction Method

The NN was implemented with the well known deep learning framework Keras

in Python [41]. It o�ers an easy to use, �exible and e�cient implementation of

the most common ML methods. We work with a standard con�guration of a two

layer neural network with 256 neurons each. The optimal number of neurons is

determined by an optimization using the python package Hyperopt by [42]. For the

activation function we choose relu for the two hidden layers, because it is a very

successful and widely used activation function [43]. It is de�ned as element-wise

maximum of x

relu(x) = max(x, 0). (60)

During training we apply dropout in each layer to prevent over�tting [40]. The

number of nodes in the output layer is proportional to then number of states and

prediction steps. For optimization we choose the algorithm Adam, with the MSE

as a loss function, because it is a computationally e�cient method for problems

with many parameters or large datasets [44].

49

5 PERFORMANCE ANALYSIS

5 Performance Analysis

For the Wi-Fi data from our test scenario we cannot know the ground truth. There-

fore we �rst investigate several di�erent theoretical data models to benchmark the

prediction methods.

The scenarios we look at are setups where we assume to be able to measure the

current number of users at each AP, however, due to the measurement creating

load on the system, we want to minimize the amount of measurements that we

have to take while still knowing how many users are at the APs at each point in

time. To do so we �rst investigate theoretical data which o�ers the possibility to

test sensitivity of parameters to noise on the prediction performance. Here we test

the algorithms for di�erent prediction horizons or prediction steps. A prediction

horizon of p, or p prediction steps implies we assume that we know every pth state

of the system, x1:p:k, and predict xk+1:k+p for k = 1, . . . , T − p.

5.1 Theoretical Data

To test our prediction methods we set up two di�erent theoretical examples. The

�rst scenario generates data on an AP basis with a state space model, giving us the

opportunity to investigate how increasing prediction horizon and parameter error

in�uence the performance of the Kalman predictor. The second set of data consists

of individual user sequences generated by a HMM. This models the randomness

with an underlying statistical model that is also visible in our collected Wi-Fi data.

It o�ers a �rst benchmark of order-L Markov, HMM and the Kalman predictor.

5.1.1 Data generated by State Space Model

We generate a time series with six states, where each state represents an AP. The

movement diagram, including the enter and exit possibility of users represented

by state o are displayed in �g. 16. Users may only enter the system in state 1 and

leave the system in state 5.

50

5.1 Theoretical Data 5 PERFORMANCE ANALYSIS

1 5

4

3

2

6

o

Figure 16: State diagram for the theoretical state space model example.

We model this system with the state space equations

xk+1 = Axk +B(uk)uk (61a)

yk = Cxk +Duk. (61b)

In this model we are free to de�ne any transition matrix. The transition matrix

we choose is

A =

0.5 0 0 0 0 0

0.5 0.25 0.5 0.5 0 0

0 0.25 0.5 0 0 0

0 0.25 0 0.5 0 0

0 0.25 0 0 0.5 0.5

0 0 0 0 0.5 0.5

. (62)

Apart from the de�ned movement between the states, according to the de�nition of

A in eq. (62), it is also possible that users remain in a state for several timesteps,

because the entries in the main diagonal are not zero. The diagram in �g. 16

implies that B depends on the sign of uk, because users can only leave and enter

the system in certain states. This results in

B =

(1 0 0 0 0 0

)T, uk > 0(

0 0 0 0 1 0)T

, uk ≤ 0. (63)

51


We select the matrices C, H, G, Q, R to be identity matrices, set D to zero and

therefore reduce the model from eqs. (61a) and (61b) to

xk+1 = Axk +B(uk)uk (64a)

yk = xk, (64b)

and set the initial state as x0 = 0.

As an input function we choose

uk = sin

(k − 60

100

)+ sin

(k

500

)(65)

to simulate a rise and fall of users arriving over a day. The resulting number of users

over timesteps can be seen in �g. 17. In the plot it is visible that occasionally the

number of users is below zero. This type of model does not guarantee a positive

result, which makes adaptions to prediction methods that are based on a state

space model necessary. Our chosen modi�cations are outlined in section 4.5.1.

0 500 1000 1500 2000 2500 3000 3500

Timesteps

-0.2

0

0.2

0.4

0.6

0.8

1

Nr.

of users

Router 1

Router 2

Router 3

Router 4

Router 5

Router 6

Figure 17: Number of users at each AP over time generated by the state spacemodel example described in section 5.1.1.

52


Kalman Filter Prediction

We employ the data generated by the state space model to analyse the stability

of the Kalman �lter prediction in terms of parameter noise and prediction hori-

zon. Due to our precise knowledge of the parameters, the prediction is perfect no

matter how long the prediction horizon gets. However if we introduce noise to the

parameter A, like in �g. 187, the MSE increases with larger prediction horizons.

Noisy parameters occur due to measurement errors or wrong models. Figure 18

also shows that the minimal MSE grows slower than the maximal MSE.

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940

Prediction Horizon

0

1

2

3

4

5

6

MS

E

10-4

Figure 18: Variation of MSE for increasing prediction horizon and additive uniformnoise on A with variance 10−2.

Another factor that a�ects the performance of real world prediction is the sensitiv-

ity of the predictor regarding parameter estimation. For our theoretical example

we simulated noise on either parameter A or B with a variance between 0.01 and

1 for di�erent prediction horizons. As seen in �g. 19, noise on di�erent parameters

in�uences the performance of the prediction in di�erent ways. For both, A and

B, noise has the least in�uence for the lowest prediction horizon. There is almost

no impact on the MSE for noise on B with the prediction horizon one, while an

error of A increases the MSE signi�cantly.

7The boxplot of 100 realizations shows the median MSE with the red line. The lower limit of

the blue box is de�ned by the median of all values under the overall median and the upper limit

is de�ned by the median of all values above the median MSE. The whiskers are maximally 1.5

times the interquartile range long and therefore represent ±2.7σ or 99.3% of data if the values

were normally distributed. The red crosses display outliers outside of that whisker length.

53


0 0.2 0.4 0.6 0.8 1

Noise variance

0

1

2

3

4

5

6

7

MS

E

10-3

B | 1

B | 20

B | 40

A | 1

A | 20

A | 40

Parameter with Noise | Prediction Horizon

Figure 19: Results for MSE over noise variance for di�erent prediction horizonsand noisy parameters A and B for the state space model example.

The steepness of the increase in MSE also di�ers between the two parameters. For

A the error increases steeper in the beginning and for B the increase in MSE starts

slowly but then follows an almost exponential growth.

In conclusion, the Kalman �lter prediction might be very sensitive to errors when

choosing the parameters. It also shows that more resources should be put into

determining A in comparison to estimating B.

5.1.2 Data generated by Hidden Markov Model (HMM)

For an initial benchmark of the order-L Markov, HMM and Kalman predictor, we

generate individual user sequences with an HMM. The generated data, as seen

in �g. 20, shows similar fast �uctuations as the Wi-Fi data in �g. 7, giving us

insight into the behaviour of the prediction methods when dealing with potential

randomness in user movement. The HMM example has six states and the transition

matrix

A =

0.882 0.029 0 0 0 0

0.027 0.838 0.027 0.027 0 0

0 0.029 0.886 0 0 0

0 0.029 0 0.857 0.029 0

0 0.116 0 0 0.698 0.116

0 0 0 0 0.029 0.882

(66)

54


0 50 100 150 200 250 300 350 400 450 500

Timesteps

0

0.5

1

1.5

2

2.5

Nr.

of users

Router 1

Router 2

Router 3

Router 4

Router 5

Router 6

Figure 20: Number of users over time stacked for each AP for the theoretical HMMdata.

and the emission matrix

B =

0.75 0.25 0 0 0 0

0.2 0.6 0.2 0 0 0

0 0.2 0.6 0.2 0 0

0 0 0.2 0.6 0.2 0

0 0 0 0.2 0.6 0.2

0 0 0 0 0.25 0.75

. (67)

Because in a real world scenario not all users would have the same sequence length8,

we vary the trace length for each user with a uniform distribution between 0 and

500. We then choose a start timestep, so that the sequence ends latest at 500

timesteps.

Prediction with order-L Markov predictor without parameter noise

The order-L Markov predictor does not need any pre-trained parameters for its

prediction. Therefore there is no concern with noise in the parameters. However if

the model is not ful�lling the assumptions presented in section 4.1, there is no way

to tailor variables to the problem. It also creates the problem of a convergence

8Sequence length is de�ned as the length of the user's movement history.

55


phase in the beginning of the prediction, where the algorithm has not seen enough

data to make a prediction.

0 2 4 6 8 10 12 14 16 18 20

Prediction Horizon

0

0.05

0.1

0.15M

SE

Order 1

Order 2

Order 3

Order 4

Order 5

Order 6

Figure 21: Results for Markov predictors with di�erent order L for the HMMexample data.

For the Markov predictor the order L, representing the length of context that in-

�uences the next step, could be important in terms of prediction accuracy. For the

HMM example we compare di�erent orders of Markov predictors in �g. 21. For one

or two prediction steps the order-2 predictor has a lower MSE, however for higher

prediction horizons the order-3 predictor shows better results. Interestingly, the

order-1 Markov predictor has the worse performance for higher prediction horizons

together with the order-6 predictor. This shows that increasing the context length

increases MSE if the additionally considered data is not relevant to the prediction.

Consequently, the right order of Markov assumption needs to be determined in

order for this approach to work well.

Prediction with HMM based predictor without parameter noise

With a certain probability a user either moves to one AP or the other, therefore the

prediction of the theoretical HMM data with maximum likelihood is not perfect,

even just for one prediction step.

For a higher prediction horizon this error propagates and increases each time an

error has been made in the previous prediction. This behaviour is displayed in

56


�g. 22a, where without any noise added, the MSE increases with larger prediction

horizons.

Prediction with HMM based predictor with parameter noise

If even without parameter noise the larger prediction horizon leads to an MSE

increase, we now want to answer the question of what happens if the parameters

are not known exactly. To do so, we look at the HMM example data again and

add uniformly distributed noise to the transition matrix A with varying variance.

0 2 4 6 8 10 12 14 16 18 20

Prediction Horizon

0.011

0.012

0.013

0.014

0.015

0.016

0.017

MS

E

(a) without noise

0 2 4 6 8 10 12 14 16 18 20

Prediction Horizon

0

0.05

0.1

0.15

0.2

0.25

MS

E0.00

0.05

0.10

0.15

0.20

0.25

Noise variance

(b) uniform noise added with di�erent vari-

ances to transition matrix A

Figure 22: MSE results for HMM theoretical example data applying the HMMbased prediction.

The results in �g. 22b con�rm the MSE increases with higher noise variance. The

plot also shows that the MSE for all noise variances for a prediction horizon lower

than 5 does not increase signi�cantly when compared to the rapid MSE change

for over 5 steps of prediction. This suggests that even with a high parameter noise

up to 5 prediction steps are possible without a decrease of prediction performance.

Based on this observation we assume that there is a limit for how many prediction

steps with the HMM method are possible before the MSE increases abruptly.

57


Prediction with Kalman �lter

As compared to the prediction with a HMM based method, the Kalman Filter

estimate stays more constant over prediction horizon. In �g. 22b the MSE increases

rapidly after a certain prediction horizon. Compared to the MSE increase in

�g. 19, the error in �g. 23 does not signi�cantly increase, even for higher prediction

horizons. This suggests that even though we found in section 5.1.1 that a noise on

A creates a steep increase of the MSE, the error does not seem to propagate as

fast as for the HMM prediction if the data is generated by a HMM.

0 2 4 6 8 10 12 14 16 18 20

Prediction Horizon

0.005

0.01

0.015

0.02

0.025

0.03

0.035

MS

E

0.00

0.05

0.10

0.15

0.20

0.25

Noise variance

Figure 23: Results for the HMM theoretical data applying the Kalman �lter pre-dictor.

Comparison of prediction methods for data generated by HMM

To compare the di�erent prediction methods we look at MSE of all three methods

considering increasing prediction horizon and noise variance. The results in �g. 24

show that HMM and Kalman �lter display a very similar behaviour for no noise

over increasing prediction horizons. However, as soon as we introduce parameter

noise, the Kalman �lter prediction generally outperforms the HMM estimate. We

also compare the results to the order-3 Markov prediction, because it was showing

the best results in our previous analysis from �g. 21. For prediction lengths over

10, the Markov based predictor delivers better results than the HMM method,

but the Kalman based approach always results in a lower MSE. This leads us to

the conclusion that the Kalman �lter approach is very promising in providing a

58


0 2 4 6 8 10 12 14 16 18 20

Prediction Horizon

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

MS

E

HMM | 0.00

HMM | 0.10

HMM | 0.25

Kalman | 0.00

Kalman | 0.10

Kalman | 0.25

Order-3 Markov

Pred. Method | Noise variance

Figure 24: Comparison of prediction performance for di�erent prediction methodsand parameter noise variances for the HMM theoretical example data.

low MSE prediction, especially for a long prediction horizon. It delivers relatively

constant results for increasing prediction horizon and is stable when parameter

noise is added.

Table 5: MSE for di�erent prediction methods for prediction horizon 1, 5 and20. The results for Kalman and HMM based predictions are displayed for noisevariance 0.1.

Order-3 Markov Kalman HMMPrediction Horizon 1 2.14 · 10−2 1.08 · 10−2 1.48 · 10−2Prediction Horizon 5 3.75 · 10−2 1.77 · 10−2 1.93 · 10−2Prediction Horizon 20 4.75 · 10−2 2.23 · 10−2 14.20 · 10−2

As summarized in table 5, the MSE for a one step Kalman �lter prediction increase

by 2.18 · 10−3 and for a �ve step prediction by 1.50 · 10−2. Figure 25 depicts

the sum of users of AP 1 and the prediction results for a one step prediction

from all three methods9. The original data has many fast changes that are not

represented by the Kalman estimate, however the smoothed curve that the Kalman

�lter approach delivers seems to be the best estimate overall. This property might

9Examples of other APs can be seen in appendix A.5

59


0 100 200 300 400 500

Timesteps

0

5

10

15

20

25

30

35

Nr.

of users

AP Nr. 1

Original Data

HMM

Order-3 Markov

Kalman

Figure 25: Results for a one step prediction for HMM, Order-3 Markov and Kalman�lter prediction

be a disadvantage if the number of users �uctuates and the increase or decrease

needs to be predicted.

5.1.3 Data generated by Agent based Simulation

The data generated by the Anylogic simulation o�ers the possibility to benchmark

the Kalman �lter approach in regards to fast user �uctuations missing in the pre-

vious HMM model data. The number of users �uctuate very abruptly and the

prediction needs to follow with a small error. We will �rst describe the Anylogic

simulation and how the parameters for the Kalman predictor were chosen. We

then present results for the prediction performance for di�erent prediction hori-

zons.

During the simulation Anylogic records how many people are in the de�ned router

areas in every simulated timestep. Because we do not have traces for single users

from the Anylogic simulation, we �rst need to set the parameters for the predictor.

We initialize the parameters according to the �oor plan resulting in the values for

A and B as stated in eq. (68) and eq. (69) in appendix A.3. We then use the

Matlab function fmincon iteratively, changing the new optimal value if it delivers

60


0 100 200 300 400 500 600 700

Timesteps

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09AP Nr. 1

Estimate

Data

(a)

0 100 200 300 400 500 600 700

Timesteps

0

0.2

0.4

0.6

0.8

1

1.2AP Nr. 2

Estimate

Data

(b)

Figure 26: Plot of normalized data from the Anylogic simulation and its Kalmanprediction for 5 prediction steps.

better results than the past values in order to �nd the optimized parameters. We

do this until the norm of the old MSE minus the new MSE is smaller than the

threshold 10−5. The results for the matrices from this optimization are stated in

eqs. (70) to (72) in appendix A.3.

For �ve prediction steps we get an MSE of 2.203 · 10−3 and the results for one

AP are displayed in �g. 26. The Kalman �lter does not deal ideally with sudden

steep rises. As seen in �g. 26a, sometimes the estimates cuts o� peaks in the

data. This appears to happen especially if the peaks are small compared to data

at other APs. At AP Nr. 2 the total amount of users is higher and the predictor

overestimates the values of the peaks. This di�erence in dynamic is in�uenced by

the parameters of the predictor. While choosing the parameters there is a trade-o�

between dynamically reacting to fast �uctuations and also being able to predict

when the number of users at an AP is not changing. This shows that the Kalman

method can react to sudden changes in the amount of users, but if the parameters

are not set in an optimal way it will over or underestimate the �uctuations. This

points to a practical problem in choosing the parameters.

61

5.2 Wi-Fi Data 5 PERFORMANCE ANALYSIS

The Anylogic example also gives us the possibility to investigate how the MSE

for the Kalman �lter approach changes with increasing prediction horizon. As

�g. 27 shows, this increase in MSE has a linear trend.

0 5 10 15 20 25 30 35 40

Prediction Horizon

-2

0

2

4

6

8

10

12

14

16M

SE

10-3

Figure 27: Development of MSE with the Kalman �lter prediction for increasingprediction horizon applied on the Anylogic data.

5.2 Wi-Fi Data

For the performance comparison with our collected Wi-Fi data we de�ne one week

of data for training and one week of test data to measure mean prediction error.

We will �rst determine the best order for the Markov predictor and then benchmark

it against the HMM, Kalman and ML prediction methods. We also determined

results for a predictor that always repeats the last step for the length of the pre-

diction horizon to analyse if the predictor can outperform a trivial predictor. We

call this predictor use last step (ULS).

Training parameters of di�erent models with Wi-Fi data

We calculate transition and input matrix for the Kalman �lter method and de�ne

the input function by �tting a spline curve to the summed data of the training

week. We also estimate the transition and emission matrix with the training data

for our HMM prediction method. The order-L prediction method does not need

62


any training because it learns all parameters online. Utilizing training data to help

initialize the method did not improve the result.

5.2.1 Order-L Markov Predictor

Because we have seen in section 5.1.2 that the order of the Markov predictor

can play a signi�cant role in the performance of the method, we �rst test several

variations of L in order to determine which delivers the best results in terms of

MSE. Figure 28 shows the results for the Markov predictor for the Wi-Fi data

with di�erent context lengths. For one prediction step the results for all orders are

very close and for higher prediction steps the lower order predictors outperform the

higher order methods. Because order-1 delivers slightly better results for prediction

horizons 2, 3 and 5 and is the least complex, we choose order-1 for later evaluations.

1 2 3 4 5

Prediction Horizon

0

0.5

1

1.5

2

2.5

3

3.5

4

MS

E

10-3

1

2

3

4

5

Order

Figure 28: MSE results for di�erent order Markov predictors applied on the mea-sured Wi-Fi data.

63


5.2.2 Benchmarking of di�erent Prediction Methods

We now benchmark the results for the prediction of number of users per AP for

the order-1 Markov, Kalman, HMM, ML and ULS predictor.

In a �rst step we analyse MSE for di�erent prediction horizons. Figure 29 shows

that for a one step prediction the order-1 Markov, ULS and HMM method result

in an almost identical MSE which is lower than Kalman �lter result. For longer

prediction horizons the HMM and Kalman approach outperform the the order-1

and ULS method. This con�rms that most users in the Wi-Fi network stay in one

location for at least a certain period of time, making the last location a very good

predictor for one step. Only for longer prediction horizons we can see the bene�t

of working with a model predicting movement.

1 2 3 4 5 6 7 8 9 10

Prediction Horizon

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

MS

E

10-3

HMM

Kalman Filter

Order-1 Markov

ULS

Pred. Method

Figure 29: MSE for HMM, Kalman, order-1 and ULS prediction for increasingprediction horizon using the Wi-Fi data from Campus Guÿhaus.

A di�erence between the trends of order-1 Markov, HMM, Kalman, ML and ULS

can be examined in �g. 30b. The ML, order-1 Markov, ULS and HMM curve

follow the data almost perfectly. The Kalman �lter estimate seems to follow the

data a step behind and sometimes resembles a moving average result. Figure 31

shows the results for the same dataset, but with the prediction results for a �ve

64


step prediction, which equals a time of 2.5 minutes. Figure 31a reveals that for 5

steps the order-1 Markov and ULS method do not predict the trend of the data

any more and the performance drops drastically. In Figure 31b we can see that

the ML, Kalman and HMM approach follow the general trend of the data, but

deliver worse results than for the one step prediction.

Table 6: Comparison of MSE for 1 and 5 prediction steps for the recorded Wi-Fidata.

Prediction Horizon 1 Prediction Horizon 5ULS 0.64 · 10−4 26.92 · 10−4

Order-1 Markov 0.67 · 10−4 19.73 · 10−4HMM 0.76 · 10−4 1.95 · 10−4Kalman 4.35 · 10−4 4.81 · 10−4ML 0.19 · 10−4 1.1 · 10−4

Finally, we summarize our results in table 6 for prediction horizon one and �ve.

The ML approach results in the lowest MSE. The ULS method delivers the second

best result for prediction horizon one, however the MSE increases signi�cantly for

the 5 step prediction. The order-1 predictor displays a similar behaviour. The

Kalman �lter approach results in the least MSE increase when comparing results

from one and 5 step predictions, while the HMM approach delivers the second best

MSE with only being 1.7 times higher than the results for ML.

65


(a)

(b)

Figure 30: Normalized number of users over time for AP-CF02-7 for a one stepprediction for the Wi-Fi data from TU Wien for the 26th of November 2018.

66


07:4

1

08:0

9

08:3

4

08:5

9

09:2

4

09:4

9

10:1

4

10:3

9

11:0

4

11:2

9

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Nr.

of

use

rs

Data

Order-1 Markov

HMM

ULS

Kalman

ML

(a)

08:1

9

08:2

2

08:2

4

08:2

7

08:2

9

08:3

2

08:3

4

08:3

7

08:3

9

08:4

2

0.2

0.22

0.24

0.26

0.28

0.3

0.32

Nr.

of

use

rs

Data

Order-1 Markov

HMM

Kalman

ML

(b)

Figure 31: Normalized number of users over time for AP-CF02-7 for a �ve stepprediction for the Wi-Fi data from TU Wien for the 26th of November 2018.

67

6 DISCUSSION

6 Discussion

Our objective was to benchmark di�erent predictions techniques estimating how

many user will be where in future timesteps. This could be useful for future

networks that employ small cell network architecture with spatially distributed

antenna and processing units. We analyse the prediction methods for di�erent

prediction horizons, evaluating the limits in terms of predictable timespan. This

o�ers us the possibility to adopt the number of prediction steps within a margin

of error as a feasibility design parameter in future applications. We select MSE on

an AP level as performance metric, because it enables us to benchmark individual

user and AP based methods in terms of crowd prediction accuracy.

Benchmark results for theoretical data

A �rst review of prediction performance with data generated by a state space model

allows us to understand how a parameters estimation error increases the MSE for

the Kalman �lter predictor. Figure 18 shows that even though it might be obvious

that for more prediction steps the error increases as well, it is important to notice

that an error in the parameters with the same magnitude can have a varied e�ect

by prediction horizon 40. While the variation of MSE for one prediction step is

small, by the prediction horizon 40 the MSE can vary between 10 and almost 16

times the MSE from one step. Each parameter is analysed, as it a�ects the result

di�erently. This allows us insight revealing where we may spend e�ort during

parameter training and helps us understand real world performance issues. For

the particular Kalman method, the parameter A is more sensitive to an error than

parameter B.

Data generated from a more complex statistical model, based on a HMM, o�ers

the opportunity to model random �uctuations that we encounter in the Wi-Fi

trace collected from the WLAN network at TU Wien. The HMM data still cannot

reproduce the temporal evolution found in the Wi-Fi data, but shows how the

prediction methods perform when applied on data with an underlying random

model. For the created test vectors from the HMM, the Kalman �lter predictor

outperforms the HMM and order-L Markov based methods. The investigation of

di�erent orders of Markov predictors identi�es that for this HMM data the order-3

68

6 DISCUSSION

predictor results in the lowest MSE. The HMM predictor is less robust against

parameter noise and the propagation of error with increasing prediction horizons

than the order-L Markov and the Kalman method. Without parameter noise the

HMM predictor delivers only slightly higher MSE values for prediction steps 1-

20, outperforming the order-3 Markov predictor. With added parameter noise,

the MSE for the HMM predictor is higher than the MSE of the order-3 Markov

predictor for a prediction horizon over 8.

To model the temporal time of day e�ects of a network we generate agent based

simulation data with the software Anylogic. It simulates a scaled down scenario

with less lecture rooms, no Wi-Fi noise as described in section 3.2 and only users

that walk from pre-de�ned points of entry to designated lecture rooms.

This simulation enables us to examine how the Kalman method performs on data

that is created with a more realistic movement pattern than with the HMM data.

Due to the linear nature of the Kalman predictor it over or underestimates fast

�uctuations. This points to a trade-o� in selecting the parameters of the Kalman

predictor in terms of dynamic or static MSE performance.

Benchmark results for Wi-Fi data

As a last step we compare the prediction methods with the Wi-Fi data that we

collected at TU Wien. This scenario is closely linked to a small cell deployment

as introduced it in section 1. Here we also cross evaluate the introduced methods

with a purely data driven prediction, a neural network based machine learning

predictor, and a trivial predictor always using the last location as the predicted

location we named ULS. We �nd that the machine learning approach delivers the

best results with an MSE of 0.9 · 10−4 for a one step prediction and 1.1 · 10−4 for a�ve step prediction. The second best predictors, with almost identical results, for

a one step prediction are the order-1 Markov, ULS and HMM predictor with an

approximately 3.5 times higher MSE. For the �ve step prediction the MSE for the

order-1 predictor increases to 19.73 · 10−4, almost 18 times higher than the result

for the ML method for the same number of prediction steps. The ULS predictor

has an around 25 times higher MSE than the ML approach for 5 prediction steps,

while the Kalman �lter results in a 4 times and the HMM predictor in a 1.7 times

69

6 DISCUSSION

higher MSE than ML.

The HMM predictor has the second best performance. However, due to the adap-

tions to the algorithm required to solve the numerical issues during training and

prediction this method also takes the longest time for each prediction, making it

not suited for larger scale applications unless a more e�cient implementation is

found. This attests that the model based methods considered can capture some

dynamics of the system, however the machine learning approach delivered the best

results overall.

Implementation considerations

Prediction accuracy alone is not su�cient to determine the best prediction method

for a use case in a mobile network. We need to consider time required for a

prediction, computational complexity and storage scalability too.

Individual user based methods, like HMM and order-L predictors, have the disad-

vantage that they need trajectories for each user during training and prediction.

The network is required to store these trajectories, while AP based methods, like

Kalman and ML predictor, rely on the number of active users per AP as an input

for the prediction, reducing the memory and signalling resources needed to acquire

the data. The AP methods, Kalman �lter and ML, are also faster than the user

based methods, because they predict the total number of users for each AP in one

iteration.

The e�ort required for each prediction is another important factor It depends on

the number of APs and the number of users in the network.

The most time consuming step for the order-L Markov predictor is searching

through all previous contexts for each users. For the HMM prediction several

matrix multiplications with the dimension of the number of APs N have to be

calculated for each user. Each prediction with a Kalman predictor takes an inver-

sion of a N -dimensional matrix and some matrix multiplications with the same

dimensions. The complexity for each prediction for the neural network scales with

the number of layers and depending on the activation function is proportional to

the number of neurons in each layer.

Assuming the number of users is signi�cantly higher than the number of APs, this

70

6 DISCUSSION

makes the Kalman and ML predictor better at scaling with an increasing network

size.

The order-L Markov predictor is the only method not requiring any data for train-

ing, however this compromises the prediction accuracy in the initial phase. Espe-

cially in a network with a high number of APs the time for this initialization phase

gets long, because the amount of data the order-L Markov predictor requires until

every context was part of the training data is proportional to the number of APs

and the context length.

The HMM and Kalman predictors both require a similar amount of user individ-

ual sequences for the training. From our experimental results the training for the

Kalman predictor is signi�cantly faster. The ML approach needs only training

data on AP level, but as it purely relies on �nding the characteristic patterns

from the data, it might require more data than a model based approach, like the

Kalman predictor.

In our scenario a purely data driven ML prediction has the lowest MSE. Con-

trary to the ML method, adding or removing an AP does not make it necessary

to completely retrain the parameters of the Kalman predictor. Depending on

the application, for longer prediction horizons, the Kalman �lter may o�er a low

complexity solution to the prediction problem, where the parameters characterise

known parts of the system. Here, further research into time dependent variations

or online updating of parameters may o�er further inside into performance for

di�erent applications. Depending on the data the HMM predictor o�ers a great

performance for systems with a low number of users. As the comparison with ULS

shows, more complex models only deliver better results for longer prediction times.

In future work adaptions for the order-L Markov predictor for more than one step

could be investigated.

71

7 CONCLUSION

7 Conclusion

This research aimed to benchmark di�erent location prediction methods for po-

tential application in small cell environments and investigate the limits of crowd

mobility prediction.

The order-L Markov, HMM, Kalman and ML predictor all have their theoretical

advantages and we illustrate that each methods performs very di�erently depend-

ing on the data that is used. Generally the simple Markov chain based prediction

yields impressive results for its simplicity. However this method proved to perform

not better than a trivial estimator, resulting in the same performance as a predic-

tor just repeating the previous step. This raises the question on the predictability

of the data itself. If users mostly stay in one location, every trained predictor

will get biased towards this location. This also might be a considerable di�erence

between indoor and outdoor data, which needs to be further analysed.

Based on practical Wi-Fi data we conclude that the purely data driven ML method

results in the lowest MSE overall. The HMM based method delivered the second

lowest MSE, however with its numerical instability during training and estimation,

a more e�cient way for the prediction is necessary to make it feasible for real world

applications.

Generally per AP aggregated user based techniques o�er a more memory e�cient

and faster way to predict the number of users at each AP in future time steps

than individual user based methods. Model based approaches, like the Kalman

predictor, are easier to adapt to a new network architecture, as they do not require

complete retraining of parameters if e.g. an AP is added or removed. It therefore

presents a low complexity predictor while being easily scalable with the number

of users. Additionally results from both the theoretical and practical analysis

show that this predictor works well for increasing prediction horizon. Here future

research could address time dependent parameters and an e�cient ways to train

them to further improve results. In terms of training the ML method requires

more complex training techniques than the approach we applied for the Kalman

�lter, possibly making it less usable for increasing network sizes.

Generally the suggested model based approaches were apparently not able to fully

capture the characteristic of the data, as they were outperformed by the model-

72

7 CONCLUSION

less ML approach. This either calls for a more complex model, or shows that a

general model like the one of a NN is the best at realizing the underlying complex

movement pattern in a small cell, indoor environment.

We also conclude that the type of performance measure is important. Accuracy

metrics based on individual users suggested in the literature are not applicable for

crowd based applications. MSE o�ers a good indicator on how well the predictions

match the data, it might however fail as an performance indicator if an application

depends on under or over estimating being avoided.

Improvements could be made for all methods, however we think that the crowd

based methods shows greater usage in a network with small cells. Prediction

methods for aggregated user data scale better, o�er higher accuracy on a global

scale, have the additional bene�t of securing the privacy of the individual users

while making remembering trajectories after the training phase unnecessary.

In the future di�erent ML techniques that incorporate more external knowledge

like time of day, weekday/weekend or even weather for outside networks could

be considered. As training is the most computationally complex part of an ML

predictor, e�cient training methods need to be found to handle large amounts of

data from long sequences.

73

8 REFERENCES

8 References

[1] Cisco. Cisco visual networking index (VNI) global mobile data tra�c forecast

update, 2017-2022. Tech. rep. 2019. url: http://www.gsma.com/spectrum/

wp-content/uploads/2013/03/Cisco%7B%5C_%7DVNI-global-mobile-

data-traffic-forecast-update.pdf.

[2] Ericsson. Ericsson Mobility Report Q4 Update February 2019. Tech. rep.

2018. url: www.ericsson.com/mobility-report.

[3] NGMN Alliance. 5G White Paper. Tech. rep. 2015, pp. 1�125.

[4] 3rd Generation Partnership Project (3GPP). TS 123 501 - V15.3.0 - 5G;

System Architecture for the 5G System (3GPP TS 23.501 version 15.3.0 Re-

lease 15). 2018. url: https://portal.etsi.org/TB/ETSIDeliverableStatus.

aspx.

[5] Gianfranco Nencioni et al. �Orchestration and Control in Software-De�ned

5G Networks: Research Challenges�. In: Wireless Communications and Mo-

bile Computing 2018 (2018), pp. 1�18. issn: 1530-8669. doi: 10.1155/2018/

6923867.

[6] Chih-Lin I et al. �Recent Progress on C-RAN Centralization and Cloudi-

�cation�. In: IEEE Access 2 (2014), pp. 1030�1039. issn: 2169-3536. doi:

10.1109/ACCESS.2014.2351411. url: http://ieeexplore.ieee.org/

document/6882182/.

[7] Aleksandra Checko et al. �Cloud RAN for Mobile Networks�A Technol-

ogy Overview�. In: IEEE Communications Surveys & Tutorials 17.1 (2015),

pp. 405�426. issn: 1553-877X. doi: 10.1109/COMST.2014.2355255. url:

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=

6897914.

[8] Christine Cheng, Ravi Jain, and Eric van der Berg. �Location Prediction

Algorithms for Mobile Wireless Systems�. In: Handbook of Wireless Internet.

CRC Press, 2003. Chap. 11, pp. 245�261. isbn: 0849315026.

74

http://www.gsma.com/spectrum/wp-content/uploads/2013/03/Cisco%7B%5C_%7DVNI-global-mobile-data-traffic-forecast-update.pdf



www.ericsson.com/mobility-report

https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx

https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx

https://doi.org/10.1155/2018/6923867

https://doi.org/10.1155/2018/6923867

https://doi.org/10.1109/ACCESS.2014.2351411

http://ieeexplore.ieee.org/document/6882182/


https://doi.org/10.1109/COMST.2014.2355255

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6897914


8 REFERENCES

[9] James O Malley. Here's What TfL Learned From Tracking Your Phone On

the Tube. 2017. url: http://www.gizmodo.co.uk/2017/02/heres-what-

tfl-learned-from-tracking-your-phone-on-the-tube/.

[10] Marc-olivier Killijian, Sebastien Gambs, and Miguel Nunez del Prado Cortez.

�Show Me How You Move and I Will Tell You Who You Are�. In: Transac-

tions on Data Privacy 4 (2011), pp. 103�126.

[11] Dietmar Bauer et al. �Quasi-Dynamic Estimation of OD Flows From Tra�c

Counts Without Prior OD Matrix�. In: IEEE Transactions on Intelligent

Transportation Systems 19.6 (June 2018), pp. 2025�2034. issn: 1524-9050.

doi: 10.1109/TITS.2017.2741528. url: https://ieeexplore.ieee.org/

document/8032480/.

[12] Sharminda Bera and K. V.Krishna Rao. �Estimation of origin-destination

matrix from tra�c counts: The state of the art�. In: European Transport -

Trasporti Europei 49 (2011), pp. 3�23.

[13] Fabio Pinelli et al. �Data-Driven Transit Network Design From Mobile Phone

Trajectories�. In: IEEE Transactions on Intelligent Transportation Systems

17.6 (June 2016), pp. 1724�1733. issn: 1524-9050. doi: 10.1109/TITS.

2015.2496783. url: http://ieeexplore.ieee.org/document/7471487/.

[14] Anselmo Ramalho Pitombeira Neto, Francisco Moraes Oliveira Neto, and

Carlos Felipe Grangeiro Loureiro. �Statistical models for the estimation of

the origin-destination matrix from tra�c counts�. In: Transportes 25 (Dec.

2017), pp. 1�13. issn: 2237-1346. doi: 10.14295/transportes.v25i4.1344.

url: https://www.revistatransportes.org.br/anpet/article/view/

1344.

[15] Jacob Ziv and Abraham Lempel. �Compression of lndividual Sequences via

Variable-Rate Coding�. In: IEEE transactions on Information Theory 24.5

(1978), pp. 530�536.

[16] George Liu and Gerald Maguire. �A class of mobile motion prediction al-

gorithms for wireless mobile computing and communications�. In: Mobile

Networks and Applications 1.2 (June 1996), pp. 113�121. issn: 1383-469X.

75

http://www.gizmodo.co.uk/2017/02/heres-what-tfl-learned-from-tracking-your-phone-on-the-tube/

http://www.gizmodo.co.uk/2017/02/heres-what-tfl-learned-from-tracking-your-phone-on-the-tube/

https://doi.org/10.1109/TITS.2017.2741528

https://ieeexplore.ieee.org/document/8032480/

https://ieeexplore.ieee.org/document/8032480/




https://doi.org/10.14295/transportes.v25i4.1344

https://www.revistatransportes.org.br/anpet/article/view/1344

https://www.revistatransportes.org.br/anpet/article/view/1344

8 REFERENCES

doi: 10.1007/BF01193332. url: http://link.springer.com/10.1007/

BF01193332.

[17] J Chan, S Zhou, and A Seneviratne. �A QoS adaptive mobility predic-

tion scheme for wireless networks�. In: IEEE GLOBECOM 1998 (Cat. NO.

98CH36250). Vol. 3. IEEE, 1998, pp. 1414�1419. isbn: 0-7803-4984-9. doi:

10.1109/GLOCOM.1998.776573. url: http://ieeexplore.ieee.org/

document/776573/.

[18] Yihang Cheng, Yuanyuan Qiao, and Jie Yang. �An improved Markov method

for prediction of user mobility�. In: 2016 12th International Conference on

Network and Service Management (CNSM). IEEE, Oct. 2016, pp. 394�399.

isbn: 9783901882852. doi: 10 . 1109 / CNSM . 2016 . 7818454. url: http :

//ieeexplore.ieee.org/document/7818454/.

[19] Alicia Rodriguez-Carrion et al. �Study of LZ-Based Location Prediction and

Its Application to Transportation Recommender Systems�. In: Sensors 12.6

(June 2012), pp. 7496�7517. issn: 1424-8220. doi: 10.3390/s120607496.

url: http://www.mdpi.com/1424-8220/12/6/7496.

[20] Libo Song. �Evaluating Mobility Predictors in Wireless Networks for Improv-

ing Hando� and Opportunistic Routing�. PhD thesis. Dartmouth College,

2008, pp. 1�200. url: http://www.cs.dartmouth.edu/reports/TR2008-

611.pdf.

[21] Amnir Hadachi et al. �Cell phone subscribers mobility prediction using en-

hanced Markov Chain algorithm�. In: IEEE Intelligent Vehicles Symposium,

Proceedings June 2016 (2014), pp. 1049�1054. doi: 10.1109/IVS.2014.

6856442.

[22] Libo Song et al. �Evaluating Next-Cell Predictors with Extensive Wi-Fi Mo-

bility Data�. In: IEEE Transactions on Mobile Computing 5.12 (Dec. 2006),

pp. 1633�1649. issn: 1536-1233. doi: 10.1109/TMC.2006.185. url: http:

//ieeexplore.ieee.org/document/1717434/.

[23] Patrick Kenny, Matthew Lennig, and Paul Mermelstein. �A Linear Predictive

HMM for Vector-Valued Observations with Applications to Speech Recog-

76

https://doi.org/10.1007/BF01193332

http://link.springer.com/10.1007/BF01193332

http://link.springer.com/10.1007/BF01193332

https://doi.org/10.1109/GLOCOM.1998.776573



https://doi.org/10.1109/CNSM.2016.7818454



https://doi.org/10.3390/s120607496

http://www.mdpi.com/1424-8220/12/6/7496

http://www.cs.dartmouth.edu/reports/TR2008-611.pdf

http://www.cs.dartmouth.edu/reports/TR2008-611.pdf

https://doi.org/10.1109/IVS.2014.6856442

https://doi.org/10.1109/IVS.2014.6856442

https://doi.org/10.1109/TMC.2006.185



8 REFERENCES

nition�. In: IEEE Transactions on Acoustics, Speech, and Signal Processing

38.2 (1990), pp. 220�225. issn: 00963518. doi: 10.1109/29.103057.

[24] Simon L. Cawley and Lior Pachter. �HMM sampling and applications to gene

�nding and alternative splicing�. In: Bioinformatics 19.Suppl. 2 (Sept. 2003),

pp. ii36�ii41. issn: 1367-4803. doi: 10.1093/bioinformatics/btg1057.

url: https://academic.oup.com/bioinformatics/article-lookup/

doi/10.1093/bioinformatics/btg1057.

[25] Arpad Gellert and Lucian Vintan. �Person Movement Prediction Using Hid-

den Markov Models�. In: Studies in Informatics and Control 15.1 (2006),

pp. 17�30. url: http://webspace.ulbsibiu.ro/arpad.gellert/html/

SIC%7B%5C_%7DHMM.pdf.

[26] Hongbo Si et al. �Mobility Prediction in Cellular Network Using Hidden

Markov Model�. In: 2010 7th IEEE Consumer Communications and Net-

working Conference (2010), pp. 1�5. doi: 10.1109/CCNC.2010.5421684.

url: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?

arnumber=5421684.

[27] The Anylogic Company. Anylogic. 2019. url: https://www.anylogic.de/.

[28] Gebäude und Technik TU Wien. GUT Grundrisspläne. 2019. url: https:

/ / www . gut . tuwien . ac . at / wir % 7B % 5C _ %7Dfuer % 7B % 5C _ %7Dsie /

immobilienmanagement/grundrisse%7B%5C_%7Dobjekte/.

[29] Sanjeev Dhawan. �Analogy of Promising Wireless Technologies on Di�er-

ent Frequencies: Bluetooth, WiFi, and WiMAX�. In: The 2nd International

Conference on Wireless Broadband and Ultra Wideband Communications

(AusWireless 2007). AusWireless. IEEE, Aug. 2007, pp. 14�14. isbn: 0-7695-

2842-2. doi: 10.1109/AUSWIRELESS.2007.27. url: http://ieeexplore.

ieee.org/document/4299663/.

[30] C/LM - LAN/MAN Standards Committee. �IEEE Standard for Information

technology� Local and metropolitan area networks� Speci�c requirements�

Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer

(PHY) Speci�cations Amendment 2: Fast Basic Service Set (BSS) Transi-

tion�. In: IEEE Std 802.11r-2008 (Amendment to IEEE Std 802.11-2007

77

https://doi.org/10.1109/29.103057

https://doi.org/10.1093/bioinformatics/btg1057

https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btg1057

https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btg1057

http://webspace.ulbsibiu.ro/arpad.gellert/html/SIC%7B%5C_%7DHMM.pdf

http://webspace.ulbsibiu.ro/arpad.gellert/html/SIC%7B%5C_%7DHMM.pdf

https://doi.org/10.1109/CCNC.2010.5421684



https://www.anylogic.de/

https://www.gut.tuwien.ac.at/wir%7B%5C_%7Dfuer%7B%5C_%7Dsie/immobilienmanagement/grundrisse%7B%5C_%7Dobjekte/



https://doi.org/10.1109/AUSWIRELESS.2007.27



8 REFERENCES

as amended by IEEE Std 802.11k-2008) (2008), pp. 1�126. doi: 10.1109/

IEEESTD.2008.4573292. url: https://standards.ieee.org/standard/

802%7B%5C_%7D11r-2008.html%7B%5C#%7DStandard.

[31] Andreas Kugi. Automatisierung. 2019. url: https://www.acin.tuwien.

ac.at/file/teaching/bachelor/automatisierung/Gesamtskriptum.

pdf.

[32] Kevin P. Murphy.Machine Learning: A Probabilistic Perspective. 1991, pp. 73�

78, 216�244. isbn: 9780262018029. doi: 10 . 1007 / SpringerReference _

35834. arXiv: 0 - 387 - 31073 - 8. url: http : / / link . springer . com /

chapter/10.1007/978-94-011-3532-0%7B%5C_%7D2.

[33] David Barber. Bayesian Reasoning and Machine Learning. Cambridge: Cam-

bridge University Press, 2011, p. 646. isbn: 9780511804779. doi: 10.1017/

CBO9780511804779. arXiv: arXiv:1011.1669v3. url: http://ebooks.

cambridge.org/ref/id/CBO9780511804779.

[34] Tobias P Mann. �Numerically Stable Hidden Markov Model Implementa-

tion�. In: An HMM scaling tutorial (2006), pp. 1�8. url: http://bozeman.

genome.washington.edu/compbio/mbt599%7B%5C_%7D2006/hmm%7B%5C_

%7Dscaling%7B%5C_%7Drevised.pdf.

[35] L.R. Rabiner. �A tutorial on hidden Markov models and selected applications

in speech recognition�. In: Proceedings of the IEEE 77.2 (1989), pp. 257�286.

issn: 00189219. doi: 10.1109/5.18626. arXiv: arXiv:1011.1669v3. url:

http://ieeexplore.ieee.org/document/18626/.

[36] Wolfgang Kemmetmüller and Andreas Kugi. Regelungssysteme 1. 2018. url:

https://www.acin.tuwien.ac.at/master/regelungssysteme-2/.

[37] Franz Hlawatsch. Parameter estimation Methods. 2016, pp. 1�116.

[38] Nachi Gupta and Raphael Hauser. Kalman Filtering with Equality and In-

equality State Constraints. Tech. rep. Oxford University Computing Labora-

tory, Sept. 2007. arXiv: 0709.2791. url: http://arxiv.org/abs/0709.

2791.

78

https://doi.org/10.1109/IEEESTD.2008.4573292

https://doi.org/10.1109/IEEESTD.2008.4573292

https://standards.ieee.org/standard/802%7B%5C_%7D11r-2008.html%7B%5C#%7DStandard

https://standards.ieee.org/standard/802%7B%5C_%7D11r-2008.html%7B%5C#%7DStandard

https://www.acin.tuwien.ac.at/file/teaching/bachelor/automatisierung/Gesamtskriptum.pdf



https://doi.org/10.1007/SpringerReference_35834

https://doi.org/10.1007/SpringerReference_35834

https://arxiv.org/abs/0-387-31073-8

http://link.springer.com/chapter/10.1007/978-94-011-3532-0%7B%5C_%7D2

http://link.springer.com/chapter/10.1007/978-94-011-3532-0%7B%5C_%7D2

https://doi.org/10.1017/CBO9780511804779

https://doi.org/10.1017/CBO9780511804779

https://arxiv.org/abs/arXiv:1011.1669v3

http://ebooks.cambridge.org/ref/id/CBO9780511804779

http://ebooks.cambridge.org/ref/id/CBO9780511804779

http://bozeman.genome.washington.edu/compbio/mbt599%7B%5C_%7D2006/hmm%7B%5C_%7Dscaling%7B%5C_%7Drevised.pdf



https://doi.org/10.1109/5.18626

https://arxiv.org/abs/arXiv:1011.1669v3


https://www.acin.tuwien.ac.at/master/regelungssysteme-2/

https://arxiv.org/abs/0709.2791

http://arxiv.org/abs/0709.2791


8 REFERENCES

[39] Christopher M. Bishop. Pattern Recognition and Machine Learning (Infor-

mation Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006.

isbn: 0-387-31073-8.

[40] Nitish Srivastava et al. �Dropout: a simple way to prevent neural networks

from over�tting�. In: The Journal of Machine Learning Research 15.1 (2014),

pp. 1929�1958. url: http://jmlr.org/papers/v15/srivastava14a.htm.

[41] François Chollet et al. Keras. 2015. url: https://keras.io.

[42] James Bergstra, Daniel L K Yamins, and D Cox. �Making a science of model

search: Hyperparameter optimization in hundreds of dimensions for vision

architectures�. In: Proceedings of the 30th International Conference on Ma-

chine Learning (2013), pp. 115�123. url: http://jmlr.org/proceedings/

papers/v28/bergstra13.html.

[43] Prajit Ramachandran, Barret Zoph, and Quoc V Le. �Searching for Activa-

tion Functions�. In: CoRR. Oct. 2017, pp. 1�13. arXiv: 1710.05941. url:

http://arxiv.org/abs/1710.05941.

[44] Diederik P. Kingma and Jimmy Ba. �Adam: A Method for Stochastic Opti-

mization�. In: International Conference on Learning Representations (Dec.

2014). arXiv: 1412.6980. url: http://arxiv.org/abs/1412.6980.

79

http://jmlr.org/papers/v15/srivastava14a.htm

https://keras.io

http://jmlr.org/proceedings/papers/v28/bergstra13.html

http://jmlr.org/proceedings/papers/v28/bergstra13.html





A APPENDIX

A Appendix

A.1 Floor Plans of Campus Guÿhaus

Figure 32: Floor plan of second �oor of Campus Guÿhaus with APs marked ingreen [28].

Figure 33: Floor plan of third �oor of Campus Guÿhaus with APs marked in green[28].

80

A.2 Kalman Filter Implementation in Matlab A APPENDIX

Figure 34: Floor plan of fourth �oor of Campus Guÿhaus with APs marked ingreen [28].

A.2 Kalman Filter Implementation in Matlab

1 function [estimate,errorCovariance]

2 = KalmanFilterEstimation(y,previousEstimate,previousErrorCovariance,

A,B,C,D,G,H,R,Q,u)

3 NrOfStates = length(A);

4 % System:

5 % x(k+1) = A*x(k)+B*u(k)+G*w(k)

6 % y(k) = C*x(k)+D*u(k)+H*w(k)+v(k)

7 % E{v_k*v_j} = R*delta_kj

8 % E{w_k*w_j} = Q*delta_kj

9 K = A*previousErrorCovariance*C'*(C*previousErrorCovariance*

C'+H*Q*H'+R)^(−1);10 estimate = A*previousEstimate+B*u+K*(y−C*previousEstimate−D*

u);

11 errorCovariance = A*previousErrorCovariance*A'+G*Q*G'−K*C*previousErrorCovariance*A';

12 if estimate<0

13 projEstimate = @(x) (x−estimate)'*(x−estimate);14 estimate = fmincon(projEstimate,estimate,eye(

NrOfStates)*−1,zeros(NrOfStates,1));15 end

81

A.3 Parameters for Anylogic Simulation A APPENDIX

16 end

A.3 Parameters for Anylogic Simulation

A0 =

13

23

0 0 0 0

0.4 0.2 0.2 0 0.2 0

0 13

13

0 13

0

0 0 0 1 0 0

0 0.2 0.2 0 0.2 0.4

0 0 0 0 23

13

(68)

Bin,0 = Bout,0 =

13

013

0

013

(69)

Aopt =

0.278728 0 0 2.47E − 08 0 5.34E − 08

2.970083 0.992932 4.43E − 07 0 6.90E − 08 0

0.209038 0.002664 0.220401 0 0 0.576441

0 2.65E − 07 1.15E − 06 0.702691 0.04961 0.557901

0.280836 0 0.231901 0.845527 0.787416 2.713404

5.27E − 08 7.79E − 05 0 0 0.002155 0.747378

(70)

Bout,opt =

0.000

0.149

0.000

0.142

0.709

0.000

(71)

82

A.4 Anylogic Simulation Results A APPENDIX

Bin,opt =

0.090

0.243

0.288

0.189

0.125

0.064

(72)

C = eye(NrOfStates);

D = zeros(NrOfStates,1);

H = eye(NrOfStates);

G = eye(NrOfStates);

P0 = eye(NrOfStates)*100;

x0 = mean(Data,2);

A.4 Anylogic Simulation Results

0 100 200 300 400 500 600 700

Timesteps

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09AP Nr. 1

Estimate

Data

Figure 35: MSE for Kalman prediction of the Anylogic simulation for AP 1.

83


0 100 200 300 400 500 600 700

Timesteps

0

0.2

0.4

0.6

0.8

1

1.2AP Nr. 2

Estimate

Data


84


0 100 200 300 400 500 600 700

Timesteps

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5AP Nr. 3

Estimate

Data


0 100 200 300 400 500 600 700

Timesteps

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5AP Nr. 4

Estimate

Data


85


0 100 200 300 400 500 600 700

Timesteps

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8AP Nr. 5

Estimate

Data


0 100 200 300 400 500 600 700

Timesteps

0

0.01

0.02

0.03

0.04

0.05

0.06AP Nr. 6

Estimate

Data


86


0 100 200 300 400 500 600 700

0

0.5

1

1.5

2

2.5

3

Data

estimate

Figure 41: Sum of users for the Anylogic example and estimate using the Kalman�lter prediction with prediction horizon one.

0 100 200 300 400 500 600 700

0

0.5

1

1.5

2

2.5

3

Data

estimate

Figure 42: Sum of users for the Anylogic example and estimate using the Kalman�lter prediction with prediction horizon �ve.

87

A.5 HMM Example Simulation Results A APPENDIX

A.5 HMM Example Simulation Results

0 50 100 150 200 250 300 350 400 450 5000

5

10

15

20

25

30AP Nr. 2

Original Data

HMM

Order-3 Markov

Kalman

Figure 43: Plot of original HMM data and prediction result for 1 prediction stepfor HMM, Markov order-3 and Kalman prediction showing AP 2.

88


0 50 100 150 200 250 300 350 400 450 5000

2

4

6

8

10

12

14

16

18

20AP Nr. 3

Original Data

HMM

Order-3 Markov

Kalman


0 50 100 150 200 250 300 350 400 450 5000

2

4

6

8

10

12

14

16

18

20AP Nr. 4

Original Data

HMM

Order-3 Markov

Kalman


89


0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

8

9

10AP Nr. 5

Original Data

HMM

Order-3 Markov

Kalman


0 50 100 150 200 250 300 350 400 450 5000

5

10

15AP Nr. 6

Original Data

HMM

Order-3 Markov

Kalman


90

Hiermit erkläre ich, dass die vorliegende Arbeit gemäÿ dem Code of Conduct �

Regeln zur Sicherung guter wissenschaftlicher Praxis (in der aktuellen Fassung des

jeweiligen Mitteilungsblattes der TU Wien), insbesondere ohne unzulässige Hilfe

Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel, angefertigt

wurde. Die aus anderen Quellen direkt oder indirekt übernommenen Daten und

Konzepte sind unter Angabe der Quelle gekennzeichnet.

Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder in ähnli-

cher Form in anderen Prüfungsverfahren vorgelegt.

Wien, Juni 2019

Miriam Leopoldseder

master's thesis technische universität wien › files › publik_280264.pdf · 5 third oor of...

Documents