FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
A computer vision-based proposal forseat occupancy monitoring applied to
FEUP’s library
José Miguel Seruca Veloso
Mestrado Integrado em Engenharia Eletrotécnica e de Computadores
Supervisor: Paulo José Lopes Machado Portugal
July 30, 2021
© José Veloso, 2021
Abstract
The surveillance of occupancy has been an area of great interest, both for resource managementand behavioral analysis. Currently, the use of infrared-based technology is already well docu-mented and their limits are known. Consequently, it is necessary to explore new methods thatallow the extraction of information to determine occupation.
This dissertation focuses on the analysis, design and implementation of a seat occupancy moni-toring system based on computer vision. Convolutional neural networks are explored as to providethe ability of creating status mapping of all seating options available in FEUP’s library. The thesisis focused on the study and conceptual validation of the proposed detection system, which includesthe implementation of a cloud-hosted dashboard and database.
The feasibility of implementing this system is confirmed by results obtained on a select testsite. From observational evidence, it is possible to prove the general concept and reveal applica-bility of the developed work in a real, functional context.
i
ii
Agradecimentos
Em primeiro lugar, desejo agradecer ao Professor Doutor Paulo Portugal pela disponibilidade,paciência e dedicação demonstradas desde que iniciei este processo de completar a dissertação.
Aos Serviços da Biblioteca, em particular à Doutora Cristina Lopes, e aos Serviços Técnicos,pela assistência indispensável.
À comunidade da Faculdade de Engenharia da Universidade do Porto, que me acolheu e guiounestes últimos 5 anos.
Aos meus pais, Helena Seruca e José Veloso, pelo apoio e compreensão, por todos os sacrifí-cios que fizeram e por tudo o que me ensinaram.
À minha irmã, Joana Veloso, por me acompanhar e animar.À minha família, que me proporcionou sempre o sentimento de orgulho e pertença.Aos meus companheiros de curso, André Reis, Tomás Fonseca e Tiago Sousa, pela soli-
dariedade e influência em diferentes momentos desta jornada.Aos meus amigos de sempre, Alex Himmel, Joana Morais e Matias Schöner, por todos as
experiências que partilhamos, e pelo companheirismo que nunca deixou de existir.A todos eles, e tantos outros que de alguma forma me marcaram,Muito Obrigado.
José Veloso
iii
iv
“Take time to smell the roses”
Sir Robert William Robson
v
vi
Contents
1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Subjects and Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Review 52.1 Seat Occupancy Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Based on PIR sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Based on computer vision . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 System Requirements 173.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Occupancy Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 System Architecture 214.1 Vision-based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Camera Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1.3 Definition of detection algorithm . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 System Implementation 275.1 Module Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.1 Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1.2 Camera Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Image Segmentation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 295.2.1 General Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.2 Algorithm Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Dashboard and Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vii
viii CONTENTS
6 System Testing 376.1 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.1 Model Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.1.2 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Conclusion and Future Development 517.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.2 Future Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References 55
List of Figures
2.1 Wireless desk sensor with polyethylene lens (altered) [1] . . . . . . . . . . . . . 62.2 Fresnel Design [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Concept of IR-detection [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Commercial architecture for a sensor network [1] . . . . . . . . . . . . . . . . . 72.5 Use case of installation under a desk[1] . . . . . . . . . . . . . . . . . . . . . . 82.6 Overview of a CNN’s architecture and training process [4] . . . . . . . . . . . . 92.7 Convolution operation. [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 Max Pooling, downsampling of an input tensor by a factor of 2 [4] . . . . . . . . 102.9 Prototype example. Raspberry Pi 3 equipped with Intel Neural Compute Stick 2
and wide-angle camera. [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.10 Prototype from Figure 2.9. Image capture and classification [5] . . . . . . . . . . 112.11 Summary of Models in the R-CNN family [6] . . . . . . . . . . . . . . . . . . . 122.12 YOLOv1 Architecture [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.13 Bounding Box Structure (altered) [7] . . . . . . . . . . . . . . . . . . . . . . . . 142.14 SSD Architecture, featuring VGG16 as feature extractor [7] . . . . . . . . . . . . 15
4.1 Common study area (FEUP Library: 2nd-4th floor) [8] . . . . . . . . . . . . . . 224.2 Comparison of several detection frameworks [9], COCO dataset [10] . . . . . . . 244.3 GPU time (milliseconds) for each model, for image resolution of 300x300 [9] . . 244.4 Data Structure, each library floor contains a number of active devices responsible
for monitoring several seating areas . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Model representing major system components and communications . . . . . . . 26
5.1 Raspberry Pi 4 Model B (8GB RAM) [11] . . . . . . . . . . . . . . . . . . . . . 285.2 Raspberry Pi Camera Module v2 [12] . . . . . . . . . . . . . . . . . . . . . . . 295.3 2nd Floor Plan, camera position (dotted circle) and covered area (black) . . . . . 295.4 Still from equivalent area depicted on Figure 5.3 . . . . . . . . . . . . . . . . . . 305.5 Seating areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.6 Contention Overlapping Areas, final status determined by the dashboard . . . . . 315.7 Image processing loop in application . . . . . . . . . . . . . . . . . . . . . . . . 325.8 Frame selection cycle, each transition occurs after samplingrate × processing units. 335.9 Use of OpenCV modules for image resizing, normalization and network forwarding 335.10 A visual representation of mean subtraction where the RGB mean (center) has
been calculated and subtracted from the original image (left), resulting in the out-put image (right) [13]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.11 Thingsboard Monolythic Architecture [14] . . . . . . . . . . . . . . . . . . . . . 355.12 cURL requests: URL composed by host, access token and telemetry specification 36
6.1 Module’s CPU and Memory Usage while running the application . . . . . . . . . 37
ix
x LIST OF FIGURES
6.2 Original (green) and provisional (red) cameras, targeting Area 1 . . . . . . . . . 386.3 Seating areas tested, targeting Area 1 . . . . . . . . . . . . . . . . . . . . . . . . 396.4 Area 0 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 396.5 Area 1 On-Off Status Chart, 08:00AM - 19:30PM . . . . . . . . . . . . . . . . . 406.6 Area 1 On-Off Status Chart (w/ provisional camera), 08:00AM - 19:30PM, 24-05-21 406.7 Area 2 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 416.8 Area 3 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21 . . . . . . . . . . . 416.9 Area 0 On-Off Status Chart, 08:00AM - 17:00PM . . . . . . . . . . . . . . . . . 426.10 Volume of transport messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.11 Capacity for telemetry data storage . . . . . . . . . . . . . . . . . . . . . . . . . 436.12 Transport hourly activity, 14-day period . . . . . . . . . . . . . . . . . . . . . . 446.13 Telemetry persistence hourly activity, 14-day period . . . . . . . . . . . . . . . . 446.14 Hourly average of state changes over a 2-week period, 4-colored stacked bars . . 456.15 Hourly average of state changes over a 2-week period, 4-colored lines . . . . . . 456.16 4 Area Status Charts over a 2-week period . . . . . . . . . . . . . . . . . . . . . 466.17 Total hourly average over a 2-week period . . . . . . . . . . . . . . . . . . . . . 466.18 Current total calculated upon clicking Update . . . . . . . . . . . . . . . . . . . 476.19 View of widget’s (Figure 6.18) HTML editor . . . . . . . . . . . . . . . . . . . 476.20 Image Map with zero occupied areas (all green) and thermometer (blue) . . . . . 486.21 Image Map with two occupied (red), vacant (green), areas and thermometer (green) 496.22 Remaining versions of thermometer marker [15] . . . . . . . . . . . . . . . . . . 49
List of Tables
3.1 Client Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Requirement Analysis: Detection System . . . . . . . . . . . . . . . . . . . . . 183.3 Requirement Analysis: Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Main specifications of testing system . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Comparison of detection frameworks (from Table 3 [16]), PASCAL VOC dataset
[17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
xi
xii LIST OF TABLES
xiii
xiv ABREVIATURAS E SÍMBOLOS
Abbreviations
API Application Programming InterfaceCNN Convolutional Neural NetworkCoAP Constrained Application ProtocolCPU Central Processing UnitCSI Camera Serial IntefacecURL Client URLFC Fully ConnectedFEUP Faculdade de Engenharia da Universidade do PortoFOV Field of ViewFPS Frames per secondGDPR General Data Protection RegulationGPIO General-Purpose Input/OutputGPU Graphics Processing UnitHTML HyperText Markup LanguageHTTP Hypertext Transfer ProtocolIR InfraredJSON JavaScript Object NotationJVM Java Virtual MachinemAP mean Average PrecisionmQTT Message Queuing Telemetry TransportMS COCO MicroSoft Common Objects in ContextOS Operating SystemPC Personal ComputerPIR Pyroelectric Infra-RedQR Quick ResponseRAM Random Access MemoryR-CNN Region Based Convolutional Neural NetworksRGB Red Green BlueRPN Region Proposal NetworkReLU Rectified Linear UnitREST Representational State TransferSPI Serial Peripheral InterfaceSQL Structured Query LanguageSSD Single Shot DetectorUART Universal Asynchronous Receiver-TransmitterUI User InterfaceUML Unified Modeling LanguageURL Uniform Resource LocatorUSB Universal Serial BusVOC Visual Object ClassesYOLO You Only Look Once
Chapter 1
Introduction
1.1 Context
The library belonging to the Faculdade de Engenharia da Universidade do Porto(FEUP) is a public
space, and highly sought-after by the student community. There is therefore a need for control and
an efficient use of the available seating areas. Occupation spikes and congestion are a daily occur-
rence, while some periods of the year prove to be particularly challenging in terms of management,
specially during examination season.
As a result of these concerns, the Library’s Services have identified the need for moderniza-
tion of monitoring systems currently in place. This has been aggravated by increased alert and
restrictions caused by the ongoing COVID-19 pandemic.
Though there are commercial solutions available for occupancy detection, they rely on highly-
priced components and proprietary software. An opportunity arises for internal development of a
system which is economically more viable, and simultaneously supportive of integration, modifi-
cation and expansion.
1.2 Motivation
The library managed to set up a temporary solution [18] for determining seat occupancy. The sys-
tem, however, relies on the cooperation of students, needing each one to signal their presence via
Quick Response (QR) code readings. This manifests itself as a significant issue for data integrity,
as it cannot be expected to use this information for determining an individual seat’s current status.
At this time, the system is only able to display total counts for each floor, which also cannot be in-
terpreted literally, only indicating a trend in overall occupation. This represents another issue, not
only for real-time monitoring, but for the compilation of historical and statistical figures, which
could assist in future management methods and initiatives.
To combat this, the library’s administration is seeking a form of seat status monitoring offering
the ability to assign a state of occupation to all seating options. This aspect involves the means to
actively detect and process occupancy across the building. The problem presented demands the
1
2 Introduction
conception of a distributed system and near real-time operation. In widely available applications
currently, there is a general use of modules adopting the infrared sensing technology. Beyond
concerns regarding economic and development issues, the technology itself proves to be relatively
limited in range and reliability of detection.
To this end, other commercial alternatives and ongoing research have emerged on the field of
computer vision to tackle the problem of building monitoring. The promise shown on early stages
of convolutional neural networks use reveals the potential involved in developing and implement-
ing a solution based on vision, highlighted by the use of a camera and on-board processing.
A solution similar to these applications possesses the power to accurately assess seating areas’
status. It also minimises the use of hardware related to the amount of seating analysed, potentially
driving down costs. If the system proves to be reliable and flexible enough for use in related
in applications, it can constitute a significant advancement in investigation of this nature. The
perfecting of such a system represents possibilities for diversification and feature expansion of the
Library’s current management methods.
1.3 Goals
The primary goal of this dissertation comprehends the development of a system capable of people
detection, and mapping out floor occupancy in a reliable and efficient manner. The proposal
must be capable of undertaking the status mapping of entire floors, indicating the location and
occupation of available seating areas through the use of a cloud-hosted platform accessible for
administrators and users alike. To support this pursuit, the dissertation encompasses the following
goals:
• Research and develop a system concept, centered on people detection, including module and
camera physical alignment, as to optimise precision and quality of data. Part of this thesis
focuses on the interaction between physical factors and the efficacy of certain convolutional
neural networks. As such, studying the manner in which these are affected becomes vital
for defining an equilibrium, which will determine the choice of a detector.
• Develop and implement a working dashboard and database, hosted and available online.
Once occupancy status is determined at the low-level, data must be transmitted and stored
appropriately to secure integrity. A web interface manages and manipulates the data to
display past statistical or real-time status information in a digestible form for the user.
• Implementation and testing of the prototype, as the dissertation’s completion relies on the
verification and validation of the system for the desired use case. Therefore, the installation
on a verifiable test site is essential to observe and confirm model behavior. The confirmation
of the concept implies the possibility for replication across the entire building, and opens up
the possibility of expansion to other immediate applications, namely detection of common
objects (books, handbags, coats, etc.).
1.4 Subjects and Structure 3
1.4 Subjects and Structure
The dissertation is comprised of 7 main chapters, including this Introduction. On the second
chapter, a study and review of the current occupancy monitoring technologies and applications is
completed. Chapter 3 explains the process of defining the system requirements and enumerates
them. On Chapter 4, a general system architecture is proposed, including specifications for the
object detecting module and the cloud-hosted platform. Chapter 5 details the following compo-
nent selection and implementation of the entire system, looking to prove the general concept for
future, widespread replication. The sixth chapter defines the working conditions of the module
and application, detailing the respective results. The efficacy of the system is evaluated, and the
display options of the dashboard with the collected data are presented. The thesis is concluded
with the final chapter, with proposed areas for future development and improvement.
4 Introduction
Chapter 2
Literature Review
The aim of the dissertation is primarily the design of a general concept and the implementation
of a system capable of seat occupancy monitoring, in accordance to the environment’s specificity.
To this effect, prior relevant research and developments are detailed. The existing body of work
serves as a starting point for possible development and improvement.
The state of the art regarding situations similar to the problem at hand, namely the occupancy
monitoring of seating areas, rooms or buildings, is generally composed by two main approaches.
In Section 2.1, techniques largely based around Pyroelectric Infra-Red (PIR) sensor (2.1.1) and
computer vision (2.1.2) technologies are presented. Section 2.2 handles possible limitations and
the preference for the thesis’s groundwork.
2.1 Seat Occupancy Monitoring
2.1.1 Based on PIR sensors
Commercially the most common solution [1, 19, 20, 21, 22, 23, 24] for deployment of an intelli-
gent monitoring network, the PIR is considered an effective, low-powered alternative for applica-
tions centered on mobile computing [25].
The sensors intend to detect and interpret infrared radiation [26, 27], untraceable to the human
eye, as its wavelength is longer than visible light. Every concept hinging on a network of PIR
sensors for occupancy monitoring targets human-emanated heat for determining their presence,
which contains infrared radiation. However, infrared radiation of the human scale (1.33µm -
16.67µm) [28] is subjected to fluctuations provoked by changing conditions, but most importantly
such radiation suffers regular blockages by non-passing materials, which include most glass and
plastic. Therefore, typical versions of an IR sensor employ lenses made of polyethylene (Figure
2.1), which is adept at limiting radiation outside of the human range. To accentuate the received
radiation further, lenses usually adopt the shape of the Fresnel Lens.
5
6 Literature Review
Figure 2.1: Wireless desk sensor with polyethylene lens (altered) [1]
The Fresnel style [29] is designed to have its grooves facing the IR sensing element, presenting
a smooth surface to the subject side of the lens (Figure 2.2)
Figure 2.2: Fresnel Design [2]
The pyroelectric sensor is made of a crystalline material that generates a surface electric charge
when exposed to heat. Once the level of radiation suffers a change, the charge is altered and can
then be measured (Figure 2.3). The varying signal is typically fed to an amplifier possessing signal
conditioning circuits. The next stage involves a window comparator, responding to positive and
negative transitions of the sensor output signal.
2.1 Seat Occupancy Monitoring 7
Figure 2.3: Concept of IR-detection [3]
Occupancy detection applications of this physical phenomenon incorporate IR-sensors in battery-
powered modules, capable of connecting wirelessly to an access point or gateway [1, 22], relaying
relevant data to a local or cloud platform for post-processing (Figure 2.4 stands as an example).
Figure 2.4: Commercial architecture for a sensor network [1]
A common alignment of the modules involves installation beneath a desk, directly covering the
person seated (Figure 2.5). Installations taking advantage of a room’s ceiling [19, 20, 24] are also
considered for full-area control. However, unlike the first option, this situation is more predisposed
to occurrences of radiation blocking and should be mainly thought of as a complement to existing
data sources. Furthermore, the technology does not provide the capability to detect inanimate
8 Literature Review
objects, given there is no infrared radiation to be measured. One final factor to consider are the
dimensions of the targeted space [18]. Implementing a system based on one module per seating
option represents a multiplication of costs, and thwarts the pursuit of flexibility in floor layout,
implying constant vigilance and maintenance of every module. Change in location, condition, and
battery status of multiple devices can result in loss of data integrity and temporary system failure.
Figure 2.5: Use case of installation under a desk[1]
2.1.2 Based on computer vision
Though not as widespread as other techniques, the use of vision to accomplish occupancy moni-
toring has manifested itself more frequently in recent years [4, 5, 30, 31, 32, 33].
The primary goal of this field is to enable machines to perform tasks such as Image & Video
recognition, Image Analysis & Classification, Media Recreation, etc. The advancements in Com-
puter Vision with Deep Learning have evolved over time, primarily over one particular algorithm
- a Convolutional Neural Network [34] (CNN).
In traditional Computer Vision, most of the work consists on hand-engineering filters which,
when applied to an image, can extract its features [35]. The more features can be extracted,
the more accurate a prediction is. A major setback to this approach is that each feature must
be manually engineered in the design process, which makes scaling these types of algorithms
challenging. Convolutional Neural Networks work in the opposite direction. Instead, by choosing
how many features the CNN will extract, the extraction process will follow during its training [36].
2.1 Seat Occupancy Monitoring 9
Figure 2.6: Overview of a CNN’s architecture and training process [4]
A CNN consists of an input layer, hidden layers and an output layer (Figure 2.6). In any feed-
forward neural network, all middle layers are considered hidden as their inputs and outputs are
masked by the activation function and final convolution.
Typically, this includes a layer that performs a dot product of the convolution kernel with the
layer’s input matrix. As the convolution kernel slides along the input matrix for the layer, the
convolution operation generates a feature map (Figure 2.7), which in turn contributes to the input
of the next layer. Before serving as input for the next layer, however, outputs from convolution
operations are subjected to an activation function. The most common nonlinear activation function
used presently is the rectified linear unit (ReLU) [37]. The function is defined as f (X) = (0,max).
This process is followed by other layers such as pooling layers, fully connected layers, and nor-
malization layers (Figure 2.6).
Figure 2.7: Convolution operation. [4]
10 Literature Review
Similar to the convolutional layer, the pooling layer is responsible for reducing the spatial
size of the convolution feature output. This process allows for a decrease of the computational
power required by reducing dimensionality. Furthermore, it is useful for extracting dominant
features which are invariant to positional and rotational alterations, thus avoiding inefficiencies
in the training process. There are two main sorts of pooling, namely Max Pooling and Average
Pooling. Max Pooling is the most popular method, downsampling by only returning the maximum
value from the portion of the image covered by the kernel (exemplified by Figure 2.8). A global
average pooling, on the other hand, performs an extreme type of downsampling, where a feature
map with size of height×width is reduced to a simple 1×1 array. This is completed by averaging
out all the elements in each feature map, whereas the depth of feature maps is retained. This
operation is typically applied only once before the fully connected layers.
Figure 2.8: Max Pooling, downsampling of an input tensor by a factor of 2 [4]
Fully connected (FC) layers map out the acquired features, resulting from convolution and
pooling operations, to the final outputs of the network. FC layers operate as a set by connecting
inputs to outputs through the use of trained weights. In the context of object detection, the outputs
at the end of such networks usually result in sets of probabilities for each class in classification
tasks. The final fully connected layer typically has the same number of output nodes as the number
of classes.
A normalization layer is typically defined as the activation function applied to the last FC
layer in the network, as they usually differ from activation functions utilised for previous layers.
Normalization layers vary according to the specific task at hand. For multiclass classification
purposes, a softmax function is adopted, which normalizes output real values from the last fully
connected layer to target class probabilities. Each value ranges between 0 and 1 and all values
sum to 1.
2.1 Seat Occupancy Monitoring 11
Figure 2.9: Prototype example. Raspberry Pi 3 equipped with Intel Neural Compute Stick 2 andwide-angle camera. [5]
Affordable options for embedded computing power, including camera use, are rapidly becom-
ing available for both industrial applications and research (example on Figure 2.9). These devices
are essential for hardware projects that are reliant on image-based analysis, discarding the use of
a computer to perform external processing. This type of modular function allows for the possibil-
ity of implementing computer vision on-site for applications that would otherwise not be feasible
due to either cost, mobility or size constraints. Running applications while utilising the on-board
camera’s imaging feed proves to be highly beneficial on another aspect, namely permitting the
collection of detailed information, while keeping data private. Ensuring that all data collection
does not involve storage on external systems is key to comply with current privacy regulations and
standards [38].
Figure 2.10: Prototype from Figure 2.9. Image capture and classification [5]
12 Literature Review
Most of the implementations related to room occupancy resort to a similar architecture of
Figure 2.4. Portable camera modules are placed with a clear view at the targeted area, and process
the image to harness relevant data. This information is then transmitted to a central platform for
adequate handling.
By training object-recognition networks on both occupants’, objects’ and architectural con-
text, it is possible to adapt and improve existing solutions for the dissertation’s purpose. That is,
however, not the only avenue for adaptation to a problem’s own set of characteristics. A wide
variety of pre-trained object detection algorithms have been developed in recent years, capable of
running and be relatively accurate on devices with less computing power. These present their own
trade-offs [9] depending on the situation, and can be a building block for small-scale applications
[39, 40, 41].
Among this set of algorithms, there are three main subgroups standing out, both by frequency
of use in similar applications [42, 43] and fit towards specified needs, considering the limited
capability of whatever processing unit is chosen [7]. The ability of distinguishing people as well
as commons objects, and prioritising ease of processing while maximizing precision on detections
stand at the forefront.
Region-based Convolutional Neural Networks (R-CNN)
In the R-CNN setting (and its many variants, Figure 2.11), detection happens in two stages. Dur-
ing the first stage, called the region proposal network (RPN), images are processed by a feature
extractor. The extraction [44] is a necessary step for automatic identification of the objects, which
are to be associated with certain attributes, characterizing and differentiating them. The similarity
between images can be determined through features which are represented as a vector. The various
contents of an image such as color, texture, and shape are used to represent and index an image or
an object and used to predict bounding boxes.
Figure 2.11: Summary of Models in the R-CNN family [6]
In the second stage, these box proposals are used to crop features from the same intermediate
feature map which are subsequently fed to the remainder of the feature extractor in order to predict
2.1 Seat Occupancy Monitoring 13
a specific class for each proposal. The loss function [6] for both stages is identical, the second stage
using results from the RPN as anchors. During this process, there is part of the computation that
must be run once per region, and thus the running time depends on the number of regions proposed
by the RPN. This normally represents substantially longer processing times compared to the other
options evaluated [42, 43].
You Only Look Once (YOLO)
Distinctly from previous detectors, this algorithm employs a single convolutional network for its
predictions, framing an object as a regression problem to spatially separated bounding boxes and
associating class probabilities directly from full images in one evaluation. The complete YOLOv1
network architecture (Figure 2.12) features 24 convolutional layers and 2 fully connected layers.
Figure 2.12: YOLOv1 Architecture [7]
The algorithm [16, 45, 46, 47] divides any given image into a S×S grid. Each grid cell on
the input image predicts a fixed number of boundary (anchor) boxes for an object. As for each
boundary box, the network outputs offset 4 element values (bx, by, bh, bw), one confidence pc and
C conditional class probabilities. The coordinates (bx, by) represent the bounding box’s center
relative to the bounds of the grid cell in the input image. The bw and bh parameters are the
box’s width and height, respectively. The confidence pc is equivalent to the probability that a box
contains an object. C conditional class probabilities point out the likelihood certain objects belong
to a given classi (Figure 2.13).
14 Literature Review
Figure 2.13: Bounding Box Structure (altered) [7]
Single Shot Detector (SSD)
A single convolutional neural network, it is less complex when compared to other methods it
intends to surpass. The SSD architecture is separated into two parts - a base network, most com-
monly MobileNet [48] or VGG16 [49] (Figure 2.14), contributing high quality image classification
applied to the front, and several convolutional feature layers added afterward to predict object de-
tection.
There are several features in the SSD model, such as multi-scale feature maps for detection,
where the sizes of convolutional feature layers added decrease gradually. This allows for pre-
dictions at different scales. Each feature layer uses a different convolutional model to predict
detections.
Each added feature layer or any existing feature layer from the base network can generate a
fixed set of detection predictions by using a set of convolutional filters [50] that are displayed on
the top of the SSD architecture.
For a feature layer with the size m × n and p channel, SSD applies small convolution filters, 3
× 3 in size, to compute the location and class scores for each cell. Subsequently, predictions for a
fixed set of the default bounding box are made. Each one contains its own boundary, with offset
shape to its default box, and scores for all classes. The class set includes a class 0, reserved for
outputs signaling no object detection). The YOLO model, instead of using a convolutional filter,
adopts an intermediate fully connected layer that is discarded by SSD.
2.2 Summary 15
Figure 2.14: SSD Architecture, featuring VGG16 as feature extractor [7]
2.2 Summary
Both technologies have been proven, through commercial applications and research projects, to
be of use in the context of large-scale, real-time occupancy detection. The PIR technology has
been featured far more regularly with seat occupancy monitoring, through installation of sensors
beneath desks and tables. Computer vision, and more concretely CNNs, on the other hand, has
been relied upon primarily for overall room occupancy and movement tracking.
PIR sensors do show some limitations, however, namely the confinement of a module’s effi-
cacy to a single seating option. The technology lacks range and is subject to blockage and so is
limited in its potential for greater area sweepings. This represents a great cost in securing coverage
for a full building, as a large quantity of components become necessary to build and maintain a
network. Due to the form of installation, this ad hoc nature also prevents greater flexibility both in
management of the layout and feature expansion.
To the contrary effect, the concept of networked modules running CNNs is far more open-
ended in its capabilities. The potential for area coverage per module supersedes the PIR sensing,
and the wide array of tools accessible with limited computing power provide a pathway for fur-
ther improvement and diversification of the application, namely the option of detecting common
objects occupying seating areas.
Algorithm families such as You Only Look Once (YOLO) [16] and Single-Shot Detector
(SSD) [51] present a more realistic option for video stream and real-time processing as one-stage
detectors, while Region Based Convolutional Neural Networks [52, 53, 54, 55] (R-CNN) provide
more reliable results at the cost of speed with its two-stage approach [7, 6].
Existing examples from the promising field are an incentive for further research and the pursuit
of a solution to the problem at hand using this technology.
16 Literature Review
Chapter 3
System Requirements
The following analysis aims to define and explore the client and system requirements. Addition-
ally, it provides information about the needs met by the product, its capabilities, its operating
environment, properties and user experience.
Firstly, the functional requirements are set (Section 3.1). These client needs where determined
after meeting with the Library’s Services, confirming, and elaborating on the main objectives of
the solution.
Once the needs are established, the non-functional requirements, whose main purpose is to
support and guide the resolution of the customer’s needs, are listed. There will be a split among
two main subgroups, namely the area dealing with the occupancy detection itself (Subsection
3.2.1), and the correspondent interface for data visualisation and statistical upkeep (Subsection
3.2.2).
3.1 Functional Requirements
These are a direct result of ongoing discussions during client and orientation meetings. The infor-
mation provided engages on the needs met by the product, its capabilities, operating environment,
properties and user experience.
# DescriptionCN1 The system must be able to track and map each seating area’s current status.CN2 The system should be devoid of proprietary components.
CN3The system must include a user interface and cloud-hosted dashboard, including occupancyhistory and statistics for administrators.
CN4 The system should work modularly and allow for easy expansion.CN5 Any system setup should guarantee a low or competitive cost, relative to similar solutions.CN6 The modules should be compact and non-invasive.
Table 3.1: Client Needs
17
18 System Requirements
3.2 Non-Functional Requirements
How the product will be designed, in order to satisfy the previously mentioned needs, is essential
for securing a coherent system design and architecture. The following are divided into two sub-
sections, each corresponding to a different index (Occupation Detection: OD; Dashboard: DB).
Every requirement is classified according to its priority (Mandatory or Desirable) and satisfies at
least one Client Need (CN).
3.2.1 Occupancy Detection
# Description Needs Priority
OD1The system shall reliably detect, count and signal the presence ofpeople, within a certain area.
CN1 Mandatory
OD2The system should reliably detect the presence of common ob-jects, within a certain area.
CN1 Desirable
OD3The system must be able to support usual data operations (includ-ing transmission, processing and storage) for all modules over theexpected lifetime of the system.
CN1, CN3 Mandatory
OD4The system’s technical design (hardware, databases, etc.) mustbe able to scale, and support projected use across floors.
CN4, CN5,CN6
Mandatory
OD5The system must provide a near real-time response to detectionand submit them to the central dashboard.
CN1 Mandatory
OD6The system must support the repair or upgrade of a component ina running system or with minimised downtime.
CN2, CN4,CN5
Desirable
OD7The system must not record any type of personal data, as estab-lished per GDPR [38] rulings (image, video, body temperature,etc.)
CN1 Mandatory
Table 3.2: Requirement Analysis: Detection System
The defined needs are fulfilled by this set of requirements on the hardware level. In terms of
targeted detection, the priority remains to detect people (OD1) occupying certain areas. However,
it became apparent the possibility of signaling the presence of common objects (OD2), such as
books, handbags or folders, is of interest for the Library’s management. OD3 and OD5 relate
to reliability and transmission of the necessary data for a proper integration of the dashboard.
OD4 and OD6 demand an open-sourced, modular system as to secure ease of maintenance and
expandability. Privacy concerns (OD7) dictate the system must be able of processing relevant
inputs without resorting to storage of sensitive data.
3.3 Summary 19
3.2.2 Dashboard
# Description Needs Priority
DB1The user must have access to seat status mapping, and administra-tors to relevant information, such as timestamps of state changesand statistics related to the time each seat is occupied.
CN1, CN3 Mandatory
DB2The system must be extensible and/or have the ability to acceptnew features or functionalities.
CN2, CN4 Desirable
DB3The system must include access to a modifiable, hosted on thecloud database.
CN3 Desirable
Table 3.3: Requirement Analysis: Dashboard
The constitution of the dashboard is of more undefined nature, as the main feature is to display
a mapping of all relevant areas, as well as a complete history of the collected data. DB1 alludes to a
more concrete structure of the required parameters, including timestamps and a status designation
for each seating area.
3.3 Summary
The set of requirements listed creates guidelines for the design of a functional system. For this
purpose, an overall analysis of the client needs was followed by a more detailed approach, ac-
cording to a preliminary division of the system in two sections. After an initial stage of defining
the product’s main goals, as well as the necessary market research, the blueprint is laid out for an
adequate definition of the system’s architecture.
20 System Requirements
Chapter 4
System Architecture
Given the definition of the problem, the context of commercially available solutions, and the re-
quirements set by the previous chapter’s analysis, it is reasonable to consider the most fitting
approach to be one based on vision and image processing.
This chapter will address the main blocks needed for the implementation of the proposed
application, as well as the respective interactions. Sections 4.1 and 4.2 engage in a similar division
to Chapter 3, detailing the components related to the aspect of image processing and analysis,
followed by the system visible to the user, namely the one encompassing a dashboard, supported
by an accessible web interface and database.
4.1 Vision-based Solution
In comparison to an architecture relying on an individual device per seat, such as an infrared-sensor
(IR-sensor) [1, 19, 20, 21, 22, 23, 24], this proposal intends to multiply the detection potential per
module. Regardless of other diverging parameters, every module consists of three key blocks,
namely a processing unit, an individual camera attached to the unit, and an application centered
around a detection algorithm, selected and modified according to the specificity of the targeted
space.
4.1.1 Processing Unit
The processing unit in the presented situation is tasked with running a detection algorithm, while
controlling and processing the video stream captured by the camera. The dimensions of the device
should allow for greater flexibility and ease of installation, specially regarding access to power
and network sources. Additionally, it guarantees the transmission of data to be accessed by the
developed interface, meaning the inclusion of an Ethernet port is necessary for added reliability
and stability.
21
22 System Architecture
4.1.2 Camera Module
Functioning in an integrated form with the computing unit, the camera captures and provides direct
access to its video feed. The device preferably includes a Camera Serial Inteface (CSI) [56] as to
secure higher bit rates (over 2 Gbit/s). The desired mounting with the respective processing unit
will reduce power cabling, as the camera module is supplied through the mentioned CSI port.
The lens option should cover the necessary field of view (FOV) for the targeted areas, which are
mostly replicated across the library’s floors, and clearly delimited by shelves populating the space
(Figure 4.1). Resolution must be maximized and takes priority over frame rate, since quality of
image greatly impacts the success of object detection [9] and movement is largely non-existent or
infrequent in a seated studying environment.
Figure 4.1: Common study area (FEUP Library: 2nd-4th floor) [8]
4.1.3 Definition of detection algorithm
During the initial phase of research, applications similar [39, 40, 57, 58, 59, 60] to the one proposed
used already known models with a clear set of characteristics. YOLOv2 and SSD compare very
favorably regarding speed of processing [42, 43], clocking at around 10x greater FPS than the more
accurate YOLOv3 or the analyzed R-CNNs. Simultaneously, SSD demonstrated lower accuracy
rates, defined as mean Average Precision (or mAP [61, 62]), in its predictions and boundary box
positioning in comparison to its counterparts. In addition to this, there was an issue with lower
resolution imagery and smaller objects, where the SSD failed regularly in detecting an object,
while other models succeeded [48].
4.1 Vision-based Solution 23
However, these traits are relevant for considerably powerful processing units. Even the system
where these findings where confirmed (Table 4.1) possesses far greater computing power than a
potential unit fitting of the defined system architecture. Bearing these factors in mind, it becomes
clear two-stage detectors such as the R-CNNs would perform poorly with lesser hardware. The
fact remains true, even though they reveal high-level precision and sufficient frame rate or speed
of processing for the proposed application, where activity is reduced and movement detection is
not required.
Description ComponentCPU AMD Ryzen 5 3600 6-Core Processor @3.60 GHzGPU NVIDIA GeForce GTX 1650 SUPER (1740 MHz, 4GB GDDR6)RAM 16.0GB
Table 4.1: Main specifications of testing system
The conclusion is further exemplified by comparisons of detection frameworks, such as the
one presented on the research paper introducing the second iteration of the YOLO algorithm [16].
In this instance, trials were run using the PASCAL VOC dataset, and the Geforce GTX Titan X
(1000 MHz, 12 GB GDDR5) as the graphics processing unit (GPU). The comparison (Table 4.2)
shows the faster iterations of the R-CNN framework to clock at around 7 FPS, rather insufficient
given the predicted downgrade in processing capabilities.
Detection frameworks Train mAP FPSFast R-CNN 2007+2012 70.0 0.5Faster R-CNN VGG-16 2007+2012 73.2 7Faster R-CNN ResNet 2007+2012 76.4 5YOLO 2007+2012 63.4 45SSD300 2007+2012 74.3 46SSD500 2007+2012 76.8 19YOLOv2 288x288 2007+2012 69.0 91YOLOv2 352x352 2007+2012 73.7 81YOLOv2 416x416 2007+2012 76.8 67YOLOv2 480x480 2007+2012 77.8 59YOLOv2 544x544 2007+2012 78.6 40
Table 4.2: Comparison of detection frameworks (from Table 3 [16]), PASCAL VOC dataset [17]
Regarding the two remaining one-stage detectors that are indeed the most common option in
similar applications, SSD300 and YOLOv2 stand out as the better compromise between quality
and speed. Given their comparable level, both qualify as reasonable possibilities needing further
testing on the final environment, though previous analysis [9, 43] does show distinctions on perfor-
mance while working with different datasets, and differing resolution inputs. In that regard, while
smaller objects tend to be detected less frequently using SSDs, they do present better outcomes in
24 System Architecture
the fastest detectors category, specially while using the Common Objects in Context dataset (Fig-
ure 4.2). This factor constitutes a considerable advantage, as the COCO [10] dataset is preferred
for this type of solution, providing a large-scale, open-source and particularly effective tool, when
targeting people recognition. It is also in constant development, sponsored by some of the biggest
entities of the field such as Microsoft, the Common Visual Data Foundation and Facebook.
Figure 4.2: Comparison of several detection frameworks [9], COCO dataset [10]
The model of reference going forward is therefore the SSD300, coupled with MobileNet as
feature extractor (lowest GPU time per Figure 4.3) and utilizing the COCO dataset.
Figure 4.3: GPU time (milliseconds) for each model, for image resolution of 300x300 [9]
4.2 Dashboard 25
4.2 Dashboard
Working in parallel to this array of modules is a web interface, displaying a dashboard with rel-
evant data, which in turn will be accessed, stored, and manipulated in a database. These two
elements could be fully integrated or work in tandem, the essential aspect being they represent a
separate system from the vision-based design.
The interface is hosted through a web service, receiving data entries in JSON [63] format from
each individual processing unit. The data is of low complexity, constituted by arrays of integers
containing seating area designation and its respective binary status: occupied or vacant. These
entries can be executed on each module, either by posting directly to an interface based on HTML
through HTTP requests using cURL [64], or populating an independent database (Figure 4.4) by
running SQL commands.
The database would consist of three main classes for structuring of the data. A Floor is iden-
tified by its unique id integer, correspondent to the layout of the building. Every iteration of this
class has one or more detection modules, or Device, associated to itself. The main supported
method is updateTotalCount(), which pulls the total amount of occupied seats indicated by the
pool of associated modules.
A Device is equally identified by its unique id, defined by the administrator, and is responsible
for monitoring at least one designated seating area. The main method getStatus(in id:integer)
indicates the current status for the identified Area. The Area class contains reference to the status,
which can alternate between 0 and 1, and the respective timestamp to each update.
Figure 4.4: Data Structure, each library floor contains a number of active devices responsible formonitoring several seating areas
The simple, and singular, transmission of data for each change of state allows for reliability in
communication between systems, and flexibility in building the different tools for the display of
information.
26 System Architecture
4.3 Summary
The complete architecture of the vision-based system rounds out the specifications of each major
component, according to the product’s objectives and adapted to the final environment. The com-
munication between subsystems will consist of status updates sent out by all modules to the web
interface. Variances throughout the targeted space can possibly determine changes to the overall
design, primarily with regards to the camera module’s lens, though the building remains fairly
consistent and uniform in layout. The implementation of the presented concept is to be thoroughly
tested in a limited area, thus aiming for replication across major areas of the building. The object
detection algorithm, MobileNet SSD300, is selected for deployment, based on past experiences
and characteristics of available models.
Figure 4.5: Model representing major system components and communications
Chapter 5
System Implementation
Following the definition of the solution’s overall architecture, the next step consists of selecting
fitting components towards the implementation of the system. As the system revolves around the
concept of signaling the presence of people (and possibly a certain set of objects), the preliminary
stage of development began with researching suitable object detection algorithms within the scope
of the concept. Analysing these initial findings provided a framework for targeting the remnants
of the system, which include a processing unit and corresponding camera module. Isolated from
these elements is the platform used for the construction of a dashboard and data storage, which
will be reviewed as well.
5.1 Module Composition
5.1.1 Processing Unit
Considering the wide range of possibilities regarding the choice of processing unit, there are solu-
tions designed directly for the purpose of running neural networks, such as the Jetson Nano Devel-
oper Kit [65]. While including specialised software libraries for deep learning, computer vision,
GPU computing and multimedia processing, which facilitate the development process, it does rep-
resent higher costs per unit and proprietary challenges. The Raspberry Pi models, however, are a
popular, versatile and community-supported option for this type of application [66, 67, 68].
As such, the recent Raspberry Pi 4 Model B [69] (Figure 5.1) became a natural solution,
providing more flexibility regarding algorithm adjustments, given its maximization of computing
capability, as well as familiarity of use. It presents a significant leap in processing (Broadcom
BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC @1.5GHz) and connectivity (Wireless
2.4 GHz e 5.0 GHz IEEE 802.11b/g/n/ac) when compared to previous devices. It also remains
silent and portable, as well as requiring similarly low energy consumption levels.
In terms of communications protocols, much like its counterpart the Jetson Nano, it supports
the standards GPIO, I2C, I2S, SPI, and UART. More vitally, the inclusion of Gigabit Ethernet and
CSI port secure the transmission of data to the dashboard and on-board camera module respec-
tively, as per the system architecture. The performance resembles one of a basic x86 PC for a
27
28 System Implementation
reduced cost. Should system demands rise in the future, either through substitution of the running
detection algorithm or another sort of feature expansion, the Raspberry Pi enables the inclusion of
computer vision accelerators. The leading options currently available are the Coral USB Acceler-
ator [70] and the Intel Neural Compute Stick 2 [71].
Figure 5.1: Raspberry Pi 4 Model B (8GB RAM) [11]
5.1.2 Camera Module
Assuming the characteristics of the algorithm and the respective processing unit, resolution of the
image captured is fundamental to accentuate the precision of the results. As for physical features,
the overall size of the module must be supported by the mounted unit, and include the CSI interface
so as to maximize speed of processing.
The Raspberry Pi Camera Module v2 [72] (Figure 5.2) supports 1080p30 and 720p60 video
stream, providing the necessary imagery quality for a normal execution of SSD, which as previ-
ously mentioned can function abnormally when confronted with lower resolution. A 15cm ribbon
cable connects to the CSI port on the Raspberry Pi and allows for many options in positioning and
angling of the lens.
The combined sensor image area (3.68 x 2.76 mm - 4.6 mm diagonal), optical size (1/4”),
focal length (3.04 mm), horizontal (62.2 degrees) and vertical (48.8 degrees) fields of view (FOV)
create an image capable of covering and detecting objects around 5m to either side and 10m deep,
as determined per tests in select areas of the library. This configuration also allows for minimal
distortion, compared to other cameras with greater FOV. This distortion, in turn can be practically
eliminated by proper calibration. Compatibility for attaching other types of lenses is extensive,
should different areas require another approach. Finally, numerous third-party libraries are created
and referenced, including the Picamera Python library [73].
5.2 Image Segmentation and Analysis 29
Figure 5.2: Raspberry Pi Camera Module v2 [12]
5.2 Image Segmentation and Analysis
Figure 5.3: 2nd Floor Plan, camera position (dotted circle) and covered area (black)
The complete module can be installed to cover practically any given location of the library, as
there are continuous wiring and communications ports distributed across the false ceilings of the
main floors. Since these floors are generally identical, once the concept is proven and tested in a
selected area, it can be replicated to others with minor adaptations.
The test site selected is the area depicted on the 2nd floor plan (Figure 5.3). Represented are
the camera module placement, as well as the covered area, resulting in the image of Figure 5.4. On
this site, 4 seating areas (Figure 5.5) are adequately covered and were the basis for the diagnosis
and correction of the system functions.
30 System Implementation
Figure 5.4: Still from equivalent area depicted on Figure 5.3
Figure 5.5: Seating areas
5.2 Image Segmentation and Analysis 31
5.2.1 General Concept
With the module in place and capable of capturing image frames, it is possible to outline and
identify the planned seating areas, as it has been broadly executed on Figure 5.5. These zones are
to be designated either as "occupied" or "vacant", somewhat independent of the number of people
that could find themselves near these spots, as the goal remains to count and identify used seats,
not a total number of people present on the floor. Therefore, a single or multiple detections within
these areas indicate the same outcome.
A detection in a targeted area is defined by the formation of singular bounding boxes indicating
the presence of an object labeled as "person", classified according to the COCO dataset. Once
that occurs, the status of the area can be considered "occupied". The inverse status, "vacant", is
confirmed following a defined limit of non-detections. This procedure accounts for shortcomings
of the module in detecting people on every frame presented, even if they are indeed present, and for
the fact that a seated library environment signifies less occasions of movement or state changes,
allowing for longer periods of analysis. Higher limits naturally result in lower reactivity of the
system.
Figure 5.6: Contention Overlapping Areas, final status determined by the dashboard
Another relevant aspect is the idea of overlapping areas, one prime example being "Area 1"
(Figure 5.5). As the architecture presupposes the formation of a network of cameras across an
entire floor, certain predetermined zones are to be covered simultaneously by a pair of modules.
Assuming another unit is installed on the other side of the shelving, two devices could provide
feedback on the status of "Area 1", which minimizes the occurrence of errors. In these cases,
32 System Implementation
status updates relating to a common identified area are transmitted by all devices involved and
arbitration is done at the higher level of the dashboard, demanding full agreement to consider the
final status to be "vacant". If any singular device sends an "occupied" status as its most recent
update, the area is considered "occupied" (Figure 5.6).
5.2.2 Algorithm Execution
The deployment, using the Caffe framework [74], of the SSD300 on the processing unit is ac-
complished through a script running on Python 3.7. The script explores OpenCV [75] as its main
library, as it is fully open-sourced and possesses extensive documentation on its modules [76], in-
cluding functionalities relating to image processing, object detection, neural networks, and camera
calibration [77]. The core of the application (Figure 5.7) is responsible for processing the captured
frames and classifying the status of each area.
Figure 5.7: Image processing loop in application
5.2 Image Segmentation and Analysis 33
5.2.2.1 Frame Selection
Firstly, a video capture object is created, from which frames will be periodically collected. The
period depends on overall processing time, combined with a counter. This counter (area_cycle on
Figure 5.7) is equivalent to a sampling rate, hard-coded as to define the repeated number of frame
updates belonging to the identical area will be passed through the network, therefore minimizing
the error rate. The image passed through the neural network on each phase of the round-robin
cycle (Figure 5.8) is a 70x70 cropping of the resized frame, equivalent to the bounding boxes
present on Figure 5.5. A full cycle’s duration, where every area is analyzed once, depends on the
total number of areas associated to a device, the defined sampling rate and the usual processing
time per frame. Therefore, it lasts:
areastotal × samplingrate × processing_timeaverage (5.1)
Figure 5.8: Frame selection cycle, each transition occurs after samplingrate × processing units.
5.2.2.2 Status Analysis
Once the area for analysis is defined, the correspondent 70x70 input is resized to 300x300, the
standard for MobileNet, and is normalized through the performance of a mean subtraction (127.5,
127.5, 127.5) to the RGB channels. The resulting input from using blobFromImage() [78] is then
forwarded through the neural network for detection (Figure 5.9).
Figure 5.9: Use of OpenCV modules for image resizing, normalization and network forwarding
34 System Implementation
Mean subtraction is used to help against illumination changes present in input images. This is
meant as an aiding technique for Convolutional Neural Networks. Each image contains a certain
average pixel intensity for each of the Red, Green, and Blue channels. For each training set, the
mean values differ, as is the case for ImageNet (example Figure 5.10), for which the RGB figures
are R=103.93, G=116.77, and B=123.68. In the case of CaffeNet, the values all stand at 127.5.
Figure 5.10: A visual representation of mean subtraction where the RGB mean (center) has beencalculated and subtracted from the original image (left), resulting in the output image (right) [13].
There are two alternative courses upon finding the output from the network, according to the
definition of the general concept.
In case there is a detection belonging to the "person" classification of the dataset, a detection
is immediately signaled by attributing a "1" value to the correspondent position of the detect array
(Figure 5.7). Every position in the mentioned array is a placeholder for the equivalent seating area,
i.e. Area 1 status would be changed by accessing detect[1].
On the other hand, should no detection of the "person" class emerge, a no_hit counter is added
upon which, if hitting a predetermined limit, results in a change of status to "0". On completion,
for both cases, the process returns to the phase of frame updating. The area_cycle counter defines
which area is to be analysed next.
5.3 Dashboard and Data Storage
The system architecture described on Chapter 4 predicted the necessity of not only a dashboard, but
the creation and maintenance of an available database to support it, even considering the possibility
of having both components be fully integrated. In that regard, Thingsboard [15] provides a sound
alternative for implementation, as it includes both aspects in its architecture (Figure 5.11).
Thingsboard is a free of cost, open-sourced, thoroughly documented [14], and customisable
platform, including a HTTP-based (along with MQTT, CoAP) API for connectivity and REST
APIs for the server-side, based on Java and Python. All ThingsBoard components can be launched
in a single Java Virtual Machine (JVM) and share the same OS resources. Since ThingsBoard is
written in Java, there is also a great minimization of required memory to run ThingsBoard, allow-
ing for launches with 256 or 512 MB of RAM in constrained environments, such as a Raspberry
Pi’s operating system. The same OS is fully supported by a native installation of the platform.
5.3 Dashboard and Data Storage 35
Figure 5.11: Thingsboard Monolythic Architecture [14]
Dashboards can be created and hosted by a Web UI on the server side, collecting data from the
Thingsboard Core (Figure 5.11). The Core is, in turn, supported by a database, either PostgreSQL
or Cassandra as the NoSQL option. The content on the dashboard is presented through use of
a built-in widget library [79], editable by tools supporting HTML and Javascript advanced com-
mands. The administrator can also create their own applications based on the appropriate JSON
format.
In the context of the dissertation, Thingsboard functionalities are harnessed in two major as-
pects, namely the organization and post-processing of data. Device profiles are created for every
existing "Area" controlled by the module. These profiles support the management and storage
of telemetry data, which is essentially constituted of area designation and status variables. These
integer types are packaged using the JSON format.
Each profile contains its own unique access token, used by the HTTP protocol to reference the
destination of each cURL request sent by the application running on the Raspberry Pi, either to
a local server or the cloud. A request is sent to its respective device telemetry upon each status
change, from "0" to "1" and vice-versa.
36 System Implementation
Figure 5.12 details the manner in which the process is executed. Once the condition for a state
change of a particular area is triggered, the previous saved status (previous_detect) is updated.
Then follows the building of a data bundle, using JSON format, containing the Area designation
and corresponding detection condition. Depending on the area, the destination URL is constructed,
including references (in order) of the host, API, access token, and the telemetry category.
Figure 5.12: cURL requests: URL composed by host, access token and telemetry specification
The concept of telemetry facilitates the organization and use of data relevant to the dashboard,
including processing capabilities such as accessing both current and previously current values.
The feature is relevant for the arbitration of overlapping zones, referenced on Section 5.2.1 (Figure
5.6). Applying the concept, contending modules send out the corresponding status updates using
an identical access token, identifying the same seating area. When presenting information on the
dashboard, such zones will only be considered "vacant" with the reception of two consecutive,
agreeing updates. Since data is only sent upon a change of status, this logic guarantees coherence
compared to the other typical areas.
5.4 Summary
In accordance with the specifications of the system architecture, the Raspberry Pi 4 Model B was
selected as the on-board processing unit. The Raspberry Pi Camera Module v2 was the indicated
mounted camera for carrying out image capturing, thus completing the modular prototype.
The image processing and analysis is outlined, including the definition and description of the
test site. The general concept, which is to be proved for replication to other areas, is presented.
Execution of the algorithm implementing the concept is chronicled, along with the in-depth aspects
of frame selection and status analysis.
Dashboard and data storage access are combined through the use of the Thingsboard plat-
form. Compatibility with the processing unit, architecture, and communication processes with
each module are addressed.
Chapter 6
System Testing
This chapter provides insight into the methodology of system testing and evaluating the presented
results. Subsection 6.1.1 delves into the working condition of the module and analyses each seat-
ing area’s individual results. Subsection 6.1.2 presents the usage data provided by the tools present
on the dashboard’s API and delves into some options of data presentation for the user. Concluding
this chapter is a section (6.2) summarising and referring to analysis of the content.
6.1 Methods and Results
6.1.1 Model Behavior
The test configuration of the application is set to sample 6 frames per area on the round-robin
cycle, requiring a value of at least 100 on the corresponding no_hit counter to change a seating
area’s status from "occupied" to "vacant". Each phase’s processing time varies slightly around 3
seconds, equating to 2 FPS. Therefore, each full turn of the round-robin is set to last in the range
of 12-13 seconds, in accordance to the estimation on subsection 5.2.2.1. The conjunction of the
defined no_hit limit and the duration of a full cycle equate to a delay of around 50-55 seconds for
each state transition to "vacant".
Figure 6.1: Module’s CPU and Memory Usage while running the application
The delay parameters, as previously stated, can be altered as to eliminate unwanted intervals in
the context of library management. Such short breaks (under 5/10 minutes) from students, which
will quickly return to their place, are effectively irrelevant for determining seat availability. How-
ever, it remains useful at this stage to maintain a minimal temporal window, as to more effectively
analyse the responsiveness of the model to differing areas.
37
38 System Testing
Privacy concerns play a role in determining the evaluation of the system, since footage includ-
ing the likeness of other people must not be recorded and stored. Excluding this possibility closes
off some avenues for large-scale, systematic calculations, such as rating the mAP confidently. In
the context of this application however, it is of greater importance to gather observational evidence,
and qualitatively evaluate the model in response to the desired outcomes.
Figure 6.2: Original (green) and provisional (red) cameras, targeting Area 1
For this purpose, state transitions were supervised during certain periods, while monitoring
the test site live. State transitions are depicted as ON to OFF, and vice-versa. These are equivalent
to declaring a certain seating area occupied (ON) and vacant (OFF). Theoretically, there can be no
false ON designations, neither were any observed. This is related to the need for any detection and
subsequent classification to clear the defined confidence score threshold (0.2 out of 1.0). Unjusti-
fied downtime, however, remains a concern. These are cases where a student remains in place, yet
an OFF-status update still occurs, due to lack of detections during several round-robin cycles, thus
reaching the no_hit limit. The most relevant window of observation was the 24th of May, when a
temporary installation of a second camera (Figure 6.2) on the other side of the shelving allowed to
test the concept of overlapping areas, introduced previously (Subsection 5.2.1, Figure 5.6).
6.1 Methods and Results 39
Figure 6.3: Seating areas tested, targeting Area 1
Figure 6.4: Area 0 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21
Area 0 (Figure 6.4) finds itself essentially in ideal conditions, including lighting, distance and
resolution of image. Status updates are consistent and coherent, apart from select downtimes
lasting less than a minute. These downtimes are equivalent to a missed detection momentarily
translating to "vacant" status, and the immediate inversion after completing the next cycle.
These minor errors, however, would be eliminated in a fully working system tolerant of small
intervals. These intervals, as previously defined, include short absences from each occupant
40 System Testing
(totaling under 5 minutes), which are irrelevant for declaring a seating option to be vacant for
other users. So while there are shortcomings in responsiveness, the individual 1-minute OFF-
miscategorisations do not represent a failure to the system.
Figure 6.5: Area 1 On-Off Status Chart, 08:00AM - 19:30PM
Area 1 (Figure 6.5) shows the effects of its location being at an extreme distance in relation
to its counterparts. Minimal intervals, common to the example shown of Area 0, persist while
becoming more frequent. More importantly, wider, unjustified gaps (see arrow) over 5 minutes
long begin to emerge, which are damaging to data integrity. Figure 6.6 shows how the overlap
logic, a second monitoring enabled by another camera, can be of aid. The dual input for status
compensates low levels of detection and achieves similar levels of sufficiently covered zones, such
as Area 0.
Figure 6.6: Area 1 On-Off Status Chart (w/ provisional camera), 08:00AM - 19:30PM, 24-05-21
6.1 Methods and Results 41
Figure 6.7: Area 2 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21
Sharing the same common space in a table, Area 2 (Figure 6.7) and Area 3 (Figure 6.8) manage
to appreciate the limitations of the model. Area 2 resembles the efficacy of Area 0, as it finds itself
centrally on the main camera’s FOV. Area 3, however, evidences a degradation in performance
due to location at the edge of the frame, experiencing negative effects caused by lower resolution
and minor distortion. This zone also engages on minor downtimes more frequently, yet no record
exists of wider gaps capable of affecting a more tolerant configuration of the system, unlike the
unsupported version of Area 1’s monitoring.
Figure 6.8: Area 3 On-Off Status Chart, 08:00AM - 19:30PM, 24-05-21
42 System Testing
With the introduction of a greater permanence of the ON-status (exemplified by Figure 6.9),
the functionality of the concept is proven to be successful. Another testing occasion, where no_hit
is elevated from 100 to 500, shows how shorter absences not meaningful for the assertion of a
seat’s availability are ignored, along with the minor interruptions evidenced on the previous test-
ing round. Using this configuration, there is no practical discrepancy between number of people
observed and reported through detection.
Figure 6.9: Area 0 On-Off Status Chart, 08:00AM - 17:00PM
The more demanding trial run, however, showed the influence environmental factors can have
on each area’s results. When adopting the concept to form a building-wide system, adaptations
have to be carefully undertaken to ensure precision and efficacy. These adaptations can range
from alternative camera positions, differing lens options for greater resolution or less distortion,
to assigning certain areas to more appropriate modules.
6.1 Methods and Results 43
6.1.2 Dashboard
The dashboard supported by Thingsboard includes several already existing options for the display
of data, including diagnostics. The main categories are related to API Usage, namely the volume
of transport messages (i.e. HTTP requests, Figure 6.10) and telemetry data points stored on the
database (Figure 6.11). It becomes clear a single host is capable of supporting the limits of an
expansion from the test site to the entirety of the library’s public space.
Figure 6.10: Volume of transport messaging
Figure 6.11: Capacity for telemetry data storage
44 System Testing
Looking further into the statistical data over a 2-week period (Figures 6.12, 6.13), it is apparent
both graphs mirror each other, predictably.
Figure 6.12: Transport hourly activity, 14-day period
Figure 6.13: Telemetry persistence hourly activity, 14-day period
6.1 Methods and Results 45
The equivalent information is conveyed through a Time-series [80] Stacked Bar Chart on the
public dashboard, programmed to display the hourly average of state changes on the same time
period. The Figure 6.14 shows the variance and spikes of activity during multiple days, and how
entrance (8:00-9:00AM) and lunch (around 13:00PM) periods are usually the busiest in terms of
movement, for example.
Figure 6.14: Hourly average of state changes over a 2-week period, 4-colored stacked bars
By applying the same time-series data to an equivalent Line Chart (Figure 6.15), it is demon-
strable all areas, though displaying some differing traits, follow common trends over longer sam-
ples of time. As it should be expected, given all areas are located on a specific zone, with shared
behavioral patterns.
Figure 6.15: Hourly average of state changes over a 2-week period, 4-colored lines
46 System Testing
Separate area Status Charts (Figure 6.16), similar to the ones viewed on the previous subsec-
tion 6.1.1, can be included with their time windows altered to display changes over longer periods
of time as well. Combining the four graphs into a single one in stacking mode offers a historical
review of the hourly average count (Figure 6.17).
Figure 6.16: 4 Area Status Charts over a 2-week period
Figure 6.17: Total hourly average over a 2-week period
6.1 Methods and Results 47
For display of the real-time total count (Figure 6.18), a HTML card from the widget library is
customised, including a script pulling data from the required entities and adding up the status data
(Figure 6.19).
Figure 6.18: Current total calculated upon clicking Update
Figure 6.19: View of widget’s (Figure 6.18) HTML editor
48 System Testing
Status mapping is intended to display the location and current occupation of each individual
seating area. This is accomplished by creating an Image Map [81] widget, utilizing the layout of
the selected floor as a background (Figure 6.20 and 6.21).
Figure 6.20: Image Map with zero occupied areas (all green) and thermometer (blue)
Permanent markers are placed on their areas’ respective position, in accordance with the rep-
resented layout. The status of each area is pictured by changing the marker’s color dynamically:
green for "vacant", and red for "occupied". A marker is also placed to indicate the status of the
entire zone, grouping status of all belonging areas. Monitoring the total count, the makeshift ther-
mometer evolves in level and color, rising from blue, to green, orange and finally red (Figure 6.22),
which indicates full capacity.
6.2 Summary 49
Figure 6.21: Image Map with two occupied (red), vacant (green), areas and thermometer (green)
Figure 6.22: Remaining versions of thermometer marker [15]
6.2 Summary
The usage, parameters, and configuration of the module’s running application are presented and
contextualized before analysing the scanned areas’ distinct traits. Comparison and contrast target
the limitations defining some of the trends, influenced generally by factors such as resolution,
distortion, or lighting. The working version of the system is proven to function correctly, utilizing
a more lenient no_hit limit for elimination of minor errors and irrelevant short absences.
Usage and diagnostics data provided by the dashboard’s own API are discussed, and related
to activity information presented on the platform. Further widgets, customised from Thingsboard
library, are depicted. Their creation, as well as variations according to received data, are described
in detail.
50 System Testing
Chapter 7
Conclusion and Future Development
7.1 Conclusion
Current widespread methods for seat occupancy monitoring rely on infrared sensing technology
for determining the presence of humans. This technique reveals itself often times limiting in terms
of reliability, being applied mostly in the context of individual seating to avoid inaccuracies caused
by blocking or lack of range. Generally, commercial systems show a tremendous dependency on
large-scale hardware installations and an inability to detect other forms of occupation, namely by
common objects.
As such, in an effort to explore systems capable of greater efficiency and expansion in terms of
functionality, the work of the dissertation focused on the conception, development and validation
of a computer vision-based solution, utilizing the capabilities of convolutional neural networks.
The beginning of the thesis involved a deeper understanding of the current situation in build-
ing occupancy detection. Researching the state of the art regarding this subject allowed for the
unearthing of the main technologies and systems being applied and developed. From this starting
point, it is possible to determine the dissertation’s proposal for a working system in the context of
a library is valuable for its potential, and worthy of exploration.
Future system requirements were defined in accordance with input and demands from the Li-
brary’s services, resulting in the elaboration and analysis of a general architecture. The architec-
ture of the vision-based system rounds out the specifications of each major component, according
to the defined objectives and adapted to the final environment. This includes the definition of
communication protocols and data structure for the dashboard. Options regarding object detection
algorithms are explored, resulting in a narrower selection based on past experiences and charac-
teristics of the models.
Subsequently, a working prototype was implemented on a select testing area, aiming to prove
and replicate the concept over the entire building. The module consists of a Raspberry Pi 4 Model
B for on-board processing, and a mounted Raspberry Pi Camera Module v2 for image capture.
The running application is supported by the MobileNet SSD300 object detector, deployed through
51
52 Conclusion and Future Development
the Caffe deep learning framework. As to enable the use of image processing and analysis func-
tions, an open-sourced, real-time optimized Computer Vision library (OpenCV) was used. The
behaviour of the model was analysed through live observation of the targeted seating areas and the
corresponding state transitions.
The results obtained during stringent testing conditions point to some limitations in certain
areas experiencing non-ideal conditions, suffering from occasional drop-offs in determined status
and reality. However, the working system accounting for short breaks, and harnessing the concept
of overlapping redundancy eliminate such errors. Small gaps and real periods of absence below
the five-minute mark are considered irrelevant to determine true availability of a seating area for
interested parties.
In summation, the primary goals of the dissertation were achieved, with exception to the de-
sirable option of detecting occupation by common objects, as the task was deemed to be of higher
complexity than the scope of this work allowed. The necessary stages of analysis, design, and
implementation of a computer vision system idealised for seat occupancy monitoring were ac-
complished.
7.2 Future Development
Though the principal objectives were achieved, it is possible to ensure greater real-time accuracy
and allow for a more complete mapping of occupancy. Considering these aspects, future develop-
ment should focus on some key areas for system improvement.
The work of this dissertation determined the functionality of the general concept and ascer-
tained the differing aspects between analysed areas. Verifying the error in total people counting
would obviously demand a wider testing setup to guarantee a large enough sample, as it has been
done in research reviewed beforehand. Francesco Paci et al. (2014) characterized the system
through full-day observations, but also by calculating the Mean Absolute Error and Root Mean
Square Error [32]. Single modules were proven to function correctly, yet assessing a wider net-
work, and the complications introduced by time synchronisation, involves the use of these equa-
tions.
Working with the current prototype, the main bottleneck in the model’s efficiency is tied to the
limitations of available datasets, such as COCO or PASCAL VOC. Though these are constructed
to deal with data-hungry neural networks, they can be expanded by applying the technique of Data
Augmentation. The method involves progressive resizing, random image rotations, shifting, as
well as vertical and horizontal flipping of existing images, creating and multiplying elements of
the dataset.
Further experimentation and fine-tuning in other areas of the building with the same prototype
should also be considered. Hyper-parameter tuning can include optimisation of the Learning Rate,
and alterations in batch size to maximize the capacity of the processing unit. Increased model
capacity, either by exploring the addition of layers for deepening of the network or extra filtering
in each convolutional area, is worthy of examination.
7.2 Future Development 53
Departing from the current prototype, more powerful combinations of hardware components,
neural networks, feature extractors, and datasets can be targets of experimentation. This pathway
might enable a system capable of better overall precision and of non-human occupancy detection.
54 Conclusion and Future Development
References
[1] Pressac Communications Limited. Wireless desk occupancy sensors, 2021. https://www.pressac.com/desk-occupancy-sensors/, accessed: 27.06.2021.
[2] David Van Ess. Pyroelectric Infrared Motion Detector, PSoC Style, 2009. https://www.cypress.com/file/90886/download, accessed: 27.06.2021.
[3] Tony DiCola Lady Ada. PIR Motion Sensor, 2014. https://learn.adafruit.com/pir-passive-infrared-proximity-motion-sensor/how-pirs-work,accessed: 27.06.2021.
[4] Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. Convolutionalneural networks: an overview and application in radiology. Insights into Imaging, 9(4):611–629, Aug 2018.
[5] Jens Jørgensen, Martin Tamke, and Kåre Poulsgaard. Occupancy-informed: Introducing amethod for flexible behavioural mapping in architecture using machine vision. In Proceed-ings of the 2020 eCAADe Conference, September 2020.
[6] Lilian Weng. Object Detection for Dummies Part 3: R-CNN Family. lilianweng.github.io/lil-log, 2017. http://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html, accessed: 27.06.2021.
[7] G.S. Peng. Performance and Accuracy Analysis in Object Detection. California State Uni-versity San Marcos, 2019.
[8] Biblioteca Serviço de Documentação e Informação. GUIA DA BIBLIOTECA PARANOVOS ESTUDANTES: Espaços para estar, 2021. https://feup.libguides.com/novosestudantes/espacos, accessed: 27.06.2021.
[9] Jonathan Huang, V. Rathod, Chen Sun, Menglong Zhu, A. Balan, A. Fathi, Ian S. Fischer,Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. Speed/Accuracy Trade-Offs for ModernConvolutional Object Detectors. 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 3296–3297, 2017.
[10] Tsung-Yi Lin, M. Maire, Serge J. Belongie, James Hays, P. Perona, D. Ramanan, Piotr Dol-lár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014.
[11] PC Componentes. Raspberry Pi 4 Modelo B 8GB. https://www.pccomponentes.pt/raspberry-pi-4-modelo-b-8gb, accessed: 27.06.2021.
[12] sparkfun. Raspberry Pi Camera Module V2. https://www.sparkfun.com/products/14028, accessed: 27.06.2021.
55
56 REFERENCES
[13] Adrian Rosebrock. Deep learning: How OpenCV’s blobFromImage works, Nov 2017. https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/, accessed: 27.06.2021.
[14] Thingsboard. ThingsBoard Documentation, 2021. https://thingsboard.io/docs/,accessed 27.06.2021.
[15] Thingsboard. ThingsBoard: Open-source IoT Platform, 2021. https://thingsboard.io/, accessed 27.06.2021.
[16] Joseph Redmon and Ali Farhadi. YOLO9000: Better, Faster, Stronger. 2017 IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017.
[17] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.The Pascal Visual Object Classes Challenge: A Retrospective. International Journal ofComputer Vision, 111(1):98–136, January 2015.
[18] Serviços Biblioteca FEUP. Ocupação dos Pisos, 2021. https://sites.google.com/g.uporto.pt/ocupa-pisos-bibfeup/home, accessed: 27.06.2021.
[19] Abintra Consulting. Occupancy, 2021. https://abintra-consulting.co.uk/products/occupancy/, accessed: 27.06.2021.
[20] Pressac Communications Limited. Wireless room occupancy sensors, 2021. https://www.pressac.com/room-occupancy-sensors/, accessed: 27.06.2021.
[21] floorsense. make sense of the modern workplace, 2021. https://floorsen.se/,accessed: 27.06.2021.
[22] Workplace Occupancy. Workplace Efficiency Monitoring Systems, 2021. https://workplaceoccupancy.com/#occupancy, accessed: 27.06.2021.
[23] infsoft. infsoft Occupancy, 2021. https://www.infsoft.com/solutions/products/infsoft-occupancy, accessed: 27.06.2021.
[24] fm:systems. Occupancy Sensors, 2021. https://fmsystems.com/our-solutions/employee-experience/workplace-occupancy-utilization-sensors/,accessed: 27.06.2021.
[25] Noureddine Lasla, Messaoud Doudou, Djamel Djenouri, Abdelraouf Ouadjaout, and CherifZizoua. Wireless energy efficient occupancy-monitoring system for smart buildings. Perva-sive and Mobile Computing, 59:101037, 2019.
[26] Khirod Chandra Sahoo and Umesh Chandra Pati. IoT based intrusion detection system usingPIR sensor. In 2017 2nd IEEE International Conference on Recent Trends in Electronics,Information Communication Technology (RTEICT), pages 1641–1645, 2017.
[27] Shengjun Xiao, Linwang Yuan, Wen Luo, Dongshuang Li, Chunye Zhou, and ZhaoyuanYu. Recovering Human Motion Patterns from Passive Infrared Sensors: A Geometric-Algebra Based Generation-Template-Matching Approach. ISPRS International Journal ofGeo-Information, 8(12), 2019.
[28] W.L. Yu, Zhen Wang, and Lei Jin. The experiment study on infrared radiation spectrum ofhuman body. In Proceedings of 2012 IEEE-EMBS International Conference on Biomedicaland Health Informatics, pages 752–754, 2012.
REFERENCES 57
[29] Marine City. Fresnel Lenses, 2007. https://web.archive.org/web/20070927021951/http://www.marinecitymich.org/Blank%20Page.htm, accessed:27.06.2021.
[30] TrueOccupancy. Workplace Occupancy Sensors: True Occupancy technology, 2021. https://www.trueoccupancy.com/technology, accessed: 27.06.2021.
[31] Analog Devices. ADI Vision-Based Occupancy Sensing Solutions, 2021. https://www.analog.com/en/design-center/landing-pages/002/apm/vision-based-occupancy-sensing-solutions.html#, accessed: 27.06.2021.
[32] Francesco Paci, Davide Brunelli, and Luca Benini. 0, 1, 2, many — A classroom occupancymonitoring system for smart public buildings. In Proceedings of the 2014 Conference onDesign and Architectures for Signal and Image Processing, pages 1–6, 2014.
[33] Jie Zhang Zhi Liu and Li Geng. An Intelligent Building Occupancy Detection System BasedOn Sparse Auto-encoder. 2017 IEEE Winter Conference on Applications of Computer VisionWorkshops, 2017.
[34] Yosefa Gilon, Fei-Fei Li, Ranjay Krishna, and Danfei Xu. Convolutional Neural Networks(CNNs / ConvNets), 2021. https://cs231n.github.io/convolutional-networks/, accessed: 27.06.2021.
[35] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti A. Zadeh. Feature Extraction.Studies in Fuzziness and Soft Computing. Springer Berlin Heidelberg, 2006.
[36] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to documentrecognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[37] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for Activation Functions.arXiv preprint arXiv:1710.05941, 7:1, 2017.
[38] Intersoft Consulting. General Data Protection Regulation (GDPR), 2021. https://gdpr-info.eu/, accessed: 27.06.2021.
[39] Shashank Karthik M., Rohit Poduri, and Sachchidanand Deo. Seat Occupancy Detection,2016. http://icsl.ee.columbia.edu/iot-class/2016fall/group11/#system, accessed: 27.06.2021.
[40] Aditya Kunar. Object Detection with SSD and MobileNet, Jul 2020. https://aditya-kunar-52859.medium.com/object-detection-with-ssd-and-mobilenet-aeedc5917ad0, accessed: 27.06.2021.
[41] Adrian Rosebrock. YOLO object detection with OpenCV. https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv, accessed:27.06.2021.
[42] Rafael Padilla, Wesley L. Passos, Thadeu L. B. Dias, Sergio L. Netto, and Eduardo A. B.da Silva. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics, 10(3), 2021.
[43] Jonahan Hui. Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN,SSD, FPN, RetinaNet and YOLOv3), March 2018. https://jonathan-hui.medium.com/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359, accessed: 27.06.2021.
58 REFERENCES
[44] Aastha Tiwari and Anil Kumar Goswami amd Mansi Saraswat. Feature Extraction for ObjectRecognition and Image Classification. International Journal of Engineering Research andTechnology (IJERT), 02(10), October 2013.
[45] Joseph Redmon, S. Divvala, Ross B. Girshick, and A. Farhadi. You Only Look Once: Uni-fied, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 779–788, 2016.
[46] Joseph Redmon and Ali Farhadi. YOLOv3: An Incremental Improvement. arXiv preprintarXiv:1804.02767, 2018.
[47] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. YOLOv4: OptimalSpeed and Accuracy of Object Detection. arXiv e-prints, pages arXiv–2004, 2020.
[48] Sik-Ho Tsang. Review: SSD — Single Shot Detector (Object Detection), 2018. https://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11, accessed: 27.06.2021.
[49] Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating Very Deep Convo-lutional Networks for Classification and Detection. IEEE Transactions on Pattern Analysisand Machine Intelligence, 38:1943–1955, 2016.
[50] Yawei Li, Shuhang Gu, Luc Van Gool, and Radu Timofte. Learning Filter Basis for Convo-lutional Neural Network Compression. 2019 IEEE/CVF International Conference on Com-puter Vision (ICCV), pages 5622–5631, 2019.
[51] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot MultiBox Detector. In ECCV, 2016.
[52] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierar-chies for accurate object detection and semantic segmentation. 2014 IEEE Conference onComputer Vision and Pattern Recognition, pages 580–587, 2014.
[53] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, 39:1137–1149, 2015.
[54] Ross B. Girshick. Fast R-CNN. 2015 IEEE International Conference on Computer Vision(ICCV), pages 1440–1448, 2015.
[55] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: Object Detection via Region-basedFully Convolutional Networks. In Advances in Neural Information Processing Systems, vol-ume 29, pages 379–387, 2016.
[56] MIPI Alliance. MIPI Camera Serial Interface 2 (MIPI CSI-2), 2021. https://www.mipi.org/specifications/csi-2, accessed: 27.06.2021.
[57] Praveen Pavithran. How to run object detection on CCTV feed, 2020. https://cloudxlab.com/blog/how-to-run-yolo-on-cctv-feed/, accessed: 27.06.2021.
[58] Faizan Shaikh. Using Deep Learning, 2017. https://www.analyticsvidhya.com/blog/2017/08/finding-chairs-deep-learning-part-i/, accessed:27.06.2021.
REFERENCES 59
[59] eMaster Class Academy. Python: Real Time Object Detection (Image, Webcam, Video files)with Yolov3 and OpenCV, 2020. https://www.youtube.com/watch?v=1LCb1PVqzeY, accessed: 27.06.2021.
[60] Igor Panteleyev. How To Implement Object Recognition on Live Stream, 2017. https://www.iotforall.com/objects-recognition-live-stream-yolo-model,accessed: 27.06.2021.
[61] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of MachineLearning Research, 12:2825–2830, 2011.
[62] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, OlivierGrisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Lay-ton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machinelearning software: experiences from the scikit-learn project. In ECML PKDD Workshop:Languages for Data Mining and Machine Learning, pages 108–122, 2013.
[63] Tim Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259,December 2017.
[64] Daniel Stenberg. Everything cURL. 2018.
[65] NVIDIA. Jetson Nano Developer Kit, 2021. https://developer.nvidia.com/embedded/jetson-nano-developer-kit, accessed: 27.06.2021.
[66] Leigh Johnson. Real-time Object Tracking with TensorFlow, Raspberry Pi, and Pan-TiltHAT, 2019. https://towardsdatascience.com/real-time-object-tracking-with-tensorflow-raspberry-pi-and-pan-tilt-hat-2aeaef47e134,accessed: 27.06.2021.
[67] Shawn Hymel. How to Perform Object Detection with TensorFlow Lite on Raspberry Pi.https://www.digikey.com/en/maker/projects/how-to-perform-object-detection-with-tensorflow-lite-on-raspberry-pi/b929e1519c7c43d5b2c6f89984883588, accessed: 27.06.2021.
[68] Klym Yamkovyi. Object detection with Raspberry Pi and Python, 2018. https://medium.datadriveninvestor.com/object-detection-with-raspberry-pi-and-python-bc6b3a1d4972, accessed: 27.06.2021.
[69] Raspberry Pi Foundation. Raspberry Pi 4 Computer Model B, 2021. https://datasheets.raspberrypi.org/rpi4/raspberry-pi-4-product-brief.pdf, accessed:27.06.2021.
[70] Coral AI. USB Accelerator, 2019. https://coral.ai/docs/accelerator/datasheet/, accessed: 27.06.2021.
[71] Intel. Intel® Neural Compute Stick 2, 2018. https://www.intel.com/content/dam/support/us/en/documents/boardsandkits/neural-compute-sticks/NCS2_Datasheet-English.pdf, accessed: 27.06.2021.
[72] Raspberry Pi Foundation. Camera Module. https://www.raspberrypi.org/documentation/hardware/camera/, accessed: 27.06.2021.
60 REFERENCES
[73] Dave Jones. Picamera, 2016. https://picamera.readthedocs.io/en/release-1.13/, accessed: 27.06.2021.
[74] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Gir-shick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for FastFeature Embedding. Proceedings of the 22nd ACM international conference on Multimedia,2014.
[75] G. Bradski. The OpenCV library. Dr Dobb’s J. Software Tools, 25:120–125, 2000.
[76] OpenCV. Modules. https://docs.opencv.org/4.5.2/modules.html, accessed:27.06.2021.
[77] Alexander Mordvintsev. Camera Calibration. https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html, accessed: 27.06.2021.
[78] OpenCV. Deep Neural Network module. https://docs.opencv.org/3.4/d6/d0f/group__dnn.html#ga29f34df9376379a603acd8df581ac8d7, accessed:27.06.2021.
[79] Thingsboard. Dashboards. https://thingsboard.io/docs/user-guide/dashboards/#widgets, accessed: 27.06.2021.
[80] Thingsboard. Widgets library: Time-series. https://thingsboard.io/docs/user-guide/ui/widget-library/#time-series, accessed: 27.06.2021.
[81] Thingsboard. Widgets library: Maps widgets. https://thingsboard.io/docs/user-guide/ui/widget-library/#maps-widgets, accessed: 27.06.2021.