actinfo: information platform for physical activity · i declare that this document is an original...
TRANSCRIPT
Actinfo: Information Platform for Physical Activity
Alexandre Silva Carreira
Thesis to obtain the Master of Science Degree in
Biomedical Engineering
Supervisors: Prof. Dr. Mário Jorge Costa Gaspar da SilvaProf. Dr. Maria de Fátima Marcelina Baptista
Examination Committee
Chairperson: Prof. Dr. João Miguel Raposo SanchesSupervisor: Prof. Dr. Mário Jorge Costa Gaspar da Silva
Member of the Committee: Dr. Pedro Alexandre Barracha da Guerra Júdice
June 2019
ii
Preface
The work presented in this thesis was performed at the Exercise and Health Lab, Faculty of Human
Kinetics, University of Lisbon (Lisbon, Portugal), during the period September 2018-March 2019, under
the supervision of Prof. Fatima Baptista. The thesis was co-supervised at Instituto Superior Tecnico by
Prof. Mario J. Silva and Prof. Bruno Martins.
iii
iv
Declaration
I declare that this document is an original work of my own authorship and that it fulfills all the require-
ments of the Code of Conduct and Good Practices of the Universidade de Lisboa.
v
vi
Acknowledgments
First and foremost, I would like to express my appreciation to Professor Fatima Baptista, Professor Mario
J. Silva and Professor Bruno Martins for all their assistance, guidance and availability throughout this
project.
I would like to offer my special thanks to Joao Magalhaes and Pedro Judice for all the valuable input,
feedback and for welcoming me and showing me the ropes during my time at the Exercise and Health
Lab.
I also wish to acknowledge the help provided by all the researchers at the lab who allowed me to use
the data which made this work possible and, in particular, those who took the time to test the platform,
providing me with valuable feedback for evaluating the prototype.
Finally, I wish to thank my parents and brother for their support and encouragement throughout this
journey.
vii
viii
Resumo
Esta dissertacao apresenta o desenvolvimento e avaliacao de uma nova plataforma de informacao para
gestao de dados de actividade fısica (AF), Actinfo. Construıda com base num stack de tecnologias
JavaScript de codigo aberto (MEAN), esta plataforma e tanto um repositorio de estudos de AF como
uma ferramenta para operar sobre ficheiros de actigrafia. Actinfo fornece visualizacoes de estatısticas
relevantes geradas a partir de informacao em estudos de AF, bem como ferramentas para comparacao
de dados de AF de estudos diferentes. O modelo de dados utilizado tem por base o padrao FHIR, asse-
gurando interoperabilidade com informacao clınica. Para validar a precisao das ferramentas de proces-
samento de dados implementadas foi conduzido um estudo comparativo, para comparar indicadores
temporais de AF e o cumprimento com recomendacoes de AF entre dois grupos: uma populacao de
adultos com diabetes tipo II, estudo ”D2FIT” (n=73) e uma amostra da populacao adulta do munincıpio
de Lisboa, estudo ”ProjCML” (n=69). Concluiu-se que uma menor percentagem dos participantes do
estudo D2FIT cumpre com as recomendacoes de AF (18% vs. 35%), e que os mesmos atingem, em
media, um menor tempo em comportamento sedentario (71.89% vs. 72.12% do tempo de utilizacao do
acelerometro), menor tempo em AF de intensidade moderada a vigorosa (3.99% vs. 4.88% do tempo de
utilizacao do acelerometro) e um maior numero de interrupcoes no comportamento sedentario (10.04
vs. 9.64 interrupcoes/hora em comportamento sedentario), por dia. Resumindo, foi desenvolvido um
prototipo funcional de uma plataforma de gestao de dados de AF com boa usabilidade.
Palavras-chave: Actividade fısica, actigrafia, plataforma web, MEAN stack, FHIR
ix
x
Abstract
This dissertation presents the development and assessment of Actinfo, a new information platform for
the management of physical activity (PA) data. Built using a full-stack of open-source JavaScript tech-
nologies (MEAN), this platform is both a repository of PA studies and a tool for performing a number of
operations on actigraphy files. Actinfo provides visualizations of relevant statistics from information in
PA studies, as well as tools for comparing PA data from different studies. Data in Actinfo is modelled
after the FHIR standard for healthcare information exchange, to ensure interoperability with clinical data.
To validate the accuracy of the data processing tools implemented, a comparative study was carried, to
compare computed PA time indicators and compliance with PA recommendations between two studies:
a population of adult patients of type II diabetes, i.e. the study ”D2FIT” (n=73) and a sample of the adult
population of the municipality of Lisbon, i.e. the study ”ProjCML” (n=69). It was possible to conclude that
a lower percentage of participants in study D2FIT attain sufficient physical activity (18% vs. 35%), and
that subjects in this study average lower sedentary times per day (71.89% vs. 72.12% of accelerometer
wear time), less time in moderate- to vigorous-intensity PA per day (3.99% vs. 4.88% of accelerometer
wear time) and a higher number of interruptions in sedentary behaviour (10.04 vs. 9.64 breaks/hour of
sedentary time). In summary, it was possible to achieve a functional prototype of a PA data management
platform with good usability.
Keywords: Physical activity, actigraphy, web platform, MEAN stack, FHIR
xi
xii
Contents
1 Introduction 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background on physical activity 5
2.1 Physical activity and health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Physical activity: role in health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Classifying physical activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 WHO’s recommendations on physical activity . . . . . . . . . . . . . . . . . . . . . 8
2.2 Objectively measured physical activity: actigraphy . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Supporting technology 17
3.1 Web application architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 The MEAN stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Angular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Express . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.4 MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 The FHIR standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Security and authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 ActInfo 29
4.1 Overview of the platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 Web server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.3 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
xiii
4.2.1 Administrator role . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Researcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Compliance with data protection regulations . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Assessing Actinfo 51
5.1 Conformity with requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Platform usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Comparative study with two adult populations . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.2 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.4 Discussion of experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Conclusions and future work 69
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography 73
A Entity types and document fields 79
B User interface 83
C Exported Excel file example 87
D Consent forms 91
xiv
List of Tables
5.1 Conformity of the platform’s features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 SUS scores of Actinfo’s evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Number of breaks per hour of sedentary time . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Daily average ST, from Actilife. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Daily average time in MVPA, from Actilife. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6 Daily average number of breaks/ST hour, from Actilife. . . . . . . . . . . . . . . . . . . . . 65
5.7 Mean absolute error for each computed PA time indicator. . . . . . . . . . . . . . . . . . . 66
A.1 Fields in documents with the user entity type . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.2 Fields in documents with the researchStudy entity type . . . . . . . . . . . . . . . . . . . 80
A.3 Fields in documents with the studyGroup entity type . . . . . . . . . . . . . . . . . . . . . 80
A.4 Fields in documents with the researchSubject entity type . . . . . . . . . . . . . . . . . . 81
A.5 Fields in documents with the file entity type . . . . . . . . . . . . . . . . . . . . . . . . . . 81
xv
xvi
List of Figures
2.1 Actigraph Corp. activity monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Processing of accelerometer data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 *.agd file schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 3-tier architecture of web applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Request-response flow in the MEAN stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Relational database vs. MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Modelling relationships in MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Data model for Actinfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Actinfo homepage, login menu and profile page . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 ”admin” interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 ”My studies” interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 ”New study” form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Interface for a created study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7 Study group interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 Validation settings interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.9 Output interface (summary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.10 Output interface (detailed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.11 ”Group statistics” page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 SUS questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Subject information and validation details (example) . . . . . . . . . . . . . . . . . . . . . 58
5.3 Demographics for the analyzed population . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Distribution of compliance with PA recommendations . . . . . . . . . . . . . . . . . . . . . 60
5.5 Distribution of compliance with physical activity recommendations (males vs. females) . . 61
5.6 Distribution of daily sedentary time for both studies . . . . . . . . . . . . . . . . . . . . . . 62
5.7 Distribution of daily sedentary time for both studies (males vs. females) . . . . . . . . . . 62
5.8 Distribution of daily time in MVPA both studies . . . . . . . . . . . . . . . . . . . . . . . . 63
5.9 Distribution of daily time in MVPA both studies (males vs. females) . . . . . . . . . . . . . 63
B.1 File uploader interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xvii
B.2 Custom cut point form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.3 Custom cut point example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.4 Commonly used bouts and breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
B.5 Custom bout form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
C.1 ”Summary” sheet for exported Excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
C.2 ”Daily” sheet for exported Excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
D.1 Consent form signed by participants of the D2FIT study (front) . . . . . . . . . . . . . . . 92
D.1 Consent form signed by participants of the D2FIT study (back) . . . . . . . . . . . . . . . 93
D.2 Consent form signed by participants of the CML study . . . . . . . . . . . . . . . . . . . . 94
xviii
Chapter 1
Introduction
The effect of physical activity (PA) in the health of individuals has been studied for decades and motivated
the development of more robust methods for exploring the PA-health relationship. Despite the benefits of
meeting adequate levels of physical activity being widely accepted (and even part of the common sense
of the general population), the interest in further investigating more intricate aspects of this relationship
has not by all means plummeted. In fact, research in the field of PA has progressed tremendously over
the few decades, making use of various emerging technologies and methods along the way. Specifically,
current research often relies on the use of accelerometers (usually integrated in an activity monitor,
called an actigraph) for the objective measurement of physical activity. These devices, which emerged in
the 1980s and 1990s, and are now commercialized in a larger scale (Troiano et al., 2014). The availability
and accuracy of such tools has made it easier to profile PA, not only at the level of the individual,
but also at the scale of a population, allowing the categorization of PA across a group of subjects.
However, new tools also imply more data and, with that, the need to properly process, store and extract
relevant information from that data. Particularly, in research facilities, there is a growing tendency to
perform statistical analysis over larger datasets, originating from the aggregation of population samples
from multiple PA studies. This results in a rising need for systems that centralize the vast amount of
information generated, storing it under internationally recognized standards, while at the same time
allowing the analysis of the data stored.
At the Exercise and Health Laboratory (EHLab, Faculty of Human Kinetics, University of Lisbon),
research involving this type of data are part of everyday life. As stated in their mission, they strive to
”(...) lead and innovate in research and dissemination of models, methods, and interventions to treat or
prevent the unhealthy effects of sedentary behavior, to further understand the role of physical activity in
health and disease (...)” 1. It is, therefore, of no surprise the constant need to generate, process and store
data collected from activity monitors, in the form of actigraphy files. The problem, however, arises when
said storage is spread and unstandardized, hindering the researchers’ workflow when trying to reuse
data previously collected for conducting new studies. No centralization system for the data collected is
implemented whatsoever, nor is it straightforward to, for instance, compare results from different physical
1EHLab: http://www.fmh.utl.pt/en/research/exercise-and-health
1
activity studies to one another. Additionally, there is a lack of readily available tools for analyzing said
data, as the lab staff relies heavily on proprietary software to do so (currently, a very limited number of
software licenses makes it difficult for more than two researchers to use it at the same time). Tackling
these two problems, i.e., on the one hand, the nonexistence of a system in which physical activity data
is centralized and stored under international standards and, on the other hand, the heavy dependence
on proprietary software, would greatly benefit research conducted at the EHLab, both from a workflow
optimization point of view and a data integrity one.
There is an obvious need for a centralized information system, coupled with a necessity to develop
a more accessible tool to handle and process the files. Additionally, there is an interest in improving on
certain features of the currently used software to score and validate actigraphy data. This makes the
EHLab a perfect candidate for the development and implementation of a new platform capable of meeting
these needs. This platform should be able to store physical activity information under international
information standards, ensuring interoperability with clinical data to the best of its extent. Furthermore,
the platform should be equipped with tools for processing and generating relevant statistical information
from sets of actigraphy files.
1.1 Objectives
The main objectives of the work presented in this dissertation were as follows:
• Develop a new web information platform, Actinfo, for PA data, with the following requirements:
– Allow centralization of the currently sparsely stored data from physical activity studies, at the
EHLab, integrating multiple studies under one system;
– Allow the visualization and export of results from, not only individual studies, but also the
comparison between different physical activity studies in the platform;
– Offer tools for processing actigraphy files from studies uploaded to the platform, namely for
validation of accelerometer wear time, computation of physical activity time indicators and
generation of visualizations for relevant population statistics;
– Store data in the platform’s database following, as best as possible, international standards
for healthcare information, allowing it to be as interoperable with clinical data as possible;
– Data processing should be in compliance with the EU General Data Protection Regulation
(GDPR) and with Portugal’s Data Protection Authority (CNPD).
• Evaluate the platform’s current prototype in three steps:
– Assessment of the conformity of the platform’s features with requirements;
– Assessment of the platform’s usability using a standard usability questionnaire;
– Using two distinct populations, previously studied only independently from one another, demon-
strate Actinfo’s ability to handle accelerometer data and produce relevant population statistics,
2
namely the distributions of physical activity time indicators and compliance with recommen-
dations for physical activity, in a comparative study.
1.2 Methods
The tasks for developing the platform were the following:
1. Review of literature on the current research on physical activity and health, as well as objectively
measured physical activity through actigraphy;
2. Study of the workflow of the staff at the EHLab when conducting physical activity studies. In partic-
ular, the life cycle of the accelerometry files after they have been collected and downloaded from
the corresponding device, to understand how and where the generated information is currently
stored;
3. Identification and characterization of the tools the platform should offer for processing the actigra-
phy files, extraction of relevant information and analysis of the results of both individual and sets
of studies;
4. Identification of the requirements to develop a platform suited for the researcher’s needs: research
of standards for information storage and exchange, as well as the steps to take to make it compliant
with the GDPR and CNPD;
5. Decide on state of the art technology to be employed in the development of Actinfo, namely which
front-end and back-end interfaces to use, the database management system and security proto-
cols;
6. Development of Actinfo: creation of a framework, complete with an authentication system, and
building of the previously defined features for data storage and processing;
7. Consultation of researchers at the EHLab regarding the implemented features: presentation of the
prototype, creation of user accounts and implementation of the necessary adjustments to better
meet the researcher’s needs;
8. Testing of the platform:
(a) Analysis of the conformity of the platform’s features with the requirements;
(b) Assessment of usability, via administration of a standard, anonymous questionnaire to Act-
info’s active users;
(c) Conduction of a comparative study, to test the platform’s tools for comparison of physical
activity studies, computation of physical activity time indicators and statistical analysis.
3
1.3 Contributions
Two main contributions resulted from the work conducted in this dissertation, which can be summarized
as follows:
1. Actinfo, a platform for the centralization of PA studies and actigraphy data, storing them in a stan-
dardized manner, compliant with the EU’s GDPR and Portugal’s CNPD. Additionally, the platform
allows users to easily access the studies’ contents and perform comparisons between different
sets of data, and provides tools for processing accelerometry data.
2. A comparative study of the objectively measured PA profiles of two adult populations, conducted
using Actinfo, concluding on differences in the distribution of various computed metrics for PA.
Regarding the first item, I developed a functional prototype, resulting in an integrated information
platform which acts as a 2-in-1 system: on the one hand, a repository of PA studies, allowing access to
readily available information from, not only individual studies, but also from the comparison between dif-
ferent studies stored in the database; on the other hand, a tool for processing actigraphy data, equipped
with features for a number of operations on accelerometer files. Tailored to suit the needs of researchers
tackling PA not only at the EHLab, but also at other faculties conducting research in this area, Actinfo’s
development benefited from constant feedback during conceptualization and implementation, turning it
compliant with the current research standards. As for the study conducted using Actinfo, it allowed, not
only the comparison of the distribution of PA time indicators between the two adult populations, but also
validated the platform as a tool for storing, processing and generating relevant statistical information
from actigraphy files and PA studies.
1.4 Thesis Outline
The remainder of the dissertation is structured in the following manner: Chapter 2 provides crucial
definitions regarding physical activity, explores the PA-health relationship and elaborates on the role of
actigraphy in objectively measured PA, explaining the state of the art technologies in this field; Chapter
3 explores the technology used to build Actinfo, specifically, web development technologies and stan-
dards for information exchange; Chapter 4 describes the platform in detail, focusing on its requirements,
architecture, data model and the various interfaces and features for storing, processing and visualiz-
ing relevant data extracted from actigraphy files; Chapter 5 details the various steps of the assessment
phase, starting with a conformity analysis, moving on to the usability of the platform and, finally, present-
ing the results of the proof of concept comparative study conducted to test Actinfo’s tools for processing
PA data. Finally, Chapter 6 summarizes the most significant conclusions and achievements of this dis-
sertation while also reflecting on future work to be conducted to improve the platform.
4
Chapter 2
Background on physical activity
The benefits of physical activity (PA) for overall health are clear in this day and age, and not by all means
a recent discovery. In fact, the relationship between PA and health has been a subject of interest for
humans since ancient times. Over the past century, this connection became a thoroughly researched
topic, mostly due to the proved effectiveness of habitual PA in preventing a myriad of chronic diseases,
such as certain types of cancer, cardiovascular disease, obesity and depression (Warburton et al., 2006).
Recently, tools have emerged that allow for the objective measurement of PA, specifically, hip worn
accelerometers (Troiano et al., 2014). It is, therefore, of no surprise that more and more researchers
have taken interest in using such technologies to quantify the PA of individuals. Consequently, with the
data collection needed for such studies, there arises a need to store and organize it, leading to the
creation of systems to support the gathering of information. This chapter contains crucial definitions and
relevant aspects of the impact of PA in health, as well as the importance of actigraphy data, emphasizing
the major advancements in the area. Furthermore, existing technologies for the gathering and storage
of these type of health data are also explored.
2.1 Physical activity and health
First and foremost, the term ”physical activity” is not to be confused with the terms ”physical exercise”
or ”physical fitness”. Although often used interchangeably, these expressions have distinct definitions.
The World Health Organization (WHO) describes PA as ”(...) any bodily movement produced by skeletal
muscles that requires energy expenditure” (World Health Organization, 2017). The same entity goes on
to define the term ”exercise” as ”(...) a subcategory of PA that is planned, structured, repetitive, and pur-
poseful in the sense that the improvement or maintenance of one or more components of physical fitness
is the objective” (World Health Organization, 2017). The previous definitions are based on Caspersen
et al. (1985), who additionally consider ”physical fitness” to be ”(...) a set of acquired (genetic) or de-
veloped (training) attributes related to the ability to perform PA”. This section will focus various aspects
of PA which, simply put, is a behaviour resulting in energy expenditure above resting levels (Hills et al.,
2014).
5
2.1.1 Physical activity: role in health
Hypertension, coronary heart disease, type II diabetes, stroke, colon and breast cancer and depres-
sion are some of the most common noncommunicable diseases (NCDs). Globally, it is estimated that
NCDs account for 63% of all deaths, corresponding to 36 million people dying annually from these dis-
eases (World Health Organization, 2013). Risk factors for NCDs include an unhealthy diet, smoking,
overweightness, high blood pressure and cholesterol, obesity and physical inactivity. Five of these are
related with PA, which is a major independent and modifiable risk factor for NCDs. PA is known to reduce
blood pressure, improve HDL cholesterol and control of blood sugar levels, and reduce the risk of devel-
oping colon cancer, breast cancer (in women) and prostate cancer (in men) (European Union, 2008). It
is fundamental for controlling body weight and energy balance, which in turn provides additional benefits
in preventing obesity. Regarding the musculoskeletal system, PA plays a role in preserving or poten-
tiating bone mineralization and maintaining and improving muscular strength and endurance (World
Health Organization, 2004). Additionally, PA has an important effect in preserving cognitive function,
decreases the risk of depression and dementia, decreases stress, improves sleep quality and improves
self-esteem, which in turn decreases absenteeism. In elders, PA is associated with a decrease in the
risk of fall and decrease of functional limitations, as well as a delay (and even prevention) of chronic
diseases associated with aging (European Union, 2008; World Health Organization, 2004).
Physical inactivity is the one of the leading risk factors for global mortality (World Health Organiza-
tion, 2010). When addressing insufficient PA, it is important to distinguish it from the term ”sedentary
behaviour”. Despite some inconsistencies in its definition (Yates et al., 2011; Pate et al., 2008), the term
can be described from an energy expenditure point of view, as explained later in Section 2.1.2, while
”physical inactivity” simply translates into not meeting recommended amounts of PA (Gonzalez et al.,
2017). Nevertheless, the two entities represent a threat to population health, and both translate into low
levels of PA. Attaining sufficient levels of PA has become a more and more difficult task. It is estimated
that one in five people are not physically active enough, aggravating the global situation in regards to the
increasing number of people suffering from chronic diseases (Gonzalez et al., 2017). On top of that, the
effects of physical inactivity in a country’s economy are not to be ignored (Janssen, 2012); the reduction
of the prevalence of physical inactivity has a significant impact on reducing healthcare costs (Cadilhac
et al., 2011).
Regarding sedentary behaviour, i,e., daytime activities performed in the sitting or reclining position,
research has shown the various adverse outcomes of high levels of sedentary time (ST) on both the risk
of disease and in the risk of death (Brocklebank et al., 2015; Ekelund et al., 2016). As ST occupies the
better part of an individual’s day (Baptista et al., 2012; Clark et al., 2011), its effects on health are worth
investigating, especially since it is possible for a person to meet the minimum recommended levels of
PA but still spend most of their time in sedentary behaviour (Tremblay et al., 2017). In fact, meeting
the recommended amounts of PA does not attenuate the risks associated with high ST. The effects
of sedentary behavior may possibly be minimized if the person accumulates a weekly PA of moderate
or higher intensity of at least four times that which is recommended, i.e. approximately 600 min/week
(Ekelund et al., 2016). Aside from studying the total ST per day for an individual, research has also
6
focused on the patterns of accumulation of ST, i.e., bouts of sedentary activity and breaks (interruptions
in ST). Typically, in the context of PA research, when using the term bout, one refers to ST, however, a
bout can correspond to a time period in which a subject’s level of PA is equal to, or greater than, some
specific intensity, during a given time frame (Barrett et al., 2017). For the purposes of this dissertation,
when referring to the term ”bout”, I will be addressing sedentary time, with ”breaks” being interruptions
in ST, unless otherwise specified.
Various studies have been conducted to investigate the connection between bouts and breaks in ST
and subjects’ physiology and health. Chastin et al. (2015) have demonstrated the positive effect that
interrupting sedentary behaviour has in controlling adiposity and blood sugar levels. In fact, in patients
suffering from certain NCDs, such as type II diabetes, breaking up ST has been shown to be a useful
method to mitigate the negative effects of sedentary behaviours (Sardinha et al., 2017). Prolonged bouts
of ST, on the other hand, have been shown to be associated with obesity (Judice et al., 2015). Sardinha
et al. (2015) showed improved physical function from breaking up ST. Recently, Santos et al. (2018)
have investigated how patterns of ST accumulation change throughout the lifespan, concluding, against
previous expectations, that longer bouts of ST are less common in adulthood than in late adolescence,
highlighting the possible existence of crucial periods in which ST increases, namely in adolescence and
the transition from adulthood into old age. This work emphasizes the importance of analyzing not only
total ST, but also how it is distributed across the day.
Current and past research shows the importance of adequate levels of PA and avoidance of pro-
longed periods of sedentary behaviour. Nevertheless, PA of any duration is better than none, with
various benefits for health (Saint-Maurice et al., 2018).
2.1.2 Classifying physical activity
The question ”how much PA is enough?” is the next logical inquiry to arise when trying to relate PA and
health. An answer for such a question can only be met by first quantifying PA.
PA can be quantified through a variety of approaches. One approach is its quantification through
energy expenditure, usually expressed in metabolic equivalent (MET) or kcal. The World Health Or-
ganization (2014) defines MET as ”(...) the ratio of a person’s working metabolic rate relative to their
resting metabolic rate”. It can be interpreted as the intensity of a specific task relative to the resting
metabolism. As such, one can define one MET as the energy cost of sitting quietly, which in quantitative
terms translates into a consumption of 1kcal/kg/hour. It is important to note that, depending on fitness
levels, the subjective perception of effort or the heart rate may vary during the execution of a particular
task, amongst individuals. Hence the importance of using an absolute, task-independent method of
categorizing PA, such as expressing intensity in terms of METs.
In an absolute scale, using METs as a reference, the World Health Organization (2010) splits PA
across two major levels :
1. Moderate PA, with an energy expenditure between 3 and 6 METs;
2. Vigorous PA, corresponding to an activity being performed at an intensity greater than 6 METs.
7
For the purposes of the work developed in this dissertation, however, two added categories must be
considered:
3. Light PA, with an energy expenditure between 1.6 and 2.9 METs (Kim et al., 2013);
4. Sedentary behaviour, for activities with a cost 6 1.5 METs (Sedentary Behaviour Research Net-
work, 2012).
These four categories serve as a basis for grouping periods of PA or ST, for a specific individual, in a
given time frame.
2.1.3 WHO’s recommendations on physical activity
The World Health Organization reports that, in 2010, an estimated 23% of the global population (20%
men and 27% women) was not active enough1. Sedentary behaviours both at home and in the work-
place, low activity during leisure and the option for more passive means of transportation are the main
causes for physical inactivity. With the main goal of preventing NCDs through an increase in PA, the
WHO published the ”Global Recommendations on Physical Activity for Health” (World Health Organi-
zation, 2010), a document detailing not only the recommended amounts of PA (both time and intensity
wise), for different age groups, but also various policies to meet those recommended levels, globally.
These range from measures for the nationwide implementation of guidelines to enhance PA, to the mon-
itoring of those implemented measures, so as to assure the promotion and maintenance of adequate
PA.
The guidelines state the recommended levels of PA should be met through accumulation during the
day or week, meaning a distribution of the recommended time across various activities during the given
time period. Additionally, for inactive individuals, the increase in PA should be gradual, improving on
frequency, duration and intensity over time.
Physical activity recommendations for children and youth aged 5 to 17 years old
A daily accumulation of a minimum of 60 minutes of moderate- to vigorous-intensity PA (MVPA, i.e., PA
that is at least of moderate intensity) is recommended for this age group. The WHO goes on to specify
that these amounts of PA should mostly come from aerobic activities (i.e., using oxygen as the main
source of energy, making the most use of aerobic metabolism, such as walking, running, swimming or
cycling), and that PA of vigorous intensity should be performed at least three times per week.
Examples of activities suited for children include planned physical exercise, sports, games, trans-
portation and physical education, inserted in the context of school, family or community activities.
1Data retrieved from https://www.who.int/news-room/fact-sheets/detail/physical-activity
8
Physical activity recommendations for adults aged 18 to 64 years old
Adults falling in these range of ages are recommended an accumulation of 150 minutes minimum of
moderate PA per week or a minimum of 75 minutes of vigorous PA accumulated during the week. Alter-
natively, combination of both moderate and vigorous PA can be used to achieve these recommendations.
Furthermore, the WHO recommends aerobic PA to be performed in periods lasting at least 10 min-
utes. Added health benefits are possible through an increase in moderate or vigorous PA to double the
recommended amount per week, i.e., an accumulated 300 minutes for moderate PA or 150 minutes for
vigorous PA (or a combination of the two). Finally, engaging in activities for increasing muscle strength
is advised, twice per week, targeting the major muscle groups.
PA in adults can be in the form of work, transportation, chores, leisure, sports or planned physical
exercise, under the daily, family or community environments.
Physical activity recommendations for adults aged 65 years old and above
In later stages of adulthood, the WHO recommends similar amounts of accumulated PA throughout the
week to those of the previous age group: either at least 150 minutes of moderate intensity PA or 75
minutes of vigorous PA (or a combination of both), with added benefits from doubling the duration and
with the recommendation of aerobic activity to be performed in periods no shorter than 10 minutes.
Strength training should also be included, with a frequency of two days per week.
If health conditions impede the compliance with these recommendations, adults should do their best
to be as physically active as possible, to the best of their abilities.
In older adults, PA can be included in transportation, chores, work (when the person is still active in
that sense), leisure, sports or planned physical exercise, under the daily, family or community environ-
ments.
For all the aforementioned age groups, meeting the described recommendations brings pros that
transcend any eventual cons. In children and adolescents, an improvement in both muscular and car-
diorespiratory fitness is to be expected when aiming for the described targets; in adults aged 18 to 64
years old, engaging in the weekly recommended PA amounts also contributes greatly to bone health and
reduces the risk of NCDs; in older adults, there’s an additional benefit relating to maintaining cognitive
function and functional health.
9
2.2 Objectively measured physical activity: actigraphy
When trying to accurately measure PA, there’s an obvious interest in minimizing estimation errors as best
as possible, using appropriate methods to do so and avoiding subjective assessments of PA intensity.
Although self-report methods exist to measure PA, for instance through interviews, self-administered
surveys, questionnaires, diaries or a combination of these methods, questions arise regarding the va-
lidity and accuracy of such measures in estimating PA intensity, volume, bouts and breaks (Helmerhorst
et al., 2012; Ainsworth et al., 2012; Loney et al., 2011; Monyeki et al., 2018). To overcome these lim-
itations, the use of activity monitors has become standard for providing an objective measurement of
PA.
Actigraphy, a non-invasive monitoring method for human activity, is typically used for objectively
measuring PA. The devices used for this type of assessment, actigraphs, are small, portable watch-like
units, which are worn by participants of a given study for assessing PA, during a given period of time
(for example, five to seven consecutive days) and record the wake-time activity of the subject. Data
is then extracted from the devices, as most contain an USB hub for this purpose. In PA research,
these devices are worn in the hip and the main sensor for registering activity is a built-in accelerometer.
According to the manufacturer and model, different devices may contain different sensors, such as light
and temperature sensors. In the context of this dissertation, as accelerometry is the sole method by
which PA was assessed, data from that sensor contains the most important information extracted from
the devices. Not exclusive to PA research, actigraphs are also employed in sleep research, being worn
in participants’ wrists during sleep time (Ibanez et al., 2018).
The use of actigraphs in PA research has been validated as a reliable, objective method for quan-
tifying activity (Plasqui et al., 2013). Among the various manufacturers for these devices, one stands
above the rest as having the most widely used and validated devices: Actigraph Corp (Actigraph, LLC;
Ft. Walton Beach, FL2). Out of all Actigraph Corp.’s, models of activity monitors, two main devices are
used at the research conducted at the EHLab: Actigraph model GT1M3 and Actigraph model wGT3X+4
(Figure 2.1 (a) and (b), respectively). The main difference between these two devices is the fact that the
former only allows for uniaxial accelerometer data collection, while the latter allows triaxial accelerome-
ter data recording. Kaminsky and Ozemek (2012) assessed both models of accelerometers’ recordings
in uniaxial mode and found them to be comparable. To ensure comparability between findings using
different devices, researchers at the EHLab typically focus on data from only one axis and, as such, the
work developed in this dissertation follows the same method when making use of accelerometer data.
Actigraph’s accelerometers have been thoroughly validated since their release. One study in particu-
lar deserves special attention regarding the validation of these devices in an experimental setting. Using
data from the United State’s 2003-2004 National Health and Nutrition Examination Survey (NHANES)5
Troiano et al. (2008) characterized a representative sample of the United States population (n=6329)
2Actigraph Corp: https://www.actigraphcorp.com/3Actigraph Model GT1M: https://www.actigraphcorp.com/support/activity-monitors/gt1m/4Actigraph model wGT3X+: https://actigraphcorp.com/support/activity-monitors/wgt3xplus/5NHANES: https://www.cdc.gov/nchs/nhanes/index.htm
10
(a) (b)
Figure 2.1: Actigraph model GT1M (a) and wGT3X+ (b).
in terms of their levels of PA, integrating accelerometer data measured using an Actigraph model 7164
from children (6 to 11 years), adolescents (12 to 19 years), and adults (older than 20 years). Since then,
Actigraph Corp. have released new, improved devices, and studies have been conducted which com-
pare newer devices with the ones used in the aforementioned study (Cain et al., 2013). Nevertheless,
the settings for accelerometer initialization and data processing originally described by Troiano et al.
(2008) are still employed in current research, in particular in studies conducted by the EHLab.
Actigraph Corp.’s devices data: collection, conversion and processing
Since the work developed in this dissertation greatly revolves around processing data from Actigraph
Corp.’s accelerometers, it is important to address how that data is collected in the devices. Although
equivalent in terms of accuracy (Robusto and Trost, 2012), GT1M and wGT3X+ devices can produce
different files upon download, as they record accelerometer data in distinct manners. As per Actigraph’s
documentation (Actigraph Software Department, 2012), in the older GT1M devices, data is sampled at a
fixed 30Hz, to be then filtered and accumulated into epochs of user-determined size (for example, 15s,
30s or 60s). This process occurs in the device, with data being processed in relatively small chunks.
Data can then be downloaded using Actigraph Corp.’s proprietary software, Actilife, producing an epoch-
level file (containing, amongst other information, accelerometer data), an *.agd file. The newer wGT3X+
devices, however, sample data at a user defined frequency, ranging from 30 Hz to 100 Hz, with every
sample being stored in the device without accumulation. Downloading data from the devices produces
a raw, *.gt3x data file. This file is then filtered and accumulated into epochs through Actilife, creating the
*.agd file. Because processing occurs only after download, users can create different epoch-level files
from a single *.gt3x file. Figure 2.2 illustrates the differences between both devices.
Typical workflow at EHLab produces files at the already filtered and sampled *.agd level, which are
then used to extract all the necessary activity data, through Actilife. Nevertheless, researchers also
keep the raw *.gt3x files, when data collection is performed using wGT3X+ devices, as to allow for future
analysis at different epoch levels. However, researchers work at the filtered and processed *.agd file
level, relying on Actilife for extracting information from the accelerometer files. The software is needed
11
(a)
(b)
Figure 2.2: Differences between processing of accelerometer data between older devices (a) and morerecent models (b). Adapted from Actilife 6 User’s Manual (Actigraph Software Department, 2012).
Figure 2.3: *.agd file schema. From Actilife 6 User’s Manual (Actigraph Software Department, 2012).
for converting the raw *.gt3x files into *.agd files. This is due to the fact that the *.gt3x files are binary
files, their format being proprietary and belonging to specific copyright protected software, in this case,
Actilife. While this is true for the *.gt3x files, the *.agd files can be more easily accessed with the right
tools, without relying on Actilife, as described in the next section.
*.agd file format
The *.agd files, on top of which operations to actigraphy files are performed, are in the open SQLite
format6, with the schema represented in Figure 2.3.
SQLite is a C-language library and a widely used database engine. As such, queries can be per-
formed to *.agd files to operate on their contents, in a completely independent manner from Actilife. It
6SQLite: www.sqlite.org
12
is, therefore, easy to develop code to process information stored in these files. As observable in the
file schema presented, files may contain more than just accelerometer data. Here, however, we’re in-
terested in the ”data” table and, in particular, in the columns ”dataTimestamp” and ”axis1” (the vertical
axis and the one researchers at the EHLab often use to measure activity). This is the time series for
the recorded accelerometer data, accumulated in user defined epochs. Each row of the ”axis1” column
contains a numeric value of ”counts”, i.e., Actigraph Corp.’s units of measurement of activity. According
to the company’s documentation, counts are obtained by adding post-filtered accelerometer data into
epoch-sized chunks. Count values vary according to the frequency and intensity of the raw acceleration.
Counts are produced by a proprietary filter, reserved to Actigraph Corp7.
Once understood how activity is measured in Actigraph Corp.’s accelerometers, it is now necessary
to understand how different values of counts translate into different levels of PA intensity. Resulting from
various research over the years using Actigraph Corp.’s devices, several cut point sets are implemented
in Actilife in order to map counts to sedentary activity, light PA, moderate PA or vigorous PA. These
sets were originally defined for 60s epoch files (their unit being, therefore, counts per minute, CPM)
and are linearly scaled for files for which epochs are under 60s. Out of the 13 cut point sets currently
implemented in the software8, we will focus on two: the ”Evenson Children” and the ”Troiano” cut point
sets. In research conducted by the EHLab, these cut point sets are used, respectively, for children and
youth aged 17 or younger and for adults aged 18 or older.
The ”Evenson Children” cut point set was based on Evenson et al. (2008), who determined threshold
values for the intensity of physical activities in children, using Actigraph’s accelerometers. The resulting
cut point set, currently implemented in Actilife is as follows:
• Sedentary activity, for values in the range 0 to 100 CPM;
• Light PA, for counts between 101 and 2295 CPM;
• Moderate PA, for values between 2296 and 4011 CPM;
• Vigorous PA, when activity counts are higher than 4011 CPM.
As for the ”Troiano” cut point set, it derived from previously cited research (Troiano et al., 2008).
Although the study mentions thresholds for various age groups, the cut point set implemented in Actilife
applies solely to adults (18 or older). The thresholds are the following:
• Sedentary activity, for values in the range 0 to 99 CPM;
• Light PA, for counts between 100 and 2019 CPM;
• Moderate PA, for values between 2020 and 5998 CPM;
• Vigorous PA, when activity counts are higher than 5998 CPM.
7Actigraph Corp.’s definition of counts: https://actigraphcorp.force.com/support/s/article/What-are-counts8Different cut point sets in Actilife: https://actigraphcorp.force.com/support/s/article/
What-s-the-difference-among-the-Cut-Points-available-in-ActiLife
13
These cut point sets will serve as a basis for part of the work developed in this dissertation. Specif-
ically, the thresholds will guide the implementation of features for processing *.agd files, allowing the
distinction between the various levels of intensity of PA in the data contained in the actigraphy files.
Other tools for processing and storing accelerometer data
Apart from Actilife, another product from Actigraph Corp. deserves attention in the context of the work
developed in this dissertation. CentrePoint9 is Actigraph Corp.s’ cloud based system for managing
and analyzing accelerometer data. It replicates Actilife’s functionalities in a web platform, without the
need for installation of a heavy software package. A solution of this type could be employed to tackle the
problem here presented of centralizing accelerometer data from various studies conducted at the EHLab,
however, the problem of it being a commercial product would still apply. Moreover, the issues of certain
Actilife features not directly satisfying the needs of the researchers indicate that a more personalized,
simple, free to use and easily deployable solution would be ideal. Specifically, there is an issue with the
fact that some tools (such as detection of bouts and breaks in ST) are not implemented accordingly to
the exact results researchers would want to extract from that information.
Regarding existing databases for accelerometry data, there are two that deserve focus, with data
which can be explored using solutions which replicate Actilife’s functionalities. In Portugal, the National
Observatory for Physical Activity and Sports (Observatorio Nacional da Actividade Fısica e do Desporto,
ONAFD)10 aims to, amongst other objectives, monitor PA in the Portuguese population, via analysis of
data collected using Actigraph’s accelerometers. The work conducted by researchers has the endgame
of promoting PA and health, in Portugal. As such, great benefits can arise from having a readily avail-
able tool to work on data collecting, without being so dependent on proprietary software. Regarding
international databases for accelerometry, ICAD, the International Children’s Accelerometry Database11
contains, as the name indicates, accelerometry data from children aged 3 to 18 years from various
countries, totaling variables from over 37000 subjects. ICAD’s contents have resulted in sound research
over the years, some of which in collaboration with the EHLab (Tarp et al., 2018; Hansen et al., 2018;
Tarp et al., 2018; Kuzik et al., 2017). Once again, accelerometer files used in this database are similar to
those of which can be processed via Actilife, which means they can also be handled with a tool capable
of the same operations.
2.3 Chapter overview
Great benefits arise from attaining adequate levels of PA, as those described by the WHO. PA plays a
crucial role in the prevention of chronic diseases, such as diabetes, cardiovascular diseases and certain
types of cancer. In contrast, physical inactivity constitutes a major factor in global mortality.
9CentrePoint: https://www.actigraphcorp.com/centrepoint/10ONAFD, physical activity: http://observatorio.idesporto.pt/Conteudos.aspx?id=311ICAD: http://www.mrc-epid.cam.ac.uk/research/studies/icad/
14
Sedentary behaviour has proved to bring numerous adverse health effects. As important as the study
of total ST over a person’s day, patterns of accumulation of ST (bouts of sedentary activity and breaks
in ST), have emerged as a research topic over the last decade, mainly concluding on the importance of
breaking up ST.
PA can be categorized, according to the intensity of the task performed, in four levels: sedentary
activity, light PA, moderate PA and vigorous PA. According to these levels, the WHO recommends mini-
mum amounts of MVPA to be achieved by individuals during a day’s or week’s time. These values vary
according to the age group of the individual.
Objectively measured PA via accelerometry, by using activity monitors (actigraphs) has emerged as
a reliable, effective method for assessing PA. Out of the various commercialized solutions, Actigraph
Corp.’s devices are the most widely used and validated. Activity data from these devices is currently
treated using proprietary software, however, the specific file format used for actigraphy files, *.agd, fol-
lows a format which makes them easy to access, with the right tools. Large databases for accelerometer
data exist, which contain files in this very format, and are ready to be explored by researchers.
Current research is being somewhat hindered by relying too heavily on proprietary software for pro-
cessing accelerometer data. A free, easy to use and deploy, solution is of great benefit for researchers
working in the field of PA evaluation through accelerometry. Additionally, the centralization of PA infor-
mation allows for improved workflow.
15
16
Chapter 3
Supporting technology
As a web platform, Actinfo was developed using various technologies. This chapter describes the pro-
gramming languages, architecture and standards used to develop the platform, detailing the approach
followed to build Actinfo and model data in the platform. Furthermore, security and authentication mea-
sures are addressed.
3.1 Web application architecture
Web applications are computer programs which, as opposed to desktop applications, run on a com-
puting server and are accessed via a web browser. These applications follow the client-server model
structure, a computing model in which communication between the provider of a service (server) and
the requesters of that service (clients) occurs over a network. Advantages in web applications, in com-
parison to desktop ones, include:
• Rapid deployment, without the need to download or install additional software apart from an inter-
net browser;
• Large compatibility across different platforms, such as smartphones and tablets;
• Ease of development, due to the vast amount of resources and open source technologies available.
Over the last decade, web applications have witnessed a large increment in their capabilities, as more
and more tools are being added to web browsers. Specifically, JavaScript and HTML, the most widely
used programming languages in web development1, have experienced phenomenal gains in terms of
performance, making it possible to develop web applications comparable to desktop ones. Although
initially conceived to allow for the execution of client-side scripts, incorporated in browsers, we now find
JavaScript implemented in server-side software. In fact, frameworks and libraries have been developed
to extend Vanilla JavaScript (i.e., plain JavaScript, without any additions), further increasing the number
of different features which can be developed using this programming language.
1Data from the 2019 Stack Overflow’s Developer Survey, available at: https://insights.stackoverflow.com/survey/2019
17
Figure 3.1: 3-tier architecture of web applications.
Architecture of a web application
Applications of this nature are organized in tiers, each with specific roles. Although the number of tiers
in web applications can vary depending on the type of technologies used, the 3-tier architecture is the
most commonly used (and the one followed by Actinfo). In this structure, represented in Figure 3.1,
three logical modules comprise the web application:
• A presentation tier, accessible through a web browser, which acts as the client, and through which
information is presented via a graphical interface to end users;
• An application tier (or application server), containing the core logic which drives the application;
• A data tier (also referred to as the database server) for handling database functions.
To understand how communication between the three tiers occurs, one must refer to the term API.
Short for Application Programming Interface, it consists in a set of rules and methods acting as a com-
munication medium between tiers. Amongst the various existing types of API, REST APIs are the most
popular. REST is an acronym for Representational State Transfer and it corresponds to an architectural
style originally described by Fielding (2000). Web services using REST, (i.e., RESTful web services)
make use of HTTP methods (GET, POST, PUT and DELETE being the most relevant in this context, for
getting, sending, updating or deleting contents, respectively (Fielding and Reschke, 2014)) to operate
on resources. In the World Wide Web, a resource is an item of interest which is identifiable by Uniform
Resource Identifiers (URI)2. The flow of information in an application using REST APIs can be simplified
as follows: an agent (be it a person or software) makes a request to a specific resource, identifiable
by an URI, via an URL (Uniform Resource Locator, a specific type of URI). This request (which, since
the application makes use of the HTTP protocol, can be of the type GET, POST, PUT or DELETE, in-
dicating the action to be performed on the resource) will generate a response, in a specific format. In
the context of this dissertation, responses use a JavaScript-based format known as JSON (JavaScript
Object Notation3). The contents of the response indicate that the method used in the request produced
the desired effect on the resource.
2Definition by the World Wide Web Consortium: https://www.w3.org/TR/2003/WD-webarch-20031209/3JSON definition: https://www.json.org/
18
3.2 The MEAN stack
Typically, to create a web application, different technologies are combined together to form what is called
the full stack for web development, that is, software for implementing the various tiers. A very popular
stack, and one deeply nested into many applications in production today is the LAMP stack. LAMP
is an acronym which originally stood for Linux, Apache, MySQL and PHP (Lawton, 2005). Each one
of these serves a specific purpose in the application’s structure: Linux is the operating system, at the
base of the application; on top of that, Apache is used as the web server; MySQL is used for the data
tier, as the relational database management system (RDBMS); finally, PHP is used as the scripting
language, which offers the needed programming support. LAMP has since evolved to incorporate more
programming languages and frameworks for building applications and APIs which are still compatible
with the stack (Louridas, 2016).
In recent years, the MEAN stack has emerged as a strong competitor, slowly replacing LAMP as
the first choice for developing web applications (Louridas, 2016). MEAN is an acronym for MongoDB,
Express, Angular and Node.js. These open source components come together to create an end-to-
end framework for application development, from the database to the presentation tier. As such, the
MEAN stack uses: MongoDB as the non-relational database management system; Express as the
web framework, which runs on top of Node.js; Node.js, for the web server-side implementation of the
application in JavaScript; Angular for the presentation tier of the application. All of the aforementioned
components are operated using JavaScript, thus allowing developers the use of only one language for all
the tiers of the application. This constitutes an advantage over the LAMP stack, which requires different
programming languages and format conversions to exchange data between tiers.
A MEAN stack application follows the previously described 3-tier architecture: MongoDB can be
viewed as the data tier; Node.js contains code for the web server; Express is used to create REST
APIs and can be interpreted as a channel to allow communication between the server and presenta-
tion layer, comprising, together with Node.js, the application tier; Angular is the presentation tier. The
typical request-response flow is illustrated in Figure 3.2. Since only one format is used across all tiers
to structure the data (the JSON format), it is possible to avoid data conversions, as is the case with
applications using SQL-based databases. In MEAN application it is, therefore, faster to make HTTP
requests, present and store the data, as there is no need for reformatting.
Similar to MEAN, other stacks exist which share some of its components. These variants typically re-
place the components of the stack with other JavaScript frameworks. For example, in a MERN stack, the
front-end environment, which is used to build the presentation tier, is replaced by React.js, a JavaScript
library developed by Facebook4; in a MEVN stack, Vue.js, ”(...) a progressive framework for building
user interfaces (...)”5, is used in the presentation tier. Nevertheless, I chose the original MEAN variant
for the platform developed in this project, as it is more widespread than the emerging alternatives, with
extensive documentation on each component available online, both individually and as part of the stack.
4React.js: https://reactjs.org/5Vue.js: https://vuejs.org/
19
Figure 3.2: Request-response flow in a 3-tier architecture MEAN stack application.
The following sections explore each one of the four components in a MEAN stack application, from front
(starting in the client side, Angular) to back, ending with the database (MongoDB).
3.2.1 Angular
At the time of the development of Actinfo, Angular 6 was the available version of this presentation en-
vironment. Developed and supported by Google, Angular6 is a framework for building interactive single
page applications (i.e., a web application in which user interaction is based on the rewrite of the current
page instead of loading new pages from the server, thus making for a better user experience, without in-
terruptions between pages, much like a desktop application). It is written in TypeScript, a typed superset
of JavaScript developed by Microsoft, and builds the client application in HTML and TypeScript.
Since the time it was first conceived, the MEAN stack benefited from upgrades to its various technolo-
gies, Angular being one of them. The most recent iteration of the stack makes use of a component-based
architecture for the presentation tier:
1. Angular’s building blocks, NgModules, provide context for compiling these components;
2. Metadata for these components associate them with templates, which define views, i.e., screen
elements which can be modified according to the program’s logic and data;
3. The templates combine HTML with Angular directives (providers of program logic) and binding
markup (which connect the application data and the DOM, Document Object Model, of the page),
which allow Angular the modification of the HTML;
4. The page is finally rendered for display to the end user.
Angular’s power comes not only from its speed and performance, as it makes use of code splitting
to only load what is required to render the view the user requested, but also from its versatility, as a
cross-platform client environment, with the views being easily rendered in mobile devices. The Angular
CLI (command line interface) makes it easy to generate various components, which can be routed to one
6Angular: https://angular.io/
20
another and connected via services for sharing methods and data. Its ease of testing and deployment
make it an obvious choice for using with the MEAN stack.
3.2.2 Node.js
Node.js7 is a server-side runtime environment. It is built on top of Chrome’s V8 JavaScript engine
and its architecture is event-driven, running on a single thread and an asynchronous, non-blocking I/O
(input/output) model. By using a single thread to service all the requests, Node.js creators hoped to
overcome the bottleneck of I/O operations by moving away from synchronous service of the requests
arriving at the server. This way, when the code serviced needs, for instances, to query the database,
the web server does not wait for data to be returned; the main thread will continue running, moving
on to the next API call. When the database operation finishes, its corresponding callback is queued,
pending execution for once the engine gets a chance to handle the response. This event-driven style
of programming used in Node.js has proved it to be extremely efficient in I/O operations and resource
utilization (Chaniotis et al., 2015).
One major feature of Node.js is its extensibility through the pre-installed package manager, npm
(Node.js package manager). A command line interface and an online database of public and paid-
for packages greatly enhance the Node.js versatility by providing open source JavaScript development
tools, available across 750,000 packages8.
Node.js allows for building highly scalable, real-time JavaScript web applications and is currently
used by many top companies and organizations, such as Netflix, PayPal, LinkedIn and NASA9.
3.2.3 Express
Specifically built for Node.js, Express10 is a web framework, providing developers with features for build-
ing web applications. Although minimalist and lightweight, it contains numerous HTTP utility methods,
facilitating the creation of APIs. Express makes use of middleware functions for handling requests.
These functions have access to both the request and response objects, as well as the next function in
the request-response cycle. As such, through middleware, Express handles HTTP requests, by either
returning a response or passing on the parameters to a different middleware function. Express is also a
routing framework11, allowing developers to determine how the application responds to a client request
to a given endpoint (an URI plus one of the HTTP request methods GET, POST, PUT or DELETE).
Express is crucial in the application tier, allowing the creation of REST APIs and thus ensuring commu-
nication between the server and the client.
7Node.js: https://nodejs.org/8Data gathered from https://www.npmjs.com/products/enterprise9Information retrieved from https://www.netguru.com/blog/top-companies-used-nodejs-production
10Express: https://expressjs.com/11Routing in Express: https://expressjs.com/en/guide/routing.html
21
(a) Relational model.
(b) Data as documents.
Figure 3.3: Data models for a relational database (a) and MongoDB (b). Adapted from MongoDBArchitecture Guide.
3.2.4 MongoDB
MongoDB is a a non-relational database management system which stores data as documents, using a
binary representation of the JSON format called BSON (Binary JSON). BSON documents can contain
one or more fields, each field containing a value of a specific data type (such as arrays, strings, numbers,
Booleans, objects, binary data or sub-documents). Documents with similar structure are organized in
MongoDB as collections. In a traditional relational database, the equivalent of a collection would be
a table, with documents being the rows and fields being the columns. An example from MongoDB’s
Architecture Guide12 comparing a relational data model with MongoDB’s ”data as documents” can be
found in Figure 3.3. In this example, which shows the modeling of data for a blogging application, the
relational approach would require multiple tables (here, we’re considering the tables ”Category”, ”User”,
”Article”, ”Tag” and ”Comment”). In MongoDB, on the other hand, it is possible to model data using two
collections of documents: one for the users, and another for the articles.
In an article of the blog, multiple comments, tags and categories may exist, each one expressed as
an embedded array in the article document. This approach of localizing data, using a single document
for all the data for a single record is, not only simpler for developers, but also increases scalability and
performance, as it is possible to retrieve a full document, with all related data, in a single read to the
database, in contrast with relational databases, where data is spread across multiple tables.
Because the storing of data in MongoDB is flexible, fields in the JSON documents can be altered,
effectively changing the data structure of the document. These updates do not affect other documents
in the database, translating into the possibility of having documents with different fields in the same
12MongoDB Architecture Guide: https://www.mongodb.com/collateral/mongodb-architecture-guide
22
collection. The dynamic schema that MongoDB offers allows developers to insert or remove new fields
in documents as they are needed, without tampering with the database schema, as is the case of
relational databases.
The use of flexible schemas in MongoDB means the collections do not enforce document structure by
default. As such, it is the responsibility of the developer to define constraints which ensure the integrity
of the data. While in SQL databases there is the need to define foreign key constraints (i.e., fields which
support uniquely identifying the relationship between two tables), in MongoDB the use of these fields
is optional. This may result in the insertion of invalid data in a document. Therefore, data must me
modelled in a way which ensures its integrity, while still matching the performance requirements.
A typical MongoDB database may contain several collections of entities (i.e., documents). The en-
tities in each collection share attributes, defining a loose entity type. To create the logical data model
of the document database, the developer needs to define relationships between the entity types. There
are two types of relationships relevant to this work: one-to-many relationships and many-to-many rela-
tionships. The official documentation for MongoDB13 provides strategies to model both types:
One-to-many relationships: there are two methods to model these types of relationships:
Embedded documents: this strategy was already alluded to in the example from Figure 3.3 (b):
an article of the blog may contain multiple comments, tags or categories. In this example, the
one-to-many relationship between the blog article and the comments, tags and categories
is expressed by embedding the collections of entities on the ”many” side of the relationship
in the document for the blog article. In the JSON document, this is the equivalent of using
embedded arrays for each one of the entity types on the ”many” side. An example of this
type of relationship in Actinfo is shown in Figure 3.4 (a), where the document belongs to a
collection of the type researchStudy. Documents in this collection may be linked to multiple
documents of the type studyGroup. Therefore, the studyGroup entity type is represented in
the form of embedded arrays of sub-documents in the researchStudy documents. This is
a denormalized data model, in which it is possible to retrieve all information regarding the
groups in a study through a single read.
References: this method consists in including a foreign key field in the documents on the ”many”
side of the relationship. Figure 3.4 (b) shows an example from Actinfo. The figure displays
two documents of the file type, which are linked to the same document of the researchSubject
type in the database. The ”subject” field acts as a foreign key, linking the file documents to the
researchSubject document. One advantage of this strategy is that it will not grow the original
document for the subject as more files are uploaded to the database. Instead, the references
are stored in the documents for each new file.
Many-to-many relationships: to model these relationships, we can use a strategy similar to the refer-
encing explained for one-to-many relationships. For many-to-many relationships, we can embed
13Data models in MongoDB: https://docs.mongodb.com/manual/data-modeling/
23
(a) Embedded documents.
(b) Documents with references.
(c) One way embedding.
Figure 3.4: Modelling relationships in MongoDB (examples from Actinfo, highlighting the relevant fields).Some fields were collapsed for improving the readability.
the references in one side of the relationship, creating an array with foreign keys. This method of
”one way embedding” is often chosen for optimizing the read performance of a relationship of this
type, particularly when the relationship is uneven. Figure 3.4 (c) shows an example from Actinfo,
for a document in the collection with the researchSubject type. Documents in this collection can
be linked to many documents of the type studyGroup. Additionally, studyGroup documents can be
linked to many documents of the researchSubject type. Since there are many more subjects in a
particular group than groups linked to a single subject, we can embed the references to groups in
the subjects.
24
Notice also the ” id” field in the example documents. This is a unique ID automatically generated by
MongoDB for each document, which acts as a primary key. The value of this field is of the type ObjectID,
consisting in a hexadecimal numeral containing information which includes a random number for that
specific document and the timestamp of the creation of the document. This information is accessible
through MongoDB’s methods.
Lastly, it is also important to address the storage requirements of the platform. Since Actinfo should
be able to handle the storage of actigraphy files of a relatively large size, GridFS14 was used for storing
and retrieving all files uploaded to the platform. GridFS is a specification for MongoDB supporting the
handling of files with sizes exceeding 16MB (useful in Actinfo, which supports the upload of the raw
*.gt3x actigraphy files, with sizes ranging from 50 to 100MB). GridFS has the particularity of automat-
ically generating the collections needed for file storage and, as such, no schema was needed for the
files. Nevertheless, additional fields with metadata were added as needed, for associating files with
documents in other collections in the database, as explained in more detail in Chapter 4. GridFS works
by dividing files into data chunks, storing them as separate documents. Therefore, one file can be asso-
ciated with more than one chunk, depending on its size. The file’s metadata and its actual contents are
kept in separate collections. In other words, GridFS creates one document per file in the collection for
file metadata, while all of the binary chunks are stored in a different collection. As such, when in need
of updating file metadata, only a single collection is accessed.
3.3 The FHIR standard
Despite the advantages of MongoDB’s schema flexibility, some additional constraints must be imple-
mented when trying to standardize the storage and sharing of data. Ultimately, Actinfo should be inter-
operable with clinical data to the fullest extent, making use of well established, international standards for
information exchange. Schema design cannot, therefore, be discarded simply because of MongoDB’s
flexible document structure.
As such, to ensure interoperability, the HL7 FHIR specification was used, whenever possible, for
modeling data in Actinfo. Published by HL7 (Health Level 715, a not-for-profit organization dedicated to
developing standards for handling electronic health information, including exchange, integration, sharing
and retrieval), FHIR16 (pronounced ”fire”) is an acronym for Fast Healthcare Interoperability Resources.
Being supported by over 1,600 members from over 50 countries, with stakeholders representing
not only healthcare providers, but also government, pharmaceutical companies and consulting firms17,
HL7’s standards were not, initially, open to the public. As the community using the standards grew,
the need for an open license to utilize the organization’s specifications for developing interoperable
applications became more and more eminent. Thus, FHIR was created, with the goal of providing a
simple, easy to implement API for healthcare (Mandel et al., 2016; Bender and Sartipi, 2013).
14GridFS: https://docs.mongodb.com/manual/core/gridfs/15Health Level 7: http://www.hl7.org/16FHIR: https://www.hl7.org/fhir/17Data from HL7’s website: https://www.hl7.org/about/index.cfm?ref=nav
25
In FHIR, data are represented as resources. Each of these modular components has a set of well-
defined fields, with specific data types, and can have multiple representations, in different formats. Re-
sources in FHIR have clear, intuitive definitions for their data elements and contain references to one
another, defining constraints and relationships between them. Together they constitute a collection of
information models and are one of the two main pillars of FHIR, the other being its RESTful APIs to
operate on resources. In the currently official released version of FHIR, version R4, as of April 2019, re-
sources are grouped in five layers, according with their role in the application: Foundation, Base, Clinical,
Financial and Specialized. With the current iteration of Actinfo being research-focused (and not exactly
a clinical application in the eyes of FHIR), this last layer contains the relevant resources used, as they
were designed for public health and research: the ResearchStudy and ResearchSubject resources.
FHIR resources can be described in multiple formats, such as XML, Turtle and, of particular interest
for the work developed in this dissertation, JSON. The representation of FHIR resources the JSON
format allows for integration with a MEAN stack application such as Actinfo, as information can be
stored in the database with the template defined in FHIR’s documentation. It is, however, important
to mention that FHIR resources were not used out-of-the-box when modelling data in Actinfo, as they
either lack fields needed in this specific context, or provide ones which are not relevant under the scope
of this work. The resources were, therefore, followed as closely as possible, with some minor extensions
and some fields ignored, due to missing data (but no fields specifically required in the FHIR resource in
question were left empty). This is safeguarded by FHIR’s extensibility: FHIR resources were conceived
in such a way that, provided the document’s structure remains true to the original template, i.e., its
schema having all the fields specified in the documentation, and provided the required fields are not
left empty, documents can be extended with additional fields and the non-required fields can be ignored
without compromising the correct use of the standard. This flexibility is of great importance in Actinfo, as
it makes it possible for the platform to use the standard, even though some fields specific to a healthcare
context are left empty, due to the nature of this project being more oriented towards PA research.
The use of MongoDB as a flexible database proves to be a major advantage, since integration with
FHIR is possible by modeling documents after FHIR resources, in the JSON format. To accomplish this
goal, a library for MongoDB, Mongoose18, was used to model data, allowing the creation of schemas
following the structure of FHIR resources.
3.4 Security and authentication
In a platform such as Actinfo, where several requirements must be met to ensure compliance with the
GDPR (the General Data Protection Regulation) and Portugal’s own data protection authority, CNPD
(Comissao Nacional de Proteccao de Dados) guidelines, some measures must be implemented to es-
tablish a secure connection between the client and the server. Furthermore, client authentication for
navigating the platform is also to be considered.
18Mongoose: https://mongoosejs.com/
26
HTTPS and SSL
An extension of HTTP, the HTTPS is a protocol for secure communication over a network. When using
HTTPS, the browser is signaled to add an encryption layer, protecting the traffic, using SSL (Secure
Sockets Layer). By wrapping normal traffic in this protected, encrypted layer, the server and client can
communicate without the risk of interception of the messages by outside parties, effectively blocking
what are referred to as ”man-in-the-middle” attacks.
Actinfo is currently hosted in a SSL protected domain. A SSL certificate was emitted for the platform
and installed in the web server. This certificate contains a digital signature, which is verified by a Cer-
tified Authority (CA). When the browser connects to the server, it tries to verify the authenticity of the
certificate, by checking the entity who emitted it against a list of trusted organizations. In this case, the
SSL certificate is signed by DigiCert19, a company focused on digital security. Once the identity of the
website has been verified, an encrypted session is started to allow client-server communication. Users
can also verify the identity of the website they’re in: in most modern browsers, a padlock in the address
bar indicates a secure connection.
Client authentication: JWT
I implemented a platform specific user authentication system in Actinfo, to control access to the platform,
as explained in more detail in Chapter 4. Users registered in the platform are provided login credentials
for navigating Actinfo. When a registered user successfully logs in to the platform, a JSON Web Token
(JWT, pronounced ”jot”) is generated. A JWT is an open standard (RFC 751920) which allows for the se-
cure transmission of information between parties as a JSON object. Furthermore, because it is digitally
signed, the information can be trusted and verified. A JWT is a string composed of a header, payload
and a signature, separated by dots.
The most common use for JWTs (and the one which Actinfo makes use of) is authorization. When the
user logs in, every request will include the generated JWT for that user, which allows access to routes,
services and resources, provided these were permitted with that token. The token is passed in the HTTP
Authorization header with every API call; the server will then check for a valid JWT which, if present,
allows access to the protected routes. Because the JWT is saved in the browser’s local storage, there’s
no need for exchanging credentials to verify the user’s identity with every request. However, JWTs
should not be kept in local storage for longer than required. I defined an expiration date of one day for
tokens generated for Actinfo’s users. Once the JWT expires, the user needs to log back in to access the
platform’s contents. Additionally, JWTs are removed from local storage every time an user logs out of
the platform.
To implement this authentication protocol, Passport21, a Node.js middleware for authentication, was
used. Passport is easily integrated in Express-based applications, being flexible and modular. To au-
thenticate requests, Passport makes use of authentication mechanisms called strategies, which are
19DigiCert: https://www.digicert.com/20JWT description: https://tools.ietf.org/html/rfc751921Passport: http://www.passportjs.org/
27
packaged as individual modules. In Actinfo, the JWT strategy was employed, along with a strategy for
authentication using a Google account, as explained in more detail in Chapter 4.
3.5 Chapter overview
Actinfo was built using the MEAN stack, a full-stack of JavaScript-based open source technologies which,
when combined together, form a platform framework for web applications development following the 3-
tier architecture: Angular is used for the presentation layer; together, Express and Node.js make up the
application tier; MongoDB is used as the non-relational database. Communication between layers is
made through REST APIs, in a request-response flow.
MongoDB’s flexibility and storage of data as JSON-like documents makes it possible to employ stan-
dards for electronic information exchange, such as the HL7 FHIR standard. FHIR addresses interop-
erability with clinical data using well structured data models, called resources. In its current iteration,
Actinfo does not deal with clinical data, focusing on accelerometry and physical activity (PA) data. How-
ever, the platform should be prepared to allow for future integration with data from multiple sources,
many of which containing health indicators, and allowing that information to be compared with PA time
indicators. Additionally, the FHIR resources used here are adaptable, to meet the requirements of the
information storage needs in the context of the work developed, which, paired with the robustness of the
standard, makes it an obvious choice for modelling data in Actinfo.
To ensure protection from man-in-the-middle attacks and provide a trusted, verifiable identity to the
website, a SSL certificate was installed, which allows encryption of data exchanged between the web
server and browsers. Additionally, Actinfo is equipped with an authentication system of its own, making
use of JWT for authenticating users in the platform, which allows for safe navigation in the platform.
28
Chapter 4
ActInfo
The present chapter describes Actinfo, the platform developed in the scope of this dissertation. Actinfo
is currently hosted by FCCN1, in a virtual machine, managed by INESC-ID2, its associated domain being
https://actinfo.inesc-id.pt. This chapter explains the platform in detail, starting with an overview
of its architecture, followed by a description of its various features and tools. Lastly, an explanation of
how Actinfo is compliant with current security and data protection regulations is provided.
4.1 Overview of the platform
Following the model introduced in Chapter 3, Actinfo follows a 3-tier architecture based on the MEAN
stack. Each of these tiers can be analyzed individually to understand their contribution to the platform
as a whole.
4.1.1 Database
Figure 4.1 shows the logical data model for the database. The relationships between the different entity
types were modelled as explained in Section 3.2.4. Five different collections of entities exist in this
model. Each collection has an entity type:
• user : a collection of information profiles for the registered users, including the access creden-
tials for navigating the platform. The ”role” field serves a specific purpose which is addressed in
Subsection 4.1.2.
• researchStudy : the collection for data from physical activity (PA) studies, its documents modelled
after FHIR’s ResearchStudy resource. A researchStudy establishes a many-to-many relationship
with the documents in the user collection via embedding of the user references in the research-
Study document, under the ”userPermissions” field. These documents contain metadata for the
PA studies. Additionally, documents in this collection contain embedded documents for groups
1FCCN: https://www.fccn.pt/2INESC-ID: https://www.inesc-id.pt/
29
Figure 4.1: Data model for Actinfo.
of study participants, establishing a one-to-many relationship. As explained in Section 3.3, FHIR
resources were used as a basis for modelling these type of data. However, due to them being
more clinically-oriented, some additional fields were needed, for the purposes of this project, while
others were intentionally not used due to lack of FHIR-specific information. This is not to say the
core FHIR schema is incomplete, but instead that a request to a researchStudy in Actinfo returns
a response which differs slightly from the original FHIR resource, as only non-empty fields are
present in the JSON. Future iterations of the platform may contain documents with more FHIR-
related fields, which now act only as placeholders. A more detailed explanation of the contents of
each field can be found in Appendix A.
• studyGroup: documents with this entity type appear in the form of embedded documents in the
researchStudy collection. As such, the studyGroup does not warrant the creation of a separate
MongoDB collection to store its documents. The studyGroup contains metadata resulting from
grouping the study participants.
• researchSubject : a collection for data from participants from the PA studies. Similarly to the re-
searchStudy collection, its documents were modelled after a FHIR resource, the ResearchSubject
resource. Once again, several non-required fields were not used, while some were added, ex-
tending on the original FHIR resource. This is the collection in which the outputs resulting from
operations on actigraphy files are stored, i.e., the PA time indicators (sedentary time, time in light
PA, time in moderate- to vigorous-intensity PA and breaks/bouts) and other information extracted
from the files. The ”groups” field in the documents of this collection allow the establishment of a
many-to-many relationship with the studyGroup the subject belongs to, via one way embedding.
Note: there is also a ”study” field, which references the study the subject is a part of. It was in-
cluded due to being required in the FHIR resource’s model, even though it is not used to establish
a relationship in the data model chosen for Actinfo.
30
• file: a collection for metadata for actigraphy files uploaded to the platform. A file establishes
a many-to-one relationship with the researchSubject, by including a field which references the
subject the file belongs to.
The documents of each collection follow schemas modelled using Mongoose (see Section 3.2.4),
except for those in the file collection, for which the model is generated automatically by GridFS. Note
that, as explained in Section 3.2.4, all entities have a ” id” field, generated automatically by MongoDB,
in addition to the fields represented in Figure 4.1. A more detailed description of the fields in each entity
of the data model can be found in Appendix A.
4.1.2 Web server
The platform’s web server is where I developed APIs to perform all the required tasks for CRUD (Create,
Read, Update, Delete) operations on documents in the researchStudy, researchSubject, user and file
collections, as well as process actigraphy data. When an user makes a request, via, for example, the
click of a button, an Angular service is triggered to send this HTTP request to the application tier, where
the appropriate responses are generated and sent to the web client. As such, it is understandable
the need for different methods to handle requests targeting documents in different collections. A more
structured explanation of what operations are specifically being performed in the application tier with
every request is described in Section 4.2.
The user management API
With the main goal of controlling access to the platform, contents of the user collection describe who
is navigating Actinfo. Methods defined in this API support, not only the creation and management of
user accounts, but also the authentication before a user logs in to the platform. When creating new
user accounts, based on the information received from the web client, the actual passwords are never
stored in the database; instead, as of the creation of the account, the password is hashed, resulting in
the actual string to be stored in the ”password” field of the user document. Regarding authentication,
an API method is able to compare the hash stored in the document with the password provided at log
in; if they match, a JWT is generated (see Section 3.4), for user navigation on the platform. Optionally,
users registered on Gmail can access the platform without providing a password, using an alternative
authentication path based on a Passport.js strategy for that purpose: a single sign-in method is em-
ployed, using an OAuth3 provider (in this case, Google), who enables Actinfo’s access to a portion of the
user’s Google account profile, specifically, his/hers email. This required the registration of Actinfo, as a
third-party application, able to request access to this information, in the official Google API console.
The implementation of the user authentication API required the installation of npm packages (see
Section 3.2) to handle registration and authentication, namely the bcryptjs package, for hashing the
passwords, and the passport, passport-jwt and jsonwebtoken packages for authentication.
3OAuth: https://oauth.net/
31
In addition to the general authentication system, I implemented an authorization protocol, to further
restrict navigation in the platform, via the creation of the ”role” field in the user documents. Currently,
Actinfo supports two different types of user accounts, depending on the contents to be accessed, result-
ing in two different roles: ”admin”, for administrator accounts and ”researcher”, for researcher accounts.
While the former is only able to register and manage users in the platform, being restricted to these API
methods, the latter has access to all of the remaining features of the platform.
The researchStudy and researchSubject API
A repository of PA studies such as Actinfo requires standard CRUD operations for researchStudy and re-
searchSubject documents. Users with the ”researcher” role can make requests in the front-end involving
the creation of new studies, the access to its contents, the update of studies with new information (e.g.:
addition or removal of participants) or their deletion. This is made via POST, GET, PUT and DELETE
requests, respectively, generating server-side responses which are sent to the presentation tier, reflect-
ing these changes. Similar methods for these operations are also defined for the researchSubject entity
type.
The file API
Lastly, to support Actinfo’s functionalities for, not only managing files uploaded to the platform, but also
computing metrics from their contents, I defined a set of API methods for accessing and operating on
*.agd files. Routes for handling the POST, GET and DELETE requests were created using Express
on Node.js for file upload, download, providing access to the files and deletion from the database,
respectively. As for reading the contents of the SQLite database in the files, I used the npm package
sqlite3 in association with Node.js’s native methods for file streaming, allowing queries to be performed
in files saved to GridFS, thus enabling access to the tables presented in Figure 2.3 (Section 2.2). From
then on, I developed different methods to allow for operations on the retrieved accelerometer’s time
series, as well as for getting personal information stored in the device, such as the sex, height, weight
and age of the subject.
4.1.3 Client
The presentation tier of the platform sits on top of various Angular components, rendering different
pages the user has access to. In the web client side, Actinfo makes use of the popular HTML/CSS +
JavaScript library Bootstrap4, originally developed for Twitter. I created a responsive, user-friendly set
of web pages, using these technologies. Furthermore, certain small details were implemented to make
for a better user experience, such as the use of the flash messages module for Angular, available from
npm, which implements a form of notification messages displayed to the end user in case of success or
failure of certain actions. As for more complex visualizations, such as charts for the distributions of PA
4Bootstrap: https://getbootstrap.com/
32
time indicators, the open-source libraries D3.js5 and Chart.js6 were used to generate HTML plots from
JavaScript arrays created using the implemented APIs.
4.2 User interface
Upon landing on Actinfo’s website, the homepage is presented to the end user, represented in Figure
4.2 (a). This, together with the login menu (Figure 4.2 (b)) and user profile page (Figure 4.2 (c)), are
the only pages which are common to all users. Once logged in, depending on the role of the account,
different interfaces are presented to the user. From that point onward, different features of the platform
can be explored, which are explained in this section.
The many operations which can be performed using Actinfo are grouped in two main categories,
one for each of the two user roles. The pages accessible to the researcher role is where the platform’s
potential truly manifests: Actinfo presents itself as both a hub for PA studies and a tool for processing
actigraphy data. As such, various interfaces were created for these two purposes: on the one hand, in
a broader level, CRUD operations on studies and files; on the other hand, processing and visualizations
from computed metrics, with plots and relevant statistics. This section describes these features, accord-
ing with the intended flow for navigating the platform, starting with account creation and management, in
the case of administrator accounts, and the various steps going from study creation to data visualization,
for researcher accounts.
4.2.1 Administrator role
Users with the ”admin” role can perform CRUD operations on user accounts. The separation between
users with access to the platform’s many features for data storing and processing, and users for manag-
ing accounts allows control over who can access different contents in Actinfo.
Upon login, users are shown the page in Figure 4.3 (a). From this interface, administrator accounts
can update user information of specific accounts or remove registered users from the database. Addi-
tionally, an option to register new users is available, by clicking the ”New user” button. Figure 4.3 (b)
presents the form for the creation of user accounts.
Regarding the form fields, most are self-explanatory, however, it should be mentioned the fact that
an email for the user is collected not only as an alternative login method, as previously explained, but
also as the user’s contact information.
5D3.js: https://d3js.org/6Chart.js: https://www.chartjs.org/
33
(a)
(b)
(c)
Figure 4.2: Actinfo’s homepage (a), login menu (b) and profile page (c).
34
(a)
(b)
Figure 4.3: The administrator interface (dashboard (a) and ”Register” form (b))
35
Figure 4.4: ”My studies” interface, presented to users with researcher accounts upon login.
4.2.2 Researcher
Figure 4.4 shows the interface presented to users with researcher accounts upon successful login. Users
are shown the studies they have access to and are presented with an option to create a new study, by
clicking the ”Create new study” button. Additionally, clicking on a row of the table redirects the user to
that specific study’s interface. Lastly, an option to ”Compare studies” is also available.
New study
Upon clicking the ”Create new study” option, the user is redirected to the study creation form page. As
shown in Figure 4.5, several fields are required to create a new PA study, most of which corresponding
to FHIR-specific fields, derived from the ResearchStudy resource (as explained in section 3.3):
• Identifier: an identifier attributed to the research study by the responsible researcher.
• Title: a short and descriptive label for the study.
• Responsible person: the name of the researcher who oversees the study.
• Start and end dates: when the study began and ended, mapped to the ”period” field in the
corresponding JSON document.
• Status: the status of the study, based on FHIR version R3 (note: at the time of implementation
of this field, version R4 had not yet been officially released, hence the choice for the options from
version R3). Hovering over each one of the radio button options shows the user a description of
each status.
36
Figure 4.5: ”New study” form.
• Study groups: when designing the interface for studies, researchers at the EHLab requested a
system to group files within each study, resulting in the implementation of an organization method
based on study groups. Users must specify at least one group for uploading files corresponding
to a specific study. More groups can be created using the ”New group” button. This feature was
designed to allow the grouping of study participants by their geographical location and/or corre-
sponding cohort. As such, studies in which accelerometer data was collected in distinct moments
for the same subjects can belong to multiple groups, one for each one of those moments. The
cohort index allows for a distinction between, for instance, a first baseline PA assessment and a
second one some time later, after participants followed a certain physical exercise protocol. Alter-
natively, if data was only collected once for each subject, a cohort index of 1 should be specified
for all groups, as described in the helper text in the form.
Clicking the ”Continue” button redirects users to the created study interface (Figure 4.6). Users are
notified to upload the corresponding files to each of the created study groups. This interface also intro-
duces options to manage various aspects of the study, namely: managing of the study’s permissions,
deletion of the study, creation of new groups and visualization of statistics from data in the study. Permis-
37
Figure 4.6: Interface for a created study. Two groups exist in this example study, one with files alreadyuploaded to it, the other still empty.
sions were introduced as a way to restrict access to a study’s contents. By default, only the researcher
who created the study can access it, unless he/she updates the permissions by granting access to other
registered researchers. Regarding the ”Compute group statistics” button, its features are explained in
more detail later in this section. As for the groups list, it should be mentioned the column ”Validated?” is
an indicator of the *.agd files in the study group being subjected (or not) to Actinfo’s tools for processing
actigraphy files, as will be explained later.
Clicking on a item of the group list redirects the user to an interface for that study group. There, users
are presented with options to either delete the group, upload *.gt3x/*.dat/*.agd files to it or download
those files. The file uploader interface is shown in Appendix B (Figure B.1). Figure 4.7 shows an example
group with files already uploaded to it. Users can perform common file management operations, such
as file deletion, download or upload.
*.agd file validation and processing
The processing of actigraphy files is introduced with the ”Validate *.agd files” operations, available from
the interface in Figure 4.7. Clicking this button redirects users to the page for setting the parameters
for validation and processing of actigraphy files in a specific study group, as shown in Figure 4.8. The
processing of actigraphy data involves three major operations, which take place via API, based on the
input parameters:
1. Wear time validation: to remove periods of non-wear time from the analysis, based on standard-
ized criteria for accelerometer reduction settings (International Children’s Accelerometry Database
38
Figure 4.7: Study group interface, indicating files uploaded to it. Options to search for files by name areavailable to users.
(ICAD), 2017). A period of time in the *.agd time series (i.e., the rows for ”dataTimestamp” and
”axis1” in the ”data” table, if we refer to the file schema from Figure 2.3) is considered as non-
wear time (that is, a period during which it is considered the subject was not wearing the monitor)
if a minimum of 60 minutes of consecutive zero-value activity counts (in the ”axis1” column) is
observed. Form this definition, a day is considered valid if it accumulates at least 600 minutes
of valid wear time. Lastly, at the file level, a file is considered valid if a minimum of three valid
days occur, one of them being a weekend day. This validation process takes place independently
from the chosen settings. Additionally, if the option ”Define maximum wear time” is selected, the
accelerometer wear time validation loop will use the input value as the maximum time, in minutes,
a day of data should have to be considered valid. This option was implemented after user request,
to account for situations in which participants do not follow the exact instructions for wearing the
accelerometer. Specifically, participants should remove the activity monitors before going to bed;
in case this does not happen, the accelerometer will wrongly register sleep time as accelerometer
wear time, which will be incorrectly classified as sedentary time during the application of the cut
point set. By defining a maximum for wear time, adjusting for the number of hours of sleep, it is
possible to remove sleep time from the analysis.
39
Figure 4.8: Interface for setting the validation parameters for *.agd files. ”Compute bouts” option is notselected, for better readability.
2. Application of cut points: users can select the cut point set to be applied to the *.agd time
series, to compute sedentary time and times in light, moderate and vigorous PA. The ”Troiano”
and ”Evenson” cut point sets are given as options, as they are the most commonly used sets in
research conducted by the EHLab for adults and children, respectively. These were implemented
after Actilife’s very own cut point sets, the thresholds for each level of PA intensity being the ones
described in Section 2.2. Additionally, users also have an option for defining their own cut point
set, thus not being limited to the common four levels of intensity. Choosing this ”Custom” option
will present users with a form for defining the thresholds for each custom level. An example form
is provided to users, to better understand how to fill in the fields. The custom form, along with the
example, are shown in Appendix B, Figures B.2 and B.3, respectively. Based on the cut point set
chosen, the duration of the time periods in each level of PA intensity is calculated, resulting in the
accumulated time in each intensity, for each valid day.
3. (if selected) Computation of bouts/breaks: for each valid wear time period, bout detection takes
place, according to the selected bouts. To avoid manual input of the thresholds for the commonly
used bouts and breaks, a list for selecting among the most often chosen ones is presented to the
user. Additionally, users can also define a custom bout of PA, not being restricted to sedentary
behaviour, via manual input in a similar fashion to the custom cut point set definition. A bout of
PA requires definition of a minimum duration for the bout (in minutes) and count level (in counts
per minute). The complete list of options for the commonly used bouts (all referring to sedentary
behaviour) is shown in Appendix B, Figure B.4. The form for custom bout definition is presented in
40
Figure B.5. Detecting a bout is a matter of looping through a valid wear time period and checking
if the minimum duration for that bout has been reached; next, we check if the activity (data in the
”axis1” column) remains within the interval of counts for that bout and does so during a period of
time which does not exceed the bout’s maximum duration. The total number of detected bouts and
accumulated duration are saved, for each selected bout.
These three operations are possible via file streaming from the database to a temporary on-disk
location, using native Node.js methods in conjugation with the sqlite3 package. It is important to note
that, because cut point sets and count levels for bouts are defined for periods of one minute, these need
to be linearly scaled to account for different epoch lengths of the *.agd files (for example, in Chapter
5, the files correspond to sequences of 15 second epochs, which is a typical epoch length in studies
conducted at the EHLab).
Output page
The results of the described operations are presented to the end user in the output page, for which an
example is shown in Figures 4.9 and 4.10. The page is divided in two sections: one ”Summary” section,
condensing the important information for the valid files, and a section for the detailed outputs, for all
files (both valid and invalid). Each section presents a list with the information for each file. Each item
of the lists (file) can be expanded, by clicking in the arrow on the right side of the card, revealing the
file information. Options to expand/collapse all items at once are also available. Users can also search
for specific files by filename. Additionally, users can export the analysis to an Excel file, by clicking the
”Export to Excel file” button at the top of the page (an example *.xlsx file is shown in Appendix C) and/or
save the output to the database, clicking the ”Save output” button. This last action makes an API call
to generate researchSubject documents from the files, saving them to MongoDB. This allows users to
return to an already validated dataset through the ”study groups” menu from Figure 4.7, without having
to reset the validation parameters. Saved validations can also be wiped from the database, if users
decide to process the files with different settings.
In the ”Summary” section, only valid files are shown. For each item of the list, three tables are
presented to the user:
• ”Subject info. and validation details”: general information extracted from the *.agd file including the
subject ID, the file epoch, gender and age. The chosen cut point set is also shown.
• ”Wear time validation”: summary information for the wear time validation results. Includes the
the number of valid days, total valid wear time, total times for PA indicators, and an indication if
the subject meets the WHO’s recommendations for PA (World Health Organization, 2010). The
compliance with PA recommendations is assessed taking into account the age of the participant.
• (if bout/break detection was selected) ”Bouts summary”: A summary of the total number and time
in each of the selected bouts.
41
The ”Detailed validation outputs” section follows a similar structure to the ”Summary” section, only
with all files are included, regardless of being valid or not. Invalid files are highlighted in red in the list.
Additional information is presented to the user in each table, for each file:
• ”Subject info. and validation details”: the table in this section is divided in ”Subject information” and
”Validation details”. The former presents personal information extracted from the *.agd file. The
latter shows the validation details, including the user defined settings.
• ”Wear time validation details”: in addition to the summary information for the wear time validation
results, it includes the detailed information by day for the PA time indicators (ST, time in light PA,
time in MVPA and breaks/bouts). Invalid days are highlighted in red.
• (if bout/break detection was selected) ”Bouts”: contains a summary of the total number and time
in each of the selected bouts plus the number and time in each bout by day.
It is important to mention that, while height and weight are extracted from the *.agd file, this informa-
tion is not used to compute any metric. Future iterations of the platform may make use of this information,
as explained in Chapter 6.
”Group statistics” and ”Compare studies” functionalities
The pages for these features are similar in the type of information they present. Both contain visualiza-
tions for the metrics shown in the output page. The difference between the two resides in the datasets
used to generate the plots and tables: while the ”Group statistics” functionality applies to participants
in user-selected study groups from a particular study, the ”Compare studies” page displays information
comparing results from two or more studies. This section will focus on the ”Group statistics” page, while
the ”Compare studies” tool is demonstrated in Chapter 5, as part of the comparative study. Figure 4.11
shows an example of the ”Group statistics” interface. The page is divided in two sections: ”Demograph-
ics” and ”Physical activity and sedentary time”. Additionally, options to filter the data are also available
to users.
The menu at the top of the page allows users to apply age and/or gender filters to the data. The
plots and tables are promptly updated with the new datasets. Regarding the age filters, I implemented
options for three distinct age groups, based on input from researchers at the EHLab. Any combination
of the three can be chosen:
• ”Children/adolescents”, for subjects aged 17 years old or younger;
• ”Adults”, for participants in the age range 18-65 years old;
• ”Elders”, which includes subjects aged older than 65 years.
The ”Demographics” section contains three elements. At the top, users can view a summary table
with the number of subjects and their distribution among the different age groups. Following the table,
two cards are displayed: a card with a violin plot, for the distribution of ages of the participants and one
42
Figure 4.9: Output interface (”Summary” section). Two items (files) are shown in the list. The first(filename: ”LIS 015915sec.agd”) is expanded, revealing the summary information, while the second(filename: ”LIS 019015sec.agd”) is collapsed.
containing a pie chart with the distribution of genders. Regarding the ”Ages” card, users can interact with
the violin diagram, by hovering/clicking the plot, which shows quantitative information: the y coordinate
(in this case, the age), maximum and minimum, mean, median, the 1-quantile and 3-quantile. Addi-
tionally, since the D3.js library was used to generate the plot, users have access to axes manipulation
options, namely zoom and pan, as well as an option to download a .png image of the diagram. As for
the ”Gender” card, interactions with one of the slices of the pie chart (i.e., hovering or clicking) shows
the user the percentage of subjects in each category.
Following the demographics, users can view information for the PA time indicators. Three cards are
presented in the section ”Physical activity and sedentary time”: a card for the distribution of subjects who
43
Figure 4.10: Output interface (”Detailed validation outputs” section). Only the first item (filename:”LIS 015915sec.agd”) is expanded, for better readability. File ”LIS 015015sec.agd” was flagged as in-valid, which is why it is highlighted in red.
44
meet the PA recommendations, a card for the distribution of ST and a card for the distribution of time
in MVPA. The charts have the same type of interactions/options available as the ones in the previous
section. Regarding the tables in the ST and MVPA cards, in addition to the average time per day in
each behaviour, I chose to also present these times as percentage of wear time. This allows for a
more accurate representation of the indicators than simply presenting the average time per day in each
behaviour and replicates the metrics used in current research conducted by the EHLab (Santos et al.,
2018).
45
Figure 4.11: ”Group statistics” page.
46
4.3 Compliance with data protection regulations
The General Data Protection Regulation, GDPR (European Parliament and Council, 2016), in effect
since 25 May 2018, is the European regulation on personal data protection. In Portugal, the Comissao
Nacional de Proteccao de Dados (CNPD) is the data protection authority responsible for the control of
the processing of personal data, thus ensuring the respect for individual rights. Up until the GDPR had
been entered into application in the European Union, personal data in Portugal was protected under the
Lei da Proteccao de Dados Pessoais no. 67/98, the LPDP (Assembleia da Republica, 1998). As of June
2019, the LPDP is still in effect for all matters that do not contradict the GDPR.
Actinfo aims to be compliant, to the best of its extent, with the GDPR. The relevant articles of this
regulation, for the type of data handling performed in the platform, are, in general, more comprehensive
(and, thus, covering a broader range of aspects) than the articles in the LPDP referring to similar matters.
As such, ensuring compliance with the GDPR will result in Actinfo also being compliant with the LPDP.
The relevant articles of the GDPR in the context of the work developed in this dissertation, with a
summary description of the key aspects and an explanation of how Actinfo complies with each one are
presented in the following list:
• Articles 5 and 6: ”Principles relating to processing of personal data” and ”Lawfulness of
processing”:
– Description: personal data must be processed based on legitimate purposes and in a trans-
parent manner, informing subjects about the processing activities on the data. The data must
be collected for specified, explicit and legitimate purposes. The subjects must have given
consent to the processing of their personal data for one or more specific purposes.
– Actinfo: in Actinfo, user accounts are created by an administrator, for a specific individual,
who is present during the account creation process and gives verbal consent for the input of
his/her data. The user is informed of the purposes of the data collected: personal information
for the identification of the user and an email for contact. The data is not used for purposes
other than the ones stated to the user.
As for the data in actigraphy files, from study participants, its handling is safeguarded by
signed consents, which explain how the data is to be used. The participants sign the consent
forms prior to the data collection. The researcher who is responsible for the study must ensure
the data is used for the specific purposes stated to the participant. This must be guaranteed
by the researcher before the actigraphy data is uploaded to the platform. It is outside the
scope of this project to verify if all active users with researcher accounts are indeed complying
with the regulation.
• Articles 12 to 23: ”Rights of the data subject”:
– Description: the data subjects have the right to ask about the information stored and how it
is being handled and processed. The subjects have the right to ask for corrections, object to
the processing of their data or even have it deleted.
47
– Actinfo: clicking the ”About” link in the platform’s navigation bar shows users an explanation
of how to contact the administrator regarding any questions, concerns or requests regarding
the platform. The administrator must attend these requests, taking the necessary course of
action to protect the users’ rights.
Regarding data from study participants (i.e., actigraphy files), it is, once again, protected by
signed consents. This is external to Actinfo; the researcher who is responsible for the study
is the one who the participants must contact regarding any concerns about their data.
• Article 25, articles 32 to 34: ”Data protection” and ”Security of personal data”
– Description: aspects regarding privacy and protection should be considered from the start, at
system design (”privacy by default”). Users must be notified of any personal data breaches, if
they occur. The system must be regularly tested to ensure security of the data. In the event of
a technical or physical incident, a mechanism must exist to restore the availability and access
to personal data.
– Actinfo: the administrator of the platform must ensure the software installed in the virtual
machine hosting the application is up to date, to prevent exploits of possible vulnerabilities
in the system by a party with malicious intent. Regular back-ups of the data must also be
performed. Furthermore, the platform is hosted in the FCCN data center, equipped with
protocols for testing the servers, to further aid in identifying possible technical issues.
In addition to the described articles from the GDPR, Article 15 of the LPDP, which addresses security
concerns, also refers to access control. As explained in Section 4.1.2, only registered users can navigate
the platform. Additionally, the ”role” system further restricts access to the data: only researcher accounts
have access to PA studies and tools to operate them. Finally, with the implemented permissions system,
the studies are only available to researchers with permissions to access them.
The current version of Actinfo tries to comply with data protection regulations in all possible aspects,
both for the EU regulation and the guidelines defined by the CNPD. As the platform grows, improvements
must be made to further ensure compliance with these regulations and increase security.
4.4 Chapter overview
Actinfo is a web application following a 3-tier architecture. It was built based on the MEAN stack.
The database is divided in different collections of entities, for user data, studies, study groups, study
participants and actigraphy files.
In the application tier, I developed methods for performing the necessary operations on each of the
documents in the database, such as CRUD in the case of the users, studies and participants, and
methods for computing PA time indicators from information in *.agd files.
As for the presentation tier, I expanded Angular with libraries for generating data visualizations, such
as plots and charts.
48
The platform’s features can be divided in two major groups, depending on the role of the user account.
Upon login, administrator accounts are presented with the option to manage user accounts. Researcher
accounts, on the other hand, have options to operate on studies. Several features are available to
researcher accounts:
• Creation of studies (upload of actigraphy files);
• Validation of files in studies, while at the same time applying cut point sets and bout detection;
• Export of the validation results to an Excel file;
• Visualization of PA statistics for files within a specific study;
• Comparison of PA statistics between two or more studies.
Actinfo aims to be compliant with current data protection regulations, namely the GDPR and the
LPDP. Continuous improvements must be made to ensure compliance with these regulations and further
increase the protection of personal data.
49
50
Chapter 5
Assessing Actinfo
Once a functional prototype of the platform had been achieved, I conducted an assessment phase
in order to evaluate the degree to which the implemented features are in agreement with what was
proposed and assess the website’s usability. Furthermore, I performed a comparative study with two
datasets managed by Actinfo, as a practical demonstration of the various tools for handling actigraphy
files. As such, this chapter is divided in three major sections, each tackling one of the aforementioned
topics.
In Section 5.1, I assess the extent of the completion of the platform’s major functionalities, i.e.,
the conformity of the implementation of the major features in Actinfo with the requirements for each
functionality. An overview of the most important features is given, accompanied by relevant comments,
including some gathered from the platform’s users.
Section 5.2 provides insight on user feedback regarding the ease of use of Actinfo. To assess the
website’s usability, an anonymous, standardized questionnaire was provided to the platform’s active
users.
Finally, Section 5.3 details the comparative study I conducted, demonstrating Actinfo’s capabilities
for, not only extracting relevant data for the characterization of the sedentary time and objectively mea-
sured physical activity (PA) profiles of two populations, but also compare the metrics extracted for both.
Data from two different previously conducted studies was uploaded to the platform, allowing for the com-
parison of two populations in terms of PA time indicators. Additionally, the results were validated using
the ”de facto standard” software for actigraphy data analysis, Actilife.
5.1 Conformity with requirements
During the development of Actinfo, I collected input from researchers at the EHLab regarding the various
functionalities as they were being implemented. This feedback served to determine exactly how each
one of the main features of the platform should operate, i.e., what was required of each functionality.
With the input from the researchers, it was possible to evaluate the implementation status of the major
functionalities of the platform, once the first prototype of Actinfo was completed. This assessment of the
51
Table 5.1: Conformity of the platform’s features.
Category Feature Implementation status
UsersAccount management Performing as expected.
Access control Performing as expected.
Studies
Interface Lacking in options to group studies.
Study permissions Currently, users can only add permis-sions to a study, not remove them.
Comparison of studies Performing as expected.
Group statistics Performing as expected.
Filters for computed metrics andplots
Performing as expected, for age andgender.
Actigraphy files
File management File handling is incomplete.
Wear time validation Lacking in customization options.
Application of cut point sets Performing as expected.
Computation of activity bouts Performing as expected.
Visualization of the obtainedoutput
User interface needs improvements forbetter readability.
Export analysis to an Excel file Exported file needs additional informa-tion.
implementation status of the features allowed me to perform a conformity analysis, aimed to evaluate if
the implemented functionalities were performing in accordance with the requirements. When a feature
was found not to be performing exactly as expected, this was mostly due to revised requirements in later
stages of the development approaching completion. As I gathered more feedback from researchers, I
was able to identify specific details they would like to see implemented differently (mostly pointing to-
wards more customization options and interfaces). Table 5.1 presents an overview of the main features
implemented in Actinfo and their conformity with the requirements. These features were grouped in
three major areas of interest: ”Users”, for functionalities relating to the user database and account man-
agement, ”Studies”, for features affecting the manipulation of studies in the platform and corresponding
outputs and ”Actigraphy files”, for the handling of actigraphy files and accelerometer data analysis.
Observing the table, the major features for which the implementation is not in complete accordance
with the requirements refer to the handling of studies and actigraphy files:
• Studies:
– Regarding the CRUD operations on studies, in particular the creation of studies, it should
include more FHIR-specific fields, improving on interoperability with clinical information. Addi-
tionally, the creation of studies should be possible from files already uploaded to the platform.
As for the interface itself, it should be possible to group the participants by location and/or nb.
of cohort.
– In the feature ”Study permissions”, permission management should be more complete, allow-
ing users to also remove access from a specific study.
52
• Actigraphy files:
– Regarding file management, in the current version of the platform, files must be associated
to a specific study. The separation of the dataset from the study should be possible, in future
iterations. Additionally, users showed interest in being able to perform operations on actigra-
phy files (such as wear time validation and application of cut point sets) only on selected files,
instead of having to operate on all files in a study group.
– In the tool for wear time validation, users reported the minimum wear time should be a custom
field. Furthermore, when using the maximum wear time, if exceeded, users should be notified
of the specific file(s) in which this occurs, so as to define sleep time and subtract it from
sedentary time. Lastly, options for marking a file as valid should have more custom fields.
Users have shown interest in being able to define the minimum number of valid days and
which days of the week should have valid data, for marking a file as valid.
– As for the visualization of the validation output, some tweaks to the interface are needed
to make it more intuitive: larger font size and grouping of ”Summary” and ”Detailed” menus
under a single section. For the tables containing detailed information, averages should be
presented for all valid days, for time indicators, in a final table row.
– Lastly, regarding the generated Excel file, users stated the file should contain two additional
sheets: ”Summary” and ”Daily” sheets for valid files only.
This analysis of conformity allowed to understand the current status of Actinfo’s main features. While
the majority are performing in accordance with the requirements, some adjustments are necessary in
future versions of the platform, mainly regarding the addition of customization options and redesign of
certain pages, for improved readability and user experience.
5.2 Platform usability
Although conceived over two decades ago by Brooke (1996), the System Usability Scale (SUS) is still
currently viewed as the industry standard method for assessing usability. The SUS consists in a 10-item
questionnaire which provides a ”quick and dirty” method to evaluate usability of various systems, and
has been widely tested for measuring usability of a range of products. These include smartphone appli-
cations, websites, web applications and an array of other hardware and software, as it is an inexpensive
tool to assess perceived usability (Lewis, 2018). As such, because it is a simple (yet effective) tool,
reliable even when sample size is small, I employed the SUS to evaluate Actinfo’s usability.
Embedded in Actinfo’s ”About” section, in the website’s navigation bar, users can open a website-
tailored version of the SUS, as shown in Figure 5.1. The questionnaire contains 10 Likert items, i.e.,
questions with five levels of response, ranging from ”strongly disagree” to ”strongly agree”:
An equal number of positively connoted and negatively connoted items compose the survey. Users
are asked not to leave any item unanswered, as it is standard when answering a questionnaire of this
type (Bangor et al., 2009).
53
Figure 5.1: SUS questionnaire, found in Actinfo’s ”About” section.
Currently, there are nine registered users, seven of which having accounts with the ”researcher” role.
These users were asked to answer the survey after testing Actinfo’s various features. Their answers
were collected to obtain the SUS score. As per the method originally described by Brooke (1996),
for every positively connoted item, the expression x − 1 was applied, x being the average of all users
scores, ranging from 0 (”strongly disagree”) to 4 (”strongly agree”); for every negatively connoted item,
the expression 5 − x is used; the scores for all items are then added and multiplied by 2.5, yielding the
final SUS score. A score of 87.50 out of a possible 100 was obtained for Actinfo, resulting from the
individual scores presented in Table 5.2.
Over 10 years of research compiled by Bangor et al. (2008) allowed to establish benchmarks for
an array of systems and products evaluated using the SUS, websites included, for which an average
score of 70 was obtained. In a later study, Bangor et al. (2009) mapped results from 1000 surveys to
a 7-point adjective scale, comprised of words often associated with usability: ”Awful” (no score data),
”Worst imaginable” (mean SUS score of 25.00), ”Poor” (mean SUS score of 25.00), ”OK” (mean SUS
score of 52.01), ”Good” (mean SUS score of 72.75), ”Excellent” (mean SUS score of 85.58) and ”Best
imaginable” (mean SUS score of 100). The score obtained for Actinfo falls within the range of scores
for categories between ”Excellent” and ”Best imaginable”, which can be used as an indicator of having
achieved a prototype with good usability. However, this score is not by all means final, since usability
54
Table 5.2: SUS scores of Actinfo’s evaluation (n=7)
Item Mean score Mean SUS item score”I felt very confident using the website” 4.33 3.33”I found the various functions on this website werewell integrated”
4.67 3.67
”I think that I would like to use this website fre-quently”
5 4
”I thought the website was easy to use” 4.33 3.33”I would imagine that most people would learn touse this website very quickly”
5 4
”I thought there was too much inconsistency on thiswebsite”
1.67 3.33
”I needed to learn a lot of things before I could getgoing with this website”
2.33 2.67
”I found the website unnecessarily complex” 1.67 3.33”I found the website very cumbersome to use” 1 4”I think that I would need the support of a technicalperson to be able to use this website”
1.67 3.33
Final SUS score 87.5
must be periodically assessed, as more users are registered and new features implemented in the
platform.
5.3 Comparative study with two adult populations
As an integrated information system with the power to aggregate multiple PA studies, Actinfo is equipped
with tools for establishing comparisons between them, as explained in Chapter 4. To demonstrate the
platform’s ability to compute and compare PA metrics and their distribution between different, never
before compared, populations, I conducted a comparative study, using the various tools implemented
in Actinfo. To this end, accelerometer data from two populations was used, from two different studies
conducted by the EHLab: one study collected accelerometer data with the goal of assessing PA in a
sample of the population of the municipality of Lisbon (from this point forward refered to as ProjCML),
the other consisting in actigraphy data collected as part of a clinical trial aiming to determine the effect of
different physical exercise protocols in biomarkers in patients of type II diabetes (study with the identifier
D2FIT).
Using these two datasets, I employed the platform’s tools for computing PA time indicators to obtain
average sedentary time (ST) per day, average time in MVPA (moderate- to vigorous-intensity PA) per
day and average number of breaks per day per hour of sedentary time. Regarding this last metric,
I chose it as an alternative to simply obtaining the number of breaks per day, as current literature is
moving towards making it a standard for evaluating the patterns of sedentary behaviour (Chen et al.,
2018). This also served as a means to validate Actinfo’s break detection method. The platform also
allowed for comparing the distribution of subjects meeting the WHO’s recommendations for PA (World
Health Organization, 2010). Through Actinfo it was, therefore, possible to infer on possible differences in
55
the PA profiles of a sample of the adult population of the municipality of Lisbon and an adult population
suffering from type II diabetes.
5.3.1 Studies
The D2FIT study consisted in a controlled, randomized trial with a 12 month duration with the goal of
assessing the efficacy of different physical exercise protocols in biomarkers and quality of life of patients
of type II diabetes. The project divided patients in three groups, two to whom specific exercise protocols
were administered and a control group. Only data from the control group was used for establishing the
comparison with the ProjCML study; in particular, actigraphy files from the first evaluation moment of
the participants (where the baseline for the study was created) were used. The PA aspect of the study
was measured using Actigraph’s wGT3X+ activity monitor: participants used the hip-worn device for 7
consecutive days, being asked to remove the accelerometers for any water-based activities and during
sleep. Initialization of the devices for data recording occurred on the morning of the first day. Data were
downloaded from the devices, converted into 15 second epoch *.agd files through Actilife and stored for
posterior analysis, by the EHLab staff, after the recording period ended. These were the files uploaded
to Actinfo.
ProjCML consisted in an assessment of the current level of physical fitness of the residents of the
municipality of Lisbon, in order to understand the interventions needed to promote a more active lifestyle
of the population. As such, actigraphy was used to objectively measure PA in participants. Similar to the
D2FIT study, participants wore an Actigraph’s wGT3X+ monitor, with the same recommendations and
initialization conditions. Once finished the collection period, files were downloaded and converted into
15 second epoch *.agd files through Actilife.
For both studies, informed consent from all participants was obtained prior to the data collection, for
the specific purposes of evaluating PA, as explained in the consent forms in Appendix C. The goals of the
comparative study described in this section, i.e., evaluating PA via time indicators (average sedentary
time per day, average time in moderate to- vigorous-intensity PA and average number of breaks per
ST hour) and assessing the compliance with the global recommendations for PA are aligned with the
objectives stated in the consent forms. In fact, this analysis follows the same methodology as the
research conducted at the EHLab with these exact data, ensuring compliance with the EU GDPR’s
principals regarding the processing of personal data (see Section 4.3). Additionally, each participant
was attributed an unique code, by the responsible researchers for the study, at the EHLab, as to protect
subject’s identity.
5.3.2 Data preparation
Prior to the analysis, I used Actinfo to prepare the data for the comparative study. This process can be
summarized in the following steps:
1. I uploaded actigraphy files from a total of 174 subjects to the platform, grouping them by study (80
corresponding to the D2FIT study and 94 for ProjCML).
56
2. Using the implemented tools for validating files, I performed a preliminary screening analysis.
The platform flagged files having missing data needed for the ”Group statistics” feature, which I
excluded from the study. Additionally, I employed the platform’s wear time validation functionality
to determine invalid files, i.e., files in which the total wear time did not meet the criteria to be
considered valid. Of the original dataset of 174 files, 32 were excluded due to having missing data
and nine were flagged as invalid by Actinfo. A total of 142 files remained, which were used for the
analysis: 73 corresponding to the D2FIT study, the remaining 69 from the ProjCML study.
3. Lastly, I employed Actinfo’s tools for computing PA time indicators: the wear time validation feature,
application of cut point sets and break computation. Regarding the cut point set, since, in both
studies, the participants are adults, I opted for the ”Troiano” cut point set. For break detection, I
selected the option for detecting one minute breaks from the list shown in Figure B.4, Appendix B.
Regarding the wear time validation described in step two, as previously explained in Chapter 4, a
period of time is considered as non-wear time whenever a minimum of 60 minutes of consecutive zero-
value activity counts occurs. Valid days are then defined as having a minimum of 600 minutes of valid
wear time. Additionally, valid data for at least three days must be present, one of which should be a
weekend day. These conditions apply to all participants, regardless of study, age group or sex. The de-
scribed reduction settings are adapted from standardized criteria (International Children’s Accelerometry
Database (ICAD), 2017).
As for the break detection referred to in step three, it is important to clarify we are defining a ”break”
as an interruption in ST, which translates into a period of time of a minimum duration of one minute in
which activity was higher than 100 counts/min. This metric was divided by the daily ST, obtaining the
variable number of breaks/ST hour, to ensure consistency with current research in the field of objectively
measured PA making use of this parameter (Chen et al., 2018; Santos et al., 2018).
In Figure 5.2 the ”Subject info. and validation details” tab of Actinfo’s output page for one example
subject is presented, resulting from the described validation process. It is important to note that, while in-
formation regarding race, height and weight is extracted from the *.agd file, it is not used in this analysis,
as discussed in Chapter 6.
Once the data had been prepared, I used Actinfo’s ”Compare studies” feature for the analysis, de-
scribed in Section 5.3.3.
5.3.3 Analysis
After saving the outputs resulting from wear time validation, application of cut point sets and break
computation to the database, Actinfo’s ”Compare studies” tool was used to produce the visualizations
for the various computed metrics. Figure 5.3 shows the first section of the comparison page, with
demographic data for the population.
Observing the violin plots for the distribution of ages, we can confirm that both studies follow similar
distributions, with populations composed of subjects close to late adulthood. As explained in Chapter
4, hovering the mouse or clicking the violin shows the user quantitative information, mainly the mean,
57
Figure 5.2: Subject information and validation details for an example file, from Actinfo’s output page.
median and interquartile ranges. As for the gender distribution, both studies show a higher percentage
of females than males. However, study D2FIT has a more uniform distribution of genders, as opposed
to study ProjCML, in which there’s a much higher percentage of females than males.
Regarding the compliance with the global recommendations for PA (World Health Organization,
2010), in both studies the percentage of subjects who do not reach the recommended levels of PA
is greatly superior to the percentage of subjects who do, as shown in the bar chart in Figure 5.4. Using
Actinfo’s option to filter by gender, study ProjCML shows a much higher percentage of females (40%)
meeting the recommendations than males (24%), while in study D2FIT the opposite happens, with a
much less discrepant difference (16% of females meeting the recommendations vs. 19% of males
reaching the recommended amounts of PA), as shown in Figure 5.5.
As for the distribution of sedentary time, the plots shown in Figure 5.6 allow a comparison in daily
average ST between participants from both studies. Although mean ST per day is similar in both studies,
study ProjCML shows a slightly lower interquartile range, as observed in the violin plot. In addition
to ST per day, sedentary time was also calculated as a % of wear time, as explained in Chapter 4.
Using Actinfo’s gender filters, it is observable, in both studies, a higher average ST/day in males by
approximately 20 min/day, as observed in Figure 5.7.
Still regarding time indicators, the distributions of time per day in PA of moderate intensity or greater
(moderate- to vigorous-intensity PA, MVPA) for the two populations was obtained, as shown in Figure
5.8. An overall lower average time in MVPA/day was obtained for study D2FIT, which also shows a
slightly lower interquartile range. When filtering by gender, a higher mean time in MVPA per day in
females is observed for study ProjCML, while in study D2FIT the opposite occurs, as shown in Figure
5.9.
58
Figure 5.3: Demographics for the analyzed population, from Actinfo’s ”Compare studies” tool.
59
Figure 5.4: Distribution of compliance with PA recommendations (no filters active).
60
(a) Males (b) Females
Figure 5.5: Distribution of compliance with PA recommendations, with filtering by males (a) and females(b).
61
Figure 5.6: Distribution of daily sedentary time for both studies (no filters active).
(a) Males (b) Females
Figure 5.7: Distribution of daily sedentary time for both studies, filtering by males (a) and females (b).
62
Figure 5.8: Distribution of daily time in MVPA both studies (no filters active).
(a) Males (b) Females
Figure 5.9: Distribution of daily time in MVPA for both studies, with filtering by males (a) and females (b).
63
Table 5.3: Number of breaks per hour of sedentary time
StudyNumber of breaks/ST hour (mean ± SD)
No filter Males Females
ProjCML 9.64±3.60 9.91±3.80 9.52±3.49
D2FIT 10.04±4.82 8.86±4.45 11.21±4.88
Lastly, using Actinfo’s ”Export” function, I obtained the average daily number of breaks per hour of
sedentary time. Unlike the previously presented metrics, which were obtained directly through Actinfo’s
”Compare studies” tool, this parameter was computed from the exported Excel files for each study.
The outputs are summarized in Table 5.3. Study ProjCML shows a higher number of interruptions in
sedentary time in males when compared to study D2FIT, while the inverse happens when comparing
the females of the two studies. A higher number of breaks/ST hour was found in males from ProjCML
when compared to females in the same study, the opposite happening in study D2FIT.
Results validation
To assess the validity of the obtained results, I processed the same files used for the described analysis
in Actigraph Corp.’s Actilife software (regarded as the ”de facto standard” for actigraphy and the software
currently used at the EHLab). I selected equivalent parameters for wear time validation and data scoring
(i.e., using the Troiano cut point set and the same definition of valid wear time period). No optional
screening parameters were selected in the software. Specifically, Actilife’s options to define a threshold
for activity and spike tolerance were not selected in the analysis. The former, if it had been not set to
zero, would result in the classification of a period of wear time as non wear unless a specific intensity
had been achieved. The latter allows users to define a time threshold for non wear periods to be tagged
as non wear, regardless of there being activity during that period, and was also set to zero. This ensures
consistency with Actinfo’s own implementation of wear time validation and segmentation of the activity
time series according to the defined cut points. Additionally, it replicates research conducted by the
EHLab, which uses these same settings.
Tables 5.4, 5.5 and 5.6 show the daily average ST, daily time in MVPA and number of breaks/ST
hour, respectively, as mean ± SD, obtained from Actilife.
64
Table 5.4: Daily average ST, from Actilife.
StudyDaily avg. ST, min/day (mean ± SD)
All subjects Males Females
ProjCML 624.35±80.56 639.26±97.14 620.62±77.14
D2FIT 622.24±104.39 624.79±112.86 615.77±92.22
Table 5.5: Daily average time in MVPA, from Actilife.
StudyDaily avg. time in MVPA, min/day (mean ± SD)
All subjects Males Females
ProjCML 43.07±30.09 40.40±24.95 44.93±26.46
D2FIT 34.27±27.81 39.95±26.14 31.63±25.62
Table 5.6: Daily average number of breaks/ST hour, from Actilife.
StudyDaily avg. nb. of breaks/ST hour (mean ± SD)
All subjects Males Females
ProjCML 7.25 ± 3.69 7.55 ± 3.43 7.11 ± 3.78
D2FIT 11.27 ± 3.03 10.58 ± 2.84 11.96 ± 3.06
Table 5.7 presents the obtained error values for each PA time indicator. I computed the mean abso-
lute error (MAE), instead of simply comparing the averages for the time indicators with the ones obtained
from Actinfo, to obtain a measure of the error associated with the outputs from the platform. In the ex-
pression for the MAE,
∑ni=1|yi − xi|
n, (5.1)
yi is the value of the output from Actilife for participant i, xi is the result from Actinfo for participant i
and n is the number of participants in the study. The MAE was computed, for each study, for daily seden-
tary time, daily time in MVPA and number of breaks/day (since the variable number of breaks/ST hour is
calculated using the number of breaks per day and daily ST), to assess the difference between outputs
using the two methods (Actilife vs. Actinfo). This allows for an evaluation against what is considered the
”de facto standard” software for objectively measured PA.
65
Table 5.7: Mean absolute error for each computed PA time indicator.
StudyMAE
ST (min/day) MVPA (min/day) Number of breaks/day
ProjCML 0.11 0.13 29.06
D2FIT 0.11 0.14 26.06
5.3.4 Discussion of experimental results
The distributions obtained in the ”Demographics” section of Actinfo’s ”Compare studies” tool indicate
similar age distributions for both studies, with a high density of participants close to the late adulthood
stage of life (around 60 years old). Actinfo’s age filters were not employed for this analysis, as populations
in both studies are comprised of a mixture between older adults and elders; its use would be more
relevant if the studies contained subjects falling under a broader range of ages, as it was intended
when they were implemented. Gender filters, on the other hand, allow a distinction between males and
females in both studies, for the various distributions. It is important to note, however, the fact that, while
study D2FIT shows an even distribution of subjects across both genders, the same is not observable for
study ProjCML, where the percentage of males is much lower than females (30 % vs. 70%).
Comparing the studies in terms of the distribution of subjects who meet the recommended amounts
of PA, patients of type II diabetes show a much lower percentage of subjects attaining sufficient PA than
participants from ProjCML (18% vs 35%). When comparing different genders in both studies, while in
ProjCML a higher percentage of females than males meet the recommendations, in study D2FIT the
opposite happens. The large discrepancy observed between the percentage of males and females who
meet the recommendations in ProjCML may be attributed to the uneven distribution of subjects across
genders, in this study.
As for ST, participants in study D2FIT average lower sedentary times per day when compared to
subjects in ProjCML, both in males and females. A higher interquartile range was also found in the for-
mer. Additionally, it was observed males in both studies spend more wear time in sedentary behaviours,
when compared to females.
Observing the plots and metrics for time in MVPA, patients of type II diabetes spend less time in
moderate- to vigorous-intensity PA per day than subjects in ProjCML. A lower interquartile range was
also found for study D2FIT. Comparing males and females, while in study ProjCML the latter spend a
higher percentage of wear time in MVPA, the opposite happens in study D2FIT.
Lastly, comparing the number of breaks/ST hour between studies, an overall higher number of in-
terruptions in ST was obtained for study D2FIT, with the exception of males in this study, who show
lower breaks/ST hour in a day when compared to males in study ProjCML. Additionally, while in Pro-
jCML a higher number of breaks/ST hour is observed in males, male participants in D2FIT present fewer
interruptions in ST than females in the same study.
66
Based on the obtained metrics, it is possible to conclude that, not only do patients of type II diabetes
show lower sedentary times, interruptions in ST are also higher, when compared to participants from
ProjCML. Participants from study D2FIT average less time spent in MVPA, however. As explained in
Chapter 2, breaking up sedentary time has positive health outcomes and can even mitigate the negative
effect of long periods of sedentary time. Females from both studies show lower sedentary times. Re-
garding time in MVPA, females from ProjCML spend more time in PA of this intensity, while the opposite
happens in study D2FIT. As for interruptions in ST, in study D2FIT, a higher number of breaks was found
in females, the opposite happening in ProjCML. It is important to understand, however, that a small
sample was used to obtain these statistics, which may not accurately represent profiles of PA in larger
populations with the same characteristics.
Although stored in the actigraphy files, height, weight and race of the subjects was not taken into
account for this analysis (as discussed in Chapter 6, these data are relevant for exploring energy expen-
diture, a feature to be implemented in future iterations of Actinfo).
When addressing the deviations in the computation of ST via application of cut points to the *.agd
file’s time series, we should take into account different implementations of the wear time validation
cycles. Although Actilife’s code is proprietary, comparing the SQLite time series in *.agd files (see
Section 2.2 with the data scoring export file the software produces, it is possible to understand some
filtering of the data occurring during the validation. Currently, Actigraph’s Corp. documentation does not
provide any explanation as to why this happens, even when a spike tolerance of zero and no activity
threshold are selected when scoring the files: Actilife provides the option to keep classifying a certain
period of wear time as non-wear unless a specified time threshold of non-zero values is exceeded or,
similarly, unless a certain intensity is reached. For this analysis, neither of these options were selected,
which should in theory result in every non-zero value being considered as an interruption of non-wear
time. This, however, does not seem to be the case, since we observe a lower wear time than that of
which we gather from the actual data. This explains the slight error in the sedentary times and time in
MVPA per day when comparing Actinfo’s outputs with the export from Actilife.
Regarding the number of breaks/ST hour, possible explanations for the high MAE point, once again,
towards different implementation strategies when computing this parameter. Actinfo’s bouts and breaks
detection tools were implemented based on feedback from the researchers at the EHLab. The main
goal was to develop a better suited tool for the specific research conducted in the field of profiling
accelerometer-derived sedentary time. Currently, whenever break detection is needed in a specific study
conducted by researchers at the lab, Actilife’s sedentary analysis tools are used. This may sometimes
constitute a problem, as, per Actigraph Corp.’s own documentation (Actigraph Corp., 2018), the total
time in breaks may be larger than the total wear time, since breaks are computed by subtraction of the
time in sedentary bouts from the total time, without taking non-wear time into consideration. With Actinfo,
however, break and bout calculation was implemented by detecting bouts and/or breaks only for valid
wear time periods.
67
5.4 Chapter overview
This chapter was focused on the evaluation of the platform’s conformity, usability and its main tools for
processing and analyzing actigraphy data,
The conformity analysis shows that, while most features are performing as expected, some tweaks
and improvements must be made to ensure a more complete version of Actinfo, with more customization
options for the management of studies and files in the platform and operating on said files.
Usability was assessed by administration of a survey following the System Usability Scale method.
The current prototype of Actinfo obtained a score which falls within the range of ”Excellent” and ”Best
imaginable”. Nevertheless, usability must continuously be assessed as the platform grows its user base.
A comparative study was conducted, as proof of concept of Actinfo’s ability to cross different studies
and compare demographics, distribution of compliance with PA recommendations and distributions of
computed PA time indicators between studies. Data from participants from two different studies, D2FIT
comprised of accelerometer data of patients of type II diabetes, and ProjCML, a sample of the residents
of the municipality of Lisbon, was uploaded to the platform and used to extract PA time indicators (aver-
age sedentary time per day, average time in MVPA and average number of breaks per hour of ST). In
study D2FIT, lower sedentary times and less time per day spent in MVPA were observed, as well as a
higher number of interruptions in ST. Overall, females in both studies are more active than males
Some deviations were obtained when assessing the validity of the obtained results, by comparison
with outputs obtained via the software Actilife (the de facto standard for actigraphy), which can be ex-
plained by different implementations of the wear time validation computations. Additionally, the indirect
method by which Actilife’s break detection works can, to an extent, justify the error obtained for this
parameter.
68
Chapter 6
Conclusions and future work
6.1 Conclusions
The work presented in this dissertation included the conceptualization, development and implementation
of Actinfo. This platform is, on the one hand, a repository of physical activity (PA) studies and, on the
other hand, a tool for operating on PA data. The motivation for the development of Actinfo derived from
the lack of a system to aggregate PA data from multiple studies, storing it in a standardized manner,
while also being able to perform typical processing tasks on actigraphy files.
The platform follows a 3-tier architecture, based on the MEAN stack, a full stack of JavaScript com-
ponents for developing web applications. These are open-source technologies, which allow further
improvements of the platform’s features and ease of maintenance. I followed the FHIR standard, when-
ever possible, to model the data. This standard, however, is oriented towards healthcare and clinical
research, while the current version of Actinfo is focused on accelerometry data. Nevertheless, FHIR
was followed to serve as a basis for future iterations of the platform, which may integrate data from
multiple sources, some of which may contain health indicators.
When developing the platform, I aimed to follow current regulations regarding the protection of per-
sonal data, namely the GDPR and LPDP.
During the development phase, I obtained input from researchers at the EHLab, in order to create
tools which support the needs of researchers tackling PA as best as possible. Following development,
I conducted an assessment of the platform, to evaluate the conformity with the requirements, usability
and the implemented tools for operating on actigraphy files. Regarding the assessment of the features
for handling PA data (studies and files), I performed a comparative study with data from two adult pop-
ulations, using Actinfo. The study allowed to conclude on the different objectively measured PA profiles
of the populations.
In summary, I achieved a functional prototype of a PA management platform with good usability.
The centralization of actigraphy data, paired with statistical analysis and validation features makes the
platform a great tool for research in the field of PA. With Actinfo, it is possible to improve PA analysis
workflows while also contributing for the interoperability with clinical information and reusability of the
69
data, which is stored in a standardized manner. The current version of the platform was evaluated
for conformity and usability. Furthermore, I validated the accuracy of the tools for processing actigraphy
data against the Actilife software, regarded as the ”de facto standard” for analyzing objectively measured
PA via actigraphy.
6.2 Future Work
While the current version of Actinfo established a solid foundation for a system of this kind, the plat-
form can be improved upon, to support the growing needs of PA research. The following list describes
additional adjustments which can improve Actinfo:
• Encryption of the database: data stored in MongoDB should be encrypted, as an extra measure
of protection, which would also make Actinfo more compliant with the GDPR. Currently, only the
paid-for version of MongoDB, named MongoDB Enterprise, supports ”Encryption at rest”, which
would solve this problem. Alternative ways to ensure the data are secured could be explored, either
by paying for a service such as MongoDB Enterprise or by hosting the application in a protected
server.
• Data from multiple sources: ultimately, Actinfo should allow for the integration of data from mul-
tiple sources, not just from actigraphy files. In future iterations, users should be able to cross
PA data with information from sources such as imaging exams (e.g.: x-rays and DEXA scans for
bone density), physical performance tests or medical exams from which health indicators could be
extracted.
• Expand on FHIR: if Actinfo moves towards making use of clinical information, interoperability with
clinical data could be improved by using more FHIR-specific fields in the database documents.
• Energy expenditure: a feature for computing energy expenditure could be implemented in future
versions of the platform. This, in turn, could justify the use of the height and weight information
extracted from the *.agd files, which is not used in the current version of Actinfo.
• Read *.gt3x files: to access the true raw data recorded by the activity monitor, we would need to
be able to operate on the data stored in the *.gt3x files. A tool for extracting raw accelerometer
data from these files could be implemented in Actinfo.
• Re-integrate *.agd files: Actilife offers a feature to re-integrate *.agd files to higher epoch values,
which could be replicated in Actinfo.
• Expand the database: the platform could benefit from having a larger dataset of PA files and
studies available to its users, allowing for more comparisons and meta-analysis between different
populations.
• Improve the data model: it could be useful to expand the current data model. In fact, it could
be beneficial to allow the support of a different type of user account for study participants, who
70
could be able to upload their actigraphy data directly to the platform. Additionally, the model could
account for the separation of the datasets from the study, allowing users to create studies from
files already uploaded to the platform. The model would also need to be more generalized and
support new relationships between collections of entities.
71
72
Bibliography
Actigraph Corp. (2018). How does Sedentary Analysis work? Retrieved from https://actigraphcorp.
force.com/support/s/article/How-does-Sedentary-Analysis-work.
Actigraph Software Department (2012). ActiLife 6 User’s Manual. Actigraph Corp.
Ainsworth, B. E., Caspersen, C. J., Matthews, C. E., Masse, L. C., Baranowski, T., and Zhu, W. (2012).
Recommendations to improve the accuracy of estimates of physical activity derived from self report.
Journal of physical activity & health, 9 Suppl 1:76–84.
Assembleia da Republica (1998). Lei da Proteccao de Dados Pessoais. Diario da Republica n.o
247/1998, Serie I-A de 1998-10-26.
Bangor, A., Kortum, P., and Miller, J. (2009). Determining What Individual SUS Scores Mean: Adding
an Adjective Rating Scale. Technical report.
Bangor, A., Kortum, P. T., and Miller, J. T. (2008). An Empirical Evaluation of the System Usability Scale.
International Journal of Human-Computer Interaction, 24(6):574–594.
Baptista, F., Santos, D. A., Silva, A. M., Mota, J., Santos, R., Vale, S., Ferreira, J. P., Raimundo, A. M.,
Moreira, H., Lui, L., Sardinha, L. B., Baptista, F., Santos, D. A., Silva, A. M., Mota, J., Santos, R.,
Vale, S., Ferreira, J. P., Raimundo, A. M., Moreira, H., and Sardinha, L. B. (2012). Prevalence of the
Portuguese Population Attaining Sufficient Physical Activity. Med. Sci. Sports Exerc, 44(3):466–473.
Barrett, C., Dominick, G., and Winfree, K. N. (2017). Assessing bouts of activity using modeled clinically
validated physical activity on commodity hardware. In 2017 IEEE EMBS International Conference on
Biomedical & Health Informatics (BHI), pages 269–272. IEEE.
Bender, D. and Sartipi, K. (2013). HL7 FHIR: An Agile and RESTful approach to healthcare information
exchange. In Proceedings of the 26th IEEE International Symposium on Computer-Based Medical
Systems, pages 326–331. IEEE.
Brocklebank, L. A., Falconer, C. L., Page, A. S., Perry, R., and Cooper, A. R. (2015). Accelerometer-
measured sedentary time and cardiometabolic biomarkers: A systematic review. Preventive Medicine,
76:92–102.
Brooke, J. (1996). SUS - A quick and dirty usability scale. In Patrick W. Jordan, B. Thomas, I. L. M.
B. W., editor, Usability Evaluation In Industry, chapter 22, pages 189–194. Taylor Francis.
73
Cadilhac, D. A., Cumming, T. B., Sheppard, L., Pearce, D. C., Carter, R., and Magnus, A. (2011). The
economic benefits of reducing physical inactivity: an Australian example. International Journal of
Behavioral Nutrition and Physical Activity, 8(1):99.
Cain, K. L., Conway, T. L., Adams, M. A., Husak, L. E., and Sallis, J. F. (2013). Comparison of older and
newer generations of ActiGraph accelerometers with the normal filter and the low frequency extension.
International Journal of Behavioral Nutrition and Physical Activity, 10(1):51.
Caspersen, C. J., Powell, K. E., and Christenson, G. M. (1985). Physical activity, exercise, and physical
fitness: definitions and distinctions for health-related research. Public health reports (Washington,
D.C. : 1974), 100(2):126–31.
Chaniotis, I. K., Kyriakou, K.-I. D., and Tselikas, N. D. (2015). Is Node.js a viable option for building
modern web applications? A performance evaluation study. Computing, 97(10):1023–1044.
Chastin, S. F., Egerton, T., Leask, C., and Stamatakis, E. (2015). Meta-analysis of the relationship
between breaks in sedentary behavior and cardiometabolic health. Obesity, 23(9):1800–1810.
Chen, T., Kishimoto, H., Honda, T., Hata, J., Yoshida, D., Mukai, N., Shibata, M., Ninomiya, T., and
Kumagai, S. (2018). Patterns and Levels of Sedentary Behavior and Physical Activity in a General
Japanese Population: The Hisayama Study. Journal of Epidemiology, 28(5):260–265.
Clark, B. K., Healy, G. N., Winkler, E. A. H., Gardiner, P. A., Sugiyama, T., Dunstan, D. W., Matthews,
C. E., and Owen, N. (2011). Relationship of Television Time with Accelerometer-Derived Sedentary
Time. Medicine & Science in Sports & Exercise, 43(5):822–828.
Ekelund, U., Steene-Johannessen, J., Brown, W. J., Fagerland, M. W., Owen, N., Powell, K. E., Bauman,
A., and Lee, I.-M. (2016). Does physical activity attenuate, or even eliminate, the detrimental associ-
ation of sitting time with mortality? A harmonised meta-analysis of data from more than 1 million men
and women. The Lancet, 388(10051):1302–1310.
European Parliament and Council (2016). Regulation (EU) 2016/679 of the European Parliament and
of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of
personal data and on the free movement of such data, and repealing Directive 95/46/EC (General
Data Protection Regulation). OJ 2016 L 119/1.
European Union (2008). EU Physical Activity Guidelines Recommended Policy Actions in Support of
Health-Enhancing Physical Activity. Technical report.
Evenson, K. R., Catellier, D. J., Gill, K., Ondrak, K. S., and McMurray, R. G. (2008). Calibration of two
objective measures of physical activity for children. Journal of Sports Sciences, 26(14):1557–1565.
Fielding, R. and Reschke, J. (2014). Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content.
Technical report.
Fielding, T. F. (2000). Architectural styles and the design of network-based software architectures.
74
Gonzalez, K., Fuentes, J., and Marquez, J. L. (2017). Physical Inactivity, Sedentary Behavior and
Chronic Diseases. Korean journal of family medicine, 38(3):111–115.
Hansen, B. H., Anderssen, S. A., Andersen, L. B., Hildebrand, M., Kolle, E., Steene-Johannessen, J.,
Kriemler, S., Page, A. S., Puder, J. J., Reilly, J. J., Sardinha, L. B., van Sluijs, E. M. F., Wedderkopp, N.,
Ekelund, U., and Collaborators, t. I. C. A. D. I. (2018). Cross-Sectional Associations of Reallocating
Time Between Sedentary and Active Behaviours on Cardiometabolic Risk Factors in Young People:
An International Children’s Accelerometry Database (ICAD) Analysis. Sports Medicine, 48(10):2401–
2412.
Helmerhorst, H. J., Brage, S., Warren, J., Besson, H., and Ekelund, U. (2012). A systematic review
of reliability and objective criterion-related validity of physical activity questionnaires. International
Journal of Behavioral Nutrition and Physical Activity, 9(1):103.
Hills, A. P., Mokhtar, N., and Byrne, N. M. (2014). Assessment of physical activity and energy expendi-
ture: an overview of objective measures. Frontiers in nutrition, 1:5.
Ibanez, V., Silva, J., and Cauli, O. (2018). A survey on sleep assessment methods. PeerJ, 6:e4849.
International Children’s Accelerometry Database (ICAD) (2017). Suggested settings for accelerometer
data reduction in ICAD 2.0. Technical report.
Janssen, I. (2012). Health care costs of physical inactivity in Canadian adults. Applied Physiology,
Nutrition, and Metabolism, 37(4):803–806.
Judice, P. B., Silva, A. M., Santos, D. A., Baptista, F., and Sardinha, L. B. (2015). Associations of breaks
in sedentary time with abdominal obesity in Portuguese older adults. Age (Dordrecht, Netherlands),
37(2):23.
Kaminsky, L. A. and Ozemek, C. (2012). A comparison of the Actigraph GT1M and GT3X accelerometers
under standardized and free-living conditions. Physiological Measurement, 33(11):1869–1876.
Kim, J., Tanabe, K., Yokoyama, N., Zempo, H., and Kuno, S. (2013). Objectively measured light-intensity
lifestyle activity and sedentary time are independently associated with metabolic syndrome: a cross-
sectional study of Japanese adults. International Journal of Behavioral Nutrition and Physical Activity,
10(1):30.
Kuzik, N., Carson, V., Andersen, L. B., Is, L., Sardinha, B., Grøntved, A., Hansen, H., and Ekelund,
U. (2017). Physical Activity and Sedentary Time Associations with Metabolic Health Across Weight
Statuses in Children and Adolescents. Obesity, 25:1762–1769.
Lawton, G. (2005). LAMP lights enterprise development efforts. Computer, 38(9):18–20.
Lewis, J. R. (2018). The System Usability Scale: Past, Present, and Future. International Journal of
Human–Computer Interaction, 34(7):577–590.
75
Loney, T., Standage, M., Thompson, D., Sebire, S. J., and Cumming, S. (2011). Self-report vs. objectively
assessed physical activity: which is right for public health? Journal of physical activity & health,
8(1):62–70.
Louridas, P. (2016). Component Stacks for Enterprise Applications. IEEE Software, 33(2):93–98.
Mandel, J. C., Kreda, D. A., Mandl, K. D., Kohane, I. S., and Ramoni, R. B. (2016). SMART on FHIR: a
standards-based, interoperable apps platform for electronic health records. Journal of the American
Medical Informatics Association, 23(5):899–908.
Monyeki, M. A., Moss, S. J., Kemper, H. C. G., and Twisk, J. W. R. (2018). Self-Reported Physical
Activity is Not a Valid Method for Measuring Physical Activity in 15-Year-Old South African Boys and
Girls. Children (Basel, Switzerland), 5(6).
Pate, R. R., O’Neill, J. R., and Lobelo, F. (2008). The Evolving Definition of "Sedentary".
Exercise and Sport Sciences Reviews, 36(4):173–178.
Plasqui, G., Bonomi, A. G., and Westerterp, K. R. (2013). Daily physical activity assessment with
accelerometers: new insights and validation studies. Obesity Reviews, 14(6):451–462.
Robusto, K. M. and Trost, S. G. (2012). Comparison of three generations of ActiGraphTM activity monitors
in children and adolescents. Journal of sports sciences, 30(13):1429–35.
Saint-Maurice, P. F., Troiano, R. P., Matthews, C. E., and Kraus, W. E. (2018). Moderate-to-Vigorous
Physical Activity and All-Cause Mortality: Do Bouts Matter? Journal of the American Heart Associa-
tion, 7(6).
Santos, D. A., Judice, P. B., Magalhaes, J. P., Correia, I. R., Silva, A. M., Baptista, F., and Sardinha,
L. B. (2018). Patterns of accelerometer-derived sedentary time across the lifespan. Journal of Sports
Sciences, pages 1–9.
Sardinha, L. B., Magalhaes, J. P., Santos, D. A., and Judice, P. B. (2017). Sedentary Patterns, Physical
Activity, and Cardiorespiratory Fitness in Association to Glycemic Control in Type 2 Diabetes Patients.
Frontiers in Physiology, 8:262.
Sardinha, L. B., Santos, D. A., Silva, A. M., Baptista, F., and Owen, N. (2015). Breaking-up Sedentary
Time Is Associated With Physical Function in Older Adults. The Journals of Gerontology Series A:
Biological Sciences and Medical Sciences, 70(1):119–124.
Sedentary Behaviour Research Network (2012). Letter to the Editor: Standardized use of the terms
“sedentary” and “sedentary behaviours”. Applied Physiology, Nutrition, and Metabolism, 37(3):540–
542.
Tarp, J., Bugge, A., Andersen, L. B., Sardinha, L. B., Ekelund, U., Brage, S., and Møller, N. C. (2018).
Does adiposity mediate the relationship between physical activity and biological risk factors in youth?:
a cross-sectional study from the International Children’s Accelerometry Database (ICAD). Interna-
tional Journal of Obesity, 42(4):671–678.
76
Tremblay, M. S., Aubert, S., Barnes, J. D., Saunders, T. J., Carson, V., Latimer-Cheung, A. E., Chastin,
S. F., Altenburg, T. M., and Chinapaw, M. J. (2017). Sedentary Behavior Research Network (SBRN)
– Terminology Consensus Project process and outcome. International Journal of Behavioral Nutrition
and Physical Activity, 14(1):75.
Troiano, R. P., Berrigan, D., Dodd, K. W., Masse, L. C., Tilert, T., and Mcdowell, M. (2008). Physical
Activity in the United States Measured by Accelerometer. Medicine & Science in Sports & Exercise,
40(1):181–188.
Troiano, R. P., McClain, J. J., Brychta, R. J., and Chen, K. Y. (2014). Evolution of accelerometer methods
for physical activity research. British journal of sports medicine, 48(13):1019–23.
Warburton, D. E. R., Nicol, C. W., and Bredin, S. S. D. (2006). Health benefits of physical activity:
the evidence. CMAJ : Canadian Medical Association journal = journal de l’Association medicale
canadienne, 174(6):801–9.
World Health Organization (2004). Global Strategy on Diet, Physical Activity and Health. Technical
report.
World Health Organization (2010). Global Recommendations on Physical Activity for Health. Technical
report.
World Health Organization (2013). 2013-2020 Global action plan for the prevention and control of non-
communicable diseases.
World Health Organization (2014). WHO — What is Moderate-intensity and Vigorous-intensity Phys-
ical Activity? Retrieved from https://www.who.int/dietphysicalactivity/physical_activity_
intensity/en/.
World Health Organization (2017). WHO — Physical Activity. Retrieved from https://www.who.int/
dietphysicalactivity/pa/en/.
Yates, T., Wilmot, E. G., Davies, M. J., Gorely, T., Edwardson, C., Biddle, S., and Khunti, K. (2011).
Sedentary Behavior. American Journal of Preventive Medicine, 40(6):e33–e34.
77
78
Appendix A
Entity types and document fields
Table A.1: Fields in documents with the user entity type
Field Data type Descriptionname String Name of the registered user.email String Email of the registered user.
username String Username of the registered user. Identifies the user who navigates theplatform. Part of the login credentials.
role String Role of the registered user, either ”admin” or ”researcher”, for controllingaccess to the platform’s features.
password String A string resulting from the the hash of the user’s password.
79
Table A.2: Fields in documents with the researchStudy entity type
Field Data type FHIR-specific? Description
identifier String Yes An identifier assigned to the research studyby the responsible researcher.
userPermissions Array No Array containing usernames of users withpermission to access the study.
meta Array NoArray containing metadata for the study, suchas the user who created it and a timestamp ofthe last modification.
resourceType String YesString identifying the type of FHIR resourcethis document is (set to ”ResearchStudy” forall documents in this collection).
title String Yes A descriptive, short and user-friendly label forthe study.
status String Yes The current state of the study.
period Object Yes Object with ”start” and ”end” timestamps forthe study.
principalInvestigator String Yes Name of the researcher who oversees thestudy.
studyGroup Array No An array of embedded documents with meta-data for the groups created for the study.
Table A.3: Fields in documents with the studyGroup entity type
Field Data type DescriptiongroupName String Name of the study group.cohort Number Index of the cohort.country String Country of the subjects in the group.city String City of the subjects in the group.
80
Table A.4: Fields in documents with the researchSubject entity type
Field Data type FHIR-specific? Description
individual Object Yes Object with an identifier for the participant in thestudy (a pseudonym is used in Actinfo).
period Object Yes Object with ”start” and ”end” timestamps for thisstudy the participant is part of.
outputInfo Object No
Object containing the outputs resulting from runningthe file through Actinfo’s validation and processingtool. Specifically, information regarding the com-puted physical activity time indicators and the per-sonal information obtained from the *.agd file, suchas the sex, height, mass and age of the subject. Thevalidation parameters are also included as proper-ties of this field (i.e., cut point set chosen and boutsselected). Additionally, some auxiliary arrays aresaved, for generating tables in the platform’s outputpage.
study String Yes Study subject is part of.
status String Yes The current state of the subject in the study (set to”on-study” by default).
groups Array No References to the studyGroup the subject is part of.
resourceType String YesString identifying the type of FHIR resource thisdocument is (set to ”ResearchSubject” for all doc-uments in this collection).
Table A.5: Fields in documents with the file entity type
Field Data type Descriptionlength Number Size of the file, in bytes.chunkSize Number Size of each file chunk, in bytes.uploadDate Date Timestamp for when the document was stored in the database.
filename String Randomly generated filename, created by GridFS (unique names aregenerated to allow upload of the same file under different studies).
contentType String MIME type for the GridFS file.subject String Reference to the subject the file belongs to.
81
82
Appendix B
User interface
Figure B.1: File uploader interface.
83
Figure B.2: Custom cut point form.
Figure B.3: Custom cut point example.
84
Figure B.4: List of commonly used bouts and breaks presented to users. Each bout is accompanied bya table detailing its settings (duration and count levels).
85
Figure B.5: Custom bout form. When de-selected, the maximum duration and count level are set to∞.
86
Appendix C
Exported Excel file example
87
FigureC
.1:”S
umm
ary”sheetforanexam
pleexported
Excelfile.
88
Figure C.2: ”Daily” sheet for an example exported Excel file.
89
90
Appendix D
Consent forms
91
Figure D.1: Consent form signed by participants of the D2FIT study (front).
92
Figure D.1: Consent form signed by participants of the D2FIT study (back).
93
Figure D.2: Consent form signed by participants of the CML study.
94