new model comparison of patient volume prediction in digital health …1215272/... · 2018. 6....

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Model comparison of patient volume prediction in digital health care

SASHA HELLSTENIUS

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Model comparison ofpatient volume predictionin digital health care

SASHA HELLSTENIUS

Degree Programme in Computer Science and EngineeringDate: June 7, 2018Supervisor: Pawel HermanExaminer: Erik FransénSwedish title: Jämförelse av modeller för förutsägelse av patientvolyminom digital vårdSchool of Electrical Engineering and Computer Science

5

Abstract

Accurate predictions of patient volume are an essential tool to improveresource allocation and doctor utilization in the traditional, as well asthe digital health care domain. Varying methods for patient volumeprediction within the traditional health care domain has been studiedin contemporary research, while the concept remains underexploredwithin the digital health care domain. In this paper, an evaluationof how two different non-linear state-of-the-art time series predictionmodels compare when predicting patient volume within the digitalhealth care domain is presented. The models compared are the feedforward Multi-layer Percepron (MLP) and the recursive Long Short-Term Memory (LSTM) network. The results imply that the predictionproblem itself is straightforward, while also indicating that there aresignificant differences in prediction accuracy between the evaluatedmodels. The conclusions presented state that that the LSTM modeloffers substantial prediction advantages that outweigh the complexityoverhead for the given problem.

6

Sammanfattning

En korrekt förutsägelse av patientvolym är essentiell för att förbättraresursallokering av läkare inom traditionell liksom digital vård. Oli-ka metoder för förutsägelse av patientvolym har undersökts inom dentraditionella vården medan liknande studier inom den digitala sek-torn saknas. I denna uppsats undersöks två icke-linjära moderna me-toder för tidsserieanalys av patientvolym inom den digitala sjukvår-den. Modellerna som undersöks är multi-lagersperceptronen (MLP)samt Long Short-Term Memory (LSTM) nätverket. Resultaten som pre-senteras indikerar att problemet i sig är okomplicerat samtidigt somdet visar sig finnas signifikanta skillnader i korrektheten av förutsä-gelser mellan de olika modellerna. Slutsatserna som presenteras pekarpå att LSTM-modellen erbjuder signifikanta fördelar som övervägerkomplexitets- och prestandakostnaden.

Contents

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 21.2 Scope and Approach . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Patient Volume . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . 62.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 Predicting Patient Volume . . . . . . . . . . . . . . 82.4.2 Comparative Analysis of Different Architectures

for Sequence Prediction . . . . . . . . . . . . . . . 92.4.3 Sequence Prediction with LSTM . . . . . . . . . . 10

3 Method 123.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Results 164.1 MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Univariate Approach . . . . . . . . . . . . . . . . . 164.1.2 Multivariate Approach . . . . . . . . . . . . . . . . 20

4.2 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.1 Univariate Approach . . . . . . . . . . . . . . . . . 224.2.2 Multivariate Approach . . . . . . . . . . . . . . . . 26

4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7

8 CONTENTS

5 Discussion 325.1 Critical Evaluation . . . . . . . . . . . . . . . . . . . . . . 345.2 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . 355.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . 35

6 Conclusions 37

Bibliography 38

Chapter 1

Introduction

The process of predicting patient volume in the emergency depart-ment (ED) has been studied to a great extent [1–3] due to the fact thatED overcrowding has been identified as a threat to patient safety [4]and the ability to forecast patient volume is crucial to combat over-crowding [5]. However, there are other aspects that can potentiallyassist in reducing ED overcrowding by unburdening the system. Oneexample is digital health care, i.e. contacting physicians via video call.As in the traditional setting, a miss match in supply and demand canoccur.

The practical implications of patient volume prediction are effec-tive resource allocation and scheduling of physicians. The benefitsinclude improved patient care quality and doctor utilization. This isan important optimization aspect that handles the flow of patients ingeneral and is therefore applicable within traditional health care aswell as digital health care. However, patient volume prediction withinthe digital health care domain is underexplored compared to that ofthe ED.

Wargon et al. [5] performed a systematic review of patient volumeprediction in the ED. The results presented indicate that different AutoRegressive (AR) models have been evaluated previously and that pa-tient volume can be studied with the assistance of time series analy-sis. Time series analysis is a subject under continous development andan alternative approach is to use Artifical Neural Networks (ANN),more specifically a Multi-layer Perceptron (MLP) [6]. Although anMLP theoretically has the ability to approximate any function, it is notspecifically designed to handle temporal sequences. In contrast to the

1

2 CHAPTER 1. INTRODUCTION

MLP, the Recurrent Neural Network (RNN) is designed to promotethe persistence of information. A variant of the RNN is the Long Short-Term Memory (LSTM) network that has the capability of learning longterm dependencies and is considered to be one of the best performingRNN architectures [7]. Originally proposed in 1997 by Hochreiter andSchmidhuber [8], the LSTM has recently been compared to other state-of-the-art methodologies in several fields. Examining the potential ofthe MLP and LSTM models in the specific area of patient volume pre-diction in the digital health care domain has not been done. Gener-ally, research indicates that LSTMs outperform other state-of-the-artmethodologies when predicting temporal sequences [9–11]. The un-derexploration of time series forecasting with neural networks in thedigital health care domain motivates further inspection.

Predicting samples solely from the historical load, i.e. univariateapproaches, offer the benefit of simplicity and speed. However, theremay be other factors than the historical load that affect the outcome.This raises a second question, as to whether a multivariate approach,aggregating additional features, offers benefits in performance thatoutweigh the complexity overhead. Previous research has indicatedthat there is a positive correlation between patient volume in the EDand weather, weekdays and domain specific variables [5, 12, 13]. Theopen question remains whether this is the case in the digital healthcare domain as well.

1.1 Problem Statement

The aim of this study is to evaluate how two state-of-the-art time seriesprediction models compare when predicting patient volume withinthe digital health care domain. In addition, it is intended to examinethe contribution of additional features.

1.2 Scope and Approach

To answer the questions posed in the thesis, two different neural net-work architectures will be evaluated: the recurrent LSTM and the clas-sical feed-forward MLP. Both architectures will be used to create a uni-variate model, predicting patient load based on historical patient load.A linear AR model will be compared to the MLP and LSTM to ensure

CHAPTER 1. INTRODUCTION 3

that non-linear models are suitable for the problem at hand. To addressthe secondary objective of the study, the combination of historical pa-tient load paired with features such as weekdays, holidays and currentwaiting times will be examined. Due to the time frame of this work,model evaluation is limited to examining the MLP and LSTM.

A limiting factor effecting this study is that one single digital healthcare actor is examined, thus the results may not be applicable in ageneric context. Another aspect that should be kept in mind is thatthe domain of digital health care has not yet reached a mature stage.More specifically, the data examined in this project is supplied by thedigital health care provider KRY [14], there has been a lot of growthat KRY since the company launched their digital health care platform.Thus, data collected at an early stage may be too sparse and lack fea-tures that are important for training.

1.3 Thesis Outline

A general introduction to the concepts of patient flow, time series anal-ysis and ANNs are introduced in Chapter 2. This is followed by areview of related work on the topic of patient volume prediction, acomparative analysis of different ANN architectures for sequence pre-diction and lastly the application of LSTMs in sequence prediction.Chapter 3 presents the methodology and methods used for model cre-ation and evaluation. The results are presented in Chapter 4. Chapter5 presents a discussion regarding the implications of the results alongwith a description of ethical and sustainable considerations as well asa critical evaluation of the work performed. Lastly, the conclusions arepresented in Chapter 6.

Chapter 2

Background

2.1 Patient Volume

The health care system is something that a majority of the populationare subjected to at some point in their lives. It is a vital aspect of soci-ety that effects many people and this motivates the development andresearch of the topic. An important aspect that affects the health caresystem’s performance is patient flow analysis, defined by Hall et al.[15] as "the study of how patients move through the health-care system".

An important factor that affects the patient flow is the volume ofpatients, defined as the number of patients seeking care at an instancein time. Since this is a variable that varies over time it can be ana-lyzed as a time series. If the number of patients requesting assistanceis above the capacity of the health care provider then the waiting timeswill increase. By estimating the upcoming patient volume there is apotential to improve resource scheduling and thereby decrease wait-ing times, however this does imply that there must be resources avail-able to accommodate patient demand.

2.2 Time Series

A time series is a set of sequential observations collected over time.The temporal order may provide additional information due to thepossible serial dependence of observations. The main components ofnumerous time series are the variations of trend and seasonality [16,17]. Trend is the increase or decrease of data over time and seasonality

4

CHAPTER 2. BACKGROUND 5

is the oscillation that occurs in eg. a daily, weekly or monthly inter-val. The seasonal variations are recurring while the trend is persistent.These concepts are visualized in Figure 1.

(a) Trend (b) Seasonality

Figure 1: Examples of how different time series components vary overtime.

There are two main areas studied when regarding time series, namelyanalysis and forecasting. The area of time series analysis is focusedon analyzing a sequence in order to understand the underlying prop-erties of the data while time series forecasting deals with predictingfuture samples. Time series analysis can thus be used to understandthe correlation of observations over time and underlying features ofthe generative process. This information can then be used to identifya suitable model of the data to generate predictions [17].

Time series analysis can in turn be divided into different categories.When data is collected on a single variable the time series is defined asunivariate, however there are cases where several related sequencesmay be of relevance. Thus, when the data is collected on multiplevariables the time series is defined as multivariate. A second distinc-tion that can be made is whether the model is linear or non-linear.If the model is linear the resulting predictions are a linear combina-tion of previous samples. In contrast, the non-linear models are notconstrained to this property. Historically, time series forecasting wasdominated by linear approaches such as Auto Regressive (AR) models.However, for some sequences the linear assumption is not appropriateand this motivates the usage of non-linear models [16, 18, 19]. Contem-porary research has indicated that ANNs pose as serious contendersto linear models [20].

By modeling forecasting as a supervised learning problem it is pos-

6 CHAPTER 2. BACKGROUND

sible to use ANNs, however, the decomposition of a time series intoa set of input and output sequences can be done in several differentways, thus, introducing new parameters. Depending on the numberof output values, ie. the number of time steps predicted, a distinctioncan be made. Either the problem is one-step or multi-step forecasting.According to Bontempi et al. [20] multi-step forecasting is more com-plicated than one-step forecasting due to several factors, among themthe propagation of error and uncertainty. Another important parame-ter that is of relevance is the number of previous time steps taken intoaccount, the lag/window size. A visualization of how a sequence canbe decomposed is presented in Figure 2.

(a) Original sequence (b) Window size: 1 (c) Window size: 3

Figure 2: A sequence of five time steps (a) decomposed into two dif-ferent input/output sequences (b/c). Each node represents a time stepin the sequence. A grey node represents a target.

2.3 Artificial Neural Networks

There are different types of artificial neural networks, an MLP is a feedforward neural network that builds on the concepts first introducedby Rosenblatt [21] in 1958. Inputs are fed in and propagated to thehidden layer and then to the output layer, visualized in Figure 3. Theoutput produced by the network is then compared to the targets andthe weights connecting the different layers are updated with the helpof gradients to improve the predictions. To introduce non-linearitya transfer function is often applied at the hidden layers as well as atthe output layer. When using an MLP in the context of time seriesprediction, the sequence is transformed into labeled data as describedin Section 2.2


Figure 3: MLP architecture with one hidden layer. Circles representnodes and connections represent weights.

The RNN started to appear in the 1980’s when Hopfield [22] in-troduced a family of RNNs that were specifically designed to handlepattern recognition. In contrast to the MLP, the RNN is an artificialneural network that is specially adapted to handle temporal sequencesby remembering what it has seen so far. Instead of only seeing the in-put, as the MLP, the RNN also sees the hidden state from the previoustime step in the sequence. This creates memory in the network. Theconceptual difference between an MLP and an RNN can be visualizedwith directed graphs, see Figure 4.

Figure 4: Visualization of of MLP (left) and RNN (right).

The problem that has been found with RNN that is solved by theLSTM network is that of the vanishing gradient. The gradient is usedto update the weights in the network. When unfolding an RNN it be-comes as deep as the sequence length. The gradient that is dependant


on the weights saturate and converge to zero, thus updates cease. TheLSTM solves this by changing the structure of the recursive layer. In-stead of one layer as in the traditional RNN, there are four layers thatinteract with each other. Along with the layers an additional input isadded from the previous time step in the sequence, the new input islabeled the cell state. The changes are visualized in Figure 5.

Figure 5: Visualization of of LSTM.

The cell state is subject to minor changes from the output of three ofthe layers that are called gates. These three layers have different pur-poses: forgetting irrelevant information from the previous cell state,deciding how to update the cell state with the new input, and decid-ing what to output from the cell state to the hidden state. This allowsthe network to remember long term dependencies. The updates ofthe weights are thus not only dependant on the weights in the hiddenlayer but also the cell state. Since the cell state passes through almostunchanged, the gradient will not converge to zero.

2.4 Related Work

2.4.1 Predicting Patient Volume

2.4.1.1 Univariate Approach

Time series forecasting of patient volume in a digital health care set-ting is something that has not been identified in previous research.


However, the neighboring field of forecasting patient volume in a tra-ditional healthcare setting is a concept that has been studied. Earlyresearch is centralized around linear, univariate approaches. In 1994Tandberg and Qualls [23] presented a comparison of five simple uni-variate statistical models to predict hourly patient volume in the ER,and their results indicated that simple models performed well withtheir data. The results of Abdel-Aal and Mangoud [24] presented in1998 support these findings. In their study, two linear univariate timeseries models were evaluated.

More recently, in 2008, Jones et al. [1] compared multiple linearmodels and also included an ANN, however the results presentedstated that the ANN did not provide consistent predictions. This re-sult may be due to the choice of a simple architecture with few hiddennodes. The verification of this result is made difficult due to the factthat the authors did not explain how the network was trained nor howthe parameters were selected. Predicting patient volume in the ED wasalso examined by Marcilio et al. [2] in 2013. The authors presented sixdifferent univariate models for predicting daily patient volume. Theresults were in line with previous research and indicated that linearunivariate models could predict daily patient attendances.

2.4.1.2 Multivatiate Approach

Around this time, research that examined forecasting patient volumein a multivariate setting started to appear. Research on the effect ofadditional features applied to load prediction has been explored andresults indicate that patient volume in the ED is correlated with week-day, month and holiday features. This result was presented by Sun etal. [13] by predicting daily patient attendances with an AR model ina multivariate setting. Kam et al. [12] confirmed that patient volumein the ERs is characterized by seasonal patterns and that the weatherimpacted the patient volume.

2.4.2 Comparative Analysis of Different Architecturesfor Sequence Prediction

Lipton et al. [9] conducted the first study comparing an LSTM to anMLP to identify patterns in time series of clinical data to classify di-agnoses. The authors concluded that the LSTM showed promising re-


sults in the context of diagnosis recognition simultaneously acknowl-edging the difficulty in interpretability of neural networks applied tocomplex medical problems. A similar study was conducted by Choiet al. [25], who compared an RNN architecture with Grated RecurrentUnits (GRU) to a few baseline models, among them an MLP, to predictclinical events. The results support that of Lipton et al., indicating thatRNNs can be used as a clinical tool. Choi et al. also state that infor-mation learned at one hospital can be transferred to another, therebyfurther encouraging the continuous development of neural network asa clinical aid.

Applying neural networks to complex medical problems is how-ever, not trivial. Amirkhan et al. [26] recently published a novel com-parison of RNN predictions in the context of colorectal cancer amongpatients. The dataset used for training the networks consisted of pa-tients above a selected age who were diagnosed with the cancerousdisease. The patient data was supplied by a health care center in theNetherlands and contained a vast amount of features. Different gran-ularities with respect to the temporal time line were evaluated. Theresults were ambiguous. The network architectures selected for eval-uation were unclear, although a few parameters were presented andmotivated.

Contemporary research thus indicates that there are many oppor-tunities available in the context of sequence prediction with neural net-works, while also demonstrating the difficulty in applicability.

2.4.3 Sequence Prediction with LSTM

There has been reported success of the LSTM in many fields handlingsequence to sequence learning, including speech recognition [27] andtext translation [28]. Sutskever et al. [28] state that the LSTMs ability tomodel complex phenomena without detailed assumptions regardingproblem specifics gives reason to believe that the model will performwell on a variety of sequence related problems. This reasoning is sup-ported by Chung et al. [29], who conducted an empirical evaluationof recurrent units in the context of sequence modeling and concludedthat the LSTM were among the superior units.

Previous research has thus indicated that the LSTM is suitable whenhandling sequence to sequence modeling, however, time series data isa continuous sequence that is not segmented into subsequences. The


original LSTM was not equipped to handle this type of problem butthe introduction of forget gates in 1999 by [30] Gers et al. enabled theLSTM to learn to reset itself, thereby removing the subsequence de-pendence.

The LSTM has been evaluated within the context of time series pre-diction in a variety of fields. Zhang et al. [10] performed comparisonsof the LSTM and other baseline models in the context of sewer over-flow monitoring. Their results indicated that the LSTM outperformedother state-of-the-art models, including the MLP. Ma et al. [11] pre-sented a novel LSTM model for prediction of travel speed with resultsindicating that the LSTM outperformed other RNN and MLP models.The authors state that this result is likely due to the LSTMs intrinsiccapability of determining an optimal window size. Ma et al. also ex-amine the effect of an additional feature, stating that the combinationof speed and volume to predict future speed improved prediction ac-curacy, although not significantly.

Contrasting the reported contemporary success, Grers et al. [31]stated early on in 2002 that the MLP outperformed the LSTM on par-ticular time series prediction tasks. Specifically on simpler predictiontasks where all the information needed was contained nearby. Thus,pointing out that the strength of the LSTM lies in it’s ability to preserveinformation over undefined time. When this aspect is not necessarysimpler approaches may be preferable.

Chapter 3

Method

3.1 Data

The primary data supplied by KRY [14] is the historical patient vol-ume from January 2015 to February 2018, consisting of all time stampswhen patients pay for their appointments. Payments are performedalong with bookings. The data is partitioned into hourly segments,thus creating a sequence of patient volume per hour from January 2015to February 2018. Due to confidentiality restrictions there is no visual-ization of the data. There is a clear growth trend. KRY also supplieddata containing waiting times per minute. The choice was made to dis-regard this data due to the fact that the waiting times from the first twoyears were not tracked and data reconstruction is out of the scope ofthis project. This entails that the effect of the domain specific feature,waiting time, is not examined.

Three different data sets are created, one univariate sequence con-sisting of patient volume. One multivariate sequence consisting ofpatient volume and weekday values and lastly one multivariate se-quence consisting of patient volume and holiday values.

To preprocess the data normalization is performed. The historicaldata and the weekday values are scaled independently to values be-tween 0 and 1. After this the data is split into training, testing and val-idation sets. The first 80% of the data is used for training and the last20% for testing. The last 20% of the training data is used for validation.The data is split chronologically to preserve the temporal ordering inthe sequence. The choice of a large training set was due to the modestpatient volume during the first year.

12

CHAPTER 3. METHOD 13

To transform the data into a classification problem three differentwindow sizes were selected, 1 (W1), 3 (W3) and 5 (W5). Thus, W1 wascreated by setting the target of each time step to the normalized patientvolume at the next time step. W3 was created by setting the sequenceof normalized patient volume from five consecutive time steps to thetarget at the next time step, and then shifting one time step and repeat-ing the process. The same procedure was used to create W5. This wasperformed on the three different datasets independently, thus finallyresulting in nine different datasets, seen in Table 1.

Dataset WindowU_W1 1U_W3 3U_W5 5

Dataset WindowMW_W1 1MW_W3 3MW_W5 5

Dataset WindowMH_W1 1MH_W3 3MH_W5 5

Table 1: Final datasets. From TL: Univariate, Multivariate Weekday,and Multivariate Holiday.

3.2 Models

The MLP and LSTM were implemented in the Keras framework. Bothmodel types were configured with the Adam optimizer, which is an ex-tension of the stochastic gradient descent optimizer. Training was runfor a maximum number of 300 epochs with regularization in the formof early stopping and a patience of 5. The patience dictates the numberof epochs with no improvements in validation loss without stopping.No shuffling of the data was performed during training. Parameter se-lection was done with a grid search were combinations where selectedbased on the lowest generalization error produced on the validationdata. The parameters and ranges examined are presented in Table 2.

Parameter ValueActivation function Relu, Sigmoid, TanhLoss function MAE, MSEBatch size 1-5% of training data size

Table 2: Domain for parameter grid search.

The architectures evaluated for both the MLP and the LSTM arepresented in Table 3. A variation of the number of hidden nodes and

14 CHAPTER 3. METHOD

layers are evaluated. Single- and two-layer architectures are tested.The amount of hidden nodes is equal over both layers in the two layercase. Ten independent trials are performed for each architecture on thetraining set and the resulting predictions over the test set are averaged.This is done due to the stochastic nature of training and initialization.

Parameter ValueHidden units 25, 50, 100, 200, 500, 1000

Table 3: Range of layers and hidden units evaluated.

3.3 Evaluation

To examine the accuracy of different models and the effect of addi-tional features the Mean Absolute Error (MAE) and the Mean SquaredError (MSE) of each resulting sequence is evaluated. The respectiveformulas are presented in Equation 3.1 and 3.2.

MAE =n∑

i=1

|yt − yt| (3.1)

MSE =n∑

i=1

(yt − yt)2 (3.2)

To evaluate the difference in prediction of the different model typesa sequence representing the error (z) from the true target (y) for eachmodel prediction (y) is created. The sequence is thus the predictionsubtracted from the target value. zt = yt−yt. This results in sequencescontaining the error of each model prediction at each hour. These se-quences are referenced to as model results.

Two-way ANOVA tests are performed on the squared sequencesz to examine how window sizes, the number of units and differentfeatures interact in affecting predictions. To evaluate if the differencesin predictions are statistically significant one-way ANOVA tests arepreformed on the squared sequence z. For each model, sample inde-pendence is assumed. A significance level of 0.05% is selected. TheANOVA test indicates if there are significant differences in the pop-ulation means with the null hypothesis that the data comes from thesame population. The ANOVA test does not indicate if there are dif-ferences among all populations and therefore a post hoc Tukey test is

CHAPTER 3. METHOD 15

performed. The Tukey test indicates if there are significant differencesamong the pairs of populations. The null hypothesis is that the datacomes from the same population.

Chapter 4

Results

4.1 MLP

4.1.1 Univariate Approach

As described in Section 3.3, different network architectures and win-dow sizes were evaluated for each model. Initially the effect of thewindow size and the number of hidden units was examined in theMLP context. This was done by training MLPs with a varying amountof hidden nodes. Each architecture was trained on the dataset con-taining historical patient volume with three different window sizes.Ten independent trials were run for each architecture and window sizevariation. The resulting predictions on the test set of the ten trials wereaveraged to produce a sequence representing the specific architectureand window size. To examine the effect of the two independent vari-ables and their interaction on the test MSE performance, a two-wayANOVA test was performed. The resulting F-statistics and p-valuesare presented in Table 4.

Factors F-statistic p-value EffectNumber of units 169 1.224e-38 ****Window size 4105 0.000 ****Number of units * Window size 33 8.414e-09 ****

Table 4: Two-way ANOVA test. Effect of the number of units andwindow sizes on the MSE obtained with the MLP on the test set. Lowp-values reflect strong effects.

16

CHAPTER 4. RESULTS 17

The results in Table 4 indicate that both the number of units andthe window size have a statistically significant main effect on the MSE.The test also indicates that the interaction between the number of unitsand the window size has a statistically significant impact on the pre-diction accuracy. The main effects are shown as box plots in Figure6 and the interaction is reflected in non-parallel lines, illustrating theinteraction effect of the two factors.

18 CHAPTER 4. RESULTS

(a)

(b)

(c)

Figure 6: Effect of the window size (a), the number of units (b), andthe interaction effect of both factors (c), on the MSE obtained with theMLP on the test set. Boxes (a, b) extend from first to third quantilevalues with the line at the median.


A post hoc Tukey test demonstrated that the effect of different win-dow sizes was statistically significant, see also Figure 6a. Due to this,and the significant interaction between the factors, the effect of thenumber of units on a specific window size was evaluated. Since awindow size of three hours produced the lowest average MSE on thetest set, this window size was further examined. A one-way ANOVAtest was thus performed to examine the specific effect of the numberof hidden units on the MSE performance when the window size is setto three hours. The resulting p-value and F-statistic are presented inTable 5.

F-statistic p-value Effect81 3.722e-85 ****

Table 5: One-way ANOVA test. Effect of the number of units on theMSE obtained with the MLP on the test set, with window size set tothree hours.

Since the one-way ANOVA test demonstrated a significant effect,a post hoc Tukey test was performed to examine the origins of theobserved differences. The resulting average MSE on the test set with a95% confidence interval is presented in Figure 7.

Figure 7: Average MSE obtained with the MLP on the test set, evalua-tion of number of units with the window size set to three hours.

Figure 7 indicates that there are some significant differences be-tween models with a different number of units. When the number


of units is set to 500, the predictions deviate the least from the targetand the difference from other models is statistically significant. Thereis no significant difference in predictions between models with 200and 1000 units, these models produce predictions that are less accu-rate than the 500 unit model. When the number of units is either 25,50 or 100 the predictions are the least accurate. This indicates that amodel with 500 units is a suitable architecture. Increasing the numberof units increases model complexity and causes overfitting, decreas-ing the number of units decreases model complexity and underfittingoccurs.

4.1.2 Multivariate Approach

To examine the effect of additional features in the MLP context, mod-els were trained on solely historical patient volume (univariate), thecombination of historical patient volume and the sequence represent-ing holidays (multivariate holiday) as well as the combination of his-torical patient volume and the sequence representing weekdays (mul-tivariate weekday). Architectural decisions were made based on themethods described in Section 3.2. The resulting MSE of predictions onthe test set of ten independent trials were averaged to produce a se-quence representing the specific dataset and window size. The resultsare illustrated in Figure 8. A two-way ANOVA test was performedand the test results indicate that there are significant main effects andwith no statistically significant interaction between the two factors, seeTable 6.

Factors F-statistic p-value EffectDataset 320 1.652e-71 ****Window size 252 2.672e-78 ****Dataset * Window size 0.71 0.397 NS

Table 6: Two-way ANOVA test. Effect of the additional features andwindow sizes on MSE obtained with the MLP on the test set.


(a)

(b)

(c)

Figure 8: Effect of additional features (a), the window size (b), and theinteraction effect of both factors (c), on MSE obtained with the MLP onthe test set. Boxes (a, b) extend from first to third quantile values withthe line at the median.


A post hoc Tukey test indicated that there was a significant differ-ence between the MSE of models trained on the multivariate weekdaydata compared to the multivariate holiday and univariate datasets,while there was no significant difference between the latter two. Thisis visualized in Figure 9 where the average MSE and corresponding95% confidence intervals are presented. Thus, there is no indication ofsignificant improvements with a multivariate approach.

Figure 9: Average MSE on the test set obtained with MLP modelstrained on univariate, multivariate weekday and multivariate holidaydata with a window size of three hours.

4.2 LSTM

4.2.1 Univariate Approach

The effect of the number of units and the window size was examinedin the context of the LSTM model as well. To evaluate this, the LSTMmodels were trained with a varying amount of hidden nodes. Eacharchitecture was trained on the historical patient volume with threedifferent window sizes. Ten independent trials were run for each ar-chitecture and window size variation. The resulting predictions on thetest set of the ten trials were averaged to produce a sequence represent-ing the specific architecture and window size. A two-way ANOVA testwas then performed to evaluate the effect of the factors on the MSEperformance.


Factors F-statistic p-value EffectsNumber of units 188 7.120e-43 ****Window size 8864 0.000 ****Number of units * Window size 274 1.698e-61 ****

Table 7: Two-way ANOVA test. Effect of the number of units andwindow sizes on the MSE obtained with the LSTM on the test set. Lowp-values reflect strong effects.

The results in Table 7 indicate that there is a significant interactionbetween the number of units and the window size, and most impor-tantly that there are main effects of the window size and the numberof hidden units. Figure 10 supports these results.


(a)

(b)

(c)

Figure 10: Effect of the window size (a), the number of units (b), andthe interaction effect of both features (c), on the MSE obtained with theLSTM on the test set. Boxes (a, b) extend from first to third quantilevalues with the line at the median.


A post hoc Tukey test indicated that a window size of one consis-tently produced results with a significantly lower MSE than the otherwindow sizes evaluated. This is supported by the box plot in Figure10a. Due to this observation, along with the identified interaction be-tween the factors, the effect of the number of units when setting thewindow size to one was further examined. As in the case with theMLP, a one-way ANOVA test was performed. The results are pre-sented in Table 8. Due to the significant outcome, the nature of theeffects were evaluated by running a post hoc Tukey test. The absolutedifference with a 95% confidence interval is presented in Figure 11.

F-statistic p-value Effect1646 0.000 ****

Table 8: One-way ANOVA test. Effect of the number of units on theMSE obtained with the LSTM on the test set, with window size set toone hour.

Figure 11: Average MSE obtained with the LSTM on the test set, eval-uation of the number of units with the window size set to one hour.

Figure 11 illustrates that LSTMs with a large number of units pro-duce results that are significantly different from those with a fewernumber of units. Models with many units (200, 500 and 1000) producepredictions with the lowest average MSE.


4.2.2 Multivariate Approach

To examine the effect of additional features in the LSTM context, mod-els were trained on solely historical patient volume (univariate), thecombination of historical volume and the sequence representing hol-idays (multivariate holiday) as well as the combination of historicalvolume and the sequence representing weekdays (multivariate week-day). Architectural decisions were made based on the methods de-scribed in Section 3.2. The resulting MSE of predictions on the test setof ten independent trials were averaged to produce a sequence rep-resenting the specific dataset and window size. The results are illus-trated in Figure 12. A two-way ANOVA test was performed and thetest results indicate that there are significant main effects with a statis-tically significant interaction between the two factors, see Table 9.

Factors F-statistic p-value EffectDataset 1970 0.000 ****Window size 34 4.389-09 ****Dataset * Window size 1678 0.000 ****

Table 9: Two-way ANOVA test. Effect of window sizes and additionalfeatures on the MSE obtained with the LSTM on the test set. Low p-values reflect strong effects.


(a)

(b)

(c)

Figure 12: Effect of the window size (a), the additional features (b), andinteraction effect of both factors (c), on MSE obtained with the LSTMon the test set. Boxes (a, b) extend from first to third quantile valueswith the line at the median.


A post hoc Tukey test demonstrated that there was a statisticallysignificant difference between the window sizes and that a windowsize of one produced the lowest average MSE, see also Figure 12a. Dueto this difference and the significant interaction between the two fac-tors, a one-way ANOVA test was run on the predictions of modelswith a window size of one. The resulting p-value and F-statistic arepresented in Table 10.

F-statistic p-value Effect849 0.000 ****

Table 10: One-way ANOVA test. Effect of the additional features onthe MSE obtained with the LSTM on the test set, with window size setto one hour.

Due to the significant outcome of the one-way ANOVA test a posthoc Tukey test was performed to study the nature of the effects. Asseen in Figure 13 the univariate approach produced the lowest aver-age MSE, thus indicating that the multivariate approach does not offersignificant improvements in MSE performance.

Figure 13: Average MSE on the test set obtained with LSTM modelstrained on univariate, multivariate weekday and multivariate holidaydata with a window size of one hour.


4.3 Comparison

After evaluating the MLP and the LSTM models separately and con-cluding that multivariate approaches offer no statistically significantimprovements, see Section 4.1.2 and 4.2.2, the models trained solelyon historical patient volume were selected for comparison. The spe-cific MLP selected for comparison was the model trained on data witha window size of three hours due to the significantly lower MSE com-pared to the other window sizes evaluated. The same motivation wasapplied when selecting the LSTM model trained on data with a win-dow size of one hour. Ten independent trials were run on each modeltype and the averaged predictions on the test set were compared alongwith a linear AR model. Figure 14a visualizes the predictions over thelast week of the test set.

It can be observed from the visualization of the error sequence inFigure 14b and 14c that the predictions of the MLP tend to primarilyovershoot the target while the LSTM model predictions are close to thetarget. The predictions of the AR model both over and undershoot thetarget.


(a)

(b)

(c)

Figure 14: Predictions over the last week of the test set (a), compari-son of error (b), and comparison of MSE (c). The magenta lines cor-responds to the LSTM model, the dark grey lines corresponds to theMLP model and the light grey lines corresponds to the AR model. Theblack line corresponds to the target.


To evaluate if the differences of the MSE obtained from model pre-dictions on the test set were statistically significant a one-way ANOVAtest was performed. The results are presented in Table 11 and due tothe significant outcome a post hoc Tukey test was performed.

F-statistic p-value Effect75 4.330e-33 ****

Table 11: One-way ANOVA test. Effect of model types on the MSEobtained from predictions on the test set. The low p-value indicates astrong effect.

The post hoc Tukey test indicated that the differences between themeans of all pairs were statistically significant and this is supportedby Figure 15 that illustrates the average MSE of each model predictionand the corresponding 95% confidence interval.

Figure 15: Average MSE on the test set obtained with different modeltypes.

All models produce significantly different predictions. The LSTMmodel outperforms both the MLP and AR model when evaluating per-formance on the MSE of predictions on the test set.

Chapter 5

Discussion

The results of this study indicate that ANNs can produce accurate pre-dictions of patient volume within the domain of digital health care.The LSTM was identified as the preferable choice with statistical sig-nificance. This is in line with contemporary research [10, 11], indicat-ing that LSTMs are suitable for sequence predictions in a wide rangeof practical contexts. This work contributes to the current understand-ing of how neural networks can be applied to time series prediction ina practical context while simultaneously offering a contribution to theexisting literature.

There are no previous evaluations of patient flow prediction withindigital health care and the results of this study therefore offer an orig-inal contribution to the domain. As we are moving towards an in-creased digitalization in general, it is highly important that the re-search conducted in traditional settings are performed in the equiva-lent digital settings as well. This is due to the fact that concepts appli-cable in traditional settings are not necessarily transferable to digitalsettings and developments within both domains are equally impor-tant. Traditional health care has developed over a long period of timewhile digital health care is still in its infancy. To utilize the full capac-ity of digital health care it is important to drive research within thisdomain forward at a high pace.

Since digital health care is a relatively new domain, the data avail-able for evaluation is limited. This entails that the data may not betruly representative and also that there is a limit to evaluating seasonalfactors. Both these factors could potentially affect results.

An important observation made about patient flow in digital health

32

CHAPTER 5. DISCUSSION 33

care is the that the prediction problem is straightforward. More specif-ically, the results indicate that an LSTM with a minimal window sizeproduces the most accurate predictions. Gers et al. [31] stated that thesuccess with a window size of one time step indicates that the problemat hand is not necessarily complicated and that a linear model may besuitable. However, in this case, there was a significant improvementof the non-linear models over the AR model. It is highly likely that theproblem at hand is not specifically complicated, although, this shouldnot necessarily be an argument to discard the LSTM. As seen in thisstudy, the LSTM performs remarkably well as a simple, relatively un-optimized model, which was also noted by Sutskever et al. [28]. Thusimplying that the model can be used as a black box where extensiveunderstanding of time series analysis and continuous changes to pa-tient volume is not necessary. This is seen as beneficial in a practicalcontext since it decreases the need for human involvement and there-fore becomes highly maintainable over time even if resources are lim-ited.

The drawback of LSTMs are mainly that they are computationallyheavy due to the vast amount of weights that require training. Thisis however most prominent in deep LSTM networks, as Ma et al. [11]mention in their work. However, in the research presented here, shal-low architectures were evaluated and therefore the computational bur-den was not as critical.

The inclusion of the weekday feature did not lend a significant im-provement to prediction accuracy. This result is not in line with pre-vious research, Jones et al. [1] and Sun et al. [13] both found that thepatient volume in the ED was characterized by weekly patterns. Anaspect to keep in mind is that the previous discoveries were performedon data from the ED. This clinical setting is not equivalent to the digitalhealth care setting that is examined in this report. The effect of week-days may be less apparent when patients can access health care froma distance. An alternative explanation could lie in the representationalchoice of the feature. A 1-of-N encoding increased the dimensionalityof the input to an extent that was deemed to be insufficient in compar-ison to the amount of data. This motivated the choice to encode week-days as values between 0 and 1. Since the results can be interpreted asan indication that the relationship between weekdays is not accuratelyrepresented in one dimension, an embedding of weekdays could havebeen an interesting alternative. By supplying an embedded represen-

34 CHAPTER 5. DISCUSSION

tation, the model could be expected to learn other relevant featuresfrom the history of patient flow since there would be no need to learnthe relevance of weekdays in the historical sequence.

The addition of the holiday feature improved the MLP model slightly,but the corresponding LSTM model produced predictions that devi-ated significantly. This is an interesting result, the main difference inthe added sequence is the fact that it is highly sparse. A speculativeexplanation for this behaviour could lie in the recursive aspect of theLSTM.

5.1 Critical Evaluation

Finding an optimal network architecture is a difficult and time con-suming task. This is done by varying the depth and width of the net-work and examining performance. In general, deeper networks tendto require fewer units per layer and thus fewer parameters. However,deeper networks can be more difficult to optimize [7]. This motivatedthe decision to limit the depth of the networks, thus leading to an in-creased risk that the architectures evaluated were not optimal. In hind-sight, it is believed that a deeper MLPs should have been evaluatedand this shortcoming is seen as a weakness in the comparison of thetwo architectures. The insight of Lipton et al. [32] highlights the rea-soning behind this decision. As the amount of combinable techniquesincrease, the practicality of exploring all combinations decrease. In thiscase, the depth of the networks evaluated was not explored to a greatextent.

Furthermore, the results in this report are not necessarily gener-alizeable since the data used for evaluation is from one specific digi-tal health care provider. In order to claim that the results are generalfor the digital health care domain, evaluation over multiple providerswould be necessary. A final limitation is the difficulty to reproducethe results presented in this report. Since the data supplied by the dig-ital health care provider is confidential the reproduction of the exactresults are problematic.

CHAPTER 5. DISCUSSION 35

5.2 Ethics and Sustainability

A lot of work has been done to gain acceptance for neural networksas a clinical aid [9, 25] but the application in the general health caredomain is more straightforward. This is possibly due to the fact thatthe consequences of mistakes in this domain are less serious than thatof clinical aid.

Identifying a suitable model for patient volume prediction impliesthat scheduling improvements can be made. Effective scheduling in-creases doctor availability and in turn improves patient care by de-creasing waiting times. This is a desirable effect within not only thethe traditional health care domain but also the growing digital healthcare domain. The health care system in Sweden is under high stressand different aspects of digitalization can unburden the system. Dig-ital health care has the intrinsic benefit that patients to a lesser extentneed visit a physical clinic. This has a positive impact on the individ-ual patient by decreasing travel costs, time spent in transit and the riskof infection that is increased when sick individuals gather in one loca-tion. With an effective scheduling the amount of patients that can beoffered help through a digital actor increases, thus offering improve-ments that affect each individual patient while simultaneously reduc-ing the environmental burden connected to travel.

Two important aspects to mention when promoting digitalizationof the health care system is security and integrity. Digitalizing careinvolves storage and use of sensitive information. However, to pro-vide high quality care that is accessible and satisfies the present de-mand, the exploration and continued introduction of digital solutionsare needed. The possible positive effects of digital health care, such asassisted diagnosis, prediction of disease and accessibility outweigh therisks introduced. As long as improvements are made with the patientin mind, digitalization of the entire health care system should continueto develop.

5.3 Further Research

Previous research has indicated that weather affects the patient vol-ume in the ED [12], indicating that people are less likely to seek helpfor minor issues when the weather is bad. Since the access to a digital

36 CHAPTER 5. DISCUSSION

platform is not necessarily affected by the the weather it could be ex-pected to have less of an impact on patient volume. However, since thepatients do not necessarily come from the same region, the method formeasuring the weather is not trivial. Patients of a digital health ser-vice are not bound to any specific location. Aside from the weather,there are interesting domain specific features that could affect predic-tion performance. With the use of data reconstruction or limiting theamount of historical data, the waiting time could be examined. Mar-keting trends could also be evaluated to see what effect they have onpatient volume, eg. email promotions and location based ads.

In a broader context, the comparison of patient volume predictionin a traditional and digital health care setting should be examined. Ifsimilarities and differences in factors affecting predictions within thesedomains can be identified, the possibility of knowledge transferringincreases. Since the traditional health care system historically has con-stituted a majority of the care provided, research within this area isgreater than that of the digital counterpart. Since both areas are im-portant factors in a sustainable health care system, research that couldincrease the understanding of the relations between the two is highlyencouraged.

This study indicates that the LSTM model is an appropriate choicewhen working with forecasting in this specific application. However,during the production of this work Bai et al. [33] published a study in-dicating that the Convolutional Neural Network (CNN) outperformedthe LSTM in many sequencing tasks. This shows that the field is underconstant change and development, and that the frontiers of knowledgeare moving fast.

Chapter 6

Conclusions

Predicting patient volume in the digital health care setting is a rela-tively straightforward. There are significant differences in accuracybetween non-linear models in a univariate setting. The LSTM modelprovides predictions that are significantly more accurate and is there-fore deemed preferable to solve the problem at hand. The compu-tational complexity is a minor issue compared to the benefits of in-creased performance in this setting. When adding additional featuressuch as holidays and weekdays, the LSTM model shows no significantimprovement.

37

Bibliography

[1] Spencer S Jones, Alun Thomas, R Scott Evans, Shari J Welch,Peter J Haug, and Gregory L Snow. “Forecasting daily patientvolumes in the emergency department”. In: Academic EmergencyMedicine 15.2 (2008), pp. 159–170.

[2] Izabel Marcilio, Shakoor Hajat, and Nelson Gouveia. “Forecast-ing daily emergency department visits using calendar variablesand ambient temperature readings”. In: Academic emergency medicine20.8 (2013), pp. 769–777.

[3] Andreas Ekström, Lisa Kurland, Nasim Farrokhnia, Maaret Cas-trén, and Martin Nordberg. “Forecasting emergency departmentvisits using internet data”. In: Annals of emergency medicine 65.4(2015), pp. 436–442.

[4] Robert M Cowan and Stephen Trzeciak. “Clinical review: emer-gency department overcrowding and the potential impact on thecritically ill”. In: Critical care 9.3 (2004), p. 291.

[5] M Wargon, B Guidet, TD Hoang, and G Hejblum. “A system-atic review of models for forecasting the number of emergencydepartment visits”. In: Emergency Medicine Journal 26.6 (2009),pp. 395–399.

[6] Stephen Marsland. Machine learning: an algorithmic perspective.CRC press, 2015.

[7] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and YoshuaBengio. Deep learning. Vol. 1. MIT press Cambridge, 2016.

[8] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term mem-ory”. In: Neural computation 9.8 (1997), pp. 1735–1780.

[9] Zachary C Lipton, David C Kale, Charles Elkan, and RandallWetzel. “Learning to diagnose with LSTM recurrent neural net-works”. In: arXiv:1511.03677 (2015).

38

BIBLIOGRAPHY 39

[10] Duo Zhang, Geir Lindholm, and Harsha Ratnaweera. “Use longshort-term memory to enhance Internet of Things for combinedsewer overflow monitoring”. In: Journal of Hydrology 556 (2018),pp. 409–418.

[11] Xiaolei Ma, Zhimin Tao, Yinhai Wang, Haiyang Yu, and Yun-peng Wang. “Long short-term memory neural network for traf-fic speed prediction using remote microwave sensor data”. In:Transportation Research Part C: Emerging Technologies 54 (2015),pp. 187–197.

[12] Hye Jin Kam, Jin Ok Sung, and Rae Woong Park. “Prediction ofdaily patient numbers for a regional emergency medical centerusing time series analysis”. In: Healthcare informatics research 16.3(2010), pp. 158–165.

[13] Yan Sun, Bee Hoon Heng, Yian Tay Seow, and Eillyne Seow.“Forecasting daily attendances at an emergency department toaid resource planning”. In: BMC emergency medicine 9.1 (2009),p. 1.

[14] Kry. URL: https://kry.se.

[15] Randolph Hall, David Belson, Pavan Murali, and Maged Dessouky.“Modeling patient flows through the health care system”. In: Pa-tient Flow. Springer, 2013, pp. 3–42.

[16] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and GretaM Ljung. Time series analysis: forecasting and control. John Wiley &Sons, 2015.

[17] Andrew V Metcalfe and Paul SP Cowpertwait. Introductory timeseries with R. 2009.

[18] Holger Kantz and Thomas Schreiber. Nonlinear time series analy-sis. Vol. 7. Cambridge university press, 2004. Chap. 1.

[19] Eric Zivot and Jiahui Wang. Modeling financial time series with S-Plus R©. Vol. 191. Springer Science & Business Media, 2007. Chap. 18.

[20] Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne.“Machine learning strategies for time series forecasting”. In: Eu-ropean Business Intelligence Summer School. Springer. 2012. Chap. 12,pp. 62–77.

40 BIBLIOGRAPHY

[21] Frank Rosenblatt. “The perceptron: a probabilistic model for in-formation storage and organization in the brain.” In: Psychologi-cal review 65.6 (1958), p. 386.

[22] John J Hopfield. “Neural networks and physical systems withemergent collective computational abilities”. In: Proceedings ofthe national academy of sciences 79.8 (1982), pp. 2554–2558.

[23] Dan Tandberg and Clifford Qualls. “Time series forecasts of emer-gency department patient volume, length of stay, and acuity”.In: Annals of emergency medicine 23.2 (1994), pp. 299–306.

[24] Radwan Abdel-Aal and Abdallah Mangoud. “Modeling and fore-casting monthly patient volume at a primary health care clinicusing univariate time-series analysis”. In: Computer Methods andPrograms in Biomedicine 56.3 (1998), pp. 235–247.

[25] Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Wal-ter F Stewart, and Jimeng Sun. “Doctor ai: Predicting clinicalevents via recurrent neural networks”. In: Machine Learning forHealthcare Conference. 2016, pp. 301–318.

[26] Ryan Amirkhan, Mark Hoogendoorn, Mattijs E Numans, andLeon Moons. “Using recurrent neural networks to predict col-orectal cancer among patients”. In: Computational Intelligence (SSCI).IEEE. 2017, pp. 1–8.

[27] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.“Speech recognition with deep recurrent neural networks”. In:Acoustics, speech and signal processing (icassp). IEEE. 2013, pp. 6645–6649.

[28] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to se-quence learning with neural networks”. In: Advances in neuralinformation processing systems. 2014, pp. 3104–3112.

[29] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and YoshuaBengio. “Empirical evaluation of gated recurrent neural networkson sequence modeling”. In: arXiv:1412.3555 (2014).

[30] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. “Learn-ing to forget: Continual prediction with LSTM”. In: (1999).

BIBLIOGRAPHY 41

[31] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. “Apply-ing LSTM to time series predictable through time-window ap-proaches”. In: Neural Nets WIRN Vietri-01. Springer, 2002, pp. 193–200.

[32] Zachary C Lipton, John Berkowitz, and Charles Elkan. “A criti-cal review of recurrent neural networks for sequence learning”.In: arXiv:1506.00019 (2015).

[33] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. “An EmpiricalEvaluation of Generic Convolutional and Recurrent Networksfor Sequence Modeling”. In: arXiv:1803.01271 (2018).

www.kth.se

new model comparison of patient volume prediction in digital health …1215272/... · 2018. 6....

Documents