predictions of pm2.5 and pm10 concentrations using static ... · ilias galanopoulos used specksim...

Predictions of PM2.5 and PM10Concentrations Using Static and

Mobile Sensors

Mantas Miksys

4th Year Project ReportComputer Science

School of InformaticsUniversity of Edinburgh

2016

3

AbstractAir quality data (PM2.5 and PM10 particulate concentrations) were collected in Edin-burgh using a network of stationary (SEPA) air quality monitors and a wearable mobile(AirSpeck) air quality monitor both developed by the Center of Speckled Computing.

Temporal predictions methods were developed using the time series particulate con-centration data, both with and without, meteorological information. Results are pre-sented for the comparison of temporal prediction methods based on ANN, decisiontree, random forest and multiple linear regression models. Spatial prediction mod-els were developed using data from the network of stationary monitors. Results werecompared for spatial prediction methods based on inverse distance weighting, kriging,radial basis functions, random forest and ANN. Both temporal and spatial predictionmodels substantially outperformed the baseline methods used. Finally, the best per-forming temporal and spatial prediction models were combined in a successful exper-iment in spatio-temporal predictions of air quality in the Meadows area in Edinburgh.

This is the first time spatial and temporal predictions have been made using data from anetwork of inexpensive stationary sensors, and spatio-temporal predictions have beenmade using a combination of stationary sensors and validated using mobile sensors.

4

Acknowledgements

First of all, I would like to give special thanks to my supervisor, Prof. D. K. Arvind,for the interesting project proposed and his guidance throughout the project.

Secondly, I am grateful to the people working at the Center of Speckled Computingfor their advice, help with application deployment and sensor installation.

Thirdly, I would like to thank you Aart Meijer who worked on AirSpeck devices pre-viously and provided me with instructions on how to use them.

Table of Contents

1 Introduction 71.1 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Structure of the report . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background 112.1 Particulate matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 SEPA and AirSpeck sensors . . . . . . . . . . . . . . . . . . . . . . 122.3 Previous Work on AirSpeck Sensors . . . . . . . . . . . . . . . . . . 12

2.3.1 Characterisation and calibration of the Personal Exposure Mon-itor (PEM) by Ilias Galanopoulos . . . . . . . . . . . . . . . 12

2.3.2 Sensing Spaces: A Study in Mapping the Public Space fromPersonal Data by Konstantin Kotsev . . . . . . . . . . . . . . 13

2.3.3 Predicting Air Quality using Personal Exposure Monitors onCyclists by Aart Meijer . . . . . . . . . . . . . . . . . . . . . 13

2.3.4 Evaluation of previous work . . . . . . . . . . . . . . . . . . 142.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Temporal PM concentration predictions . . . . . . . . . . . . 142.4.2 Spatial PM concentration predictions . . . . . . . . . . . . . 15

3 Methodology 173.1 AQ data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Meteorological and STL data . . . . . . . . . . . . . . . . . . . . . . 183.3 Data cleaning and analysis . . . . . . . . . . . . . . . . . . . . . . . 193.4 Models for temporal predictions . . . . . . . . . . . . . . . . . . . . 20

3.4.1 Artificial neural networks . . . . . . . . . . . . . . . . . . . 203.4.2 Decision trees and random forests . . . . . . . . . . . . . . . 213.4.3 Multiple linear regression . . . . . . . . . . . . . . . . . . . 223.4.4 Input feature selection . . . . . . . . . . . . . . . . . . . . . 22

3.5 Models for spatial predictions . . . . . . . . . . . . . . . . . . . . . 223.5.1 Inverse distance weighting . . . . . . . . . . . . . . . . . . . 233.5.2 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.5.3 Radial basis functions . . . . . . . . . . . . . . . . . . . . . 23

3.6 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Implementation 254.1 Web application for AQ data analysis . . . . . . . . . . . . . . . . . . 25

5

6 TABLE OF CONTENTS

4.2 Android application for AQ data collection . . . . . . . . . . . . . . . 264.3 Implementation of models . . . . . . . . . . . . . . . . . . . . . . . 28

5 Results and discussion 315.1 Analysis of static datasets . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Temporal predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2.1 ANN models . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2.2 Further optimization of ANN-1 and ANN-2 models . . . . . . 365.2.3 DT models . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.4 RF models . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.5 MLR models . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.6 PER model . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Comparison of temporal prediction methods . . . . . . . . . . . . . . 405.3.1 SEPA datasets . . . . . . . . . . . . . . . . . . . . . . . . . 405.3.2 STL datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4 Spatial predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4.1 IDW model . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.4.2 RBF model . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.4.3 ANN model . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.4.4 RF model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4.5 NN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 Comparison of spatial prediction methods . . . . . . . . . . . . . . . 465.6 Spatio-temporal predictions . . . . . . . . . . . . . . . . . . . . . . . 48

6 Conclusions 51

Bibliography 53

Appendices 57

A Data sources 57

B Model parameters 59

Chapter 1

Introduction

Air pollution is a serious issue as it severely affects people’s health, especially people’swith certain breathing sickness, like asthma [21]. Particulate matter (PM) may be theair pollutant that most commonly affects peoples health [19]. Therefore, PM2.5 andPM10 concentration forecast is an important problem as this information could helppeople with respiratory problems.

In order to have good spatial coverage and make accurate PM concentration predictionsit is necessary to have many air quality (AQ) monitors. However, professional AQmonitors are expensive and therefore, not many of them can be installed in a city.One alternative approach to dealing with this problem could be to use more cheapermonitoring devices. Another possibility could be to use mobile monitoring devicesand move them around the area to increase spatial coverage. We propose to use acombination of these approaches: use low-priced stationary and mobile AQ monitors.SEPA (stationary) and AirSpeck (mobile) sensors were developed by the Centre ofSpeckled Computing.

The main goal of the project is to develop models that could accurately perform tem-poral and spatial PM concentration predictions. It is a difficult task because there aremany different sources of particulate matter, like traffic or atmosphere, and there mightbe rapid changes in concentration levels. Certain health and environment agencies arerequired to provide 24 hours PM10 forecast [24]. These institutions use small numberof expensive sensors for their regional predictions. However, if the proposed approachof using more inexpensive sensors would work well and accurate predictions couldbe performed using their data it could be an alternative to the current monitoring andprediction approaches.

1.1 Main contributions

Air quality data was collected between 28th October and 2nd November 2015 us-ing stationary and mobile air quality sensors. 8 stationary sensors were installed onlampposts in the Meadows area (Edinburgh) and mobile data collection sessions were

7

8 Chapter 1. Introduction

performed by traversing the area covered by the stationary sensors several times a day.

A web application was developed for real-time and historical air quality data analy-sis. It also performed temporal and spatial PM concentration predictions and providedAPI for Android application. AirSpeck Android application for data collection wasextended by adding new functionality for AQ data and predictions monitoring andanalysis.

Air quality data collected was cleaned, analyzed for correlation and diurnal effects andused for development of prediction models. Some temporal prediction models usedmeteorological data which was obtained from an external source others used only timeseries PM concentration data. Apart from the datasets that were collected using Air-Speck and SEPA devices also AQ data for 2015 from a nearby monitoring station (St.Leonard’s) was used for temporal predictions. Machine learning techniques (ANN,decision trees, random forests and multiple linear regression) were used for temporalpredictions. For spatial predictions distance-based (inverse distance weighting), geo-statistical (kriging) and machine learning (radial-basis functions, ANN and randomforests) techniques were used.

Both temporal and spatial prediction models performed significantly better than base-line methods (persistence model for temporal and nearest neighbour for spatial). Arti-ficial neural networks and random forest gave best results for both temporal and spatialpredictions. Moreover, no significant correlation with meteorological data was foundfor any datasets and there was no improvement in models’ performance when meteo-rological input data was added to the models.

The novel contribution of this work is that it is the first time spatial and temporal pre-dictions have been made using data from a network of inexpensive stationary sensors,and spatio-temporal predictions have been made using a combination of stationarysensors and validated using mobile sensors.

1.2 Structure of the report

Chapter 2 presents background information about PM, SEPA and Airspeck devices,previous studies with AirSpecks and related work in the area of temporal and spatialPM concentration predictions.

Chapter 3 explains the methodology used for stationary and mobile AQ data collection.Furthermore, information about meteorological data source and other sources of dataand data cleaning is provided. Models used for temporal and spatial predictions, modeloptimization techniques and evaluation methods used are also presented in this chapter.

Chapter 4 presents how the web application for AQ data analysis was implementedand its functionality. The chapter continues with description of how AirSpeck An-droid application was extended with real-time data monitoring and predictions analy-sis functionality. Moreover, details about how prediction models and experiments wereimplemented are also described in this chapter.

1.2. Structure of the report 9

Chapter 5 presents statistical properties and analysis of the AQ data used in predictions.In addition, prediction results for each of the temporal and spatial models used arepresented, analysed and compared. The chapter also contains details and analysis ofspatio-temporal predictions experiment result.

Chapter 6 summarizes the report, presents conclusions and suggestions for future work.

Chapter 2

Background

2.1 Particulate matter

Particulate matter (PM) is a mixture of small solid particles and liquid droplets, likedust, ash, dirt or smoke [19]. There are various different sources of PM both natural,like volcanoes and water mist, and man-made, like factories and power plants and someare formed during chemical reactions in the atmosphere [13, 19]. PM2.5 (fine particles)includes particles not greater than 2.5µm while PM10 includes particles not greaterthan 10µm. Both of these types of particulate matter may originate from the samesources and they have similar effects on health. The health effects include: allergiceffects, cancer, toxic effects, aggravation of asthma and others [3].

According to EU law legislation, yearly averages of PM2.5 and PM10 concentrationsshould not exceed 25µgm−3 and 40µgm−3, respectively [7]. Furthermore, the dailyPM10 average should not exceed 50µgm−3 for more than 35 times per year [7].

PM is a combination of various chemical compounds, like ammonium or nitrates, andtheir behaviour is significantly affected by atmospheric conditions [28]. Researchersfrom the US investigated how 9 meteorological predictor variables are correlated withdaily PM concentration levels and they found that these meteorological variables canexplain up to 50% of PM2.5 variation. Moreover, correlations with temperature, rel-ative humidity, wind direction differ for individual PM2.5 components, like sulphatesor nitrates and there are regional differences in correlation [23]. Researchers fromSwitzerland investigated what meteorological variables affect PM10 concentrationsfor each season separately. They found that precipitation and wind gust has significantimpact in all seasons while temperature is mostly important in summer and winter.Furthermore, they found that day of week affects PM10 concentration levels (it is highon Thursdays and low on Sundays) [28].

11

12 Chapter 2. Background

Figure 2.1: AirSpeck device used for mobile data collection

2.2 SEPA and AirSpeck sensors

SEPA device was used for static while AirSpeck for mobile AQ measurements. SEPAdevice measured AQ every 5 minutes while the mobile AirSpeck performed measure-ments every 5 seconds. Both SEPA and AirSpeck sensors use Alphasense OPC-N2sensor for measuring PM concentrations [1]. The Optical Particle Monitor measuresPM1, PM2.5 and PM10 particle concentrations diameter of which might range from0.38µm to 17µm. Apart from PM concentrations AirSpeck and SEPA devices also mea-sured temperature and relative humidity. However, we found that relative humidity andtemperature values were in disagreement with the values measured at the nearby JCMBweather monitoring station. Therefore, meteorological data from the monitoring sta-tion was used instead.

2.3 Previous Work on AirSpeck Sensors

2.3.1 Characterisation and calibration of the Personal ExposureMonitor (PEM) by Ilias Galanopoulos

Ilias Galanopoulos used SpeckSim event-based simulator for modelling networks ofmobile AQ sensors to develop a calibration model for the mobile sensor. The authorcompared data obtained using Airspeck mobile sensor with data obtained using moreexpensive AQ sensor and developed a calibration method based on Multi-hop algo-rithm.

2.3. Previous Work on AirSpeck Sensors 13

Figure 2.2: SEPA device used for static data collection

2.3.2 Sensing Spaces: A Study in Mapping the Public Space fromPersonal Data by Konstantin Kotsev

The author developed the initial version of the Android application for mobile AQ datacollection. It was able to connect to Alphasense OPC-N2 sensor, display real-time AQdata, annotate different urban areas and write collected data into a CSV file. KonstantinKotsev collected mobile data from various urban environments in Edinburgh and man-ually annotated them. He also developed model using K-means clustering algorithmfor automatic detection of urban areas from AQ data.

2.3.3 Predicting Air Quality using Personal Exposure Monitors onCyclists by Aart Meijer

Aart Meijer collected mobile air quality data while cycling along certain route aroundthe Meadows in Edinburgh. Furthermore, the author implemented new functionalityto the Android application: automatic and manual uploading of data files to a server(see Figure 4.4). He used ANN and Support Vector Machine (SVM) regression modelsto make spatial and temporal predictions. Instead of using geographic coordinate sys-tem for spatial predictions the author developed a grid system for representing areas.Aart Meijer came to the conclusion that combining ANN and SVM to a single modelimproves its performance.


2.3.4 Evaluation of previous work

All of the previous projects are significantly different from this one. This project hasthe most similarity with A. Meijer’s work as he also worked on temporal and spatialprediction models. However, his approach was completely different because he didnot use stationary SEPA sensors and only used the nearest mobile AQ data for predic-tions. Furthermore, in this study mobile data was collected at walking speeds while hecollected it while cycling. In addition, he only used machine learning methods (ANNand SVM) for both types of predictions while we are going to explore a wider range ofprediction methods.

2.4 Related work

2.4.1 Temporal PM concentration predictions

PM concentration prediction models can be classified in 5 categories: empirical mod-els, fuzzy logic-based systems, simulation models, data-driven statistical models andmodel-driven statistical learning methods [36]. The problem with most types of data-driven statistical models is that they are developed and trained specifically for a certainregion and some model modification might be necessary for models to work well inother areas [42]. However data-driven statistical models can discover nuances in widevariety of data that cannot be discovered using rule-based or fuzzy logic-based sys-tems [36]. Classification and Regression Tree analysis (CART) and artificial neuralnetworks are examples of data-driven statistical models. Data-driven statistical modelapproach was taken in this study.

One approach for PM predictions could be to interpret concentrations as time seriesdata and use lagged variables for forecasting. For instance, such an approach is oftenused in economics for currency value predictions. Alternatively, additional data couldbe used in models which is usually done in PM predictions because of PM correlationwith meteorological data (as discussed in section 2.1). Both of these approaches weretested in the project.

In 2000 Perez et al. presented a neural network model for predicting hourly PM2.5concentration several hours in advance [39]. PM2.5 concentrations, humidity and windvelocity were used as their predictor variables for the model. For their linear perceptronand three layer network an input window of size 24 was used. In order to improvetheir forecast accuracy noise reduction, consideration of meteorological variables andsuperposition in the training sets were used. They found that using meteorological datatogether with PM2.5 series data improves the performance of the model.

In 2005 Ordieres et al. proposed a different model for fine particulate matter forecast-ing. They compared three different topologies of neural networks: Multilayer Percep-tron (MLP), Radial Basis Function (RBF) and Square Multilayer Perceptron (SMLP).Furthermore, results from neural network models were compared with results fromclassical models (persistence and linear regression). Their study has shown that all

2.4. Related work 15

neural networks topologies outperform the classical models. From the three topolo-gies investigated RBF showed up as the model with the most stable predictions andshortest training time [29].

In 2006 Perez et al. presented a neural network model for forecasting the maxima of 24hours PM10 concentration average 24 hours in advance. They compared it with a linearmodel and found that neural network model outperforms the linear one [38]. Sameyear Grivas et al. evaluated neural network models for hourly PM10 concentrationsprediction. They used a genetic algorithm optimization procedure for the selectionof the input variables, however, the best performance (0.89 index of agreement) wasachieved by the neural network model that used the entire set of input variables [26].Moreover, in a different study they compared neural network with multiple regressionmodels for PM10 values. They used wind direction index, relative humidity, windspeed, surface temperature and PM10 values as predictors for their multiple regressionmodels. They found that neural network models outperform regression models [22].

From analysis of related work it seems that artificial neural network and linear regres-sion models promising and should be used in the study. Decision trees and randomforests are used for PM forecast less frequently, however, they are promising models,therefore, they were also used in the project.

2.4.2 Spatial PM concentration predictions

There are 3 categories of spatial interpolation methods (SIMs): non-geostatistical in-terpolation methods, geostatistical interpolation methods and combined methods [30].Many of the interpolation methods are based on quite simple weighted averaging, how-ever, their performance is data specific and it might be difficult to choose and appro-priate method [30].

In 2004 David et al. compared the performance of four weighted average methodsfor spatial O3 and PM10 concentration predictions, namely, spatial averaging, nearestneighbour, inverse distance weighting (IDW) and kriging[25]. They found that themethods performed quite similarly, however, kriging gave the most realistic estimates.Contrary to this, in 2015 Zhang et al. compared IDW, kriging and trend surface andfound that IDW is more accurate spatial interpolation method than the other two[40].

Usually spatial interpolation methods use Euclidean distance, however, it cannot ac-count for complex nonlinear features [34]. A study performed in 2014 suggests in-corporating wind-field data into interpolator as they found that it gives significantimprovement [34]. Another study used Geographic Information System (GIS)-basedspatial interpolation model. They found IDW performed better than nearest neighbourinterpolation method and GIS model performed significantly better than both of them[32].

Inverse distance weighting, radial-basis functions and nearest neighbour are non-geostatisticalSIMs while kriging is a geostatistical one. Those methods were chosen for spatial in-terpolation in the study. However, machine learning methods, such as random forests


(RF) or support vector(SVM) machines, and their combinations with other interpola-tion methods, such as SVM with ordinary kriging, are also used for spatial interpola-tion.

In 2014 Nevtipilova et al. used ANN (multilayer perceptron) for spatial interpolationand its performance was compared with IDW and ordinary kriging interpolators (usingRMSD) [43]. Different configurations of ANN were tested, however, number of hid-den units was the main configuration that was changed. They found that ANN couldbe used for spatial interpolation even though its RMSD was higher than those of IDWand ordinary kriging methods and it took longer to train the ANN model [43].

In 2011 Li et al. compared the performance of 23 spatial interpolation methods, in-cluding machine learning methods, like support vector machine, random forest (RF),and geostatistical methods, like ordinary kriging (OK) [31]. They found that RF isan effective interpolator especially when used in combination with OK or inverse dis-tance squared (IDS) methods as these models performed up to 19% better than theusual methods, such as IDS [31]. Based on these previous work it was decided to useAN and RF for spatial prediction models in this dissertation.

Chapter 3

Methodology

3.1 AQ data collection

The Meadows public park in Edinburgh was used for static and mobile AQ data col-lection. The location was chosen because of its diversity; on the south is a busy roadcalled Melville Drive and on the northern perimeter is a well used cycle path. 8 station-ary sensors were mounted in the park in the locations marked in the map in Figure 3.1.SEPA7, SEPA6, SEPA4 and SEPA 3 were mounted by a busy road whereas SEPA1,SEPA2, SEPA5 and SEPA8 were mounted in the park, away from the street. The routethat was used for mobile AQ data collection covered all of the static sensors and isrepresented by a red line in Figure 3.1. All of the static sensors were mounted on lamp-posts at a height of approximately 2 meters (see Figure 3.2). The distances betweenstationary sensors were relatively small: the largest distance between two sensors was0.6 km. The mobile sensor was attached to a belt and worn as displayed in Figure 3.3.

All the mobile and stationary data (from SEPA1 - 8) were collected during the periodfrom 28th October to 2nd November, 2015, when the stationary sensors were available.During that period 2 - 4 mobile AQ data collection sessions were performed every day,at least one in the mornings and evenings. Mobile data was collected at walking speedwith no stops (as there were no junctions along the route). Each of the sessions lastedapproximately 20 - 60 minutes depending on walking speed and the number of timesthe route has been walked.

During the sessions mobile AQ data was measured every 5 seconds and was stored inCSV file together with GPS information from the phone. Stationary sensors measuredAQ every 5 minutes and uploaded to the server in real-time utilizing the mobile cellularnetwork (they included a SIM card).

17

18 Chapter 3. Methodology

Figure 3.1: Meadow park map. Grey pins represent static sensors and the red linerepresents mobile data collection route

Figure 3.2: SEPA static AQ sensor device mounted on a pole

3.2 Meteorological and STL data

Meteorological data for 2015 was obtained from the GeoSciences Weather Station[20]. Wind speed (m/s), wind direction (degrees - 0 for N), surface temperature (◦C),relative humidity (%) meteorological variables were used in the research.

Since wind direction had a great range of values (0-360 degrees) it might have resultedin either neural network giving very small weight for this input variable or it havinglarge impact on the performance of the model. Therefore, sine of wind direction datawas used as input variable instead to restrict the values to -1 to 1 range.

The datasets obtained using SEPA and AirSpeck devices were for one week and mighthad impact on the models’ performances. Therefore, to test the models on much largerdataset (the whole year of data) AQ data for 2015 from the nearby St. Leonard’s (STL)AQ monitoring station was used in the experiments as well [4].

3.3. Data cleaning and analysis 19

Figure 3.3: AirSpeck device worn as a belt (by the author)

3.3 Data cleaning and analysis

SEPA sensors collected data every 5 minutes, however, it was averaged hourly for thepredictions because hourly forecast was required and too much granularity in inputwould just add unnecessary noise. Some of the SEPA dataset data was missing orinvalid because of malfunctioning of devices. The data was missing from all SEPAdatasets at similar times because the device malfunctioning of the devices was causedby certain weather conditions (dense fog). There were a few extreme outliers with veryhigh PM values (greater than 250µm3) which were removed from the datasets. A fewdata items were missing from STL datasets as well. Missing data items were filled inwith the average of the nearest rows.

To understand the data better for each of the SEPA datasets and the STL dataset Pear-son correlation coefficient was calculated using the formula below:

R =∑

ni=0(xi− x)∑

ni=0(yi− y)√

∑ni=0(xi− x)2

√∑

ni=0(yi− y)2

SEPA1 SEPA2 SEPA3 SEPA4 SEPA5 SEPA6 SEPA7 SEPA8 STL141 101 142 123 100 100 127 45 8726

Table 3.1: Sizes of SEPA and STL datasets used for predictions

From Table 3.1 we see that all of the SEPA datasets are small because the data wasonly collected for a week. Because of the lack of data samples used in predictions wereoverlapping (only differed by one hour) which means that there was some duplicationin samples. SEPA8 dataset was especially small because of malfunctioning of thedevice, therefore, it was not used in the predictions. The size of STL dataset was morethan 62 times larger than the biggest SEPA dataset.


3.4 Models for temporal predictions

In all models the output was 12 hour PM2.5 or PM10 concentration forecast. In orderto check if performance of the models can be improved when meteorological data isadded two types of each of the model were developed. When no meteorological datawas included in model’s input, the input was purely lagged PM2.5 or PM10 concentra-tion values. When meteorological data was included in the input then meteorologicaldata for the forecasted 12 hours was also included as input.

Several different models were used for temporal predictions of PM concentrations:artificial neural networks (ANNs), decision trees (DTs), random forests (RFs), multiplelinear regression (MLR) and persistence model (PER). Each of the models (exceptfor persistence) was optimized to find the optimal parameters which gave the lowestRMSD.

Separate model parameters were obtained for each of the datasets - each of 7 SEPAsensors and STL. Furthermore, models were optimized PM2.5 and PM10 datasets sep-arately because sometimes PM2.5 and PM10 concentrations vary differently as thenature of these types of particulate matter is different.

There were 16 models in total for each of the types of temporal prediction models. Inorder to make it easier to compare the overall performance of the models the results ofall SEPA datasets were averaged in some of the tables.

It is difficult to define what performance of the models we are aiming for because therehave not been any predictions done using these datasets that we could compare against.In addition, different data might have significant impact on the model’s performance,therefore, we cannot compare performance of our models on our dataset with the per-formance of model’s implemented by other researchers on other datasets. However, itis important to know what performance results are we aiming for, thus, it was decidedthat RMSD less than 5 and index of agreement > 0.7 would indicate good performanceof a model.

3.4.1 Artificial neural networks

Parameter tuning is of utmost importance in machine learning because the performanceof models significantly depends on the values of parameters used. If a more sophisti-cated machine learning model (e.g. neural network) is poorly tuned it can be outper-formed by a simpler method (e.g. kNN). There are many parameters that need to beoptimized for the neural network e. g. learning rate, number of hidden layers, numberof hidden layers.

Three of the most popular model tuning methods are coordinate descent, grid searchand random search. When coordinate descent is used neural network parameters areoptimized one by one keeping all other parameters fixed. This method allows to changethe value for a parameter once the optimal value is obtained the value before the nextparameter is optimized. However, it can miss important points in the search space

3.4. Models for temporal predictions 21

because the performance of the model depends on combinations of parameters not in-dependent parameters. For example, a different optimal number of hidden units mightbe obtained after the number of hidden layers is changed. Grid search optimizationmethod solves this problem by testing all possible parameter combinations and explor-ing the entire search space. However, it only works for parameters with discrete valuesand most importantly it is very computationally inefficient because its running time isexponential with regards to the number of parameters. Random search method wherethe parameter values are chosen at random is much faster and it gives good optimalparameters with high probability, however, it still can miss optimal points in the searchspace. Smarter hyperparameter optimization methods include Bayesian optimization,random forests and derivative-free optimization methods (e.g. genetic algorithm andNelder-Mead) [9].

In this study, two different hyperparameter tuning methods were tested, namely Ran-dom search, Bayesian optimization using Tree-based Parzen Estimators.

Differently from stochastic gradient descent (SGD) resilient backpropagation trainingmethod (Rprop) only takes into account the signs of partial derivatives when mak-ing parameter updates and the step size for each parameter does not depend on themagnitude of the gradient [16]. Rprop is a local adaptive learning scheme where theharmful influence of the size of the partial derivative on the weight step is eliminated[17]. RPROPMinus is a variation of Rprop artificial neural network supervised learn-ing heuristic where backtracking is removed.

Long short-term memory (LSTM) is a recurrent neural network architecture which isdesigned to overcome the error back-flow problems [12]. The conventional modelsof Real-Time Recurrent Learning or Back-Propagation Through Time either blow upor vanish error signals. These might lead to oscillating weights, prohibitive amountof time to learn long time lags or not working at all [12]. LSTM architecture cansolve these problems. It is achieved by using gradient-based algorithm which enforcesconstant error flow through internal states of special units.

3.4.2 Decision trees and random forests

Decision tree (DT) is a machine learning method which can be used for both classifica-tion and regression. DT model develops set of result which can be used for predictionthrough repetitive process of splitting [27].

The model represents segmentation of data which is developed by applying a seriesof rules. As a result DT gives a model which can be represented and interpreted aslogic statements [27]. Furthermore, the model is faster than ANN model. However,DTs tend to overfit learning set and as a result often give quite poor results. DT tendto perform worse than ANN for nonlinear data because DT cannot ignore noise aswell as neural networks do. Therefore, DTs less frequently used for time series datapredictions.

Random forest (RF) is an ensemble method which means that it combines severalweaker predictors. Random forest model averages results of several decision trees.


3.4.3 Multiple linear regression

Multiple linear regression is modelling technique to investigate the relationship be-tween dependent variable and multiple independent variables [24]. We can write mul-tiple regression model formula as

Y = β0 +β1x1 + ...+βkxk + ε

where the model has k independent variables, xi are independent variables and ε isstochastic error associated with the regression [24]. To choose optimal coefficients forthe regression least squares algorithm was used.

Multiple linear regression (MLR) models are fast, easy to implement and require rela-tively little hyperparameter optimization e.g. compared with ANN.

3.4.4 Input feature selection

In order to avoid the curse of dimensionality and achieve optimal model performance itis important to reduce choose the most important features. That was the most importantto multiple linear regression and not really important for neural networks. Since neuralnetworks are knwon for being able to neglect noisy input variables.

For models where only lagged PM concentrations were used there were 24 featuresto select from while for models for which meteorological data was used there were72 possible input features. It was not possible to exhaustively try all possible featurecombinations as there would have been 224 or 272 of them. Therefore, three differ-ent feature selections methods were used to find the optimal input variables for MLRmodels.

Variance threshold feature selection works by choosing features which exceed a certainvariance threshold. Univariate feature selection works by selecting k-features whichhave the highest F-test score. Lasso method which works by constructing a linearmodel that select the features by using regularization. Lasso model should also be ableto solve the problem when two or more correlated features are all selected.

3.5 Models for spatial predictions

There is a wide variety of different spatial interpolation methods that can be used forair quality prediction. This study aims to analyse the performance of different spatialinterpolation techniques. Inverse distance weighting, kriging, radial basis functions,nearest neighbour spatial interpolation methods and random forests and artificial neuralnetwork machine learning methods were investigated in this study.

3.5. Models for spatial predictions 23

3.5.1 Inverse distance weighting

Inverse distance weighting (IDW) is a deterministic, gradual, exact interpolation methodwhich is based on mathematical formulas [8]. IDW interpolation explicitly implementsthe assumption that things that are closer are more similar than those further apart aspoints closer to the measured data points receive more weight in the averaging formula[8].

ve =∑

ni=1

1dp

ivi

∑ni=1

1dp

i

where ve the value to be estimated, vi a known value at one of the n data points withdistance di. Also p is the power level which determines the power of the nearer pointsas opposed to those that are further away. When a lower power value is used a smootherinterpolated surface is generated.

On the one hand. IDW interpolation method is simple, fast and widely used inter-polation method. For example, IDW method is used in EPAs AirNow program toproduce real-time ozone maps and air quality forecasts [5]. On the other hand, thisnon-statistical averaging method is constrained by the range of measured data pointsand cannot interpolate any values which would not fall in the observed data range.Moreover, distant areas outside of the sample are flatten to the mean value [5].

3.5.2 Kriging

Kriging is a stochastic geostatistical interpolation method which is gradual, local andnot necessarily exact [5]. Kriging method has some similarities with IDW method as italso computes weights for measured data points when interpolating values for unmea-sured location, however, they are weighted according to spatial covariance values.

Basic form of kriging estimator:

Z ∗ (u)−m(u) =n

∑α=1

λα[Z(ual pha−m(uα)]

where n is the number of data points used for estimation.

Kriging interpolation method helps to compensate for the effects of data clustering byassigning isolated data points more weight than points within a cluster [11].

There are different types of kriging: Simple, Ordinary, kriging with a trend, cokriging,indicator kriging. However, only simple kriging was investigated in this research.

3.5.3 Radial basis functions

Radial basis function method is a simple multivariate interpolator of n-dimensionalscattered data [10]. RBF is an exact interpolator which means that for known givenvalues it always gives the same values again.


The form of RBF model:

f (x) = ∑i(wi ∗φ(|x− ci|))

where φ is basis function, wi are weight coefficients and ci are interpolation centers.Weights are calculated from linear system, interpolation centers coincide with inputgrid and basis functions can be chosen [10]. SciPy library was used for implementa-tion of RBF and the following basis functions were available: multiquadric, inverse,Gaussian, linear, cubic, quintic and thin plate.

3.6 Model evaluation

Four statistics were used to evaluate the performances of different models: absolutebias, root mean square deviation, standard deviation, and index of agreement.

Absolute bias is a measure of differences in averages of measured values set and pre-dicted values set.

AB = P− O

Root mean square deviation (RMSD) is a measure of the differences between measuredand predicted values.

RMSD =

√∑

ni=1 (Oi−Pi)2

n

Standard Deviation (SD) is a measure of variation of a set of data values and closer SDvalues for observed and predicted sets indicate better performance of the model.

SD =

√1n

n

∑i=1

(Xi− X)2

Index of agreement (d) is a measure of the accuracy of model prediction. It ranges from0 to 1 the closer the index of agreement values is to 1, the better the match between thepredicted and observed values.

d = 1− ∑ni=1(Pi−Oi)

2

∑ni=1(|Pi− O|+ |Oi− O|)2

In all of the equations above P stands for predicted dataset and O stands for observeddataset.

Chapter 4

Implementation

Three pieces of implementation were developed: web application, Android applicationand scripts for prediction models, their optimization and experiments.

4.1 Web application for AQ data analysis

A website was developed to provide functionality for analysing real-time and historicalAQ data online and to provide PM concentration data and its forecast as an API themobile application.

AngularJS Javascript framework [2] was used for the front end of the web application.It was chosen for front end development because it is a popular and powerful frame-work with many extensions and libraries needed for the project (e.g. Google Maps orgraphing libraries). The back end of the website was developed in Python using Pyra-mid framework [18]. Pyramid was used for the back end development because backend functionality for collecting mobile data and storing it on the server had alreadybeen provided by the Centre of Speckled Computing. Furthermore, Pyramid frame-work seemed like a good choice because it is a general, simple, fast and minimalisticframework [18].

The web app application has 3 pages for analyzing current, historical and forecastedAQ data. In Current Data page (see Figure 4.1) latest and daily average data table fromall SEPA monitoring stations is displayed. All of the SEPA sensors are shown on themap as pins and coloured in certain colors depending on the latest PM concentrationvalues measured at them. When a pin is selected latest PM concentration data from thatSEPA device is displayed in a window on the map and a graph with hourly averages ofconcentrations for the day is displayed below (as in Figure 4.3). In addition, the usercan also upload mobile data in CSV file in order to display it on the map as circlesof certain colours (depending on PM concentration levels). Each measurement madeby mobile AirSpeck sensors during a session is represented by a single circle on amap. Moreover, there is an extra red pin which can be dragged around to get spatialpredictions at a given location (at which the pin is positioned).

25

26 Chapter 4. Implementation

Figure 4.1: Current Data page for analysing real-time AQ data

In Historical Data page (see Figure 4.2) users can analyze historical data for any SEPAsensor on any available day or month. Date can be chosen from a calendar, sensorsand pollutants can also be chosen from drop-down lists. For the selected sensor andday 24 hour window of the chosen particulates is displayed. If ”Monthly analysis”tab is selected, user can choose a month and get daily PM concentration averages forselected month displayed in the graph. In AQ Forecast page (see Figure 4.3) users cansee 12 hour PM2.5 and PM10 forecast as a graph for any of the SEPA sensors.

The backend API had four functions: first, for getting latest stationary data from SEPAsensors as JSON, second and third, for getting 12 hour temporal and spatial PM2.5and PM10 concentrations predictions for SEPA sensors as JSON, fourth, for uploadingmobile AQ data to the server.

The application could be further improved by adding functionality for historical mobiledata analysis. Another improvement could be to represent spatial predictions as aheatmap for larger area rather than having a single pin at which spatial predictions areperformed. Moreover, spatio-temporal predictions could be included in the applicationwhich would allow user to get 12 hour PM concentration forecast for any locationchosen.

4.2 Android application for AQ data collection

A native Android application was developed for data collection and analysis. Someof the functionality for the application had already been implemented by students whoworked on AirSpeck sensors previously [33, 37]. However, functionality of the previ-ous version of the application was restricted to communication with AirSpeck device,

4.2. Android application for AQ data collection 27

Figure 4.2: Historical Data page for analysing previously collected AQ data

data collection, writing data to file and manual upload of data files to a server witha specified IP address (see Figure 4.4). Moreover, previously all of the functionalitywas implemented in a single file, activity and class. Therefore, the application wasrefactored and restructured, modules were introduced in order make it easier to read,understand and extend the code.

The application was developed using 3 activities (DataCollection, MapData andHistoricalData) and a single service (DataCollectionService). DataCollectionactivity was responsible for establishing connection with the AirSpeck device, show-ing real-time data (that was being collected at the time) and displaying both spatialand temporal predictions. Instead of having a separate activity for each of the previ-ously listed tasks a fragment was created for each of them and they were combined ina ViewPager. As a result, the application became more intuitive to use and easier tonavigate (see Figures 4.5 and 4.6).

In DataCollectionFragment (left in Figure 4.5) connection with AirSpeck devicecan be established and data collection session (when AQ data is written to a CSV file)can be started. Moreover, latest PM concentration data from AirSpeck is displayed inthe fragment together with spatially predicted concentrations which were calculatedusing latest data from stationary sensors and the best performing spatial predictionmethod. In LiveDataFrament (right in Figure 4.5) all of the latest AQ data avail-able is presented, including relative humidity, O3, NO2 concentrations and counts forindividual bins.

In LiveGraphFragment (left in Figure 4.6) all of the PM2.5 and PM10 concentrationdata (from current session) is displayed as a graph (which is automatically updated)so that users can analyse how the concentrations changes over time. The graph wassliding window of 100 data items so that the plot would remained clearly visible evenwhen more data was collected. In PredictionsFragment (right in Figure 4.6) 12 hour


Figure 4.3: AQ forecast page for analysing temporal PM concentration predictions forSEPA sensors

PM concentration forecast for each of the SEPA sensors that was downloaded from theAPI is displayed.

In the previous version of the application where all of the functionality was includedin a single activity, the app needed to be always active during data collection session.Users would have to monitor the screen for a long period of time during data collec-tion or risk losing data if the phone went into sleep mode. Not only was this a baduser experience but it was also a poor design decision because a long task was runningon the UI thread and made the application less responsive. Therefore, data collec-tion (receiving, decoding and storing) functionality was moved to a separate service(DataCollectionService). DataCollection activity establishes connection withAirSpeck device via Bluetooth LE, then passes connection information to the serviceand starts the service. The service would run as a separate thread, collect data (regard-less of activity’s state) and regularly send updates to the activity (if it is still active) forthem to be displayed on the screen in real-time.

MapData activity was used for displaying real-time stationary and mobile PM concen-tration data on a map. As in the web application, stationary sensors were representedby pins and mobile data items by cirles (left in Figure 4.7). The color of each ofthem was red, yellow or green depending on PM level at the location. Furthermore,HistoricalData activity allowed users to choose date for which they wished to seehistorical stationary data and graph it as shown in Figure 4.6.

4.3 Implementation of models

All of the temporal and spatial prediction models were implemented in Python. Itwas chosen because it was planned to integrate best performing models to the server

4.3. Implementation of models 29

Figure 4.4: Previous version of AirSpeck Android application

Figure 4.5: DataCollectionFragment and LiveDataFragment from DataCollection activity

which had previously been developed in Python. For implementation of the modelsSciPy, NumPy, PyBrain [14] and PyKriging [15] libraries were used. To implementhyperparameter optimization for ANN models hyperopt Python library was used [6].

In all the experiments with the models five-fold cross-validation (where 80% of thedataset is used for training and 20% is used for testing) was used given the limitedamount of data.


Figure 4.6: LiveGraphFragment and PredictionsFragment from DataCollection activity

Figure 4.7: MapData and HistoricalData activities

Chapter 5

Results and discussion

5.1 Analysis of static datasets

PM2.5 AVG PM10 AVG PM2.5 SD PM10 SD PM2.5 R PM10 R

SEPA1 7.30 17.44 30.26 289.56 0.55 0.23SEPA2 14.61 25.34 133.94 383.51 0.71 0.60SEPA3 16.26 28.99 334.87 753.53 0.26 0.17SEPA4 22.29 35.61 445.74 709.84 0.57 0.52SEPA5 16.02 25.22 169.37 445.16 0.57 0.49SEPA6 17.34 23.11 214.39 300.44 0.63 0.61SEPA7 22.81 36.71 486.36 753.91 0.68 0.65

STL 6.72 11.28 46.02 76.38 - -

Table 5.1: Statistical properties of SEPA and STL datasets: averages (AVG), standarddeviations (SD), Pearson correlation coefficients (R)

Table 5.1 summarizes the statistical properties of the PM10 and PM2.5 data for theseven stationary sensors (SEPA1-7) and the reference air monitor at St. Leonards(STL) which is part of the Automatic Urban and Rural Network (AURN) managedby Bureau Veritas for the Department of Environment Food & Rural Affairs. Oneobserves that PM2.5 averages are significantly lower than PM10 values for all the sen-sors. SEPA1 dataset has significantly lower averages and SDs of PM2.5 and PM10concentrations than any other SEPA dataset, followed by SEPA2 and SEPA5 datasets.These three SEPA sensors were installed away from the road (as shown in Figure 3.1)while the rest of the SEPA devices were mounted next to the road. Surprisingly, higherPM concentrations and diurnal variations during peak traffic times were not detectedby these sensors.

Table 5.1 shows that that the SD values for SEPA PM10 datasets and PM2.5 datasetsof some sensors are high (over 300). This is also reflected in Figures 5.1 and 5.2 withspikes in the concentration in the datasets. The Pearson correlation coefficient values

31

32 Chapter 5. Results and discussion

Figure 5.1: Hourly PM2.5 concentrations of STL and SEPA datasets during the datacollection period

for the SEPA datasets are greater than 0.5 indicating that they are correlated with theSTL dataset.

Additionally, inspection of Figures 5.1 and 5.2 shows that STL and SEPA datasets havepeaks around the same times although the STL sensor is located at a distance of 0.6 kmaway from the SEPA sensors, demonstrating that PM concentrations are affected notonly by local pollution sources such as vehicular traffic but also due to other factors,such as meteorological conditions.

Pearson correlation coefficients were calculated for the datasets and meteorologicaldata (wind speed (WS), wind direction (WD), surface temperature (ST) and relativehumidity (RH)). For the STL dataset none of the meteorological variables gave eitherhigh positive or low negative correlations. In case of SEPA datasets higher absolutevalues of Pearson correlation coefficients were obtained, however, there was only onepair datasets (SEPA3 PM10 and ST) that gave higher than 0.5 absolute coefficient value(-0.51). See Appendix A for all correlation values.

Autocorrelation was computed on the SEPA and STL datasets. It was observed thatin the case of STL, it reduced below 0.5 in just 4 hours for the PM2.5 dataset and inunder 5 hours for the PM10 dataset, respectively. In case of the SEPA dataset for bothPM2.5 and PM10 concentrations: the autocorrelation for SEPA1, SEPA4 and SEPA5sensors reduced below 0.5 in 3 hours, for SEPA2, SEPA3 and SEPA7 in 4 hours, andfor SEPA6 in 6 hours. This means that PM concentrations for the SEPA sensors arecorrelated with themselves for a short period of time of up to 6 hours.

5.2. Temporal predictions 33

Figure 5.2: Hourly PM10 concentrations of STL and SEPA datasets during the datacollection period

Figure 5.3: Hourly PM2.5 and PM10 concentrations of STL dataset for 2015

5.2 Temporal predictions

5.2.1 ANN models

The Random Search (RS) and Tree of Parzen Estimators (TPE) optimization methodswere used to find the optimal parameters for the models. The following parameterswere optimized :

• input window size (ranging between 1 and 24)

• number of hidden units in each hidden layer (ranging between 1 and 20)

• number of hidden layers (ranging between 0 and 5)

• recurrent network (true or false)


• hidden layer activation function (choice between Sigmoid, Linear, LSTM, MDL-STM or Tanh).

The number of training epochs could also have been included in the optimization ex-periments, however, it was fixed at 25 in the interest of time and an additional ex-periment was included to optimize this parameter afterwards. In case of SEPA modelsoptimization the number of iterations was set at 100, and for STL models at 30 as it wasa significantly larger dataset. RPROPMinus neural network trainer [14] was used forANN models because it performed significantly better than the usual back-propagationtrainer in all of the initial experiments.

RS PM2.5 RS PM10 TPE PM2.5 TPE PM10

SEPA1 3.03 11.17 3.17 11.75SEPA2 6.70 13.70 6.47 14.63SEPA3 11.81 18.67 11.16 18.29SEPA4 8.89 15.84 8.41 15.26SEPA5 7.77 14.50 7.42 14.14SEPA6 7.67 11.87 7.33 10.86SEPA7 7.97 15.75 8.09 15.73

STL 3.97 5.22 3.39 4.82

Table 5.2: Optimization results (RMSD) of ANN-1 models using RS and TPE methods

ANN-1. From Table 5.2 we see that the lowest RMSD was obtained for SEPA1 datasetfor both PM2.5 and PM10. Its PM2.5 RMSD (3.03 and 3.17 for RS and TPE, respec-tively) was significantly lower than RMSD on any other in the SEPA dataset. Thelargest RMSDs were obtained for the SEPA3 datasets: 11.16 and 18.29 RMSDs forPM2.5 and PM10 using TPE optimization. This is explained by the large SDs for theSEPA3 datasets (see Table 5.1); the large variations in the data and that was difficultto forecast. In contrast, in the case of SEPA1 the variation in concentrations and theSDs were the lowest of all and therefore, it was much easier to predict the values witha higher degree of accuracy. The RMSDs for the models on other SEPA datasets werebetter than for SEPA3, ranging between 6.47 and 8.41 for PM2.5 and between 11.17and 15.26 for PM10.

Furthermore, in Table 5.2 one can observe that RS and TPE optimization methodsperformed quite similarly. In most of the cases (except for SEPA1 and SEPA7) TPEwas superior to RS, but not by much, in spite of the fact that it uses a more sophisticatedtechnique to choose model parameters. However, when the number of optimizationiterations is large even RS can give good results.

From Table 5.3 we see that ANN-1 model performed quite well on STL datasets: lowRMSD (3.97 and 4.82 RMSDs for PM2.5 and PM10), AB (absolute) and SD Diffvalues. However, index of agreement (d) was high (0.78) for PM2.5 but not for PM10(only 0.49).

The performance indicators of ANN-1 for SEPA datasets were much worse. FromTable 5.2 we see that RMSD for SEPA1 PM2.5 was low, although higher for all other


AB RMSD SD Diff d

SEPA PM2.5 -0.57 7.54 95.99 0.33SEPA PM10 0.18 14.27 296.91 0.33STL PM2.5 0.13 3.97 24.10 0.78STL PM10 3.10 4.82 30.13 0.49

Table 5.3: ANN-1 models’ performance on SEPA and STL datasets

datasets as was the average RMSD. Index of agreement values were low for SEPAdatasets and even for SEPA1 PM2.5 it was only 0.26 and on none of the datasets dvalue of at least 0.70 was obtained.

ANN-2. For ANN-2 wind speed, wind direction, surface temperature and relativehumidity meteorological variables were used as extra inputs for the neural network.This resulted in a significant increase in the input size since extra 48 input data itemswere added to each sample. Since more input variables were added, the maximumnumber of hidden units range was increased from 20 to 200 to account for that. Apartfrom that, all other optimization parameters and their possible value ranges were leftunchanged.

RS PM2.5 RS PM10 TPE PM2.5 TPE PM10

SEPA1 3.29 11.69 3.04 11.60SEPA2 6.78 14.96 6.71 14.05SEPA3 11.17 18.13 12.05 18.24SEPA4 8.66 15.77 8.49 15.98SEPA5 7.68 14.43 7.77 14.49SEPA6 7.72 10.91 7.91 10.95SEPA7 7.95 15.63 8.06 15.75

STL 3.85 5.82 3.90 5.32

Table 5.4: Optimization results (RMSD) of ANN-1 models using RS and TPE methods

From Table 5.4 we see that for some SEPA datasets, adding meteorological data andincreasing the number of hidden units resulted in improved performance, for othersin worsened. However, the changes were not very significant (less than 1 RMSD inall cases) and in general similar trends for best and worst performing datasets as theANN-1 model were shown. From this it seems that introduction of meteorologicalvariables did not give any noticeable improvement. However, with the larger numberof hidden units and weights to train more training epochs are needed and there mightbe an improvement in performance after experimenting with the number of trainingepochs.

From Table 5.5 we see that the average performance on all SEPA datasets changed verylittle. In case of STL datasets the changes were slightly more significant. There was amarked increase in absolute AB value on PM2.5 dataset (from 0.13 to -4.05), decreasein index of agreement and SD Diff. For PM10 dataset, the AB value changed from


AB RMSD SD Diff d

SEPA PM2.5 -0.45 7.53 99.44 0.30SEPA PM10 -1.26 14.47 301.38 0.32STL PM2.5 -4.05 3.85 10.23 0.60STL PM10 -3.24 5.28 41.15 0.62

Table 5.5: ANN-2 model performance on SEPA and STL datasets

3.10 to -3.24 which means that the average of predicted PM10 values significantlydecreased. Furthermore, there was an increase in d value (up to 0.62), however, it isstill quite low.

One possible explanation of such poor performance of ANN-1 and ANN-2 modelson SEPA would be comparatively smaller datasets used . The number of data itemsin each of the SEPA datasets ranged between 100 to 141. Even though 5-fold cross-validation was used it was still not enough for ANN models to learn the subtle variationin concentrations. We see that the models perform significantly better on STL datasetwhich is much larger and displays regular variation patterns.

LSTM, Sigmoid and Tahnn activation functions were chosen for the ANN-1 and ANN-2 models. Most of optimal models were recurrent and the number of input hours(window size) ranged between 22 and 24. See Appendix B for the parameters chosenfor all the models.

5.2.2 Further optimization of ANN-1 and ANN-2 models

After the models were optimized with automatic hyper-parameter choosing methods(as described in the section 5.2.1). A couple of additional experiments were per-formed in order to optimize the models even further.

In the experiment to choose the optimal number of learning epochs, training epochsranging in the following values were used: 10, 25, 50, 100, 200, and 300, and testedon the models with the optimal hyper-parameters. Surprisingly, increase in the numberof epochs did not result in improved performance for any of the models. For ANN-1 SEPA, changing the number of training epochs did not result in any improvementeither for PM2.5 or for PM10. For ANN- 2 SEPA datasets, there was a small decreasein RMSD when the number of learning epochs was increased to 50 for PM2.5, anddecreased to 10 for PM10. However, these changes were quite small. Generally, whenthe number of training epochs was increased above the default 25 there was decreasein performance. It could be explained as model over-training with large number ofepochs. Similar results were obtained for ANN-2 model on STL datasets. There wasno improvement for PM2.5 (25 learning epochs gave the lowest RMSD) and therewas a small improvement for PM10 dataset when the number of learning epochs wasincreased to 50.

Another experiment was performed by providing more input data to the ANN models.


Instead of making predictions for each of the SEPA sensors independently all 7 sensorswere interpreted as a network. Data from all SEPA sensors was used as input to themodels. As a result, the number of PM concentration input dimensions increased 7times. The maximum number of input dimensions for ANN-1 became 168 and 216for ANN-2. Same params as for ANN-2 were used to optimize the models. However,after optimization there was a small decline in model’s performance.

5.2.3 DT models

There were two parameters that needed to be optimized for the Decision Tree models:input window size and maximum depth of the tree.

• Input window size optimization space was restricted between 1 to 24 hours.

• Maximum depth optimization space was restricted to values between 1 and 30.

Given that there are only 720 (24x30) possible combinations of these parameters, agrid search was performed over the parameter space to find the optimal one.

DT-1. Same as with ANN models the best performance on SEPA datasets was obtainedwith SEPA1 (4.44 and 16.24 RMSD). The performances on all other SEPA datasetswere much worse (range 9.24 - 17.97 for PM2.5, and 21.52 - 31.58 for PM10). More-over, for all datasets RMSDs for PM10 concentrations were much greater than forPM2.5. The difference was particularly significant for SEPA1 (PM2.5: 4.44 RMSD,and PM10:16.24 RMSD).

AB RMSD SD Diff d

SEPA PM2.5 1.09 14.67 88.14 0.47SEPA PM10 0.39 25.6 249.31 0.51STL PM2.5 0.05 4.75 20.33 0.79STL PM10 -0.015 3.35 -0.45 0.88

Table 5.6: DT-1 model performance on SEPA and STL datasets

From Table 5.6 we see that on STL dataset RMSD for PM10 was lower than for PM2.5(3.35 and 4.75 respectively). Furthermore, all of the performance indicator for themodels on both STL datasets were also very good: RMSDs lower than 5 and d valuegreater than 0.70. It was surprising that the difference in SDs was tiny (only -0.45) forSTL PM10 dataset.

Best performance on STL dataset was achieved with 3 input hours and a maximumdepth of 4 for PM2.5. The corresponding values for PM10 were 11 input hours and amaximum depth of 30. In case of SEPA datasets window size was usually large (24)and the maximum depth was very small (1).

DT-2. Another decision tree model was developed with meteorological data in additionto the PM concentration levels (same input sample format as for ANN-2 model).


Similarly as for DT-1 the best SEPA performance was obtained for SEPA1 dataset(4.44 and 16.24). RMSD was slightly lower in case of SEPA2 dataset (9.08 and 21.38)when DT-2 model was used. Performance on other SEPA datasets was slightly betterthan with DT-1, but still much worse than SEPA1 (14.05 - 17.61 and 21.46 - 31.36RMSD for PM2.5 and PM10.)

AB RMSD SD Diff d

SEPA PM2.5 0.32 16.92 47.75 0.33SEPA PM10 1.88 25.97 144.12 0.59STL PM2.5 0.02 4.82 19.78 0.79STL PM10 0.07 6.65 40.60 0.75

Table 5.7: DT-2 model performance on SEPA and STL datasets

In Table 5.7 we observe that when meteorological data was introduced to the model itsperformance on STL PM10 dataset had significantly worsened - RMSD increases al-most twice (to 6.65) and the index of agreement had lowered (to 0.75). Even though theperformance for the STL PM10 worsened, overall DT-2 performance for STL datasetsis not particularly bad. RMSD for PM2.5 is lower than 5 for PM2.5 dataset and dvalue is greater than 0.7 for both datasets. In the case of SEPA dataset. RMSD in-creased slightly (by 2.25 and 0.37), SD Diff decreased almost twice and d decreasedfor PM2.5 and increased for PM10. However, the performance on SEPA datasets re-mained relatively poor (RMSD greater than 10 for most datasets).

Best performance for STL PM2.5 (4.82 RMSD) was achieved with 3 input hours andthe maximum depth of 3. For the STL PM10 dataset lowest RMSD was achieved with15 input hours and maximum depth of 5. While in case of SEPA datasets the optimalwindow size remained high and maximum tree depth remained low.

5.2.4 RF models

AB RMSD SD Diff d

SEPA PM2.5 -0.22 13.34 104.94 0.42SEPA PM10 0.16 24.65 346.90 0.46STL PM2.5 0.03 4.63 21.22 0.80STL PM10 -0.16 3.11 14.15 0.91

Table 5.8: RF-1 model performance on SEPA and STL datasets

As with the DT models, RF models were also optimized using grid search. In additionto parameters such as windows input size and maximum tree depth, the maximumnumber of trees parameter was also optimized (range 1 - 10). Therefore, there were7200 possible parameter combinations in the search space.

RF-1. From Table 5.8 RF-1 models perform slightly better than DT-1 models. Inaddition, there is an increase in SD Diff on SEPA datasets compared to the values


obtained with DT-1 where only decision tree was used. The difference was positivewhich meant that the predicted values had lower SD than the observed ones. Thiscould be explained by the fact that a few DTs were averaged in RF model and thus SDwas lowered.

AB RMSD SD Diff d

SEPA PM2.5 -0.67 13.78 109.38 0.34SEPA PM10 0.35 24.07 373.01 0.51STL PM2.5 0.05 4.67 21.82 0.79STL PM10 -0.38 5.66 34.48 0.81

Table 5.9: RF-2 model performance on SEPA and STL datasets

RF-2. From Table 5.9 we see that introducing meteorological data to RF model hadsimilar impact as it had on DT model. Performance of STL PM10 was worsened andperformance on other datasets was affected insignificantly. However, there was nosignificant decrease in SD Diff which was observed between DT-1 and DT-2 models

5.2.5 MLR models

Multiple linear regression (MLR) models are easy to implement and require relativelylittle hyperparameter optimization, when compared with ANN. For MLR models itwas only necessary to choose appropriate input features. Three methods - Univariate,K-best and Lasso model were used for this task (as described in section 3.4.4).

There were 24 possible input features for MLR-1 models and 72 features for MLR-2models. Univariate selection method with the chosen variance thresholds did not man-age to reduce the number of input features for MLR-1. That was because all of theprevious PM concentration variables had very similar variance and as a result eithernone of them, or all of them were removed with certain thresholds (for all datasets).Features selected with K-best and Lasso methods gave as good performance indica-tors as the full input features set. We were trying to minimize the number of inputfeatures to make the models simpler without losing any performance. In case of theSTL datasets K-best gave the best performance with 15 input features for both PM2.5and PM10, whereas Lasso needed only 5 and 14 input features (for PM2.5 and PM10,respectively) to achieve as good results. In the case of SEPA datasets K-best and Lassoperformed better than the Univariate method, and K-best slightly outperformed Lassomethod. For MLR-2 models input features chosen using K-best method slightly out-performed the others on STL and more significantly on SEPA datasets. See AppendixB for full details about feature selection experiment results.

MLR-1. In Table 5.10 we see that MLR-1 model performed better than DT and RFmodels but worse than ANN models on SEPA datasets. MLR-1 model performedmuch better on SEPA7 than on any other SEPA dataset (RMSD values of 5.81 and11.30, and d values of 0.75 and 0.74). This was surprising because SEPA7 datasetshad large SD which implied that it should be difficult to predict values for this dataset.


AB RMSD SD Diff d

SEPA PM2.5 0.01 11.02 82.99 0.40SEPA PM10 0.10 19.81 215.34 0.43STL PM2.5 -0.01 4.44 19.47 0.81STL PM10 0.01 6.56 42.34 0.75

Table 5.10: MLR-1 model performance on SEPA and STL datasets

AB RMSD SD Diff d

SEPA PM2.5 0.01 11.02 82.99 0.40SEPA PM10 0.15 19.64 215.58 0.44STL PM2.5 -0.13 4.48 19.28 0.81STL PM10 0.20 6.63 42.57 0.75

Table 5.11: MLR-2 model performance on SEPA and STL datasets

On STL dataset good performance was achieved on PM2.5 dataset (low RMSD andhigh d value) but RMSD on PM10 dataset was not low enough even though the indexof agreement was high.

MLR-2. In Table 5.11 we see that introduction of meteorological data to MLR modeldid not have any significant impact on the performance of the models. As in the caseof the MLR-1 models, SEPA7 stood out and gave noticeably better results as in theprevious cases.

5.2.6 PER model

The persistence model (PER) requires no models optimization and simply uses PMconcentration from 24 hours ago for its predictions. This model was included forcomparison to see how much better the developed machine learning models performas compared with this naive PER approach.

5.3 Comparison of temporal prediction methods

5.3.1 SEPA datasets

In Table 5.12 we see that none of the models performed well enough on all SEPAdatasets. Most of the methods performed well on SEPA1 datasets (especially PM2.5)and MLR models performed well on SEPA7 dataset. However, apart from a couple ofexception the models demonstrated poor results on SEPA datasets: RMSDs are highand d values are low.

5.3. Comparison of temporal prediction methods 41

AB RMSD SD Diff d

ANN-1 PM2.5 -0.57 7.54 95.99 0.33ANN-1 PM10 0.18 14.27 296.91 0.33ANN-2 PM2.5 -0.45 7.53 99.44 0.30ANN-2 PM10 -1.26 14.47 301.38 0.32DT-1 PM2.5 1.09 14.67 88.14 0.47DT-1 PM10 0.39 25.60 294.31 0.51DT-2 PM2.5 -0.32 16.92 47.75 0.33DT-2 PM10 1.88 25.97 144.12 0.59RF-1 PM2.5 0.22 13.34 104.94 0.42RF-1 PM10 0.16 24.65 346.90 0.46RF-2 PM2.5 -0.67 13.78 109.38 0.34RF-2 PM10 0.35 24.07 373.01 0.51

MLR-1 PM2.5 0.01 11.02 82.99 0.40MLR-1 PM10 0.10 19.81 215.34 0.43MLR-2 PM2.5 0.01 11.02 82.99 0.40MLR-2 PM10 0.15 19.64 215.58 0.44PER PM2.5 -2.32 19.19 8.88 0.24PER PM10 -0.81 30.38 150.35 0.27

Table 5.12: Comparison of temporal prediction models on SEPA datasets

ANN-1 and ANN-2 achieved the best results on SEPA datasets with relatively lowRMSD values (differ by at most 0.86 (11%) and 1.93 (13%) for PM2.5 and PM10).Unfortunately, index of agreement values were low for ANN-1 and ANN-2 models aswell. The PER model performed the worst of all models which was expected as it isvery naive prediction method. The RMSD values that the method gave are more thantwo times larger than the ones obtained with MLR-2 method.

5.3.2 STL datasets

In Table 5.13 we observe that there are a few prediction models that performed rela-tively well in terms of RMSD: ANNs, DT-1 and RF-1. DT-1 and RF-1 models per-fomed best on PM2.5 dataset and slightly worse on PM10 dataset while ANNs per-formed well on both datasets. ANN models gave the lowest RMSDs while RF modelsgave the highest d values.

As expected PER model performed the worst of all (6.90 and 9.69 RMSDs). Eventhough the performance of the model is quite poor it still performed better on STLdatasets than most of the models on SEPA datasets. This emphasizes the importanceof dataset used for temporal predictions.


AB RMSD SD Diff d

ANN-1 PM2.5 0.13 3.38 24.10 0.78ANN-1 PM10 3.10 4.82 30.13 0.49ANN-2 PM2.5 -4.05 3.85 10.23 0.60ANN-2 PM10 -3.24 4.28 41.15 0.62DT-1 PM2.5 0.05 4.75 20.33 0.79DT-1 PM10 -0.02 3.35 -0.45 0.88DT-2 PM2.5 0.02 4.82 19.78 0.79DT-2 PM10 0.07 6.65 40.60 0.75RF-1 PM2.5 0.04 4.63 21.232 0.80RF-1 PM10 0.16 3.11 14.15 0.90RF-2 PM2.5 0.05 4.67 21.82 0.80RF-2 PM10 -0.38 5.66 34.49 0.81

MLR-1 PM2.5 -0.01 4.44 19.47 0.81MLR-1 PM10 0.01 6.56 42.34 0.75MLR-2 PM2.5 -0.13 4.48 19.28 0.81MLR-2 PM10 0.204 6.63 42.57 0.75PER PM2.5 0.02 6.90 -0.05 0.68PER PM10 -0.01 9.69 0.03 0.62

Table 5.13: Comparison of temporal prediction models on STL datasets

Figure 5.4: ANN-1 model’s performance on STL PM2.5 dataset.

5.4. Spatial predictions 43

Figure 5.5: RF-1 model’s performance on STL PM10 dataset.

From Figures 5.4 and 5.5 we see that in both cases the models fail to accurately predictvalues for large concentrations (too low values are predicted for them) while lowervalues (< 40µgm−3) are predicted quite accurately.

5.4 Spatial predictions

Figure 5.6: All mobile data collected during data collection period

PM2.5 AVG PM2.5 SD PM10 AVG PM10 SD

Mobile dataset 13.08 80.38 26.17 375.79

Table 5.14: Statistical properties of mobile dataset (MOB)


Four datasets were used for spatial predictions: mobile (MOB) and stationary (SEPA)datasets which each included PM2.5 and PM10 concentrations.

The size of the mobile dataset was 8536 data items. From Table 5.14 we see that theaverage and SD of PM10 dataset were significantly larger than those of PM2.5 dataset.

All of the interpolation techniques were tested on the mobile dataset and the hourlyaveraged stationary sensors dataset. During the experiments on stationary data thesensor whose data was being predicted was left-out from the experiment and only thedata from the remaining 6 sensors was used. PM2.5 and PM10 datasets were testedseparately and their hourly average data was used in the experiment. SEPA8 datasetwas not used in spatial predictions as it was significantly smaller than all other SEPAdatasets because of malfunctioning of the device (as described in section 3.3).

MOB and SEPA datasets had slightly different input sample format for machine learn-ing methods because for stationary data the coordinates were fixed and for mobile theywere not. SEPA dataset contained PM concentrations of the 6 other SEPA sensorsand a constant (-1) for the missing sensor. The positions of PM concentrations fromeach SEPA sensor were fixed. Apart from stationary sensor concentrations, also lat-itude and longitude of the mobile data item under test were included to the samplefrom MOB dataset. Additionally, 5-fold cross validation was used when testing spatialinterpolation using machine learning methods.

5.4.1 IDW model

Different power values could be used in the IDW formula. Therefore, an experimentwas performed to choose the optimal power value (p) for IDW. As we can see from Ta-ble 5.15 IDW performs the best when p value is 2 for both PM2.5 and PM10 datasets.It is interesting that the model performed particularly poorly when the power valueswere 1, 3, 5, 9 and for all other values the differences in performance were much lesssignificant.

p value 1 2 3 4 5 6 7 8 9 10

PM2.5 SEPA 11.59 8.46 17.05 8.79 25.30 9.17 10.82 9.38 9.94 9.48PM10 SEPA 18.36 12.98 23.79 14.26 33.46 15.01 16.87 15.34 16.00 15.47PM2.5 MOB 1195.05 10.37 235.29 10.45 27.41 10.44 12.92 10.43 26.54 10.42PM10 MOB 2439.67 18.75 283.32 18.79 54.10 18.76 25.44 18.74 56.09 18.74

Table 5.15: Performance of IDW method with different power values (RMSD)

5.4.2 RBF model

Different functions could be used for interpolation with RBF and an experiment wasperformed to choose the optimal one.

5.4. Spatial predictions 45

Function PM2.5 SEPA PM10 SEPA PM2.5 MOB PM10 MOB

Multiquadric 17.09 25.60 13.95 22.71Inverse 12.37 17.78 9.25 16.70

Gaussian 14.50 21.40 8.67 15.71Linear 19.10 28.40 16.56 25.34Cubic 131.56 192.31 206.19 323.30

Quintic 1809.22 2743.58 58172.37 92340.12Thin plate 36.52 53.72 25.82 37.18

Table 5.16: Performance of different radial-basis functions (RMSD)

As we can see from Table 5.16, that Inverse and Gaussian functions gave the bestresults. Inverse function gave the best fit for PM2.5 and PM10 SEPA datasets whereasGaussian function gave the best results on MOB datasets. Quintic function was notable to fit any of the datasets and gave the worst results. It is interesting that eventhough IDW model performed better on SEPA datasets with all power values, RBFperforms better on MOB dataset with most functions (except for cubic and quintic).

5.4.3 ANN model

The same hyperparameter TPE optimization technique which was used for optimiza-tion of temporal predictions ANN models was used for optimization of spatial interpo-lation models. The hyperparameters that were optimized for the models were:

• number of hidden units in each hidden layer (1 - 20)

• number of hidden layers (0 - 5)

• recurrent network (true or false)

• number of learning epochs (10 - 200)

• trainer (Backprop or RPropMinus).

Hyperparameter optimization gave significant improvement in performance of ANNmodels and the models achieved very good interpolation results with optimal parame-ters. With the default parameters ANN model gave (16.83 and 23.24 RMSDs) on SEPAdatsets whereas after optimization the RMSDs were 5.84 and 5.56 (for PM2.5 andPM10). For SEPA PM2.5 the model which performed the best on SEPA4 and SEPA5(low RMSDs and high d values) and worst on the others while for SEPA PM10 themodel performed exceptionally well on SEPA1, SEPA2, SEPA3 and SEPA4 datasetsand worse on SEPA5, SEPA6 and SEPA7 datasets (high d value but also quite highRMSD). The results on MOB datasets were good as well (2.33 and 6.02 RMSDs).

For all datasets the number of learning epochs chosen by hyperparameter optimizationalgorithm was large (> 100). In addition, all of the models used RPropMinus trainerand none of the models were recurrent.


5.4.4 RF model

The same technique that was used for temporal predictions model optimization wasused for spatial prediction RF models optimization (grid search over the entire param-eter space). No input window size optimization was required in this case and onlynumber of trees (range 1 - 10) and maximum tree depth (range 1 - 30) parameters wereoptimized.

For SEPA PM2.5 maximum tree depth was 29 and the number of trees was 5, for PM10they were 26 and 10, respectively. For MOB PM2.5 optimal maximum tree depth was3 and the optimal number of trees was 5 while for PM10, these values were each one.After optimization the model gave very good results for both SEPA and MOB datasets.

5.4.5 NN model

Nearest neighbour (NN) model was used as a baseline method for comparison. Nooptimizations were required for this method as it simply chooses PM concentrationvalue based on the nearest SEPA sensor.

5.5 Comparison of spatial prediction methods

AB RMSD SD Diff d

IDW PM2.5 0.26 7.98 45.88 0.88IDW PM10 0.23 12.56 72.84 0.88

Kriging PM2.5 0.09 12.85 -61.74 0.82Kriging PM10 0.93 15.97 -128.47 0.85RBF PM2.5 2.47 12.37 100.40 0.78RBF PM10 3.68 17.79 160.82 0.80NN PM2.5 -0.65 13.67 12.03 0.81NN PM10 1.26 19.32 13.35 0.81

ANN PM2.5 -1.15 5.84 156.99 0.93ANN PM10 1.82 5.56 47.99 0.92RF PM2.5 0.49 4.46 66.48 0.96RF PM10 1.42 5.05 122.25 0.97

Table 5.17: Performance of different spatial interpolation methods on SEPA datasets

From Table 5.17 one observes that the machine learning spatial prediction methodssignificantly outperformed all the others. RF method performed slightly better thanANN method, however, both of them gave good results. IDW was the averaging basedmethod that gave the best results (7.98 and 12.56 RMSDs). All other non-ML interpo-lation models gave higher than 10 RMSDs, however, index of agreement was relativelyhigh (at least 0.78) for all models. Only RBF method had high AB value which could

5.5. Comparison of spatial prediction methods 47

be explained by the fact that it is a non-averaging based method used in the experi-ment. NN method gave really low SD Diff values because for the method only theclosest data items from the dataset were used in predictions and now averaging orother manipulation of any kind was performed on them. As expected we see that NNmethod performed the worst, nonetheless, its performance was not very much worsethan those of IDW, kriging and RBF. That was because in most cases the values of allSEPA datasets were highly correlated, similar and close to average.

AB RMSD SD Diff d

IDW PM2.5 7.96 10.08 44.41 0.49IDW PM10 13.41 17.91 123.21 0.48

Kriging PM2.5 8.63 10.50 43.62 0.50Kriging PM10 14.34 18.27 112.52 0.48RBF PM2.5 6.39 9.25 47.39 0.46RBF PM10 11.03 16.70 132.15 0.47NN PM2.5 7.97 10.41 51.30 0.48NN PM10 13.27 18.11 131.30 0.47

ANN PM2.5 0.98 2.33 4.35 0.84ANN PM10 0.20 6.00 36.68 0.20RF PM2.5 -0.19 2.08 4.85 0.62RF PM10 -0.23 6.02 37.71 0.19

Table 5.18: Performance of different spatial interpolation methods on MOB datasets

From Table 5.18 we see that the machine learning methods performed much betterthan all other. Both ANN and RF gave low RMSDs and low index of agreement. Oneof the factors that might have contributed to other methods giving such poor resultswas that PM concentration levels measured by stationary sensors were slightly higherthan the ones measured by the mobile sensors. As a result all of the averaging methodspredicted too high values. This is also reflected in AB values - all of them have highand positive AB values.

There are a few possible reasons why AirSpeck sensors might have measured lowervalues then SEPA ones. Firstly, SEPA sensors were stationary while AirSpeck deviceswere moving. Secondly, AirSpeck devices were mounted at about 1 meter height whileSEPA at around 2 meters height. Thirdly, in SEPA device the Alphasence OPC-N2(PM concentration measuring device) is put in a box while in AirSpeck it is not. Asa result circulation was better for the mobile sensor. Fourthly, AirSpeck sensor mademeasurements at higher frequency than SEPA and this might have affected the PMconcentration levels measured.

From Figures 5.7 and 5.8 we see that lower concentration values are predicted betterthan the higher ones. Moreover, SEPA1 is furthest away from from the perfect-fitline and is close to x-axis which means that too large concentration values were forSEPA1 datasets. That happened because SEPA1 was installed in the least pollutedarea and had the lowest average of all SEPA datasets (see Table 5.1). Data items fromSEPA4 datasets are position closest to the x-axis because SEPA4 datasets had averages


Figure 5.7: RF model performance on SEPA PM2.5 datasets

Figure 5.8: RF model performance on SEPA PM10 datasets

greatly larger than all the others. Same trends were observed for both PM2.5 and PM10datasets.

5.6 Spatio-temporal predictions

An experiment was performed to see how accurately we could make spatio-temporalpredictions. The best temporal predictions model (ANN-1) on SEPA dataset was usedtogether with the best performing spatial predictions model on MOB dataset (ANN).For each item the temporal predictions model forecasted PM concentrations for 12hours for each of the SEPA sensors and then the last hour from each SEPA dataset wasused for spatial interpolation and compared with the value measured by the mobilesensor.

When constructing training and testing datasets for temporal prediction ANN-1 model

5.6. Spatio-temporal predictions 49

all data except for the data sample under test was used for training the model.

AB RMSD SD Diff d

PM2.5 0.01 1.96 3.57 0.77PM10 0.03 5.60 29.37 0.54

Table 5.19: Performance of spatio-temporal predictions model on MOB datasets

From Table 5.19 we see low RMSD can be achieved by combining temporal and spa-tial predictions. We see that very low RMSD and high d value has been achievedin this experiment for both PM2.5 and slightly worse performance was achieved forPM10 dataset. RMSD values for both datasets were lower than the ones where spa-tial predictions were predicted independently of temporal prediction. The reason forthat was because both temporal and combined models were trained for each sampleindividually and consequently larger dataset was used for training.

However, in this project all of the spatial interpolation points were within close radiusto the stationary sensors which made predictions easier.

Chapter 6

Conclusions

The current method for forecasting air quality is to use data from a very small numberof very expensive and high accurate air quality monitoring stations. A lot of effort goesinto developing sophisticated fluid dynamics model for estimating the spatial distribu-tion of air quality for a city-wide basis.

This project is concerned with the use of a network of inexpensive stationary and mo-bile air quality sensors for air borne particulates (PM2.5 and PM10) and using this datato estimate air quality in different parts of the city. Such an approach is radically dif-ferent and is therefore worthy of further investigation and developing new techniquesfor spatial and temporal forecasting.

The analysis is based on real data collected from a network of seven stationary moni-tors deployed for one week in the Meadows area of Edinburgh. This work has adopteda data-driven approach, i.e. there is no chemical or atmospheric models for estimatingthe air quality. Instead machine learning and statistical methods have been applied tothe data and the prediction results have been reported.

Artificial neural network and random forests were found to be the best methods fortemporal PM concentration predictions. ANN-1 model gave 7.54 and 14.57 averageRMSDs for SEPA PM2.5 and PM10 datasets. For STL datasets the best results wereachieved with ANN-1 for PM2.5 dataset (3.38 RMSD) and RF-1 model for PM10datasets (3.11 RMSD). Such results were expected because SDs for STL datasetswere significantly smaller (46.02 and 76.38 for PM2.5 and PM10) than those of SEPAdatasets (259.26 and 519.41) and the number of large spikes for SEPA was greater.

It was confirmed that datasets have an important impact on performance of the mod-els. Performance of all models on STL dataset which was significantly larger thanSEPA was much better. Furthermore, predictions for PM2.5 datasets which had lowervariation and less outliers than PM10 ones were more accurate in most cases.

For spatial predictions it was found that RF and ANN machine learning methods out-perform the traditional geostatistical and distance-based interpolation methods (forboth SEPA and MOB datasets). Interpolation results on both datasets were quite good:optimal RMSDs on SEPA were 4.46 and 5.05 while for MOB they were 2.08 and 6.00.

51

52 Chapter 6. Conclusions

Spatio-temporal predictions can be performed with quite good accuracy (using ANN-1and RF methods) when interpolation is performed within a short radius. When the bestspatial and temporal predictors were combined, then the performance for predictionof spatio-temporal data in the area defined by the sensors improved by 0.14 and 0.42RMSD (for PM2.5 and PM10).

Recommendations for future work would be to perform a longer data collection usingboth AirSpeck and SEPA sensors for better spatial and temporal predictions. The pre-diction methods developed in this work should be applied to larger stationary networksdistributed over longer distances. Other methods worth exploring include hierarchicalBayesian or kernel convolution models for spatial predictions and hidden semi-Markovmodel for temporal predictions [41, 35]. This project has advocated a purely data-driven approach. One can in the future consider hybrid models which include trafficinformation to improve predictions, given that the influence of vehicles on pollution inurban environments.

To improve implementation of both application it would be useful to have the ability toview spatio-temporal predictions on the map and track how these predictions change atcertain locations (e.g. user’s home). Moreover, it would be useful to have functionalitywhich would enable analysis of historical mobile data on a map.

Bibliography

[1] Alphasense OPC-N2 sensor specifications. http://www.alphasense.com/index.php/products/optical-particle-counter/. [Online; accessed 15-March-2016].

[2] AngularJS. https://angularjs.org/. [Online; accessed 15-March-2016].

[3] Australia department of enviroment report on particulate matter. http://www.npi.gov.au/resource/particulate-matter-pm10-and-pm25. [Online; ac-cessed 15-March-2016].

[4] Department for Environment, Food and Rural Affairs. http://uk-air.defra.gov.uk/. [Online; accessed 15-March-2016].

[5] Developing spatially interpolated surfaces and estimating uncertainty. http://www3.epa.gov/airtrends/specialstudies/dsisurfaces.pdf/. [Online;accessed 15-March-2016].

[6] Distributed asynchronous hyperparameter optimization Python library. https://github.com/hyperopt/hyperopt. [Online; accessed 15-March-2016].

[7] EU Air Quality Standards. http://ec.europa.eu/environment/air/quality/standards.htm. [Online; accessed 15-March-2016].

[8] How inverse distance weighted interpolation works. http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works/. [Online; accessed15-March-2016].

[9] How to evaluate machine learning models: hy-perparamter tuning. http://blog.dato.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning/.[Online; accessed 15-March-2016].

[10] Introduction to RBF. http://www.alglib.net/interpolation/introductiontorbfs.php. [Online; accessed 15-March-2016].

[11] Kriging. http://people.ku.edu/˜gbohling/cpe940/Kriging.pdf/. [On-line; accessed 15-March-2016].

[12] Long short-term memory. http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf/. [Online; accessed 15-March-2016].

53

http://www.alphasense.com/index.php/products/optical-particle-counter/

http://www.alphasense.com/index.php/products/optical-particle-counter/

https://angularjs.org/

http://www.npi.gov.au/resource/particulate-matter-pm10-and-pm25

http://www.npi.gov.au/resource/particulate-matter-pm10-and-pm25

http://uk-air.defra.gov.uk/

http://uk-air.defra.gov.uk/

http://www3.epa.gov/airtrends/specialstudies/dsisurfaces.pdf/

http://www3.epa.gov/airtrends/specialstudies/dsisurfaces.pdf/

https://github.com/hyperopt/hyperopt

https://github.com/hyperopt/hyperopt

http://ec.europa.eu/environment/air/quality/standards.htm

http://ec.europa.eu/environment/air/quality/standards.htm

http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works/



http://blog.dato.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning/

http://blog.dato.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning/

http://www.alglib.net/interpolation/introductiontorbfs.php

http://www.alglib.net/interpolation/introductiontorbfs.php

http://people.ku.edu/~gbohling/cpe940/Kriging.pdf/

http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf/

http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf/

54 Bibliography

[13] Particulate matter: little things can cause big problems. http://www.hcdoes.org/airquality/monitoring/pm.htm/.

[14] Pybrain - machine learning library for Python documentation. http://pybrain.org/docs/. [Online; accessed 15-March-2016].

[15] PyKriging - Python kriging toolbox. http://pykriging.com. [Online; ac-cessed 15-March-2016].

[16] Rprop. http://theanets.readthedocs.org/en/v0.5.3/generated/theanets.trainer.Rprop.html/. [Online; accessed 15-March-2016].

[17] TD. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=DCB72FB7182C92B09F5DE1A68C44D204?doi=10.1.1.27.7876&rep=rep1&type=pdf/. [Online; accessed 15-March-2016].

[18] The Pyramid web framework. http://www.pylonsproject.org/. [Online;accessed 15-March-2016].

[19] Understanding particle pollution. http://www3.epa.gov/airtrends/aqtrnd04/pmreport03/pmunderstand_2405.pdf/. [Online; accessed 15-March-2016].

[20] Weather Station Data. http://www.ed.ac.uk/geosciences/weather-station/weather-station-data/. [Online; accessed 15-March-2016].

[21] Why Should You Be Concerned About Air Pollution? http://www3.epa.gov/airquality/peg_caa/concern.html/. [Online; accessed 15-March-2016].

[22] N. Spyrellis A. Chaloukakou, G. Grivas. Neural network and multiple regressionmodels for PM10 prediction in Athens: a comparative assessment.

[23] D. Jacob A. Tai, L. Mickley. Correlation between fine particulate matter (PM2.5)and meteorological variables in the United States: Implications for the sensitivityof PM2.5 to climate change.

[24] N. A. Ramli H. A. Hamid A.Z. UI-Saufie, A. S. Yahya. Comparison between mul-tiple linear regression and feed forward back propagation neural network modelsfor predicting PM10 concentration level based on gaseous and meteorologicalparameters.

[25] S. A. Perlin D. W. Wong, L. Yuan. Comparison of spatial interpolation methodsfor the estimation of air quality data.

[26] A. Chaloulakou G. Grivas. Artificial neural network models for prediction ofPM10 hourly concentrations in the Greater Area of Athens, Greece. AtmosphericEnvironment, 40:1216–1229, 2006.

[27] K.K.W. Yau G.K.F. Tso. Predicting electricity energy consumption: A compari-son of regression analysis, decision tree and neural networks. Energy, 32:1761–1768, 2007.

http://www.hcdoes.org/airquality/monitoring/pm.htm/

http://www.hcdoes.org/airquality/monitoring/pm.htm/

http://pybrain.org/docs/

http://pybrain.org/docs/

http://pykriging.com

http://theanets.readthedocs.org/en/v0.5.3/generated/theanets.trainer.Rprop.html/

http://theanets.readthedocs.org/en/v0.5.3/generated/theanets.trainer.Rprop.html/

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=DCB72FB7182C92B09F5DE1A68C44D204?doi=10.1.1.27.7876&rep=rep1&type=pdf/



http://www.pylonsproject.org/

http://www3.epa.gov/airtrends/aqtrnd04/pmreport03/pmunderstand_2405.pdf/

http://www3.epa.gov/airtrends/aqtrnd04/pmreport03/pmunderstand_2405.pdf/

http://www.ed.ac.uk/geosciences/weather-station/weather-station-data/

http://www.ed.ac.uk/geosciences/weather-station/weather-station-data/

http://www3.epa.gov/airquality/peg_caa/concern.html/

http://www3.epa.gov/airquality/peg_caa/concern.html/

Bibliography 55

[28] J. Keller S. Henne A. S. H. Prevot I. Barmpadimos, C. Hueglin. Influence ofmeteorology on PM10 trends and variability in Switzerland from 1991 to 2008.

[29] R.S. Capuz R. E. Salaraz J. B. Ordieres, E. P. Vergara. Neural network predictionmodel for fine particulate matter (PM2.5) on the US-Mexico border in El Paso(Texas) and Ciudad Juarez (Chihuahua).

[30] A. D. Heap J. Li. Spatial interpolation methods applied in environmental sci-ences: A review. Environmental Modelling and Software, 53:173–189.

[31] A. Potter J. J. Daniell J. Li, A. D. Heap. Application of machine learning methodsto spatial interpolation of environmental variables. Environmental Modelling andSoftware, 26:16471659, 2011.

[32] J. Schwartz F. Laden R. Puett H. H. Suh J.D. Yanosky, C.J. Paciorek. Spatio-temporal modeling of chronic PM10 exposure for the Nurses’ Health Study. At-mospheric Environment, 42:4047–4062, 2008.

[33] Konstantin Kotsev. Sensing spaces: A study in mapping the public space frompersonal data. Bachelor’s Thesis, University of Edinburgh, 2015.

[34] J. Zhou L. Li, J. Gong. Spatial interpolation of fine particulate matter concentra-tions using the shortest wind-field path distance. Journal Plos One, 9, 2014.

[35] Y. Kuang D. He S. Erdal D. Kenski M.Dong, D. Yang. PM2.5 concentration pre-dictiong using hidden semi-Markov moel-based time series data mining. ExpertSystems with Applications, 36 year =:9046–9055.

[36] Y. Kuang D. He S. Erdal D. Kenski M.Dong, D.Yang. PM2.5 concentration pre-diction using hidden semi-Markov model-based times series data mining. ExpertSystems with Applications, 36:9046–9055, 2009.

[37] Aart Meijer. Predicting air quality using personal exposure monitors on cyclists.Master’s Thesis, University of Edinburgh, 2015.

[38] J. Reyes P. Perez. An integrated neural network model PM10 forecasting.

[39] J. Reyesh P. Perez, A. Trier. Prediction of PM2.5 concentrations several hoursin advance using neural networks in Santiago, Chile. Atmospheric Environment,34:1189–1196, 2000.

[40] T. Shen P.Zhang. Comparison of different spatial interpolation methods for at-mospheric pollutant PM2.5 by using GIS and Spearman correlation. Journal ofChemical and Pharmaceutical Research, 7:452–469, 2015.

[41] K. V. Mardia S. K. Sahu. Recent trends in modeling spatio-temporal data. Meet-ing of the Italian Statistical Society on Statistics and the Environment, pages 69–83, 2005.

[42] R. B. Jacko S. Thomas. Model for forecasting expressway PM2.5 concentration- Application of regression and neural network models. The Journal of Air andWaste Management Association, 57:480–488.

56 Bibliography

[43] M. S. Boori1 V. Vozenilek V. Nevtipilova, J. Pastwa. Testing artificial neuralnetwork (ANN) for spatial interpolation. International Journal of Geology andGeosciences, 3:1–9, 2014.

Appendix A

Data sources

Figure A.1: JCMB monitoring station location with respect to SEPA sensors

The distance between JCMB monitoring station and the Meadows where all of thestationary sensors were mounted was approximately 3 km.

57

58 Appendix A. Data sources

WDPM2.5 WDPM10 WSPM2.5 WSPM10 STPM2.5 STPM10 RHPM2.5 RHPM10

SEPA1 -0.02 0.16 -0.17 -0.07 -0.14 -0.20 0.11 -0.02SEPA2 -0.16 0.11 -0.20 -0.15 -0.27 -0.42 0.16 -0.06SEPA3 0.14 0.29 -0.29 -0.27 -0.39 -0.50 0.26 0.20SEPA4 -0.21 -0.07 -0.33 -0.33 -0.29 -0.45 0.34 0.23SEPA5 -0.06 0.20 -0.22 -0.11 -0.30 -0.36 0.15 -0.05SEPA6 -0.25 -0.15 -0.32 -0.33 -0.31 -0.42 0.29 0.19SEPA7 -0.22 -0.08 -0.30 -0.27 -0.29 -0.41 0.37 0.23

STL 0.18 0.02 -0.01 -0.06 0.02 -0.06 0.09 -0.02

Table A.1: Meteorological data correlation with SEPA and STL datasets

Appendix B

Model parameters

B.1 Temporal prediction models

B.1.1 ANN models

Dataset AF RC WS HL HU

SEPA1 PM2.5 LSTM True 23 2 2SEPA1 PM10 Sigmoid True 13 5 14SEPA2 PM2.5 LSTM True 21 5 3SEPA2 PM10 Sigmoid True 6 3 19SEPA3 PM2.5 Sigmoid True 3 2 2SEPA3 PM10 LSTM False 3 3 8SEPA4 PM2.5 Tahn True 24 5 18SEPA4 PM10 Tahn True 24 2 1SEPA5 PM2.5 LSTM False 24 0 -SEPA5 PM10 Sigmoid True 3 3 11SEPA6 PM2.5 Tahn True 24 3 5SEPA6 PM10 Tahn True 24 2 8SEPA7 PM2.5 LSTM False 21 0 -SEPA7 PM10 Sigmoid True 15 3 5STL PM2.5 Linear False 18 1 1STL PM10 Linear False 18 1 1

Table B.1: Parameters of ANN-1 models. Activation function (AF), recurrent (RC), win-dow size (WS), hidden layers (HL), hidden units per layer (HU)

59

60 Appendix B. Model parameters

Dataset AF RC WS HL HU

SEPA1 PM2.5 Sigmoid True 23 4 2SEPA1 PM10 Tahn False 2 3 12SEPA2 PM2.5 Tahn True 21 5 3SEPA2 PM10 Sigmoid False 3 1 18SEPA3 PM2.5 Tahn True 9 3 7SEPA3 PM10 LSTM False 5 1 43SEPA4 PM2.5 Tahn True 24 2 14SEPA4 PM10 LSTM True 23 2 9SEPA5 PM2.5 LSTM True 23 2 9SEPA5 PM10 Tahn True 9 3 7SEPA6 PM2.5 LSTM True 23 2 9SEPA6 PM10 LSTM True 23 2 9SEPA7 PM2.5 LSTM True 23 2 9SEPA7 PM10 Sigmoid True 22 2 10STL PM2.5 LSTM True 22 1 132STL PM10 LSTM True 22 1 132

Table B.2: Parameters of ANN-2 models. Activation function (AF), recurrent (RC), win-dow size (WS), hidden layers (HL), hidden units per layer (HU)

B.1.2 DT models

SEPA1 SEPA2 SEPA3 SEPA4 SEPA5 SEPA6 SEPA7 STL

PM2.5 (23, 1) (18, 2) (2, 1) (15, 1) (24, 2) (24, 1) (24, 1) (3, 4)PM10 (1, 1) (18, 2) (1, 1) (24, 1) (23, 1) (24, 1) (23, 1) (11, 30)

Table B.3: Optimized parameters for DT-1. (window size, maximum DT depth)

SEPA1 SEPA2 SEPA3 SEPA4 SEPA5 SEPA6 SEPA7 STL

PM2.5 (18, 1) (24, 1) (5, 1) (20, 1) (24, 2) (23, 1) (23, 1) (3, 3)PM10 (3, 1) (17, 1) (4, 1) (17, 1) (13, 1) (24, 1) (14, 2) (15, 5)

Table B.4: Optimized parameters for DT-2. (window size, maximum DT depth)

B.1. Temporal prediction models 61

B.1.3 RF models

SEPA1 SEPA2 SEPA3 SEPA4 SEPA5 SEPA6 SEPA7

PM2.5 (19, 1, 3) (21, 29, 10) (3, 1, 9) (23, 1, 8) (24, 2, 8) (24, 2, 8) (24, 1, 8)PM10 (1, 1, 2) (19, 3, 8) (2, 1, 7) (24, 1, 10) (21, 1, 10) (24, 1, 8) (22, 1, 2)

Table B.5: Optimized RF-1 parameters for SEPA. (window size, maximum DT depth,number of trees in forest)

SEPA1 SEPA2 SEPA3 SEPA4 SEPA5 SEPA6 SEPA7

PM2.5 (18, 1, 8) (22, 2, 5) (8, 1, 5) (24, 2, 9) (24, 2, 6) (24, 2, 5) (20, 1, 7)PM10 (3, 13, 10) (13, 25, 9) (5, 25, 8) (24, 1, 7) (12, 1, 9) (24, 1, 9) (13, 2, 8)

Table B.6: Optimized RF-2 parameters for SEPA. (window size, maximum DT depth,number of trees in forest)

STL RF-1 STL RF-2

PM2.5 (16, 4, 9) (3, 3, 3)PM10 (24, 29, 10) (24, 27, 10)

Table B.7: Optimized RF-1 and RF-2 parameters for STL. (window size, maximum DTdepth, number of trees in forest)

B.1.4 MLR models

AB RMSD SD Diff d feat. c.

Var. thr. PM2.5 STL -0.01 4.44 19.14 0.81 24Var. thr PM10 STL 0.01 6.56 43.32 0.75 24

Var. thr. PM2.5 SEPA 0.76 15.04 -9.87 0.37 16Var. thr. PM10 SEPA -1.74 27.54 -150.96 0.41 24

Univariate PM2.5 STL -0.01 4.44 19.31 0.81 15Univariate PM10 STL 0.01 6.56 42.42 0.75 15

Univariate PM2.5 SEPA 0.01 11.02 82.99 0.40 1Univariate PM10 SEPA 0.10 19.81 215.34 0.43 1

Lasso PM2.5 STL -0.02 4.44 19.47 0.81 5Lasso PM10 STL 0.01 6.56 42.37 0.75 18

Lasso PM2.5 SEPA 0.11 12.67 34.90 0.42 5Lasso PM10 SEPA -1.59 25.84 -64.38 0.42 14

Table B.8: MLR-1 feature selection performance

62 Appendix B. Model parameters

AB RMSD SD Diff d feat. c.

Var. thr. PM2.5 STL -0.08 4.49 19.10 0.81 60Var. thr PM10 STL 0.14 6.60 43.25 0.75 48

Var. thr. PM2.5 SEPA -1.70 19.71 -160.05 0.31 28Var. thr. PM10 SEPA -8.30 38.30 -561.52 0.35 36

Univariate PM2.5 STL -0.02 4.43 19.45 0.81 10Univariate PM10 STL 0.01 6.56 42.42 0.75 15

Univariate PM2.5 SEPA 0.01 11.02 82.99 0.40 1Univariate PM10 SEPA 0.15 19.64 215.58 0.44 1

Lasso PM2.5 STL -0.02 4.44 19.47 0.81 5Lasso PM10 STL -0.15 6.61 42.92 0.75 14

Lasso PM2.5 SEPA -0.52 15.35 -17.90 0.34 10Lasso PM10 SEPA -1.90 28.05 -317.89 0.44 27

Table B.9: MLR-2 feature selection performance

B.2 Spatial prediction models

B.2.1 ANN models

Dataset AF RC HL HU LE T

SEPA PM2.5 Linear False 1 9 180 RPropMinusSEPA PM10 Linear False 1 9 180 RPropMinusMOB PM2.5 Sigmoid False 3 3 110 RPropMinusMOB PM10 Tanh False 2 3 120 RPropMinus

Table B.10: Parameters of ANN models. Activation function (AF), recurrent (RC), hiddenlayers (HL), hidden units per layer (HU), learning epochs (LE), trainer (T)

predictions of pm2.5 and pm10 concentrations using static ... · ilias galanopoulos used specksim...

Documents