a dbn-based deep neural network model with multitask...

10
Research Article A DBN-Based Deep Neural Network Model with Multitask Learning for Online Air Quality Prediction Jiangeng Li , 1,2 Xingyang Shao, 1,2 and Rihui Sun 1,2 1 College of Automation, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2 Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China Correspondence should be addressed to Jiangeng Li; [email protected] Received 2 February 2019; Revised 17 April 2019; Accepted 20 May 2019; Published 1 July 2019 Academic Editor: Antonio Visioli Copyright © 2019 Jiangeng Li et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To avoid the adverse effects of severe air pollution on human health, we need accurate real-time air quality prediction. In this paper, for the purpose of improve prediction accuracy of air pollutant concentration, a deep neural network model with multitask learning (MTL-DBN-DNN), pretrained by a deep belief network (DBN), is proposed for forecasting of nonlinear systems and tested on the forecast of air quality time series. MTL-DBN-DNN model can solve several related prediction tasks at the same time by using shared information contained in the training data of different tasks. In the model, DBN is used to learn feature representations. Each unit in the output layer is connected to only a subset of units in the last hidden layer of DBN. Such connection effectively avoids the problem that fully connected networks need to juggle the learning of each task while being trained, so that the trained networks cannot get optimal prediction accuracy for each task. e sliding window is used to take the recent data to dynamically adjust the parameters of the MTL-DBN-DNN model. e MTL-DBN-DNN model is evaluated with a dataset from Microsoſt Research. Comparison with multiple baseline models shows that the proposed MTL-DBN-DNN achieve state-of-art performance on air pollutant concentration forecasting. 1. Introduction Air pollution is becoming increasingly serious. To protect human health and the environment, accurate real-time air quality prediction is sorely needed. ere are nonlinear and complex interactions among variables of air quality prediction data. Artificial neural net- works can be used as a nonlinear system to express complex nonlinear maps, so they have been frequently applied to real- time air quality forecasting (e.g., [1–5]). Deep networks have significantly greater representational power than shallow networks [6]. To solve several difficulties of training deep networks, Hinton et al. proposed a deep belief network (DBN) in [7]. DBN is trained via greedy layer-wise training method and automatically extracts deep hierarchical abstract feature representations of the input data [8, 9]. Deep belief networks can be used for time series forecasting, (e.g., [10–15]). For these reasons, in this paper, the proposed prediction model is based on a deep neural network pretrained by a deep belief network. Multitask learning can improve learning for one task by using the information contained in the training data of other related tasks [16]. Multitask deep neural network has already been applied successfully to solve many real problems, such as multilabel learning [17], compound selectivity prediction [18], traffic flow prediction [19], speech recognition [20], categorical emotion recognition [21], and natural language processing [22]. Collobert and Weston demonstrated that a unified neural network architecture, trained jointly on related tasks, provides more accurate prediction results than a network trained only on a single task [22]. Current air quality prediction studies mainly focus on one kind of air pollutants and perform single task forecast- ing. e most studied problem is the PM 2.5 concentration prediction. However, there are correlations between some air pollutants predicted by us so that there is a certain relevance between different prediction tasks. For example, SO 2 and NO 2 are related, because they may come from the same pollution sources. Studies have showed that sulfate (SO 4 2− ) is a major PM constituent in the atmosphere [23]. And in 2016, Hindawi Journal of Control Science and Engineering Volume 2019, Article ID 5304535, 9 pages https://doi.org/10.1155/2019/5304535

Upload: others

Post on 17-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

Research ArticleA DBN-Based Deep Neural Network Model with MultitaskLearning for Online Air Quality Prediction

Jiangeng Li 12 Xingyang Shao12 and Rihui Sun12

1College of Automation Faculty of Information Technology Beijing University of Technology Beijing 100124 China2Beijing Key Laboratory of Computational Intelligence and Intelligent System Beijing 100124 China

Correspondence should be addressed to Jiangeng Li lijgbjuteducn

Received 2 February 2019 Revised 17 April 2019 Accepted 20 May 2019 Published 1 July 2019

Academic Editor Antonio Visioli

Copyright copy 2019 Jiangeng Li et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

To avoid the adverse effects of severe air pollution on human health we need accurate real-time air quality prediction In this paperfor the purpose of improve prediction accuracy of air pollutant concentration a deep neural networkmodel withmultitask learning(MTL-DBN-DNN) pretrained by a deep belief network (DBN) is proposed for forecasting of nonlinear systems and tested on theforecast of air quality time series MTL-DBN-DNN model can solve several related prediction tasks at the same time by usingshared information contained in the training data of different tasks In the model DBN is used to learn feature representationsEach unit in the output layer is connected to only a subset of units in the last hidden layer of DBN Such connection effectivelyavoids the problem that fully connected networks need to juggle the learning of each task while being trained so that the trainednetworks cannot get optimal prediction accuracy for each task The sliding window is used to take the recent data to dynamicallyadjust the parameters of the MTL-DBN-DNN model The MTL-DBN-DNN model is evaluated with a dataset from MicrosoftResearch Comparison with multiple baseline models shows that the proposed MTL-DBN-DNN achieve state-of-art performanceon air pollutant concentration forecasting

1 Introduction

Air pollution is becoming increasingly serious To protecthuman health and the environment accurate real-time airquality prediction is sorely needed

There are nonlinear and complex interactions amongvariables of air quality prediction data Artificial neural net-works can be used as a nonlinear system to express complexnonlinear maps so they have been frequently applied to real-time air quality forecasting (eg [1ndash5])

Deep networks have significantly greater representationalpower than shallow networks [6] To solve several difficultiesof training deep networks Hinton et al proposed a deepbelief network (DBN) in [7] DBN is trained via greedylayer-wise training method and automatically extracts deephierarchical abstract feature representations of the input data[8 9] Deep belief networks can be used for time seriesforecasting (eg [10ndash15]) For these reasons in this paper theproposed predictionmodel is based on a deep neural networkpretrained by a deep belief network

Multitask learning can improve learning for one task byusing the information contained in the training data of otherrelated tasks [16] Multitask deep neural network has alreadybeen applied successfully to solve many real problems suchas multilabel learning [17] compound selectivity prediction[18] traffic flow prediction [19] speech recognition [20]categorical emotion recognition [21] and natural languageprocessing [22] Collobert and Weston demonstrated thata unified neural network architecture trained jointly onrelated tasks provides more accurate prediction results thana network trained only on a single task [22]

Current air quality prediction studies mainly focus onone kind of air pollutants and perform single task forecast-ing The most studied problem is the PM25 concentrationprediction However there are correlations between some airpollutants predicted by us so that there is a certain relevancebetween different prediction tasks For example SO2 andNO2 are related because they may come from the samepollution sources Studies have showed that sulfate (SO4

2minus) isa major PM constituent in the atmosphere [23] And in 2016

HindawiJournal of Control Science and EngineeringVolume 2019 Article ID 5304535 9 pageshttpsdoiorg10115520195304535

2 Journal of Control Science and Engineering

350

300

250

200

150

100

50

0

CON

C (

gG

-3)

0 100 200 300 400 500 600 700 800 900 1000

hour

Dongcheng Dongsi air pollutant concentration data

0-25 data2 dataS2 data

Figure 1 The observed data from 7 orsquoclock in November 30 2014to 22 orsquoclock in January 10 2015 In the figure time is measuredalong the horizontal axis and the concentrations of three kinds ofair pollutants (PM25 NO2 and SO2) aremeasured along the verticalaxis There are some missing values in data sets Dongcheng Dongsiis a target air-quality-monitor-station selected in this study

a discovery revealed that the aqueous oxidation of SO2 byNO2 under specific atmospheric conditions is key to efficientsulfate formation and the chemical reaction led to the 1952London ldquoKillerrdquo Fog [24] And a study published in the USjournal Science Advances also discovered that fine waterparticles in the air acted as a reactor trapping sulfur dioxide(SO2)molecules and interacting with nitrogen dioxide (NO2)to form sulfate [25] Therefore we can regard the concen-tration forecasting of these three kinds of pollutants (PM25SO2 and NO2) as related tasks Figure 1 shows some of thehistorical monitoring data for the concentrations of the threekinds of pollutants in a target station (Dongcheng Dongsiair-quality-monitor-station) selected in this study The threekinds of pollutants show almost the same concentrationtrend Therefore the concentration forecasting of the threekinds of pollutants can indeed be regarded as related tasks

In this paper based on the powerful representational abil-ity of DBN and the advantage of multitask learning to allowknowledge transfer a deep neural network model with mul-titask learning capabilities (MTL-DBN-DNN) pretrained bya deep belief network (DBN) is proposed for forecasting ofnonlinear systems and tested on the forecast of air qualitytime series DBN is used to learn feature representationsand several related tasks are solved simultaneously by usingshared representations

For multitask learning a deep neural network with localconnections is used in the study Such connection effectivelyavoids the problem that fully connected networks need tojuggle the learning of each task while being trained so that thetrained networks cannot get optimal prediction accuracy foreach task The locally connected architecture can well learnthe commonalities and differences of multiple tasks

In order to get a better prediction of future concentra-tions the sliding window [26 27] is used to take the recentdata to dynamically adjust the parameters of predictionmodel

The rest of the paper is organized as follows Section 2presents the background knowledge of multitask learningdeep belief networks and DBN-DNN and describes DBN-DNN model with multitask learning (MTL-DBN-DNN) InSection 3 the proposed model MTL-DBN-DNN is appliedto the case study of the real-time forecasting of air pollutantconcentration and the results and analysis are shown Finallyin Section 4 the conclusions on the paper are presented

2 Methods

21 MultiTask Learning Multitask learning can improvelearning for one task by using the information contained inthe training data of other related tasks Multitask learninglearns tasks in parallel and ldquowhat is learned for each task canhelp other tasks be learned betterrdquo [16]

Several related problems are solved at the same time byusing a shared representation Related learning tasks canshare the information contained in their input data sets toa certain extent Multitask learning exploits commonalitiesamong different learning tasks Such exploitation allowsknowledge transfer among different learning tasks The dif-ference between the neural network with multitask learningcapabilities and the simple neural network with multipleoutput level lies in the following in multitask case inputfeature vector is made up of the features of each task andhidden layers are shared bymultiple tasks Multitask learningis often adopted when training data is very limited for thetarget task domain [28]

22 Deep Belief Networks and DBN-DNN Deep Belief Net-works (DBNs) [29] are probabilistic generative models andthey are stacked by many layers of Restricted BoltzmannMachines (RBMs) each of which contains a layer of visibleunits and a layer of hidden units DBN can be trained toextract a deep hierarchical representation of the input datausing greedy layer-wise procedures After a layer of RBM hasbeen trained the representations of the previous hidden layerare used as inputs for the next hidden layer A schematicrepresentation of a DBN is shown in Figure 2

A DBN with 119897 hidden layers contains 119897 weight matrices119882(1) 119882(119897) It also contains 119897 + 1 bias vectors 119887(0) 119887(119897)

with 119887(0) providing the biases for the visible layer Theprobability distribution represented by the DBN is given by

119875 (ℎ(119897) ℎ(119897minus1))prop exp (119887(119897)Tℎ(119897) + 119887(119897minus1)Tℎ(119897minus1) + ℎ(119897minus1)T119882(119897)ℎ(119897)) (1)

119875 (ℎ(119896)119894 = 1 | ℎ(119896+1)) = 120590 (119887(119896)119894 +119882(119896+1)119894Tℎ(119896+1))

forall119894 forall119896 isin 1 119897 minus 2 (2)

Journal of Control Science and Engineering 3

DBN

RBM2

RBM1

(2)

(1)

(2)

(1)

(0)

W (2)

W (1)

v

Figure 2 A 2-layer deep belief network that is stacked by twoRBMs contains a lay of visible units and two layers of hidden unitsWhere ℎ(1) and ℎ(2) are the state vectors of the hidden layers 119907 isthe state vector of the visible layer119882(1) and119882(2) are the matrices ofsymmetrical weights 119887(1) and 119887(2) are the bias vector of the hiddenlayers and 119887(0) is the bias vector of the visible layer

119875 (V119894 = 1 | ℎ(1)) = 120590(119887(0)119894 +119882(1)119894 Tℎ(1)) forall119894 (3)

In the case of real-valued visible units substitute

119907 sim 119873(119887(0) +119882(1)119879ℎ(1) 120573minus1) (4)

with 120573 diagonal for tractability [30] 120590(119909) = 1(1 + exp(minus119909))The weights from the trained DBN can be used as the

initialized weights of a DNN [8 30]

ℎ(1) = 120590 (119887(1) + 119907T119882(1)) (5)

ℎ(119897) = 120590(119887(119897)119894 + ℎ(119897minus1)119879119882(119897)) forall119897 isin 2 119898 (6)

and then all of the weights are fine-tuned by applying back-propagation or other discriminative algorithms to improvethe performance of the whole network When DBN is usedto initialize the parameters of a DNN the resulting networkis called DBN-DNN [31]

23 DBN-Based Deep Neural Network Model with MultiTaskLearning (MTL-DBN-DNN) In this section a DBN-basedmultitask deep neural network prediction model is proposedto solve multiple related tasks simultaneously by using sharedinformation contained in the training data of different tasks

DBN-DNN prediction model with multitask learning isconstructed by aDBNand anoutput layerwithmultiple unitsDeep belief network is used to extract better feature represen-tations and several related tasks are solved simultaneously byusing shared representations The sigmoid function is used asthe activation function of the output layer

Each unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN It is assumed that thenumber of related tasks to be processed isN and it is assumedthat the size of the subset (that is the ratio of the numberof nodes in the subset to the number of nodes in the entirelast hidden layer) is 120572 then 1(N-1) gt 120572 gt 1N At the locallyconnected layer each output node has a portion of hiddennodes that are only connected to it and it is assumed that thenumber of nodes in this part is 120573 then 0 lt 120573 lt 1NThere arecommonunitswith a specified quantity between two adjacentsubsets

The MTL-DBN-DNN model is learned with unsuper-vised DBN pretraining followed by backpropagation fine-tuning The architecture of the model MTL-DBN-DNN isshown in Figure 3

Remark First pretraining and fine-tuning ensure that theinformation in the weights comes from modeling the inputdata [32] In other words the network memorizes the infor-mation of the training data via the weights The networkneeds not only to learn the commonalities of multiple tasksbut also to learn the differences of multiple tasks Locallyconnected network allows a subset of hidden units to beunique to one of the tasks and unique units can better modelthe task-specific information Therefore fully connectednetworks do not learn the information contained in thetraining data of multiple tasks better than locally connectednetworks Second fully connected networks need to juggle(ie balance) the learning of each task while being trainedso that the trained networks cannot get optimal predictionaccuracy for each task Based on the above two reasons thelast (fully connected) layer is replaced by a locally connectedlayer and each unit in the output layer is connected to only asubset of units in the previous layer There are common unitswith a specified quantity between two adjacent subsets

Input As long as a feature is statistically relevant to one of thetasks the feature is used as an input variable to the model

When the MTL-DBN-DNNmodel is used for time seriesforecasting the parameters of model can be dynamicallyadjusted according to the recent monitoring data taken by thesliding window to achieve online forecasting

The Setting of the Structures and Parameters The architectureand parameters of the MTL-DBN-DNN can be set accordingto the practical guide for training RBMs in technical report[33]

3 Experiments

31 Data Set In this study we used a data set that wascollected in (Urban Computing Team Microsoft Research)Urban Air project over a period of one year (from 1May 2014to 30 April 2015) [34]There are missing values in the data sothe data was preprocessed in this studyWe chose DongchengDongsi air-quality-monitor-station located in Beijing as atarget station The hourly concentrations of PM25 NO2 andSO2 at the station were predicted 12 hours in advance

4 Journal of Control Science and Engineering

MTL-DBN-DNN

DBN

Input Layer

Output layer

1_stHidden Layer

2_ndHidden Layer

3_rdHidden Layer

4_thHidden Layer

Figure 3 The schematic representation of the DBN-DNNmodel with multitask learning

200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(a)

2500200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(b)

1500

1000

500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(c)

Figure 4Three transport corridors namely southeast branch (a) northwest branch (b) and southwest branch (c) tracked by 24 h backwardtrajectories of air masses in Jing-Jin-Ji area

32 Feature Set According to some research results welet the factors that may be relevant to the concentrationforecasting of three kinds of air pollutants make up a set ofcandidate features

Traffic emission is one of the sources of air pollutantsThe traffic flow onweekdays and weekend is different Duringthe morning peak hours and the afternoon rush hours trafficdensity is notably increased In this paper the hour of day andthe day of week were used to represent the traffic flow datathat is not easy to obtain

Anthropogenic activities that lead to air pollution aredifferent at different times of a yearThe day of year (DAY) [3]was used as a representation of the different times of a yearand it is calculated by

119863119860119884 = cos (2120587 119889119905ℎ119879) (7)

where 119889119905ℎ represents the ordinal number of the day in theyear and T is the number of days in this year

Regional transport of atmospheric pollutants may bean important factor that affects the concentrations of airpollutants Three transport corridors are tracked by 24 hbackward trajectories of air masses in Jing-Jin-Ji area [335] and they are presented in Figure 4 According to thecurrent wind direction and the transport corridors of airmasses we selected a nearby city located in the upwinddirection of Beijing Then we used the monitoring data ofthe concentrations of six kinds of air pollutants from astation located in the city to represent the current pollutantconcentrations of the selected nearby city

Candidate features include meteorological data from thetarget station whose three kinds of air pollutant concen-trations will be predicted (including weather temperaturepressure humidity wind speed and wind direction) andthe concentrations of six kinds of air pollutants at thepresent moment from the target station and the selectednearby city (including PM25 PM10 SO2 NO2 CO andO3) the hour of day the day of week and the day of year

Journal of Control Science and Engineering 5

Table 1 The 21 elements in the candidate feature set

Feature1 The current PM25 concentration of the target station (120583gm3)2 The current PM10 concentration of the target station(120583gm3)3 The current NO2 concentration of the target station (120583gm3)4 The current CO concentration of the target station (mgm3)5 The current O3 concentration of the target station (120583gm3)6 The current SO2 concentration of the target station (120583gm3)7 Weather8 Temperature (∘C)9 Atmospheric pressure (hPa)10 Relative humidity11 Wind speed (ms)12 Wind direction13 The current PM25 concentration of the selected nearby station (120583gm3)14 The current PM10 concentration of the selected nearby station (120583gm3)15 The current NO2 concentration of the selected nearby station (120583gm3)16 The current CO concentration of the selected nearby station (mgm3)17 The current O3 concentration of the selected nearby station (120583gm3)18 The current SO2 concentration of the selected nearby station (120583gm3)19 The day of year20 The day of week21 The hour of day

Weather has 17 different conditions and they are sunnycloudy overcast rainy sprinkle moderate rain heaver rainrain storm thunder storm freezing rain snowy light snowmoderate snow heavy snow foggy sand storm and dusty Allfeature numbers are presented in the Table 1

33 Evaluation Metrics In this study four performanceindicators including Mean absolute error (MAE) root meansquare error (RMSE) and mean absolute percentage error(MAPE) and Accuracy (Acc) [34] were used to assess theperformance of the models They are defined by

119872119860119864 = 1119873119873sum119894=1

1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816 (8)

119877119872119878119864 = radic 1119873119873sum119894=1

(119874119894 minus 119875119894)2 (9)

119872119860119875119864 = 100119873119873sum119894=1

10038161003816100381610038161003816100381610038161003816119874119894 minus 11987511989411987411989410038161003816100381610038161003816100381610038161003816 (10)

119860119888119888 = 1 minus sum119873119894 1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816sum119873119894 119874119894 (11)

where N is the number of time points and Oi and Pi representthe observed and predicted values respectively

34 Experiment Setup There is a new data element arrivingeach hour Each data element together with the featuresthat determine the element constitute a training sample

[119909119905 (1199101119905 1199102119905 1199103119905)] where 1199101119905 1199102119905 and 1199103119905 represent PM25concentration NO2 concentration and SO2 concentrationrespectively119909119905 is a set of features and the set ismade up of thefactors that may be relevant to the concentration forecastingof three kinds of pollutant

Setting the Parameters of Sliding Window (Window Size StepSize Horizon) In the study the concentrations of PM25 NO2and SO2 were predicted 12 hours in advance so horizonwas set to 12 Window size was equal to 1220 that is thesliding window always contained 1220 elements Step sizewas set to 1 After the current concentration was monitoredthe sliding window moved one-step forward the predictionmodel was trained with 1220 training samples correspondingto the elements contained in the sliding window and then thewell-trained model was used to predict the responses of thetarget instances

Selecting Features Relevant to Each Task The experimentalprocedures are as follows

(1) After the continuous variables are discretized for dif-ferent tasks the features were evaluated and sorted accordingto minimal-redundancy-maximal-relevance (mRMR) crite-rion

First the continuous variables were discretized andthe discretized response variable became a class label withnumerical significance In this paper continuous variableswere divided into 20 levels A MI Tool box a mutualinformation package of Adam Pocock was used to evaluatethe importance of the features according to the mRMRcriterion

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

2 Journal of Control Science and Engineering

350

300

250

200

150

100

50

0

CON

C (

gG

-3)

0 100 200 300 400 500 600 700 800 900 1000

hour

Dongcheng Dongsi air pollutant concentration data

0-25 data2 dataS2 data

Figure 1 The observed data from 7 orsquoclock in November 30 2014to 22 orsquoclock in January 10 2015 In the figure time is measuredalong the horizontal axis and the concentrations of three kinds ofair pollutants (PM25 NO2 and SO2) aremeasured along the verticalaxis There are some missing values in data sets Dongcheng Dongsiis a target air-quality-monitor-station selected in this study

a discovery revealed that the aqueous oxidation of SO2 byNO2 under specific atmospheric conditions is key to efficientsulfate formation and the chemical reaction led to the 1952London ldquoKillerrdquo Fog [24] And a study published in the USjournal Science Advances also discovered that fine waterparticles in the air acted as a reactor trapping sulfur dioxide(SO2)molecules and interacting with nitrogen dioxide (NO2)to form sulfate [25] Therefore we can regard the concen-tration forecasting of these three kinds of pollutants (PM25SO2 and NO2) as related tasks Figure 1 shows some of thehistorical monitoring data for the concentrations of the threekinds of pollutants in a target station (Dongcheng Dongsiair-quality-monitor-station) selected in this study The threekinds of pollutants show almost the same concentrationtrend Therefore the concentration forecasting of the threekinds of pollutants can indeed be regarded as related tasks

In this paper based on the powerful representational abil-ity of DBN and the advantage of multitask learning to allowknowledge transfer a deep neural network model with mul-titask learning capabilities (MTL-DBN-DNN) pretrained bya deep belief network (DBN) is proposed for forecasting ofnonlinear systems and tested on the forecast of air qualitytime series DBN is used to learn feature representationsand several related tasks are solved simultaneously by usingshared representations

For multitask learning a deep neural network with localconnections is used in the study Such connection effectivelyavoids the problem that fully connected networks need tojuggle the learning of each task while being trained so that thetrained networks cannot get optimal prediction accuracy foreach task The locally connected architecture can well learnthe commonalities and differences of multiple tasks

In order to get a better prediction of future concentra-tions the sliding window [26 27] is used to take the recentdata to dynamically adjust the parameters of predictionmodel

The rest of the paper is organized as follows Section 2presents the background knowledge of multitask learningdeep belief networks and DBN-DNN and describes DBN-DNN model with multitask learning (MTL-DBN-DNN) InSection 3 the proposed model MTL-DBN-DNN is appliedto the case study of the real-time forecasting of air pollutantconcentration and the results and analysis are shown Finallyin Section 4 the conclusions on the paper are presented

2 Methods

21 MultiTask Learning Multitask learning can improvelearning for one task by using the information contained inthe training data of other related tasks Multitask learninglearns tasks in parallel and ldquowhat is learned for each task canhelp other tasks be learned betterrdquo [16]

Several related problems are solved at the same time byusing a shared representation Related learning tasks canshare the information contained in their input data sets toa certain extent Multitask learning exploits commonalitiesamong different learning tasks Such exploitation allowsknowledge transfer among different learning tasks The dif-ference between the neural network with multitask learningcapabilities and the simple neural network with multipleoutput level lies in the following in multitask case inputfeature vector is made up of the features of each task andhidden layers are shared bymultiple tasks Multitask learningis often adopted when training data is very limited for thetarget task domain [28]

22 Deep Belief Networks and DBN-DNN Deep Belief Net-works (DBNs) [29] are probabilistic generative models andthey are stacked by many layers of Restricted BoltzmannMachines (RBMs) each of which contains a layer of visibleunits and a layer of hidden units DBN can be trained toextract a deep hierarchical representation of the input datausing greedy layer-wise procedures After a layer of RBM hasbeen trained the representations of the previous hidden layerare used as inputs for the next hidden layer A schematicrepresentation of a DBN is shown in Figure 2

A DBN with 119897 hidden layers contains 119897 weight matrices119882(1) 119882(119897) It also contains 119897 + 1 bias vectors 119887(0) 119887(119897)

with 119887(0) providing the biases for the visible layer Theprobability distribution represented by the DBN is given by

119875 (ℎ(119897) ℎ(119897minus1))prop exp (119887(119897)Tℎ(119897) + 119887(119897minus1)Tℎ(119897minus1) + ℎ(119897minus1)T119882(119897)ℎ(119897)) (1)

119875 (ℎ(119896)119894 = 1 | ℎ(119896+1)) = 120590 (119887(119896)119894 +119882(119896+1)119894Tℎ(119896+1))

forall119894 forall119896 isin 1 119897 minus 2 (2)

Journal of Control Science and Engineering 3

DBN

RBM2

RBM1

(2)

(1)

(2)

(1)

(0)

W (2)

W (1)

v

Figure 2 A 2-layer deep belief network that is stacked by twoRBMs contains a lay of visible units and two layers of hidden unitsWhere ℎ(1) and ℎ(2) are the state vectors of the hidden layers 119907 isthe state vector of the visible layer119882(1) and119882(2) are the matrices ofsymmetrical weights 119887(1) and 119887(2) are the bias vector of the hiddenlayers and 119887(0) is the bias vector of the visible layer

119875 (V119894 = 1 | ℎ(1)) = 120590(119887(0)119894 +119882(1)119894 Tℎ(1)) forall119894 (3)

In the case of real-valued visible units substitute

119907 sim 119873(119887(0) +119882(1)119879ℎ(1) 120573minus1) (4)

with 120573 diagonal for tractability [30] 120590(119909) = 1(1 + exp(minus119909))The weights from the trained DBN can be used as the

initialized weights of a DNN [8 30]

ℎ(1) = 120590 (119887(1) + 119907T119882(1)) (5)

ℎ(119897) = 120590(119887(119897)119894 + ℎ(119897minus1)119879119882(119897)) forall119897 isin 2 119898 (6)

and then all of the weights are fine-tuned by applying back-propagation or other discriminative algorithms to improvethe performance of the whole network When DBN is usedto initialize the parameters of a DNN the resulting networkis called DBN-DNN [31]

23 DBN-Based Deep Neural Network Model with MultiTaskLearning (MTL-DBN-DNN) In this section a DBN-basedmultitask deep neural network prediction model is proposedto solve multiple related tasks simultaneously by using sharedinformation contained in the training data of different tasks

DBN-DNN prediction model with multitask learning isconstructed by aDBNand anoutput layerwithmultiple unitsDeep belief network is used to extract better feature represen-tations and several related tasks are solved simultaneously byusing shared representations The sigmoid function is used asthe activation function of the output layer

Each unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN It is assumed that thenumber of related tasks to be processed isN and it is assumedthat the size of the subset (that is the ratio of the numberof nodes in the subset to the number of nodes in the entirelast hidden layer) is 120572 then 1(N-1) gt 120572 gt 1N At the locallyconnected layer each output node has a portion of hiddennodes that are only connected to it and it is assumed that thenumber of nodes in this part is 120573 then 0 lt 120573 lt 1NThere arecommonunitswith a specified quantity between two adjacentsubsets

The MTL-DBN-DNN model is learned with unsuper-vised DBN pretraining followed by backpropagation fine-tuning The architecture of the model MTL-DBN-DNN isshown in Figure 3

Remark First pretraining and fine-tuning ensure that theinformation in the weights comes from modeling the inputdata [32] In other words the network memorizes the infor-mation of the training data via the weights The networkneeds not only to learn the commonalities of multiple tasksbut also to learn the differences of multiple tasks Locallyconnected network allows a subset of hidden units to beunique to one of the tasks and unique units can better modelthe task-specific information Therefore fully connectednetworks do not learn the information contained in thetraining data of multiple tasks better than locally connectednetworks Second fully connected networks need to juggle(ie balance) the learning of each task while being trainedso that the trained networks cannot get optimal predictionaccuracy for each task Based on the above two reasons thelast (fully connected) layer is replaced by a locally connectedlayer and each unit in the output layer is connected to only asubset of units in the previous layer There are common unitswith a specified quantity between two adjacent subsets

Input As long as a feature is statistically relevant to one of thetasks the feature is used as an input variable to the model

When the MTL-DBN-DNNmodel is used for time seriesforecasting the parameters of model can be dynamicallyadjusted according to the recent monitoring data taken by thesliding window to achieve online forecasting

The Setting of the Structures and Parameters The architectureand parameters of the MTL-DBN-DNN can be set accordingto the practical guide for training RBMs in technical report[33]

3 Experiments

31 Data Set In this study we used a data set that wascollected in (Urban Computing Team Microsoft Research)Urban Air project over a period of one year (from 1May 2014to 30 April 2015) [34]There are missing values in the data sothe data was preprocessed in this studyWe chose DongchengDongsi air-quality-monitor-station located in Beijing as atarget station The hourly concentrations of PM25 NO2 andSO2 at the station were predicted 12 hours in advance

4 Journal of Control Science and Engineering

MTL-DBN-DNN

DBN

Input Layer

Output layer

1_stHidden Layer

2_ndHidden Layer

3_rdHidden Layer

4_thHidden Layer

Figure 3 The schematic representation of the DBN-DNNmodel with multitask learning

200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(a)

2500200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(b)

1500

1000

500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(c)

Figure 4Three transport corridors namely southeast branch (a) northwest branch (b) and southwest branch (c) tracked by 24 h backwardtrajectories of air masses in Jing-Jin-Ji area

32 Feature Set According to some research results welet the factors that may be relevant to the concentrationforecasting of three kinds of air pollutants make up a set ofcandidate features

Traffic emission is one of the sources of air pollutantsThe traffic flow onweekdays and weekend is different Duringthe morning peak hours and the afternoon rush hours trafficdensity is notably increased In this paper the hour of day andthe day of week were used to represent the traffic flow datathat is not easy to obtain

Anthropogenic activities that lead to air pollution aredifferent at different times of a yearThe day of year (DAY) [3]was used as a representation of the different times of a yearand it is calculated by

119863119860119884 = cos (2120587 119889119905ℎ119879) (7)

where 119889119905ℎ represents the ordinal number of the day in theyear and T is the number of days in this year

Regional transport of atmospheric pollutants may bean important factor that affects the concentrations of airpollutants Three transport corridors are tracked by 24 hbackward trajectories of air masses in Jing-Jin-Ji area [335] and they are presented in Figure 4 According to thecurrent wind direction and the transport corridors of airmasses we selected a nearby city located in the upwinddirection of Beijing Then we used the monitoring data ofthe concentrations of six kinds of air pollutants from astation located in the city to represent the current pollutantconcentrations of the selected nearby city

Candidate features include meteorological data from thetarget station whose three kinds of air pollutant concen-trations will be predicted (including weather temperaturepressure humidity wind speed and wind direction) andthe concentrations of six kinds of air pollutants at thepresent moment from the target station and the selectednearby city (including PM25 PM10 SO2 NO2 CO andO3) the hour of day the day of week and the day of year

Journal of Control Science and Engineering 5

Table 1 The 21 elements in the candidate feature set

Feature1 The current PM25 concentration of the target station (120583gm3)2 The current PM10 concentration of the target station(120583gm3)3 The current NO2 concentration of the target station (120583gm3)4 The current CO concentration of the target station (mgm3)5 The current O3 concentration of the target station (120583gm3)6 The current SO2 concentration of the target station (120583gm3)7 Weather8 Temperature (∘C)9 Atmospheric pressure (hPa)10 Relative humidity11 Wind speed (ms)12 Wind direction13 The current PM25 concentration of the selected nearby station (120583gm3)14 The current PM10 concentration of the selected nearby station (120583gm3)15 The current NO2 concentration of the selected nearby station (120583gm3)16 The current CO concentration of the selected nearby station (mgm3)17 The current O3 concentration of the selected nearby station (120583gm3)18 The current SO2 concentration of the selected nearby station (120583gm3)19 The day of year20 The day of week21 The hour of day

Weather has 17 different conditions and they are sunnycloudy overcast rainy sprinkle moderate rain heaver rainrain storm thunder storm freezing rain snowy light snowmoderate snow heavy snow foggy sand storm and dusty Allfeature numbers are presented in the Table 1

33 Evaluation Metrics In this study four performanceindicators including Mean absolute error (MAE) root meansquare error (RMSE) and mean absolute percentage error(MAPE) and Accuracy (Acc) [34] were used to assess theperformance of the models They are defined by

119872119860119864 = 1119873119873sum119894=1

1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816 (8)

119877119872119878119864 = radic 1119873119873sum119894=1

(119874119894 minus 119875119894)2 (9)

119872119860119875119864 = 100119873119873sum119894=1

10038161003816100381610038161003816100381610038161003816119874119894 minus 11987511989411987411989410038161003816100381610038161003816100381610038161003816 (10)

119860119888119888 = 1 minus sum119873119894 1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816sum119873119894 119874119894 (11)

where N is the number of time points and Oi and Pi representthe observed and predicted values respectively

34 Experiment Setup There is a new data element arrivingeach hour Each data element together with the featuresthat determine the element constitute a training sample

[119909119905 (1199101119905 1199102119905 1199103119905)] where 1199101119905 1199102119905 and 1199103119905 represent PM25concentration NO2 concentration and SO2 concentrationrespectively119909119905 is a set of features and the set ismade up of thefactors that may be relevant to the concentration forecastingof three kinds of pollutant

Setting the Parameters of Sliding Window (Window Size StepSize Horizon) In the study the concentrations of PM25 NO2and SO2 were predicted 12 hours in advance so horizonwas set to 12 Window size was equal to 1220 that is thesliding window always contained 1220 elements Step sizewas set to 1 After the current concentration was monitoredthe sliding window moved one-step forward the predictionmodel was trained with 1220 training samples correspondingto the elements contained in the sliding window and then thewell-trained model was used to predict the responses of thetarget instances

Selecting Features Relevant to Each Task The experimentalprocedures are as follows

(1) After the continuous variables are discretized for dif-ferent tasks the features were evaluated and sorted accordingto minimal-redundancy-maximal-relevance (mRMR) crite-rion

First the continuous variables were discretized andthe discretized response variable became a class label withnumerical significance In this paper continuous variableswere divided into 20 levels A MI Tool box a mutualinformation package of Adam Pocock was used to evaluatethe importance of the features according to the mRMRcriterion

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

Journal of Control Science and Engineering 3

DBN

RBM2

RBM1

(2)

(1)

(2)

(1)

(0)

W (2)

W (1)

v

Figure 2 A 2-layer deep belief network that is stacked by twoRBMs contains a lay of visible units and two layers of hidden unitsWhere ℎ(1) and ℎ(2) are the state vectors of the hidden layers 119907 isthe state vector of the visible layer119882(1) and119882(2) are the matrices ofsymmetrical weights 119887(1) and 119887(2) are the bias vector of the hiddenlayers and 119887(0) is the bias vector of the visible layer

119875 (V119894 = 1 | ℎ(1)) = 120590(119887(0)119894 +119882(1)119894 Tℎ(1)) forall119894 (3)

In the case of real-valued visible units substitute

119907 sim 119873(119887(0) +119882(1)119879ℎ(1) 120573minus1) (4)

with 120573 diagonal for tractability [30] 120590(119909) = 1(1 + exp(minus119909))The weights from the trained DBN can be used as the

initialized weights of a DNN [8 30]

ℎ(1) = 120590 (119887(1) + 119907T119882(1)) (5)

ℎ(119897) = 120590(119887(119897)119894 + ℎ(119897minus1)119879119882(119897)) forall119897 isin 2 119898 (6)

and then all of the weights are fine-tuned by applying back-propagation or other discriminative algorithms to improvethe performance of the whole network When DBN is usedto initialize the parameters of a DNN the resulting networkis called DBN-DNN [31]

23 DBN-Based Deep Neural Network Model with MultiTaskLearning (MTL-DBN-DNN) In this section a DBN-basedmultitask deep neural network prediction model is proposedto solve multiple related tasks simultaneously by using sharedinformation contained in the training data of different tasks

DBN-DNN prediction model with multitask learning isconstructed by aDBNand anoutput layerwithmultiple unitsDeep belief network is used to extract better feature represen-tations and several related tasks are solved simultaneously byusing shared representations The sigmoid function is used asthe activation function of the output layer

Each unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN It is assumed that thenumber of related tasks to be processed isN and it is assumedthat the size of the subset (that is the ratio of the numberof nodes in the subset to the number of nodes in the entirelast hidden layer) is 120572 then 1(N-1) gt 120572 gt 1N At the locallyconnected layer each output node has a portion of hiddennodes that are only connected to it and it is assumed that thenumber of nodes in this part is 120573 then 0 lt 120573 lt 1NThere arecommonunitswith a specified quantity between two adjacentsubsets

The MTL-DBN-DNN model is learned with unsuper-vised DBN pretraining followed by backpropagation fine-tuning The architecture of the model MTL-DBN-DNN isshown in Figure 3

Remark First pretraining and fine-tuning ensure that theinformation in the weights comes from modeling the inputdata [32] In other words the network memorizes the infor-mation of the training data via the weights The networkneeds not only to learn the commonalities of multiple tasksbut also to learn the differences of multiple tasks Locallyconnected network allows a subset of hidden units to beunique to one of the tasks and unique units can better modelthe task-specific information Therefore fully connectednetworks do not learn the information contained in thetraining data of multiple tasks better than locally connectednetworks Second fully connected networks need to juggle(ie balance) the learning of each task while being trainedso that the trained networks cannot get optimal predictionaccuracy for each task Based on the above two reasons thelast (fully connected) layer is replaced by a locally connectedlayer and each unit in the output layer is connected to only asubset of units in the previous layer There are common unitswith a specified quantity between two adjacent subsets

Input As long as a feature is statistically relevant to one of thetasks the feature is used as an input variable to the model

When the MTL-DBN-DNNmodel is used for time seriesforecasting the parameters of model can be dynamicallyadjusted according to the recent monitoring data taken by thesliding window to achieve online forecasting

The Setting of the Structures and Parameters The architectureand parameters of the MTL-DBN-DNN can be set accordingto the practical guide for training RBMs in technical report[33]

3 Experiments

31 Data Set In this study we used a data set that wascollected in (Urban Computing Team Microsoft Research)Urban Air project over a period of one year (from 1May 2014to 30 April 2015) [34]There are missing values in the data sothe data was preprocessed in this studyWe chose DongchengDongsi air-quality-monitor-station located in Beijing as atarget station The hourly concentrations of PM25 NO2 andSO2 at the station were predicted 12 hours in advance

4 Journal of Control Science and Engineering

MTL-DBN-DNN

DBN

Input Layer

Output layer

1_stHidden Layer

2_ndHidden Layer

3_rdHidden Layer

4_thHidden Layer

Figure 3 The schematic representation of the DBN-DNNmodel with multitask learning

200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(a)

2500200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(b)

1500

1000

500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(c)

Figure 4Three transport corridors namely southeast branch (a) northwest branch (b) and southwest branch (c) tracked by 24 h backwardtrajectories of air masses in Jing-Jin-Ji area

32 Feature Set According to some research results welet the factors that may be relevant to the concentrationforecasting of three kinds of air pollutants make up a set ofcandidate features

Traffic emission is one of the sources of air pollutantsThe traffic flow onweekdays and weekend is different Duringthe morning peak hours and the afternoon rush hours trafficdensity is notably increased In this paper the hour of day andthe day of week were used to represent the traffic flow datathat is not easy to obtain

Anthropogenic activities that lead to air pollution aredifferent at different times of a yearThe day of year (DAY) [3]was used as a representation of the different times of a yearand it is calculated by

119863119860119884 = cos (2120587 119889119905ℎ119879) (7)

where 119889119905ℎ represents the ordinal number of the day in theyear and T is the number of days in this year

Regional transport of atmospheric pollutants may bean important factor that affects the concentrations of airpollutants Three transport corridors are tracked by 24 hbackward trajectories of air masses in Jing-Jin-Ji area [335] and they are presented in Figure 4 According to thecurrent wind direction and the transport corridors of airmasses we selected a nearby city located in the upwinddirection of Beijing Then we used the monitoring data ofthe concentrations of six kinds of air pollutants from astation located in the city to represent the current pollutantconcentrations of the selected nearby city

Candidate features include meteorological data from thetarget station whose three kinds of air pollutant concen-trations will be predicted (including weather temperaturepressure humidity wind speed and wind direction) andthe concentrations of six kinds of air pollutants at thepresent moment from the target station and the selectednearby city (including PM25 PM10 SO2 NO2 CO andO3) the hour of day the day of week and the day of year

Journal of Control Science and Engineering 5

Table 1 The 21 elements in the candidate feature set

Feature1 The current PM25 concentration of the target station (120583gm3)2 The current PM10 concentration of the target station(120583gm3)3 The current NO2 concentration of the target station (120583gm3)4 The current CO concentration of the target station (mgm3)5 The current O3 concentration of the target station (120583gm3)6 The current SO2 concentration of the target station (120583gm3)7 Weather8 Temperature (∘C)9 Atmospheric pressure (hPa)10 Relative humidity11 Wind speed (ms)12 Wind direction13 The current PM25 concentration of the selected nearby station (120583gm3)14 The current PM10 concentration of the selected nearby station (120583gm3)15 The current NO2 concentration of the selected nearby station (120583gm3)16 The current CO concentration of the selected nearby station (mgm3)17 The current O3 concentration of the selected nearby station (120583gm3)18 The current SO2 concentration of the selected nearby station (120583gm3)19 The day of year20 The day of week21 The hour of day

Weather has 17 different conditions and they are sunnycloudy overcast rainy sprinkle moderate rain heaver rainrain storm thunder storm freezing rain snowy light snowmoderate snow heavy snow foggy sand storm and dusty Allfeature numbers are presented in the Table 1

33 Evaluation Metrics In this study four performanceindicators including Mean absolute error (MAE) root meansquare error (RMSE) and mean absolute percentage error(MAPE) and Accuracy (Acc) [34] were used to assess theperformance of the models They are defined by

119872119860119864 = 1119873119873sum119894=1

1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816 (8)

119877119872119878119864 = radic 1119873119873sum119894=1

(119874119894 minus 119875119894)2 (9)

119872119860119875119864 = 100119873119873sum119894=1

10038161003816100381610038161003816100381610038161003816119874119894 minus 11987511989411987411989410038161003816100381610038161003816100381610038161003816 (10)

119860119888119888 = 1 minus sum119873119894 1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816sum119873119894 119874119894 (11)

where N is the number of time points and Oi and Pi representthe observed and predicted values respectively

34 Experiment Setup There is a new data element arrivingeach hour Each data element together with the featuresthat determine the element constitute a training sample

[119909119905 (1199101119905 1199102119905 1199103119905)] where 1199101119905 1199102119905 and 1199103119905 represent PM25concentration NO2 concentration and SO2 concentrationrespectively119909119905 is a set of features and the set ismade up of thefactors that may be relevant to the concentration forecastingof three kinds of pollutant

Setting the Parameters of Sliding Window (Window Size StepSize Horizon) In the study the concentrations of PM25 NO2and SO2 were predicted 12 hours in advance so horizonwas set to 12 Window size was equal to 1220 that is thesliding window always contained 1220 elements Step sizewas set to 1 After the current concentration was monitoredthe sliding window moved one-step forward the predictionmodel was trained with 1220 training samples correspondingto the elements contained in the sliding window and then thewell-trained model was used to predict the responses of thetarget instances

Selecting Features Relevant to Each Task The experimentalprocedures are as follows

(1) After the continuous variables are discretized for dif-ferent tasks the features were evaluated and sorted accordingto minimal-redundancy-maximal-relevance (mRMR) crite-rion

First the continuous variables were discretized andthe discretized response variable became a class label withnumerical significance In this paper continuous variableswere divided into 20 levels A MI Tool box a mutualinformation package of Adam Pocock was used to evaluatethe importance of the features according to the mRMRcriterion

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

4 Journal of Control Science and Engineering

MTL-DBN-DNN

DBN

Input Layer

Output layer

1_stHidden Layer

2_ndHidden Layer

3_rdHidden Layer

4_thHidden Layer

Figure 3 The schematic representation of the DBN-DNNmodel with multitask learning

200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(a)

2500200015001000500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(b)

1500

1000

500500

00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06 00 18 12 06

(c)

Figure 4Three transport corridors namely southeast branch (a) northwest branch (b) and southwest branch (c) tracked by 24 h backwardtrajectories of air masses in Jing-Jin-Ji area

32 Feature Set According to some research results welet the factors that may be relevant to the concentrationforecasting of three kinds of air pollutants make up a set ofcandidate features

Traffic emission is one of the sources of air pollutantsThe traffic flow onweekdays and weekend is different Duringthe morning peak hours and the afternoon rush hours trafficdensity is notably increased In this paper the hour of day andthe day of week were used to represent the traffic flow datathat is not easy to obtain

Anthropogenic activities that lead to air pollution aredifferent at different times of a yearThe day of year (DAY) [3]was used as a representation of the different times of a yearand it is calculated by

119863119860119884 = cos (2120587 119889119905ℎ119879) (7)

where 119889119905ℎ represents the ordinal number of the day in theyear and T is the number of days in this year

Regional transport of atmospheric pollutants may bean important factor that affects the concentrations of airpollutants Three transport corridors are tracked by 24 hbackward trajectories of air masses in Jing-Jin-Ji area [335] and they are presented in Figure 4 According to thecurrent wind direction and the transport corridors of airmasses we selected a nearby city located in the upwinddirection of Beijing Then we used the monitoring data ofthe concentrations of six kinds of air pollutants from astation located in the city to represent the current pollutantconcentrations of the selected nearby city

Candidate features include meteorological data from thetarget station whose three kinds of air pollutant concen-trations will be predicted (including weather temperaturepressure humidity wind speed and wind direction) andthe concentrations of six kinds of air pollutants at thepresent moment from the target station and the selectednearby city (including PM25 PM10 SO2 NO2 CO andO3) the hour of day the day of week and the day of year

Journal of Control Science and Engineering 5

Table 1 The 21 elements in the candidate feature set

Feature1 The current PM25 concentration of the target station (120583gm3)2 The current PM10 concentration of the target station(120583gm3)3 The current NO2 concentration of the target station (120583gm3)4 The current CO concentration of the target station (mgm3)5 The current O3 concentration of the target station (120583gm3)6 The current SO2 concentration of the target station (120583gm3)7 Weather8 Temperature (∘C)9 Atmospheric pressure (hPa)10 Relative humidity11 Wind speed (ms)12 Wind direction13 The current PM25 concentration of the selected nearby station (120583gm3)14 The current PM10 concentration of the selected nearby station (120583gm3)15 The current NO2 concentration of the selected nearby station (120583gm3)16 The current CO concentration of the selected nearby station (mgm3)17 The current O3 concentration of the selected nearby station (120583gm3)18 The current SO2 concentration of the selected nearby station (120583gm3)19 The day of year20 The day of week21 The hour of day

Weather has 17 different conditions and they are sunnycloudy overcast rainy sprinkle moderate rain heaver rainrain storm thunder storm freezing rain snowy light snowmoderate snow heavy snow foggy sand storm and dusty Allfeature numbers are presented in the Table 1

33 Evaluation Metrics In this study four performanceindicators including Mean absolute error (MAE) root meansquare error (RMSE) and mean absolute percentage error(MAPE) and Accuracy (Acc) [34] were used to assess theperformance of the models They are defined by

119872119860119864 = 1119873119873sum119894=1

1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816 (8)

119877119872119878119864 = radic 1119873119873sum119894=1

(119874119894 minus 119875119894)2 (9)

119872119860119875119864 = 100119873119873sum119894=1

10038161003816100381610038161003816100381610038161003816119874119894 minus 11987511989411987411989410038161003816100381610038161003816100381610038161003816 (10)

119860119888119888 = 1 minus sum119873119894 1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816sum119873119894 119874119894 (11)

where N is the number of time points and Oi and Pi representthe observed and predicted values respectively

34 Experiment Setup There is a new data element arrivingeach hour Each data element together with the featuresthat determine the element constitute a training sample

[119909119905 (1199101119905 1199102119905 1199103119905)] where 1199101119905 1199102119905 and 1199103119905 represent PM25concentration NO2 concentration and SO2 concentrationrespectively119909119905 is a set of features and the set ismade up of thefactors that may be relevant to the concentration forecastingof three kinds of pollutant

Setting the Parameters of Sliding Window (Window Size StepSize Horizon) In the study the concentrations of PM25 NO2and SO2 were predicted 12 hours in advance so horizonwas set to 12 Window size was equal to 1220 that is thesliding window always contained 1220 elements Step sizewas set to 1 After the current concentration was monitoredthe sliding window moved one-step forward the predictionmodel was trained with 1220 training samples correspondingto the elements contained in the sliding window and then thewell-trained model was used to predict the responses of thetarget instances

Selecting Features Relevant to Each Task The experimentalprocedures are as follows

(1) After the continuous variables are discretized for dif-ferent tasks the features were evaluated and sorted accordingto minimal-redundancy-maximal-relevance (mRMR) crite-rion

First the continuous variables were discretized andthe discretized response variable became a class label withnumerical significance In this paper continuous variableswere divided into 20 levels A MI Tool box a mutualinformation package of Adam Pocock was used to evaluatethe importance of the features according to the mRMRcriterion

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

Journal of Control Science and Engineering 5

Table 1 The 21 elements in the candidate feature set

Feature1 The current PM25 concentration of the target station (120583gm3)2 The current PM10 concentration of the target station(120583gm3)3 The current NO2 concentration of the target station (120583gm3)4 The current CO concentration of the target station (mgm3)5 The current O3 concentration of the target station (120583gm3)6 The current SO2 concentration of the target station (120583gm3)7 Weather8 Temperature (∘C)9 Atmospheric pressure (hPa)10 Relative humidity11 Wind speed (ms)12 Wind direction13 The current PM25 concentration of the selected nearby station (120583gm3)14 The current PM10 concentration of the selected nearby station (120583gm3)15 The current NO2 concentration of the selected nearby station (120583gm3)16 The current CO concentration of the selected nearby station (mgm3)17 The current O3 concentration of the selected nearby station (120583gm3)18 The current SO2 concentration of the selected nearby station (120583gm3)19 The day of year20 The day of week21 The hour of day

Weather has 17 different conditions and they are sunnycloudy overcast rainy sprinkle moderate rain heaver rainrain storm thunder storm freezing rain snowy light snowmoderate snow heavy snow foggy sand storm and dusty Allfeature numbers are presented in the Table 1

33 Evaluation Metrics In this study four performanceindicators including Mean absolute error (MAE) root meansquare error (RMSE) and mean absolute percentage error(MAPE) and Accuracy (Acc) [34] were used to assess theperformance of the models They are defined by

119872119860119864 = 1119873119873sum119894=1

1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816 (8)

119877119872119878119864 = radic 1119873119873sum119894=1

(119874119894 minus 119875119894)2 (9)

119872119860119875119864 = 100119873119873sum119894=1

10038161003816100381610038161003816100381610038161003816119874119894 minus 11987511989411987411989410038161003816100381610038161003816100381610038161003816 (10)

119860119888119888 = 1 minus sum119873119894 1003816100381610038161003816119874119894 minus 1198751198941003816100381610038161003816sum119873119894 119874119894 (11)

where N is the number of time points and Oi and Pi representthe observed and predicted values respectively

34 Experiment Setup There is a new data element arrivingeach hour Each data element together with the featuresthat determine the element constitute a training sample

[119909119905 (1199101119905 1199102119905 1199103119905)] where 1199101119905 1199102119905 and 1199103119905 represent PM25concentration NO2 concentration and SO2 concentrationrespectively119909119905 is a set of features and the set ismade up of thefactors that may be relevant to the concentration forecastingof three kinds of pollutant

Setting the Parameters of Sliding Window (Window Size StepSize Horizon) In the study the concentrations of PM25 NO2and SO2 were predicted 12 hours in advance so horizonwas set to 12 Window size was equal to 1220 that is thesliding window always contained 1220 elements Step sizewas set to 1 After the current concentration was monitoredthe sliding window moved one-step forward the predictionmodel was trained with 1220 training samples correspondingto the elements contained in the sliding window and then thewell-trained model was used to predict the responses of thetarget instances

Selecting Features Relevant to Each Task The experimentalprocedures are as follows

(1) After the continuous variables are discretized for dif-ferent tasks the features were evaluated and sorted accordingto minimal-redundancy-maximal-relevance (mRMR) crite-rion

First the continuous variables were discretized andthe discretized response variable became a class label withnumerical significance In this paper continuous variableswere divided into 20 levels A MI Tool box a mutualinformation package of Adam Pocock was used to evaluatethe importance of the features according to the mRMRcriterion

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

6 Journal of Control Science and Engineering

10 15 20 10 15 20 10 15 20

320-25 2

36

37

38

39

40

41

42

MA

E (

gG

-3)

17

18

19

20

21

22

23

MA

E (

gG

-3)

13

14

15

16

17

MA

E (

gG

-3)

Figure 5 MAE vs different numbers of selected features on three tasks

Table 2 Selected features relevant to each task

Task Selected features Removed featuresPM25 concentration prediction 19 13 1 10 6 20 3 7 12 2 11 4 9 18 21 8 15 5 17 16 14NO2 concentration prediction 19 21 11 13 10 3 6 12 20 9 2 17 8 7 4 18 5 15 1 16 14SO2 concentration prediction 19 21 6 13 11 20 18 2 10 15 7 4 9 16 5 17 3 8 14 1 12

(2) The dataset was divided into training set and testset For each task we used random forest to test the featuresubsets from top1-topn according to the feature importanceranking and then selected the first n features correspondingto the minimum value of the MAE as the optimal featuresubset The curves of MAE are depicted in Figure 5 Table 2shows the selected features relevant to each task

In order to verify whether the application of multitasklearning and online forecasting can improve the DBN-DNNforecasting accuracy respectively and assess the capabilityof the proposed MTL-DBN-DNN to predict air pollutantconcentration we compared the proposed MTL-DBN-DNNmodel with four baseline models (2-5)

(1) DBN-DNN model with multitask learning usingonline forecasting method (OL-MTL-DBN-DNN)

(2) DBN-DNN model using online forecasting method(OL-DBN-DNN)

(3) DBN-DNNmodel(4) Air-Quality-Prediction-Hackathon-Winning-Model

(Winning-Model) [36](5) A hybrid predictive model (FFA) proposed by Yu

Zheng etc [34]For the single task prediction model the input of the

model is the selected features relevant to single task For themultitask prediction model as long as a feature is relevant toone of the tasks the feature is used as an input variable to themodel

Remark For the first two models (MTL-DBN-DNN andDBN-DNN) we used the online forecasting method To

be distinguished from static forecasting models the modelsusing online forecasting method were denoted by OL-MTL-DBN-DNN and OL-DBN-DNN respectively

For the first three models above we used the same DBNarchitecture and parameters According to the practical guidefor training RBMs in technical report [33] and the datasetused in the study we set the architecture and parametersof the deep neural network as follows In this study deepneural network consisted of a DBN with layers of size G-100-100-100-90 and a top output layer and G is the numberof input variables The DBN was constructed by stackingfour RBMs and a Gaussian-Bernoulli RBM was used as thefirst layer In the pretraining stage the learning rate was setto 000001 and the number of training epochs was set to50 In the fine-tuning stage we used 10 iterations and gridsearch was used to find a suitable learning rate For the OL-MTL-DBN-DNN model the output layer contained threeunits and simultaneously output the predicted concentrationsof three kinds of pollutants Each unit at output layer wasconnected to only a subset of units at the last hidden layerof DBN

For Winning-Model time back was set to 4 Since thedataset used in this study was released by the authors of [34]the experimental results given in the original paper for theFFAmodel were quoted for comparison

Because the first twomodels above are themodels that useonline forecastingmethod the training set changes over timeFor the sake of fair comparison we selected original 1220elements contained in the window before sliding windowbegins to slide forward and used samples corresponding to

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

Journal of Control Science and Engineering 7

Table 3 Comparison among different models

Models PM25 NO2 SO2MAE RMSE MAPE Acc MAE RMSE MAPE Acc MAE RMSE MAPE Acc

Winning-Model [36] 2363 3533 24363 040 1434 1955 8050 064 856 1424 7001 054OL-MTL-DBN-DNN 1852 3099 20036 053 1382 2004 5262 065 837 1364 5213 055OL-DBN-DNN 2509 3666 30578 037 1506 2136 6540 062 896 1449 6152 052DBN-DNN 2649 3685 39055 033 1743 2590 6352 056 1011 1472 8058 046

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50100150

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(a) OL-MTL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

0-

25

CON

C

2CO

NC

32

CON

C

50 100 1500Time (hour)

(gG

-3)

(gG

-3)

(gG

-3)

(b) OL-DBN-DNN

0 50 100 1500

100200300

Observed dataPredicted data

0 50 100 1500

50

100

0

50

100

50 100 1500Time (hour)

0-

25

CON

C

2CO

NC

32

CON

C(

gG

-3)

(gG

-3)

(gG

-3)

(c) Winning-Model

Figure 6The prediction performances of different models for a 12-h horizon In the pictures time is measured along the horizontal axis andthe concentrations of three kinds of air pollutants (PM25 NO2 SO2) are measured along the vertical axis

these elements as the training samples of the static predictionmodels (DBN-DNN and Winning-Model) The four modelswere used to predict the concentrations of three kinds ofpollutants in the same period The experimental resultsof hourly concentration forecasting for a 12h horizon areshown in Table 3 where the best results are marked withitalic

35 Results and Discussions Table 3 shows that the bestresults are obtained by using OL-MTL-DBN-DNN methodfor concentration forecasting Three error evaluation criteria(MAE RMSE and MAPE) of the OL-MTL-DBN-DNN arelower than that of the baseline models and its accuracy issignificantly higher than that of the baseline models Theprediction performance of OL-DBN-DNN is better thanDBN-DNN which shows that the use of online forecastingmethod can improve the prediction performanceThe perfor-mance of OL-MTL-DBN-DNN surpasses the performanceof OL-DBN-DNN which shows that multitask learning isan effective approach to improve the forecasting accuracyof air pollutant concentration and demonstrates that it isnecessary to share the information contained in the train-ing data of three prediction tasks It is worth mentioning

that learning tasks in parallel to get the forecast results ismore efficient than training a model separately for eachtask

The experimental results show that the OL-MTL-DBN-DNNmodel proposed in this paper achieves better predictionperformances than the Air-Quality-Prediction-Hackathon-Winning-Model and FFAmodel and the prediction accuracyis greatly improved For example when we predict PM25concentrations compared with Winning-Model MAE andRMSE of OL-MTL-DBN-DNN are reduced by about 511 and434 respectively and accuracy of OL-MTL-DBN-DNN isimproved by about 13 These positive results demonstratethat our model MTL-DBN-DNN is promising in real-timeair pollutant concentration forecasting

When the prediction time interval in advance is set to 12hours some prediction results of three models are presentedin Figure 6

Figure 6 shows that predicted concentrations andobserved concentrations can match very well when the OL-MTL-DBN-DNN is used The advantage of the OL-MTL-DBN-DNN is more obvious when OL-MTL-DBN-DNN isused to predict the sudden changes of concentrations andthe high peaks of concentrations

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

8 Journal of Control Science and Engineering

4 Conclusion

In this paper a deep neural network model with multitasklearning (MTL-DBN-DNN) pretrained by a deep beliefnetwork (DBN) is proposed for forecasting of nonlinearsystems and tested on the forecast of air quality time series

The MTL-DBN-DNN model can fulfill prediction tasksat the same time by using shared information In the modeleach unit in the output layer is connected to only a subsetof units in the last hidden layer of DBN There are commonunits with a specified quantity between two adjacent subsetsSuch connection effectively avoids the problem that fullyconnected networks need to juggle the learning of each taskwhile being trained so that the trained networks cannotget optimal prediction accuracy for each task The locallyconnected architecture can well learn the commonalities anddifferences of multiple tasks

PM25 SO2 and NO2 have chemical reaction and almostthe same concentration trend so we apply the proposedmodel to the case study on the concentration forecasting ofthree kinds of air pollutants 12 hours in advance Comparisonwith multiple baseline models shows our model MTL-DBN-DNN has a stronger capability of predicting air pollutantconcentration Therefore by combining the advantages ofdeep learning multitask learning and online forecasting theMTL-DBN-DNNmodel is able to provide accurate real-timeconcentration predictions of air pollutants

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

Section 32 of this paper (feature set) cites the authorrsquosconference paper [37]

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by National Natural Science Foun-dation of China (61873008) and Beijing Municipal NaturalScience Foundation (4182008)

References

[1] P S G De Mattos Neto F Madeiro T A E Ferreira andG D C Cavalcanti ldquoHybrid intelligent system for air qualityforecasting using phase adjustmentrdquoEngineering Applications ofArtificial Intelligence vol 32 pp 185ndash191 2014

[2] K Siwek and S Osowski ldquoImproving the accuracy of predictionof PM10 pollution by the wavelet transformation and an ensem-ble of neural predictorsrdquo Engineering Applications of ArtificialIntelligence vol 25 no 6 pp 1246ndash1258 2012

[3] X Feng Q Li Y Zhu J Hou L Jin and J Wang ldquoArtificialneural networks forecasting of PM25 pollution using air masstrajectory based geographic model and wavelet transforma-tionrdquo Atmospheric Environment vol 107 pp 118ndash128 2015

[4] W Tamas G Notton C Paoli M-L Nivet and C VoyantldquoHybridization of air quality forecasting models using machinelearning and clustering An original approach to detect pollu-tant peaksrdquo Aerosol and Air Quality Research vol 16 no 2 pp405ndash416 2016

[5] A Kurt and A B Oktay ldquoForecasting air pollutant indicatorlevels with geographic models 3 days in advance using neuralnetworksrdquo Expert Systems with Applications vol 37 no 12 pp7986ndash7992 2010

[6] A Y Ng J Ngiam C Y Foo Y Mai and C Suen DeepNetworks Overview 2013 httpdeeplearningstanfordeduwikiindexphpDeep Networks Overview

[7] G E Hinton S Osindero andY Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[8] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquo Naturevol 521 no 7553 pp 436ndash444 2015

[9] S Azizi F Imani B Zhuang et al ldquoUltrasound-based detectionof prostate cancer using automatic feature selection with deepbelief networksrdquo in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 N Navab J HorneggerW M Wells and A Frangi Eds vol 9350 of Lecture Notes inComputer Science pp 70ndash77 Springer Munich Germany 2015

[10] M Qin Z Li and Z Du ldquoRed tide time series forecasting bycombining ARIMA and deep belief networkrdquoKnowledge-BasedSystems vol 125 pp 39ndash52 2017

[11] X Sun T Li Q Li Y Huang and Y Li ldquoDeep belief echo-state network and its application to time series predictionrdquoKnowledge-Based Systems vol 130 pp 17ndash29 2017

[12] T Kuremoto S Kimura K Kobayashi andMObayashi ldquoTimeseries forecasting using a deep belief network with restrictedBoltzmann machinesrdquo Neurocomputing vol 137 pp 47ndash562014

[13] F Shen J Chao and J Zhao ldquoForecasting exchange rateusing deep belief networks and conjugate gradient methodrdquoNeurocomputing vol 167 pp 243ndash253 2015

[14] A Dedinec S Filiposka A Dedinec and L Kocarev ldquoDeepbelief network based electricity load forecasting An analysis ofMacedonian caserdquo Energy vol 115 pp 1688ndash1700 2016

[15] H ZWang G BWang G Q Li J C Peng andY T Liu ldquoDeepbelief network based deterministic and probabilistic wind speedforecasting approachrdquoApplied Energy vol 182 pp 80ndash93 2016

[16] R Caruana ldquoMultitask learningrdquoMachine Learning vol 28 no1 pp 41ndash75 1997

[17] Y Huang W Wang L Wang and T Tan ldquoMulti-task deepneural network for multi-label learningrdquo in Proceedings of theIEEE International Conference on Image Processing pp 2897ndash2900 Melbourne Australia 2013

[18] R Zhang J Li J Lu R Hu Y Yuan and Z Zhao ldquoUsingdeep learning for compound selectivity predictionrdquo CurrentComputer-Aided Drug Design vol 12 no 1 pp 5ndash14 2016

[19] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 15 no 5 pp 2191ndash2201 2014

[20] D Chen and B Mak ldquoMulti-task learning of deep neural net-works for low-resource speech recognitionrdquo IEEE TransactionsonAudio Speech and Language vol 23 no 7 pp 1172ndash1183 2015

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

Journal of Control Science and Engineering 9

[21] R Xia and Y Liu ldquoLeveraging valence and activation informa-tion via multi-task learning for categorical emotion recogni-tionrdquo in Proceedings of the 40th IEEE International Conferenceon Acoustics Speech and Signal Processing ICASSP 2015 pp5301ndash5305 Brisbane Australia April 2014

[22] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 Helsinki Finland July 2008

[23] R M Harrison A M Jones and R G Lawrence ldquoMajorcomponent composition of PM10 and PM25 from roadside andurban background sitesrdquo Atmospheric Environment vol 38 no27 pp 4531ndash4538 2004

[24] G Wang R Zhang M E Gomez et al ldquoPersistent sulfateformation from London Fog to Chinese hazerdquo Proceedings ofthe National Acadamyof Sciences of the United States of Americavol 113 no 48 pp 13630ndash13635 2016

[25] Y Cheng G Zheng C Wei et al ldquoReactive nitrogen chemistryin aerosol water as a source of sulfate during haze events inChinardquo Science Advances vol 2 Article ID e1601530 2016

[26] D Agrawal and A E Abbadi ldquoSupporting sliding windowqueries for continuous data streamsrdquo in IEEE InternationalConference on Scientific and Statistical Database Managementpp 85ndash94 Cambridge Massachusetts USA 2003

[27] K B Shaban A Kadri and E Rezk ldquoUrban air pollutionmonitoring system with forecasting modelsrdquo IEEE SensorsJournal vol 16 no 8 pp 2598ndash2606 2016

[28] L Deng and D Yu ldquoDeep learning methods and applicationsrdquoin Foundations and Trends in Signal Processing vol 7 pp 197ndash391 Now Publishers Inc Hanover MA USA 2014

[29] G E Hinton ldquoDeep belief networksrdquo Scholarpedia vol 4 no 5article no 5947 2009

[30] Y Bengio I Goodfellow and A Courville Deep GenerativeModels Deep Learning MIT Press Cambridge Mass USA2017

[31] G Hinton L Deng D Yu et al ldquoDeep neural networks foracoustic modeling in speech recognition The shared views offour research groupsrdquo IEEE Signal Processing Magazine vol 29no 6 pp 82ndash97 2012

[32] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo The American Associa-tion for the Advancement of Science Science vol 313 no 5786pp 504ndash507 2006

[33] G Hinton ldquoA practical guide to training restricted Boltz-mann machinesrdquo in Neural Networks Tricks of the Trade GMontavon G B Orr and K-R Muller Eds vol 7700 ofLectureNotes in Computer Science pp 599ndash619 Springer BerlinGermany 2nd edition 2012

[34] Y Zheng X YiM Li et al ldquoForecasting fine-grained air qualitybased on big datardquo in Proceedings of the 21st ACM SIGKDDConference on KnowledgeDiscovery andDataMining (KDD rsquo15)pp 2267ndash2276 Sydney Australia August 2015

[35] X Feng Q Li Y Zhu JWang H Liang and R Xu ldquoFormationand dominant factors of haze pollution over Beijing and itsperipheral areas in winterrdquoAtmospheric Pollution Research vol5 no 3 pp 528ndash538 2014

[36] ldquoWinning Code for the EMC Data Science Global Hackathon(AirQuality Prediction) 2012rdquo httpsgithubcombenhamnerAir-Quality-Prediction-Hackathon-Winning-Model

[37] J Li X Shao andH Zhao ldquoAn onlinemethod based on randomforest for air pollutant concentration forecastingrdquo inProceedings

of the 2018 37th Chinese Control Conference (CCC) WuhanChina 2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: A DBN-Based Deep Neural Network Model with Multitask ...downloads.hindawi.com/journals/jcse/2019/5304535.pdfforecasting accuracy, respectively, and assess the capability of the proposed

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom