time-series forecasting of indoor temperature using pre-trained deep neural networks

Time-series forecasting of indoor temperature using pre-trained Deep Neural Networks

Time-series forecasting of indoor temperatureusing pre-trained Deep Neural Networks

P. Romeu, F. Zamora-Martınez, P. Botella-Rocamora, J. Pardo

Embedded Systems and Artificial Intelligence groupDepartamento de ciencias fısicas, matematicas y de la computacion

Escuela Superior de Ensenanzas Tecnicas (ESET)Universidad CEU Cardenal Herrera, 46115 Alfara del Patriarca, Valencia (Spain)

ICANN – September 11, 2013


Index

1 Introduction and motivation

2 Stacked Denoising Auto-Encoders

3 Time series forecasting

4 Experimentation

5 Conclusions and future work


Introduction and motivation

Index




4 Experimentation





Time series forecasting: prediction future values given past data.

s = s0, . . . ,si−1,si,si+1, . . .

Non-linear relationships could be found between the elements.

ANNs were widely used for this task, normally shallow models.

Deep architectures has been successful in computer vision,speech signal processing, classification, . . .

Time series forecasting with deep architectures is starting toreceive interest (as far as we know, using Restricted BoltzmannMachines).



Deep architectures on time series

Expectations

Time series are characterized by more or less complexdependencies. For indoor temperature forecasting:

Known dependencies: time of the day, day of the year.Hidden dependencies: number of people in a room.Short-term dependencies and long-term dependencies.

Normally, expert knowledge is introduced to take into accountknown dependencies; data preprocessing: detrend, deseasoned.

A deep model could learn some of these dependencies usingseveral layers.



Forecasting of indoor temperature with deep ANNsWhat have we done in this work?

Evaluation of pre-training and denoising techniques in a timeseries forecasting task.

Results: slightly better generalization, less over-fitting.

Problems: lack of data, not complex enough task.

15

16

17

18

19

20

21

22

23

24

25

26

0 2000 4000 6000 8000 10000

ºC

Time (minutes)


Stacked Denoising Auto-Encoders

Index




4 Experimentation





A Denoising Auto-Encoder is a neural network which receives anoisy input and produces its cleaned version.

Gaussian additive noise (σ): x = x+N (0,σ2I)Masking noise (p): x = MN(x) with p probability.Encoding: h(x) = so f tsign(b+Wx)Decoding (denoising): x = g(h(x)) = so f tsign(c+W T h(x))

x

h(x)

x x

W W T

x

GN(x)

MN(x)x is an input vector, h(·) is the hidden layer vector, b and c are bias

vectors, W is a weights matrix, so f tsign(·) = x1+ |x|




Greedy training building layer-by-layer auto-encoders.

Stack all the trained weights to produce the final result.

Stack a forecasting layer (linear activation).

Train the whole neural network.


Time series forecasting

Index




4 Experimentation





Univariate vs multivariate.

Single-step-ahead vs multi-step-ahead.

Iterative forecasting vs direct forecasting.

Multiple Input One Output vs Multiple Input Multiple Output.

st+Ht+1 = F(st

t−I+1)

MIMO modelling is natural in ANNs, because they take profit of theinput/output mapping.F is a forecasting model, H the number of predicted samples, I the number of past

samples taken as input.


Experimentation

Index




4 Experimentation



Experimentation

Dataset

DatasetCaptured during 2011, Marchand June.

1 minute sampling period.

Reduced and smoothed bycomputing mean every 15samples.

Differences between adjacentsamples were computed toremove the trend.

Partition # of samples # of days

Training 2016 21Validation 672 7Test 672 7


Experimentation

Evaluation measures

Evaluation measuresMean Absolute Error (MAE)Root Mean Square Error (RMSE)

MAE?(t) =1|D|

|D|

∑t=I

1H

H

∑h=1|st+h− st+h|

RMSE?(t) =1|D|

|D|

∑t=I

√1H

H

∑h=1

(st+h− st+h)2

|D| is the size of the dataset, H the future horizon, st+h the forecasted value, st+h the

ground truth.


Experimentation

Experiments

Experiments

Different training modes comparison

TM-0 consists in a standard training of an ANN.

TM-1 pre-train the ANN using SDAE and fine-tuning of the wholenetwork

TM-2 pre-train the ANN using SDAE and fine-tuning of only lastlayer (forecasting layer).


Experimentation

Experiments

Experiments

Training description

Back-propagation with mini-batch size 32.

Mean Square Error (MSE) loss function.

Future horizon of 12 samples (three hours).

Minimum of 50 epochs, maximum of 4000.Random search hyper-parameter optimization:

learning rate, momentum, weight decay,number of hidden layers, hidden layer sizes,number of inputs,mask noise percentage.

3600 experiments for tuning.


Experimentation

Results

Results

Best topologies

- TM-0: 60 — 756 — 60 — 12- TM-1: 48 — 648 — 920 — 16 — 12- TM-2: 96 — 712 — 12

TM-0 has convergence problems with deep networks:

33% of two layered network experiments do not converge.

58% of three layered network experiments do not converge.

Note that the topologies are not the same in the three cases, we took the best

topology for each training mode.


Experimentation

Results

Results

20 random initializations of best hyper-parameters

0.115

0.120

0.125

0.130

0.135

0.140

TM-0 TM-1 TM-2

MA

E*

Validation

Test

0.135

0.140

0.145

0.150

0.155

0.160

0.165

0.170

TM-0 TM-1 TM-2

RM

SE

*


Experimentation

Results

Results

MSE of training partition during training

0.010

0.014

0.019

0.025

0.034

0.046

0.063

0.086

0.117

0.159

0 200 400 600 800 1000 1200 1400

Tra

inin

g M

SE

(lo

g-s

cale

d)

Epochs

TM-0TM-1TM-2


Experimentation

Results

Results

MAE? of test partition during training

0.117

0.159

0.216

0.293

0.398

0 200 400 600 800 1000 1200 1400

Test M

AE

* (log-s

cale

d)

Epochs

best val TM-0

best val TM-1best val TM-2

TM-0TM-1TM-2


Conclusions and future work

Index




4 Experimentation





Pre-training, denoising techniques, and random hyper-parameteroptimization were used to carry out deep ANNs training in aforecasting task.

Slightly better generalization performance at test set and areduction in over-fitting was observed (TM-1).

Fine-tuning phase of the whole deep model was needed toensure good results (TM-1 vs TM-2).

The short benefit of SDAE could be due to the low dimensionalityof the task.

In the future, this work will be extended by using largerforecasting input window combined with multivariate forecasting.



Questions?

Thanks for your attention!


Appendix

Appendix: Hyper-parameter optimization

Grid search partTrain Mode: TM-0, TM-1, TM-2Number of hidden layers: 1, 2, 3Mask Noise: 0.02, 0.04, 0.10, 0.20

Random search part100 random trials for every grid sweepInput size: 12, 24, 36, 48, 60, 72, 84, 96Learning rate: [10−3,10−2]Momentum: ∼N (10−3,5×10−3), ignoring negative valuesWeight decay: [0,10−5]Hidden layer sizes: [4,1024]


Appendix

Appendix: hyper-parameters analysis

Input size

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

12 36 60 84

TM-0

12 36 60 84

TM-1

1 layer 2 layers3 layers

12 36 60 84

TM-2


Appendix


Encoding layer size

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0 300 600 900

TM-0

0 300 600 900

TM-1

0 300 600 900

TM-2


Appendix


Masking noise

0.12

0.13

0.14

0.15

0.16

0.17

0.02 0.10 0.18

TM-0

0.02 0.10 0.18

TM-1

0.02 0.10 0.18

TM-2


Appendix


Learning rate of forecasting phase

0.12

0.13

0.14

0.15

0.16

0.17

0 0.003 0.006 0.009

TM-0

0 0.003 0.006 0.009

TM-1

0 0.003 0.006 0.009

TM-2


Appendix

Appendix: results table

MAE?

Validation (µ±σ) Test (µ±σ)

ETS 0.3004 0.3254TM-0 0.1289±0.0011 0.12482±0.0010TM-1 0.1287±0.0033 0.1223±0.0033TM-2 0.1374±0.0007 0.1279±0.0011

RMSE?

Validation (µ±σ) Test (µ±σ)

ETS 0.3648 0.3930TM-0 0.1563±0.0011 0.1511±0.0012TM-1 0.1565±0.0040 0.1473±0.0039TM-2 0.1663±0.0009 0.1538±0.0013

time-series forecasting of indoor temperature using pre-trained deep neural networks

Education

motivation time series

indoor temperature forecasting

forecasting model

deep anns

motivation deep architectures

c time minutes

deep model

denoising autoencoders