neural nilm: deep neural networks applied to energy … · 2020. 2. 4. · neural nilm: deep neural...

1
100 200 300 400 500 600 700 800 Network output 0 100 200 300 400 500 600 Disaggregated vector 0 500 1000 1500 2000 2500 Target 95000 100000 105000 110000 115000 0 1000 2000 3000 4000 5000 Network input Neural NILM: Deep Neural Networks Applied to Energy Disaggregation Jack Kelly & William Knottenbelt [email protected] Department of Computing, Imperial College London 1) INTRODUCTION Articial Neural Nets have gone in and out of fashion at least twice since their birth in the 1950s. In the last few years, 'deep' neural nets have achieved state of the art performance on many machine learning tasks including image classication, automatic speech recognition, handwriting recognition, machine translation, learning Atari games (!) etc. In many of these tasks, deep neural nets now have a substantial performance advantage over other approaches. Automatic Feature Learning: 'Classical' machine learning often involves hand-engineering a number of feature detectors. These hand-engineered feature detectors are time consuming to engineer and are often fragile in practice. Deep neural nets don't require hand-engineered feature detectors. Instead they learn a hierarchy of feature detectors from the data. Given enough training data, deep neural nets learn feature detectors which are often more robust and more informative than hand- engineered feature detectors. Feature Detectors for NILM: Many existing NILM algorithms extract a small number of features from the aggregate power signal. But if we train ourselves to identify appliances, we use a rich set of features. For example, a washing machine's drum motor regularly stops and starts during the rinse cycle. These rapid spikes in the power demand signal are a clear indication of a washing machine. But most NILM feature extractors ignore these spikes as 'noise'. We wanted to see if deep neural nets might be able to learn to detect these rich features. ACKNOWLEDGEMENTS This work was funded by a EPSRC DTA and an Intel PhD Fellowship. REFERENCES [1] Jack Kelly & William Knottenbelt. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from ve UK homes. Scientic Data 2, Article number:150007, 2015. DOI:10.1038/sdata.2015.7 [2] Jack Kelly, Nipun Batra, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, Mani Srivastava. Demo Abstract: NILMTK v0.2: A Non-intrusive Load Monitoring Toolkit for Large Scale Data Sets. In the rst ACM Workshop On Embedded Systems For Energy-Ecient Buildings, 2014. DOI:10.1145/2674061.2675024 18:35:00 18:45:00 18:55:00 19:05:00 19:15:00 19:25:00 19:35:00 19:45:00 19:55:00 Time 0 500 1000 1500 2000 2500 Power (watts) Washing machine 2) METHOD Network input: We train 1 net per target appliance. The network input during training and inference is a window of raw aggregate time series data (after gaps have been lled). The window length is set to be a little longer than the maximum duration of an appliance activation (e.g. 2 hours for a washing machine or 5 minutes for a kettle). Network output: The network tries to locate the target appliance and its mean power demand by drawing a rectangle over the target appliance in the aggregate input. The network outputs 3 positive real numbers: 1) the start time of the target appliance activation, 2) the end time and 3) the mean appliance power demand over the activation. Network layers: 1 = Input layer varies in size per appliance 2 = 1D convolutional layer (number of lters=16, lter size=4, stride=1) 3 = 1D convolutional layer (number of lters=16, lter size=4, stride=1) 4 = Dense layer (8192 rectied linear units) 5 = Dense layer (4096 rectied linear units) 6 = Dense layer (2048 rectied linear units) 7 = Dense layer (512 rectied linear units) 8 = Output layer (3 linear units) We experimented with dropout and batch normalisation but neither appeared to add anything useful. Training: Deep neural nets require a lot of training data. So we generate an eectively innite amount of training data by randomly combining real appliance activations into synthetic aggregates. The training objective is to minimise MSE. We trained on an nVidia GTX780Ti GPU for ~24 hours per appliance. We use the Python library Lasagne (built on top of Theano). Inference: How do we cope with arbitrarily long sequences given that each net has an input window duration of 2 hours or shorter? We slide the net along the input sequence (with stride = 16). We then layer every predicted 'appliance rectangle' on top of each other. We measure the overlap and normalise the overlap to [0, 1]. This gives a probabilistic output for each appliance's power demand. To convert this to a single vector per appliance, we threshold the power & probability. 0.0 0.2 0.4 0.6 0.8 1.0 F1 score 0.00 0.10 0.44 0.37 0.77 Fridge freezer 0.00 0.01 0.10 0.06 0.90 Kettle 0.00 0.33 0.68 0.68 0.87 HTPC 0.00 0.31 0.57 0.55 0.92 Washer dryer 0.00 0.30 0.69 0.73 0.96 Dish washer 0.00 0.21 0.50 0.48 0.88 Across all appliances 0.0 0.2 0.4 0.6 0.8 1.0 Precision score 0.00 0.05 0.29 0.24 0.77 0.00 0.00 0.06 0.03 0.95 0.00 0.20 0.63 0.62 0.88 0.00 0.19 0.72 0.71 0.90 0.00 0.17 0.70 0.76 0.93 0.00 0.12 0.48 0.47 0.89 0.0 0.2 0.4 0.6 0.8 1.0 Recall score 0.00 1.00 0.89 0.79 0.76 0.00 1.00 0.47 0.70 0.85 0.00 1.00 0.73 0.74 0.86 0.00 1.00 0.47 0.45 0.95 0.00 1.00 0.68 0.71 0.98 0.00 1.00 0.65 0.68 0.88 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy score 0.95 0.05 0.88 0.86 0.97 1.00 0.00 0.97 0.92 1.00 0.80 0.20 0.86 0.86 0.95 0.81 0.19 0.87 0.86 0.97 0.83 0.17 0.89 0.91 0.99 0.88 0.12 0.89 0.88 0.98 1.0 0.5 0.0 0.5 1.0 Relative error in total energy -1.00 0.00 0.75 0.74 -0.13 -1.00 0.00 0.79 0.73 -0.14 -1.00 0.00 0.08 0.14 -0.08 -1.00 0.00 0.28 0.26 -0.09 -1.00 0.00 -0.72 -0.60 0.02 -1.00 0.00 0.24 0.25 -0.08 0.0 0.2 0.4 0.6 0.8 1.0 Prop. of total energy correctly assigned 0.99 0.98 0.97 0.97 1.00 0.98 0.97 0.92 0.93 1.00 0.97 0.96 0.98 0.98 0.99 0.79 0.64 0.85 0.85 0.77 0.77 0.61 0.80 0.83 0.70 0.50 0.15 0.52 0.56 0.46 0 50 100 150 200 250 Mean absolute error (watts) 5 9 16 16 2 8 17 44 37 2 14 22 10 11 4 113 191 79 78 120 123 207 103 89 155 53 89 50 46 57 Always off Mean Combinatorial Optimisation Factorial HMM Neural NILM 3) RESULTS We generated 9 days of synthetic aggregate data using appliance activations not seen by the nets during training. Raw network output: The plot below shows, from the bottom panel up: 1) a window of synthetic aggregate; 2) the ground truth target (4 activations of a washing machine); 3) the net's vector output; 4) the probabilistic net output (overlapping rectangles). Comparison with other NILM algorithms: we trained and tested two benchmark algorithms from NILMTK[2] on the same data that we used with Neural NILM. For comparison, we also included two 'dumb' algorithms: one which always outputs zeros (!) and one which always outputs the mean. Neural NILM out-performs the other algorithms on every metric but one. Neural NILM doesn't come rst on the 'proportion of total energy correctly assigned across all appliances' because Neural NILM doesn't try to track state changes on multi-state appliances like the washer dryer and dish washer. But this should be xable in a future version of Neural NILM! recall = true positive rate = sensitivity = TP / P = TP / (TP + FN) precision = positive predictive value = TP / (TP + FP) accuracy = (TP + TN) / (P + N) relative error in total energy = |total predicted energy - total actual energy| / max(predicted, actual) Proportion of total energy correctly assigned 4) CONCLUSIONS AND FUTURE WORK We have demonstrated the rst application of DNNs to NILM. This rst version of Neural NILM already out-performs CO and FHMM algorithms. There are a large number of improvements we are planning to try.

Upload: others

Post on 08-Oct-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural NILM: Deep Neural Networks Applied to Energy … · 2020. 2. 4. · Neural NILM: Deep Neural Networks Applied to Energy Disaggregation Jack Kelly & William Knottenbelt jack.kelly@imperial.ac.uk

100

200

300

400

500

600

700

800

Network output

0

100

200

300

400

500

600Disaggregated vector

0

500

1000

1500

2000

2500Target

95000 100000 105000 110000 1150000

1000

2000

3000

4000

5000Network input

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

Jack Kelly &William Knottenbelt

[email protected] of Computing, Imperial College London

1) INTRODUCTIONArtificial Neural Nets have gone in and out of fashion at least twice since their birth in the 1950s. In the last few years, 'deep' neural nets have achieved state of the art performance on many machine learning tasks including image classification, automatic speech recognition, handwriting recognition, machine translation, learning Atari games (!) etc. In many of these tasks, deep neural nets now have a substantial performance advantage over other approaches.

Automatic Feature Learning: 'Classical' machine learning often involves hand-engineering a number of feature detectors. These hand-engineered feature detectors are time consuming to engineer and are often fragile in practice. Deep neural nets don't require hand-engineered feature detectors. Instead they learn a hierarchy of feature detectors from the data. Given enough training data, deep neural nets learn feature detectors which are often more robust and more informative than hand-engineered feature detectors.

Feature Detectors for NILM: Many existing NILM algorithms extract a small number of features from the aggregate power signal. But if we train ourselves to identify appliances, we use a rich set of features. For example, a washing machine's drum motor regularly stops and starts during the rinse cycle. These rapid spikes in the power demand signal are a clear indication of a washing machine. But most NILM feature extractors ignore these spikes as 'noise'. We wanted to see if deep neural nets might be able to learn to detect these rich features.

ACKNOWLEDGEMENTSThis work was funded by a EPSRC DTA and an Intel PhD Fellowship.

REFERENCES[1] Jack Kelly & William Knottenbelt. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific Data 2, Article number:150007, 2015. DOI:10.1038/sdata.2015.7[2] Jack Kelly, Nipun Batra, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, Mani Srivastava. Demo Abstract: NILMTK v0.2: A Non-intrusive Load Monitoring Toolkit for Large Scale Data Sets. In the first ACM Workshop On Embedded Systems For Energy-Efficient Buildings, 2014. DOI:10.1145/2674061.2675024

18:35:00

18:45:00

18:55:00

19:05:00

19:15:00

19:25:00

19:35:00

19:45:00

19:55:00

Time

0

500

1000

1500

2000

2500

Pow

er

(watt

s)

Washing machine

2) METHODNetwork input: We train 1 net per target appliance. The network input during training and inference is a window of raw aggregate time series data (after gaps have been filled). The window length is set to be a little longer than the maximum duration of an appliance activation (e.g. 2 hours for a washing machine or 5 minutes for a kettle).

Network output: The network tries to locate the target appliance and its mean power demand by drawing a rectangle over the target appliance in the aggregate input. The network outputs 3 positive real numbers: 1) the start time of the target appliance activation, 2) the end time and 3) the mean appliance power demand over the activation.

Network layers:1 = Input layer varies in size per appliance2 = 1D convolutional layer (number of filters=16, filter size=4, stride=1)3 = 1D convolutional layer (number of filters=16, filter size=4, stride=1)4 = Dense layer (8192 rectified linear units)5 = Dense layer (4096 rectified linear units)6 = Dense layer (2048 rectified linear units)7 = Dense layer (512 rectified linear units)8 = Output layer (3 linear units)

We experimented with dropout and batch normalisation but neither appeared to add anything useful.

Training: Deep neural nets require a lot of training data. So we generate an effectively infinite amount of training data by randomly combining real appliance activations into synthetic aggregates. The training objective is to minimise MSE. We trained on an nVidia GTX780Ti GPU for ~24 hours per appliance. We use the Python library Lasagne (built on top of Theano).

Inference: How do we cope with arbitrarily long sequences given that each net has an input window duration of 2 hours or shorter? We slide the net along the input sequence (with stride = 16). We then layer every predicted 'appliance rectangle' on top of each other. We measure the overlap and normalise the overlap to [0, 1]. This gives a probabilistic output for each appliance's power demand. To convert this to a single vector per appliance, we threshold the power & probability.

0.0

0.2

0.4

0.6

0.8

1.0

F1score 0

.00

0.1

0

0.4

4

0.3

7

0.7

7

Fridgefreezer

0.0

0

0.0

1

0.1

0

0.0

6

0.9

0Kettle

0.0

0

0.3

3

0.6

8

0.6

8

0.8

7

HTPC

0.0

0

0.3

1

0.5

7

0.5

5

0.9

2

Washerdryer

0.0

0

0.3

0

0.6

9

0.7

3

0.9

6

Dishwasher

0.0

0

0.2

1

0.5

0

0.4

8

0.8

8

Across allappliances

0.0

0.2

0.4

0.6

0.8

1.0

Precisionscore 0

.00

0.0

5

0.2

9

0.2

4

0.7

7

0.0

0

0.0

0

0.0

6

0.0

3

0.9

5

0.0

0

0.2

0

0.6

3

0.6

2

0.8

8

0.0

0

0.1

9

0.7

2

0.7

1

0.9

0

0.0

0

0.1

7

0.7

0

0.7

6

0.9

3

0.0

0

0.1

2

0.4

8

0.4

7

0.8

9

0.0

0.2

0.4

0.6

0.8

1.0

Recallscore 0

.00

1.0

0

0.8

9

0.7

9

0.7

6

0.0

0

1.0

0

0.4

7

0.7

0

0.8

5

0.0

0

1.0

0

0.7

3

0.7

4

0.8

6

0.0

0

1.0

0

0.4

7

0.4

5

0.9

5

0.0

0

1.0

0

0.6

8

0.7

1

0.9

8

0.0

0

1.0

0

0.6

5

0.6

8

0.8

8

0.0

0.2

0.4

0.6

0.8

1.0

Accuracyscore 0

.95

0.0

5

0.8

8

0.8

6

0.9

7

1.0

0

0.0

0

0.9

7

0.9

2

1.0

0

0.8

0

0.2

0

0.8

6

0.8

6

0.9

5

0.8

1

0.1

9

0.8

7

0.8

6

0.9

7

0.8

3

0.1

7

0.8

9

0.9

1

0.9

9

0.8

8

0.1

2

0.8

9

0.8

8

0.9

8

1.0

0.5

0.0

0.5

1.0

Relativeerror in

totalenergy

-1.0

0

0.0

0

0.7

5

0.7

4

-0.1

3

-1.0

0

0.0

0

0.7

9

0.7

3

-0.1

4

-1.0

0

0.0

0

0.0

8

0.1

4

-0.0

8

-1.0

0

0.0

0

0.2

8

0.2

6

-0.0

9

-1.0

0

0.0

0

-0.7

2

-0.6

0

0.0

2

-1.0

0

0.0

0

0.2

4

0.2

5

-0.0

8

0.0

0.2

0.4

0.6

0.8

1.0

Prop. oftotal

energycorrectlyassigned

0.9

9

0.9

8

0.9

7

0.9

7

1.0

0

0.9

8

0.9

7

0.9

2

0.9

3

1.0

0

0.9

7

0.9

6

0.9

8

0.9

8

0.9

9

0.7

9

0.6

4

0.8

5

0.8

5

0.7

7

0.7

7

0.6

1

0.8

0

0.8

3

0.7

0

0.5

0

0.1

5

0.5

2

0.5

6

0.4

6

0

50

100

150

200

250

Meanabsolute

error(watts)

5 9 16

16 2 8 17

44

37 2 14

22

10

11 4

11

3

19

1

79

78

12

0

12

3

20

7

10

3

89

15

5

53

89

50

46

57

Always off Mean Combinatorial Optimisation Factorial HMM Neural NILM

3) RESULTSWe generated 9 days of synthetic aggregate data using appliance activations not seen by the nets during training.

Raw network output: The plot below shows, from the bottom panel up: 1) a window of synthetic aggregate; 2) the ground truth target (4 activations of a washing machine); 3) the net's vector output; 4) the probabilistic net output (overlapping rectangles).

Comparison with other NILM algorithms: we trained and tested two benchmark algorithms from NILMTK[2] on the same data that we used with Neural NILM. For comparison, we also included two 'dumb' algorithms: one which always outputs zeros (!) and one which always outputs the mean. Neural NILM out-performs the other algorithms on every metric but one. Neural NILM doesn't come first on the 'proportion of total energy correctly assigned across all appliances' because Neural NILM doesn't try to track state changes on multi-state appliances like the washer dryer and dish washer. But this should be fixable in a future version of Neural NILM!

recall = true positive rate = sensitivity = TP / P = TP / (TP + FN)precision = positive predictive value = TP / (TP + FP)accuracy = (TP + TN) / (P + N)relative error in total energy = |total predicted energy - total actual energy| / max(predicted, actual)

Proportion of total energy correctly assigned

4) CONCLUSIONS AND FUTURE WORKWe have demonstrated the first application of DNNs to NILM. This first version of Neural NILM already out-performs CO and FHMM algorithms. There are a large number of improvements we are planning to try.