deep learning for human activity recognition: a resource...
TRANSCRIPT
![Page 1: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/1.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Daniele Ravì, Charence Wong, Benny Lo and Guang-Zhong Yang To appear in the proceedings of the IEEE EMBS 13th Annual International Body Sensor Networks (BSN) Conference, June 2016
Deep Learning for Human Activity Recognition: A Resource Efficient Implementation on Low-Power Devices
![Page 2: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/2.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Activity Recognition
§ Main Goal ¨ Identify human activities (such as walking, jogging, cycling, etc.)
![Page 3: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/3.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Applications
Wellbeing
Healthcare Sports
![Page 4: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/4.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
How is it recorded?
§ Inertial sensors ¨ Accelerometer
¨ Gyroscope
¨ Magnetometer
§ Camera
§ GPS
§ Audio
§ Proximity
§ Barometer
![Page 5: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/5.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Can we use a phone?
![Page 6: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/6.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
What about wearables?
Intel Edison
§ Intel Atom dual-core CPU @ 500Hz
§ 1GB DDR3 RAM
§ 4GB eMMC Flash
§ Dimensions: 35mm × 25mm
![Page 7: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/7.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Deep Learning
§ For many years, activity recognition approaches have been designed using shallow features ¨ Those methods are task dependent
and limited
§ With Deep Learning, features are extracted from the training data instead of being handcrafted for a specific application
Shallow features
![Page 8: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/8.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Our approach
§ We propose an approach that combines a descriptive input domain with a deep learning method
§ The method must be efficient
§ It must also be robust against transformations and variations in sensor properties
![Page 9: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/9.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Our approach
![Page 10: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/10.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Our approach
Spectrogram
§ A spectrogram of an inertial signal is a new representation of the signal as a function of frequency and time
§ The procedure for computing the spectrogram is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment
§ This results in a representation that describes the changing spectra as a function of time
§ The representation captures the inertial input signal whilst providing a form of temporal and sample rate invariance
![Page 11: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/11.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Our approach
![Page 12: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/12.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Our approach
Temporal convolution layer
§ Each filter wi is applied to the spectrogram vertically and the weighted sum of the convolved signal is computed as follows:
§ These temporal convolutions produce an output layer of learned features with a small size for real-time processing
§ This provides orientation invariance to the input signal
§ The last two layers are used to finally classify the features
Fig. 2. Proposed deep learning architecture: a filter wi
is applied to the pre-arranged spectrograms of the input, and the weighted sums of the convolvedsignal at each time t are computed in the temporal convolution layer. Finally, the fully-connected layer and soft-max layer are used for classification.
point in time and frequency. This can be expressed as:
STFT{x[n]}(m,!) = X(m,!) =1X
n=�1x[n]![n�m]e�j!n
(1)likewise, with signal x[n] and window w[n]. The magnitudesquared of the STFT yields the spectrogram of the function:
spectrogram{x(n)}(m,!) = |X(m,!)|2 (2)
The resulting spectrogram is a matrix st⇥ sf where st isthe number of different short term, time-localised points andsf is the number of frequencies considered. Time increasesacross the columns while the frequency increases down therows, starting at 0. The spectrogram representation providesinterpretable features in capturing the intensity differencesamong nearest inertial data points. It also provides a formof temporal and sample rate invariance. This enables theclassification of activities based on the variations of spectraldensity, which reduces classification complexity. We use thespectrogram representation as the input for the next stage ofour deep leaning activity recognition model.
B. Feature Extraction using Deep Learning
In a typical deep learning approach for HAR, the time-series data obtained by the accelerometer and gyroscope areforwarded directly into the deep learning infrastructure. Thedeep learning model contains hidden relations between pairsof input signals; these correlations are usually overlooked.Moreover, a large set of layers are built on top of eachother for automating feature design by trying to captureall the possible permutations of the signal using a greaternumber of nodes. Each layer performs a non-linear trans-formation on the outputs of the previous layer, so that thedeep learning models the data by a hierarchy of featuresfrom low-level to high-level. These approaches have high
computation demands and make them unsuitable for low-power devices. In our approach we reduce the computationcost by limiting the connections from the input nodes, findingfeatures efficiently through few nodes and levels. Our explicitconnections are implemented using sums of local temporalconvolutions over the transformed input. Since each humanactivity has a discriminative distribution of frequencies, thesum is performed in correspondence to each frequency ob-tained at different time-localised values. The implementationdetails of the proposed approach are provided in Algorithm 1.Specifically, each filter w
i
– with size kw ⇥ st – is appliedto the spectrogram vertically, as shown in Fig. 2, and theweighted sum of the convolved signal at time t is computedas follows:
o[t][i] =stX
j=1
kwX
k=1
w[i][j][k] ⇤ input[dw ⇤ (t� 1) + k][j] (3)
where dw is the stride of the temporal convolution.These temporal convolutions produce an output layer o
having size wp ⇥ OutputFrame with OutputFrame =(InputFrame� kw)/dw+1 and wp the number of filters.
To introduce orientation invariance, temporal convolutionresults obtained from the different axes of an inertial sensorare integrated together without any discrimination, as shownin the for loop on Line 5 of Algorithm 1. Moreover, the filtersapplied to the three axes share the same weights, which isimportant for reducing the number of nodes in the layer andfor better generalisation of the data. The last two layers of theproposed solution are a fully-connected layer and a soft-maxlayer, which are required since our deep model learns notonly the informative features to recognise different activitiesfrom the data, but also the classification weights used torecognise the class activity.
![Page 13: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/13.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Training
§ We use the error value in the backward propagation routine to update each weight of the network through the Stochastic Gradient Descent (SGD) approach
§ To improve the training procedure of the weights, we have used 3 regularisations: ¨ Weight decay – causes the weights to exponentially decay to zero if no
other update is scheduled to avoid over-fitting
¨ Momentum – accelerates gradient descent to move the global minimum of the function
¨ Dropout – removes units randomly from the neural network to lower generalisation error
![Page 14: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/14.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Datasets
§ Four datasets are considered in our analysis
§ We also release a new dataset called ActiveMiles ¨ It contains unconstrained real world data from 10 subjects
¨ It is one of the largest dataset (around 30 hours of labelled raw data) ¨ It is the first database that contains data captured using different devices
![Page 15: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/15.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Experimental setup
§ Classification accuracy changes when the spectrogram generation parameters are modified
24
68
1012 30 40 50 60 70 80
88
90
92
94
96
98
# of frequencies
X: 3Y: 56Z: 97.16
# of time⇠localised points
Acc
urac
y(%
)
![Page 16: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/16.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Experimental setup
§ The optimal size of the temporal convolution kernel is two or three, depending on the data being classified
§ The proposed approach requires few levels in order to obtain good results
888990919293949596979899
1 2 3 4 5
Accuracy
Filter Size
WISDM-v1ActiveMiles D-FoG Skoda
8990919293949596979899
1 2 3
Accuracy
#of Levels
![Page 17: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/17.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Experimental setup
§ 15 filters and just 80 nodes in the fully convolution layer are sufficient for a good classification
87
88
89
90
91
92
93
94
95
96
97
98
99
12 22 32 42 52
Acc
ura
cy
# of Filters
80 Nodes - WISDM-v1
50 Nodes - WISDM-v1
80 Nodes - ActiveMiles
50 Nodes - ActiveMiles
80 Nodes - D-FoG
50 Nodes - D-FoG
80 Nodes - Skoda
50 Nodes - Skoda
![Page 18: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/18.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Results
§ A comparison of HAR results using four baselines, existing methods, and the considered datasets are shown in Table III
§ The accuracy of the proposed method is typically better in comparison to the other methods
![Page 19: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/19.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Performance
§ A comparison of the computation times required to classify activities on different devices ¨ All resulting times are within the requirements for real-time processing
![Page 20: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/20.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Conclusion
§ The proposed system generates discriminative features that are generally more powerful than handcrafted features
§ The accuracy of the proposed approach is better or comparable against existing state-of-the-art methods
§ The ability of the proposed method to generalise across different classification tasks is demonstrated using a variety of human activity datasets
§ The computation times obtained from low-power devices are consistent with real-time processing
![Page 21: Deep Learning for Human Activity Recognition: A Resource ...wp.doc.ic.ac.uk/postdoc/wp-content/uploads/sites/104/2016/06/char… · spectrogram representation as the input for the](https://reader033.vdocuments.net/reader033/viewer/2022042209/5ead1952fbaaab443b31fdf2/html5/thumbnails/21.jpg)
The Hamlyn Centre The Institute of Global Health Innovation
Thank you for your attention!