“Low Power Embedded Gesture
Recognition Using Novel Short-Range
Radar Sensors”
Michele Magno – ETH Zurich
May 28, 2020
tinyML Talks Sponsors
Additional Sponsorships available – contact [email protected] for info
www.edgeimpulse.com
Next tinyML Talk
Date Presenter Topic / Title
Tuesday
June 9
Igor Fedorov
ARM Machine Learning Lab,
Arm ML Research
Dominic Binks
VP Technology,
Audio Analytic Ltd.
SpArSe: Sparse Architecture Search for
CNNs on Resource-Constrained
Microcontrollers
tinyML doesn’t need Big Data, it needs
Great Data
Webcast start time is 8 am Pacific timeEach presentation is approximately 30 minutes in length
Please contact [email protected] if you are interested in presenting
Michele Magno
Dr. Magno is a senior scientist and head of the Project-based Learning Centre at ETH Zurich. He is working in ETH since 2013 and has become a visiting lecturer or professor in several universities, namely the University of Nice Sophia, France, Enssat Lannion, France, Univerisity of Bologna and Mid University Sweden.Dr. Magno is a Senior Member of IEEE, the finalist of ETH Spark Award 2018, and a recipient of many other awards and grants. His background is in computer sciences and electrical engineering.
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Dr. Michele Magno
Credits: Jonas Erb, Moritz Scherer, Philipp Mayer, Manuel Eggimann, Prof. Dr. Luca Benini
29/05/2020 5
Low Power Embedded Gesture Recognition Using Novel
Short-Range Radar Sensors
Michele Magno
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 6
Introduction: Hand Gesture Recognition
Gesture Recognition is the topic of computer
science that interpret human gesture via
Mathematical algorithms.
Human machine Interfaces
Camera, inertial sensors and other sensors
Soli: ubiquitous gesture sensing with
millimeter wave radar (SIGGRAPH
2016)
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Radar
Classical airplane detection radar: 1-12 GHz (wavelength = 2.5-30cm)
Short-range Radar: 60GHz (wavelength = 5cm)
Data from sensor:
Sweep @ 160Hz or higher each over range
Time and range discrete signal 𝑆[𝑡, 𝑟]
29/05/2020Michele Magno 7
Promising Technology: Short Range Radar
time
range
Sweeps
range
resolution(0.483mm)
mm wavelength pulsed coherent radar, which means that it transmits
radio signals in short pulses where the starting phase is well known.
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 8
Gesture Recognition Based on Radar: Google Soli Sensor
Increasing research on radar for gesture recognition1,2,3,4
Google developed micro-radar for gesture recognition
Good results on difficult hand-gestures: 87% accuracy on
11 gestures and 10 people
Why Google Soli is not tinyML?
Sensor consumes too much power (300mW)
Too much data to process (10’000 sweeps/s)
Prediction model too big for embedded systems (689MB)
CNN+RNN (LSTM) 4
Pixel 4 Algorithm could have different model (Realised 2020)
1Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar, 2016
4Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016
3Sparsity-based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, 2017
4A Hand Gesture Recognition Sensor Using Reflected Impulses. 2017
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 10
AI Workloads from Cloud to Edge (Extreme?)
GOP+
MB+
[culurciello18]
Extreme edge AI challenge
AI capabilities in the power
envelope of an MCU:
100mW peak (1mW avg)
512KB RAM (best case
1MB)
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Implement radar based hand-gesture recognition in
embedded system Based on TCN+CNN, below 512KiB
Create dataset with fine-grained hand-gestures
at least 1000 samples per class and 20 users
Algorithm development suitable for embedded systems
less than 1MB, at least 700x smaller than I. w. Soli
Achieving similar accuracy as I. w. Soli
85% (single-user), 87% (10 people) on 11 Gestures1
Algorithm implementation in a mW Parallel RISC-V bases
processor and experimental evaluation on efficiency (power,
run-time)
29/05/2020Michele Magno 11
Main contribution of this work
1Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
We want to bring radar based gesture recognition into low embedded system
But how?
Different sensor: Acconeer
Lower power
Less data:
lower sweep rate, fewer Tx, Rx
29/05/2020Michele Magno 12
Toward TinyML and Low Power Embedded System.
Sensors: Soli Acconeer
Carrier Frequency 60GHz 60GHz
Sweep Rate (up to) 10’000Hz 1’500Hz
Power 300mW 20mw @100Hz
Transmitters/Receivers Tx:2, Rx:4 Tx:1, Rx:1
XR112
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 13
DATASET
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
2 Large datasets recorded:
26 people
1610 instances per gesture
Gesture Set:
29/05/2020Michele Magno 14
Datasets & Gesture Set Datasets 5-Gesture 11-Gesture
Sweep Frequency 256Hz 160Hz
Sensors 1 2
Sweep Range 10cm to 30cm 6.1cm to 29.9cm
People 1 26
Instances 2500 17710
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 15
Tiny-ML Model
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 16
Proposed Tiny Gesture Recognition Embedded Model
Idea: Combine Information of multiple time steps
Combining TCN1+CNN
Memory below 512KB
High Accuracy
1-10 GOPS maximum
1C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal
convolutional networks for action segmentation and detection,” 2016
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Range Frequency Doppler Map:
Radar output: range & time discrete
signal:𝑆[𝑡, 𝑟]
Doppler Map:
𝑆 𝑓, 𝑟 = σ𝑡=0
𝐿𝑑𝑜𝑝𝑝𝑙𝑒𝑟 𝑆[𝑡, 𝑟]𝑒−
2𝜋𝑖𝑓𝑡
𝐿𝑑𝑜𝑝𝑝𝑙𝑒𝑟
= 𝐹𝐹𝑇(𝑆 𝑡, 𝑟 )
Strong feature based on research1,2
29/05/2020Michele Magno 17
Technical Background
1Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016
2Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing, 2015
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 18
Basic Feature Performance Evaluation (on 11 gestures)
Doppler feature performs best (also backed by research1,2)
To reduce model complexity and model size only Doppler feature used
1Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-
Frequency Spectrum. 2016
2Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing, 2015
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
29/05/2020Michele Magno 19
Algorithm Architecture: 2D CNN
Why CNN?
Condensed gesture
information
Representation ~100x
smaller than input
Extraction of semantic information in Doppler Map with CNN1,2. Time steps considered are 32 and the number of range points per sweep is 492, for 2 sensors.
With only CNN about 90% Accuracy on single user and 5 gestures
1Hand Gesture Recognition Using Micro-Doppler Signatures With Convolutional Neural Network. 2016
2Multi-sensor System for Driver’s Hand-Gesture Recognition, 2015
(492 32 2 + 98 10 16) * 8 Bit = 368KB.
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 20
Algorithm Architecture: Temporal Convolutional Neural Network (TCN1)
Idea: Combine information of
multiple time steps
98.9% Accuracy with TCN and 2 sensors and 5 gestures !
1An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, 2018
Time series using dilated convolutional neural networks
architecture are causal, meaning that there is no information “leakage” from future to past
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 21
Reducing memory requirements
Layer structure of the residual blocks in
TCNs 1
1 C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal
convolutional networks for action segmentation and detection,” 2016.
Reduce residual block to save memory space and
execution time
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 22
Results of Prediction Model
Final Accuracies:
Sensors Dataset Accuracy
1 5 gestures single user 93.83 %
11-gestures multiple users
random shuffle
78.25 %
2 5 gestures single user 98.86 %
11 gestures single user 89.52 %
11 gestures multiple users
random shuffle
81.52 %
11 gestures multiple users
Leave-one-out cross-validation
73.66 %
Gesture spotting: 99.84 %
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 23
Optimal Architecture: 2D CNN: Demo
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 24
Edge AI Platform
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Edge-AI Platforms
GOPs/s
Power/Thermal Envelope10W1W
6.2 GLOPS/s @6W 50$
1 TOPS/s @2.5W 80$
4 TOPS/s @3W 150$
1 TOPS/s @20W 1000$
32 TOPS/s @30W 1500$
0.6 GLOPS/s @0.1W 4$
Linux-
Capable
CPU
MCU
100 GOPS/s @1W 70$
9 GLOPS/s @0.07W 5$
Multi-Core MCU
FPGA
GP-GPU
Embedded
ASIC - SmartPhone
1
100
1000
10000Battery Operated
Near-Sensor
0.1W
29/05/2020Michele Magno 25
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 26
Successful product development: GWT’s GAP8
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 27
TinyML Implementation on Novel Platform
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Hardware limitations of GAP8 require adaptation
of network for optimal execution:
Change to symmetric convolutions
Use only 2D convolutions (also for 1D)
Dilated causal 1D convolutions modelled as 2D
Penalties due to adaptation:
Accuracy: <0.5% change
MACs: original: 16MMACs, adapted: 21MMACs
But runs faster than non-symmetric convolutions
29/05/2020Michele Magno 28
GAP8 Implementation: Adaptation of Network
Before:
3x5
3x5
1x7
Before:
2
2
2
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 29
GAP8 Implementation: Run-Time, Power Consumption, FFT
Excution Length @100MHz Cycles
Network + FFT 5.877 million
2D CNN 5.079 million
TCN 0.458 million
Fully Connected Layers 0.086 million
FFT 0.177 million
Power consumption:
5Hz Prediction Rate: only 21mW!
Much margin: up to 50Hz possible
Comparison Cortex M7 (STM32F746ZG):
Same calculation consumes 147mW-588mW
Running at limit (@216MHz needing 850ms
for 5 predictions) *Work in Progress
@100MHz and 8 cores running:
total energy per frame: 4.2mJ
5 frames/s: 21mW
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 30
Comparison to “Interacting with Soli …” Paper1
Per Frame Accuracy on 11 Gesture Datasets Soli TCN+CNN
Single-User Leave-One-Out (Session) Cross-Validation 85.75 % 89.52 %
Multiple-Users randomly shuffled 87.17 % 81.52 %
Multiple-Users Leave-One-Out (Person) Cross-Validation 79.06 % 73.66 %
Other Properties Soli TCN+CNN
Model Size / Whole System 689MB/ND 91kB / 460Ki
Dataset: Total Instances per Gesture 500 1610
Dataset: People 10 26
Embedded Implementation No Yes
Network Power Consumption - 21mW
Sensor Power Consumption 300mW <190mW
Worst case 160Hz. More Realistic: 2 Sensors @ 100Hz: 40mW
1Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 31
Conclusion
Radar-based gesture recognition for Tiny embedded systems!
Very large hand-gesture dataset created (3x bigger than I. w. Soli)
11 gestures, 17710 instances (1610 per gesture) and 26 people
Very accurate and memory efficient TCN based prediction algorithm
90% accuracy (single-user) and 82% (26 people) on 11 gestures
Model size: 91kB (7500x smaller than I. w. Soli)
Fast and power efficient implementation in GAP8 processor
Power consumption of only 21mW (average at 5 inference per second)
7500x smaller model size, 3x bigger dataset, 21mW implementation
and similar accuracy as previous work on Soli
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Reference on Arxiv out in few days
TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded
Gesture Recognition with Short Range Radars.
Moritz Scherer, Michele Magno, Jonas Erb, Philipp Mayer, Manuel Eggimann, Luca Benini
Previous work
Eggimann, M., Erb, J., Mayer, P., Magno, M. and Benini, L., 2019, October. Low Power
Embedded Gesture Recognition Using Novel Short-Range Radar Sensors. In 2019 IEEE
SENSORS (pp. 1-4). IEEE.
Biggest and OPEN DATASET on short range radar for gesture recognition.: We
are launching it at the Begin of June 2020. Contact me for having it earlier.
29/05/2020Michele Magno 32
Thank you Very much for your attention
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 33
Q&A backup slides.
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
CMSIS-NN: ARM paper1 on efficient Neural Net implementations:
Network: 12.3 MMACs, run-time: 99ms @ 216MHz, most efficient implementation possible
For 5Hz prediction rate: 105 MMACs 845ms
GAP8 Measurements:
21 MMACs: 4.2mJ 0.20mJ / MMACs
GAP8 is 7x – 30x more energy efficient
29/05/2020Michele Magno 34
STM32F746ZG Power Consumption Estimation
1CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs. 2018
Chip Power consumption @216MHz Min (periph. off) Max (periph. on)
Supply Current 102mA 205mA
Supply Voltage 1.7V 3.6V
Power Consumption 173mW 738mW
Power per MMACs 1.39mJ / MMACs 5.94mJ / MMACs
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich 29/05/2020Michele Magno 35
GAP8 Implementation: Run-Time, Power Consumption, FFT
||D-ITET Center for Project-Based Learning, Integrated Systems Laboratory, ETH Zürich
Algorithm adaptability to user:
Add half of test person data to training set: 5% increase in accuracy
Arm Cortex-M Implementation using CMSIS-NN
Gesture spotting capability:
Algorithm detects random noise (no hand) with 99.8% accuracy
Evaluation Continuous instructions and 1 instance instructions:
No difference in accuracy measured
Using only 1 sensor on 11 Gestures:
78.25% instead of 81.52%, drop of only 3%!29/05/2020Michele Magno 36
Further Evaluations
Copyright Notice
This presentation in this publication was presented as a tinyML® Talks webcast. The content reflects the opinion of the author(s) and their respective companies. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.
There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.
tinyML is a registered trademark of the tinyML Foundation.
www.tinyML.org