"deep-learning-based visual perception in mobile and embedded devices: opportunities and...
TRANSCRIPT
1
Jeff Gehlhaar, Vice President, Qualcomm Research
May 12, 2015
Deep-learning-based visual perception
in mobile and embedded devices:
Opportunities and challenges
4
Key elements of “Cognition”
Hear Anticipate See Plan Concepts
Autonomous Classify Infer context
Relationships
Perception Action Reasoning
5
Rich Connectivity
Heterogeneous Computing
On-device Intelligence
On-device capabilities
• Integrated modem & AP
• Adaptive RF front end
• LTE broadcast & service focused modem features
• Tightly integrated Wi-Fi/BT
• Leading location / GPS
• Fully customized architecture
• Superior performance at low power consumption
• Highly optimized for cutting-edge cognitive capabilities
• On-device machine learning
• Computer vision
• Behavioral analysis
• Sensor processing and classification algorithms
• Natural language processing
Visual
Perception Speech & Audio Understanding
Natural Interactions
Intelligent
Connectivity Immersive Multimedia
Intuitive
Security Always On Awareness
On the road to a “Cognitive Platform”
6
On-device visual perception is key
Democratizing
robotics to assist
us in daily lives
Revolutionizing
transportation with
autonomous cars
Contextualizing your
environment through scene
understanding
7
Process data closest to the source, complement cloud
Why fully on-device matters
Reliability
Efficient use of network bandwidth
Low Latency
Security and user privacy
8
• Qualcomm Technologies, Inc. has been applying
machine learning to mobile for many years
• Deep learning for visual perception
• Provides best-in-class solutions
• Traditionally a cloud-only solution, but not on
mobile (until now)
• Presents many implementation challenges
• Our mobile focused platform goes beyond deep
learning to include RNNs and other strategies
• Applications: Security, handwriting, natural
language processing, etc.
Deep learning solves visual perception
C C C C C C
C C C C C C
Pooling
Fully Connected
Result
Deep Network
10
Typical computing environment for deep
learning
Performance
Teraflops
Memory
bandwidth
100s of GB/s
Storage
10s of GBs of RAM
Power
100s of watts
Best-in-class server-based visual perception models
require about ~2B MAC operations per image
11
Supporting deep learning on-device is
a major challenge
Power and thermal efficiency
Storage and memory bandwidth limitations
Battery powered
Constrained mobile environment
Visual perception
workloads
Compute intensive
Large and complicated neural network models
12
Within the power and thermal
constraints of mobile devices
Solving the challenge of
on-device visual perception
14
Robot face tracking video
https://www.youtube.com/watch?v=0D9I0SBGAPY
15
Key to deep learning on mobile is an efficient execution environment that considers all aspects of the SoC combined with efficient library implementations
• Careful analysis of deep learning tradeoffs
• Consider the impact of different network architectures
• Focus on cache performance, data locality, DRAM utilization efficiency
• Focus on parallelism and heterogeneity
• Take advantage of heterogeneous computing frameworks (e.g. Qualcomm MARE)
• Span execution across Qualcomm® Snapdragon™ CPU, DSP, and GPU
• Focus on underlying optimizations
• Convolutions implemented as highly efficient matrix multiply operations
• Smart buffer management for GPU and fixed bit-width optimizations for DSP
• Optimized matrix multiply for Snapdragon processors1
• 6X faster than Eigen
Efficient execution on mobile SoCs
1. Results are based on Snapdragon 805 processor and Eigen 3.2.2
Qualcomm Snapdragon and Qualcomm Multicore Asynchronous Runtime Environment are products of Qualcomm Technologies, Inc.
16
Goal
Reduce both physical size and number of MACs required at equivalent precision
• Utilize available memory bandwidth, computations effectively -> power efficiency
• Smaller size permits in-field model upgrades and improvements
Reducing model size through compression
C C C C C C
C C C C C C
Pooling
Fully Convolution
Result
Deep Network
Qualcomm Technologies, Inc.
approach • Initial SVD approach based on a paper by
Denton, et. al. of NYU1
• Qualcomm Technologies Inc. approach
involves replacing single layers with
multiple layers
• Approach permits fine-tuning all layers,
not just layers above compressed layers
Results
• Up to a 10X reduction in physical
model size
• Up to a 35% reduction in the
number of MAC operations with
minimal lost of precision
1. “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation”, arXiv:1404.0736 [cs.CV]
17
Size compression and error rate impact
FC Layer Compressed
Original Network
FC and Conv Layer
Compressed
Fully connected layer compression significantly impacts physical network size
10X size reduction
~1% pt loss in top5 error
18
MAC compression and error rate impact
FC Layer Compressed
Original Network
FC and Conv Layer
Compressed Compression ~ 35% MAC reduction
~ 1.3% pt loss in top5 error
Fine Tuning 2.5% pt improvement in
top5 error under max
MAC constraints
AlexNet
Convolutional layer compression significantly impacts MAC requirements
19
Focus on reduction of precision for both weights (static value) and
activations (dynamic values) versus traditional 32-bit floating approaches
• Physically smaller networks
• 2X improvement in memory access efficiency for network weights
Fixed point and reduced bit widths
16-bit values are used with no net increase in top-5 error
Act
ivat
ion
Bit
Wid
ths Neural Network Weight Bit Widths
4 8 16 24 32 Float
8 20.0% 1.4% 0.1% 0.1% 0.1% 0.1%
16 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
24 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
32 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
Float 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
0.0%
21
Expanding the frontier of visual perception
• More complex models
• Video classification
• Scene parsing and object localization and tracking
Platform enhancements
• Evolution of the SoC
Working towards “Cognition”
• Qualcomm Research is experimenting with algorithms for
“reasoning” to link perception to action
What comes next?
22
• Qualcomm Technologies, Inc. web sites: • Computer Vision: https://www.qualcomm.com/invention/research/projects/computer-vision
• Cognitive Technologies: https://www.qualcomm.com/invention/cognitive-technologies
• FastCV™ SDK: : https://developer.qualcomm.com/mobile-development/add-advanced-
features/computer-vision-fastcv/tools-and-resources
• Embedded Vision Alliance web sites: • Heterogeneous computing for CV: http://www.embedded-vision.com/platinum-
members/qualcomm/embedded-vision-training/videos/pages/oct-2013-embedded-vision-
summit-heterogeneous
• CV acceleration: http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-
training/videos/pages/september-2013-qualcomm-uplinq-conferenc
• Demo in Technology Showcase • Scene detect through on-device deep learning
Additional resources
FastCV is a product of Qualcomm Technologies, Inc.
Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries.
FastCV is a trademark of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with permission.
Other products and brand names may be trademarks or registered trademarks of their respective owners.