deep learning on mobile phones - a practitionersguide · deep learning on mobile phones - a...

113
Deep Learning on mobile phones - A Practitioners guide Anirudh Koul, Siddha Ganju, Meher Kasam

Upload: others

Post on 27-Dec-2019

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deep Learning on mobile phones- A Practitioners guide

Anirudh Koul, Siddha Ganju, Meher Kasam

Page 2: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision
Page 3: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deep Learning on mobile phones- A Practitioners guide

Anirudh Koul, Siddha Ganju, Meher Kasam

Page 4: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Anirudh Koul

@AnirudhKoulHead of AI & Research, Aira

[Lastname]@aira.io

Siddha Ganju

@SiddhaGanjuArchitect, Self-Driving Vehicles, NVIDIA

[FirstnameLastname]@gmail.com

Meher Anand Kasam

@MeherKasamSoftware Engineer, Square

[FirstnameMiddlenameK]@gmail.com

Page 5: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Why Deep Learning On Mobile?

Latency Privacy

Page 6: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Response Time Limits – Powers of 10

0.1 second : Reacting instantly

1.0 seconds : User ’s flow of thought

10 seconds : Keeping the user ’s attention

[Miller 1968; Card et al. 1991; Jakob Nielsen 1993]:

Page 7: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Mobile Deep Learning Recipe

Mobile Inference Engine + Pretrained Model = DL App

(Efficient) (Efficient)

Page 8: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a DL App in _ time

Page 9: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a DL App in 1 hour

Page 10: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Use Cloud APIs for General Recognition Needs

• Microsoft Cognitive Services

• Clarifai

• Google Cloud Vision

• IBM Watson Services

• Amazon Rekognition

Page 11: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

How to Choose a Computer Vision Based API?

Benchmark & Compare them

COCO-Text v2.0 for Text reading in the wild• ~2k random images• Candidate text has at least 2 characters together• Direct word match

COCO-Val 2017 for Image Tagging in the wild• ~4k random images• Tag similarity match instead of word match

Page 12: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Pricing

Page 13: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Recognize Text Benchmarks

Text API Accuracy

Amazon Rekognition 45.4%

Google Cloud Vision 33.4%

Microsoft Cognitive Services 55.4%

Evaluation criteria:• Photos have candidate words with at length>=2• Direct word match with ground truth

Page 14: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Image Tagging Benchmarks

Evaluation criteria:

• Concept similarity match instead of word match

• E.g. ‘military-officer ’ tag matched with ground truth tag ‘person’

Text API Accuracy

Amazon Rekognition 65%

Google Cloud Vision 47.6%

Microsoft Cognitive Services 50.0%

Page 15: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Image Tagging Benchmarks

Evaluation criteria:

• Concept similarity match instead of word match

• E.g. ‘military-officer ’ tag matched with ground truth tag ‘person’

Text API Accuracy Avg #Tags

Amazon Rekognition 65% 14

Google Cloud Vision 47.6% 14

Microsoft Cognitive Services 50.0% 8

Page 16: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Image Tagging Benchmarks

Hard to do Precision-Recall since COCO ground truth tags are not exhaustive

Lower # of tags for a given accuracy indicates higher F-measure

Text API Accuracy Avg #Tags

Amazon Rekognition 65% 14

Google Cloud Vision 47.6% 14

Microsoft Cognitive Services 50.0% 8

Page 17: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Tips for reducing network latency

• For Text Recognition• Compressing setting of upto 90% has little effect on accuracy, but drastic

savings in size

• Resizing is dangerous, text recognition needs a minimum size for recognition

• For image recognition• Resize to 224 as the minimum(height,width) at 50% compression with

bilinear interpolation

Page 18: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a DL App in 1 day

Page 19: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

http://deeplearningkit.org/2015/12/28/deeplearningkit-deep-learning-for-ios-tested-on-iphone-6s-tvos-and-os-x-developed-in-metal-and-swift/

Energy to train

Convolutional

Neural Network

Energy to use

Convolutional

Neural Network

Page 20: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Base Pretrained Model

ImageNet – 1000 Object Categorizer

VGG16

Inception-v3

Resnet-50

MobileNet

SqueezeNet

Page 21: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Running pre-trained models on mobile

Core ML

TensorFlow Lite

Caffe2

Page 22: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Apple’s Ecosystem

Metal BNNS +MPS CoreML CoreML2

2014 2016 2017 2018

Page 23: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Apple’s Ecosystem

Metal

- low-level, low-overhead hardware-accelerated 3D graphic and compute shader application programming interface (API)

- Available since iOS 8

Metal BNNS +MPS CoreML CoreML2

2014 2016 2017 2018

Page 24: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Apple’s Ecosystem

Fast low-level primitives:

• BNNS – Basic Neural Network Subroutine• Ideal case: Fully connected NN

• MPS – Metal Performance Shaders• Ideal case: Convolutions

Inconvenient for large networks:

• Inception-v3 inference consisted of 1.5K hard coded model definition

• Libraries Like Forge by Matthijs Hollemans provide abstraction

Metal BNNS +MPS CoreML CoreML2

2014 2016 2017 2018

Page 25: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Apple’s Ecosystem

Convert Caffe/Tensorflow model to CoreML model in 3 lines:

import coremltools

coreml_model = coremltools.converters.caffe.convert('my_caffe_model.caffemodel’)

coreml_model.save('my_model.mlmodel’)

Add model to iOS project and call for prediction.

Direct support for Keras, Caffe, scikit-learn, XGBoost, LibSVM

Automatically minimizes memory footprint and power consumption

Metal BNNS +MPS CoreML CoreML2

2014 2016 2017 2018

Page 26: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Apple’s Ecosystem

• Model quantization support upto 1 bit

• Batch API for improved performance

• Conversion support for MXNet, ONNX • ONNX opens models from PyTorch, Cognitive Toolkit, Caffe2, Chainer

• ML Create for quick training

• tf-coreml for direct conversion from tensorflow

Metal BNNS +MPS CoreML CoreML2

2014 2016 2017 2018

Page 27: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

CoreML Benchmark - Pick a DNN for your mobile architecture

Model Top-1 Accurac

y

Size of Model (MB)

iPhone 5SExecution Time (ms)

iPhone 6Execution Time (ms)

iPhone 6S/SE

Execution Time (ms)

iPhone 7 Execution Time (ms)

iPhone 8/X Execution Time (ms)

VGG 16 71 553 7408 4556 235 181 146

Inception v3 78 95 727 637 114 90 78

Resnet 50 75 103 538 557 77 74 71

MobileNet 71 17 129 109 44 35 33

SqueezeNet 57 5 75 78 36 30 29

2014 2015 2016

Huge improvement in GPU hardware in 2015

2013 2017

Page 28: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Putting out more frames than an art gallery

Page 29: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Ecosystem

TensorFlow TensorFlow Mobile TensorFlow Lite

2015 2016 2018

Page 30: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Ecosystem

The full, bulky deal

TensorFlow TensorFlow Mobile TensorFlow Lite

2015 2016 2018

Page 31: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Ecosystem

TensorFlow TensorFlow Mobile TensorFlow Lite

2015 2016 2018

Easy pipeline to bring Tensorflow models to mobile

Excellent documentation

Optimizations to bring model to mobile

Page 32: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Ecosystem

• Smaller

• Faster

• Minimal dependencies• Easier to package & deploy

• Allows running custom operators

1 line conversion from Keras to TensorFlow lite

• tflite_convert --keras_model_file=keras_model.h5 --output_file=foo.tflite

TensorFlow TensorFlow Mobile TensorFlow Lite

2015 2016 2018

Page 33: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite is small

• ~75KB for core interpreter

• ~400KB for core interpreter + supported operations

• Compared to 1.5MB for Tensorflow Mobile

Page 34: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite is fast

• Takes advantage of on-device hardware acceleration

• Uses FlatBuffers• Reduces code footprint, memory usage• Reduces CPU cycles on serialization and deserialization• Improves startup time

• Pre-fused activations• Combining batch normalization layer with previous Convolution

• Interpreter uses static memory and static execution plan• Decreases load time

Page 35: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite Architecture

Page 36: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite Benchmarks - http://alpha.lab.numericcal.com/

Page 37: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite Benchmarks - http://ai-benchmark.com/

• Crowdsourcing benchmarking with AI Benchmark android app• By Andrey Ignatov from ETH

• 9 Tests• E.g. Semantic Segmentation, Image Super Resolution, Face Recognition

Page 38: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

TensorFlow Lite acceleration – GPU delegate (dev preview)

Page 39: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Caffe2

From Facebook

Under 1 MB of binary size

Built for Speed :

For ARM CPU : Uses NEON Kernels, NNPack

For iPhone GPU : Uses Metal Performance Shaders and Metal

For Android GPU : Uses Qualcomm Snapdragon NPE (4-5x speedup)

ONNX format support to import models from CNTK/PyTorch

Page 40: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Caffe2

Page 41: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

MLKit

• Simple, easy to use

• Abstraction over TensorFlow Lite

• Built in Image Labeling, OCR, Face Detection, Barcode scanning, landmark detection, Smart reply

• Model management with Firebase• Upload model on web interface to distribute

• A/B Testing

Page 42: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

MLKit – Face Contours

By leveraging GPU delegate,

~4x speed up on Pixel 3

~6x speed up on iPhone7

Page 43: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Recommendation for production development

1. Train a model using Keras

2. Convert to Tensorflow Lite format

3. Upload to Firebase

4. Deploy to iOS/Android apps with MLKit

Keras

.tflite file

tflite_convert

Page 44: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Common Questions

“My app has become too big to download. What do I do?”

• iOS doesn’t allow apps over 150 MB to be downloaded

• Solution : Download on demand, and compile on device

• 0 MB change to app size on first install

Page 45: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Common Questions

“Do I need to ship a new app update with every model improvement?”

• Making App updates is a decent amount of overheard, plus ~2 days wait time

• Solution : Check for model updates, download and compile on device

• Easier solution – Use a framework for Model Management, e.g. • Google ML Kit

• Fritz

• Numerrical

Page 46: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Common Questions

“Why does my app not recognize objects at top/bottom of screen?”

• Solution : Check the cropping used, by default, its center crop ☺

Page 47: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a DL App in 1 week

Page 48: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Learn Playing an Accordion

3 months

Page 49: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Learn Playing an Accordion

3 months

Knows Piano

Fine Tune Skills

1 week

Page 50: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

I got a dataset, Now What?

Step 1 : Find a pre-trained model

Step 2 : Fine tune a pre-trained model

Step 3 : Run using existing frameworks

“Don’t Be A Hero” - Andrej Karpathy

Page 51: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

How to find pretrained models for my task?

Model Zoo

https://modelzoo.co

- 300+ models

Papers with Code

https://paperswithcode.com/sota

Page 52: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

AlexNet, 2012 (simplified)

[Krizhevsky, Sutskever,Hinton’12]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, 11

n-dimension

Feature

representation

Page 53: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deciding how to fine tune

Size of New Dataset Similarity to Original Dataset What to do?

Large High Fine tune.

Small High Don’t Fine Tune, it will overfit.

Train linear classifier on CNN Features

Small Low Train a classifier from activations in lower layers.

Higher layers are dataset specific to older dataset.

Large Low Train CNN from scratch

http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html

Page 54: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deciding when to fine tune

Size of New Dataset Similarity to Original Dataset What to do?

Large High Fine tune.

Small High Don’t Fine Tune, it will overfit.

Train linear classifier on CNN Features

Small Low Train a classifier from activations in lower layers.

Higher layers are dataset specific to older dataset.

Large Low Train CNN from scratch

http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html

Page 55: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deciding when to fine tune

Size of New Dataset Similarity to Original Dataset What to do?

Large High Fine tune.

Small High Don’t Fine Tune, it will overfit.

Train linear classifier on CNN Features

Small Low Train a classifier from activations in lower layers.

Higher layers are dataset specific to older dataset.

Large Low Train CNN from scratch

http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html

Page 56: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Deciding when to fine tune

Size of New Dataset Similarity to Original Dataset What to do?

Large High Fine tune.

Small High Don’t Fine Tune, it will overfit.

Train linear classifier on CNN Features

Small Low Train a classifier from activations in lower layers.

Higher layers are dataset specific to older dataset.

Large Low Train CNN from scratch

http://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html

Page 57: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Could you training your own classifier ... without coding?

• Microsoft CustomVision.ai• Unique: Under a minute training, Custom object detection (100x speedup)

• Google AutoML• Unique: Full CNN training, crowdsourced workers

• IBM Watson Visual recognition

• Baidu EZDL• Unique: Custom Sound recognition

Page 58: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Custom Vision Service (customvision.ai) – Drag and drop training

Tip : Upload 30 photos per class for make prototype model

Upload 200 photos per class for more robust production model

More distinct the shape/type of object, lesser images required.

Page 59: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Custom Vision Service (customvision.ai) – Drag and drop training

Tip : Use Fatkun Browser Extension to download images from Search Engine,

or use Bing Image Search API to programmatically download photos with

proper rights

Page 60: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

CoreML exporter from customvision.ai

– Drag and drop training

5 minute shortcut to training, finetuning and getting model ready in CoreML format

Drag and drop interface

Page 61: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a Crowdsourced Data Collector in 1 months

Page 62: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Barcode recognition from Seeing AI

Live Guide user in finding a barcode with audio cues

With

Server

Decode barcode to identify product

Tech MPSCNN running on mobile GPU + barcode library

Metrics 40 FPS (~25 ms) on iPhone 7

Aim : Help blind users identify products using barcode

Issue : Blind users don’t know where the barcode is

Page 63: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Currency recognition from Seeing AI

Aim : Identify currency

Live Identify denomination of paper currency instantly

With

Server

-

Tech Task specific CNN running on mobile GPU

Metrics 40 FPS (~25 ms) on iPhone 7

Page 64: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Training Data Collection App

Request volunteers to take photos of objects

in non-obvious settings

Sends photos to cloud, trains model nightly

Newsletter shows the best photos from volunteers

Let them compete for fame

Page 65: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Daily challenge - Collected by volunteers

Page 66: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Daily challenge - Collected by volunteers

Page 67: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a production DL App in 3 months

Page 68: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

What you want

https://www.flickr.com/photos/kenjonbro/9075514760/and http://www.newcars.com/land-rover/range-rover-sport/2016

$2000$200,000

What you can afford

Page 69: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2

3x3 conv, 384

3x3 conv, 384

3x3 conv, 256, pool/2

fc, 4096

fc, 4096

fc, 1000

AlexNet, 8 layers

(ILSVRC 2012)

Revolution of Depth

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015

Page 70: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2

3x3 conv, 384

3x3 conv, 384

3x3 conv, 256, pool/2

fc, 4096

fc, 4096

fc, 1000

AlexNet, 8 layers

(ILSVRC 2012)

3x3 conv, 64

3x3 conv, 64, pool/2

3x3 conv, 128

3x3 conv, 128, pool/2

3x3 conv, 256

3x3 conv, 256

3x3 conv, 256

3x3 conv, 256, pool/2

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, pool/2

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, pool/2

fc, 4096

fc, 4096

fc, 1000

VGG, 19 layers

(ILSVRC 2014)

input

Conv

7x7+ 2(S)

MaxPool

3x3+ 2(S)

LocalRespNorm

Conv

1x1+ 1(V)

Conv

3x3+ 1(S)

LocalRespNorm

MaxPool

3x3+ 2(S)

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

MaxPool

3x3+ 2(S)

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool

5x5+ 3(V)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool

5x5+ 3(V)

Dept hConcat

MaxPool

3x3+ 2(S)

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv

1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool

1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

AveragePool

7x7+ 1(V)

FC

Conv

1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max0

Conv

1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max1

Soft maxAct ivat ion

soft max2

GoogleNet, 22 layers

(ILSVRC 2014)

Revolution of Depth

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015

Page 71: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

AlexNet, 8 layers

(ILSVRC 2012)

ResNet, 152 layers

(ILSVRC 2015)

3x3 conv, 64

3x3 conv, 64, pool/2

3x3 conv, 128

3x3 conv, 128, pool/2

3x3 conv, 256

3x3 conv, 256

3x3 conv, 256

3x3 conv, 256, pool/2

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, pool/2

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, pool/2

fc, 4096

fc, 4096

fc, 1000

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2

3x3 conv, 384

3x3 conv, 384

3x3 conv, 256, pool/2

fc, 4096

fc, 4096

fc, 1000

1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x2 conv, 128, /2

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 256, /2

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 512, /2

3x3 conv, 512

1x1 conv, 2048

1x1 conv, 512

3x3 conv, 512

1x1 conv, 2048

1x1 conv, 512

3x3 conv, 512

1x1 conv, 2048

ave pool, fc 1000

7x7 conv, 64, /2, pool/2

VGG, 19 layers

(ILSVRC 2014)

Revolution of Depth

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015

Ultra deep

Page 72: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

ResNet, 152 layers 1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x1 conv, 64

3x3 conv, 64

1x1 conv, 256

1x2 conv, 128, /2

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 128

3x3 conv, 128

1x1 conv, 512

1x1 conv, 256, /2

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 256

3x3 conv, 256

1x1 conv, 1024

1x1 conv, 512, /2

3x3 conv, 512

1x1 conv, 2048

1x1 conv, 512

3x3 conv, 512

1x1 conv, 2048

1x1 conv, 512

3x3 conv, 512

1x1 conv, 2048

ave pool, fc 1000

7x7 conv, 64, /2, pool/2

Revolution of Depth

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015

Page 73: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

28.2

25.8

16.4

11.7

7.3 6.7

3.6 2.9

ILSVRC'10 ILSVRC'11 ILSVRC'12AlexNet

ILSVRC'13 ILSVRC'14VGG

ILSVRC'14GoogleNet

ILSVRC'15ResNet

ILSVRC'16Ensemble

ImageNet Classification top-5 error (%)

shallow 8 layers

19 layers 22 layers

152 layers

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015

Revolution of Depth vs Classification Accuracy

Ensemble of

Resnet, Inception Resnet, Inception and Wide Residual Network

Page 74: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Accuracy vs Operations Per Image Inference

Size is proportional to num parameters

Alfredo Canziani, Adam Paszke, Eugenio Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications” 2016

552 MB

240 MB

What we want

Page 75: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Your Budget - Smartphone Floating Point Operations Per Second (2015)

http://pages.experts-exchange.com/processing-power-compared/

Page 76: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

iPhone X is more powerful than a Macbook Pro

https://thenextweb.com/apple/2017/09/12/apples-new-iphone-x-already-destroying-android-devices-g/

Page 77: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Strategies to get maximum efficiency from your CNN

Before training

• Pick an efficient architecture for your task

• Designing efficient layers

After training

• Pruning

• Quantization

• Network binarization

Page 78: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

CoreML Benchmark - Pick a DNN for your mobile architecture

Model Top-1 Accura

cy

Size of Model (MB)

Million Multi Adds

iPhone 5SExecution Time (ms)

iPhone 6Execution Time (ms)

iPhone 6S/SE

Execution Time (ms)

iPhone 7 Execution Time (ms)

iPhone 8/X

Execution Time (ms)

VGG 16 71 553 15300 7408 4556 235 181 146

Inception v3

78 95 5000 727 637 114 90 78

Resnet 50 75 103 3900 538 557 77 74 71

MobileNet 71 17 569 129 109 44 35 33

SqueezeNet

57 5 800 75 78 36 30 29

2014 2015 2016

Huge improvement in GPU hardware in 2015

2013 2017

Page 79: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

MobileNet family

Splits the convolution into a 3x3 depthwise conv and a 1x1 pointwise conv

Tune with two parameters – Width Multiplier and resolution multiplier

Andrew G. Howard et al, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, 2017

Page 80: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Efficient Classification Architectures

https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html

MobileNetV2 is the current favorite

Page 81: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Efficient Detection Architectures

Jonathan Huang et al, "Speed/accuracy trade-offs for modern convolutional object detectors”, 2017

Page 82: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Efficient Detection Architectures

Jonathan Huang et al, "Speed/accuracy trade-offs for modern convolutional object detectors”, 2017

Page 83: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Efficient Segmentation Architectures

ICNet - Image cascade network

Page 84: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Tricks while designing your own network

• Dilated Convolutions• Great for Segmentation / when target object has high area in image

• Replace NxN convolutions with Nx1 followed by 1xN

• Depth wise Separable Convolutions (e.g. MobileNet)

• Inverted residual block (e.g. MobileNetV2)

• Replacing large filters with multiple small filters• 5x5 is slower than 3x3 followed by 3x3

Page 85: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Design consideration for custom architectures – Small Filters

Three layers of 3x3 convolutions >>

One layer of 7x7 convolution

Replace large 5x5, 7x7 convolutions with stacks of 3x3 convolutions

Replace NxN convolutions with stack of 1xN and Nx1

Fewer parameters ☺

Less compute ☺

More non-linearity ☺

Better

Faster

Stronger

Andrej Karpathy, CS-231n Notes, Lecture 11

Page 86: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Selective training to keep networks shallow

Idea : Augment data limited to how your network will be used

Example : If making a selfie app, no benefit in rotating training images beyond +-45 degrees. Your phone will anyway rotate.

Followed by WordLens / Google Translate

Example : Add blur if analyzing mobile phone frames

Page 87: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Pruning

Aim : Remove all connections with absolute weights below a threshold

Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015

Page 88: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Observation : Most parameters in Fully Connected Layers

AlexNet 240 MB VGG-16 552 MB

96% of all parameters

90% of all parameters

Page 89: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Pruning gets quickest model compression without accuracy loss

AlexNet 240 MB VGG-16 552 MB

First layer which directly interacts with image is sensitive and cannot be pruned too much without hurting accuracy

Page 90: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Prune in Keras (Before)

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(512, activation=tf.nn.relu),tf.keras.layers.Dropout(0.2),tf.keras.layers.Dense(10, activation=tf.nn.softmax)

])model.compile( optimizer='adam’,

loss= ‘sparse_categorical_crossentropy’,metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)model.evaluate(x_test, y_test)

Page 91: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Prune in Keras (After)

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),

prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)),tf.keras.layers.Dropout(0.2),

prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax))])model.compile( optimizer='adam’,

loss= ‘sparse_categorical_crossentropy’,metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)model.evaluate(x_test, y_test)

Page 92: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Weight Sharing

Idea : Cluster weights with similar values together, and store in a dictionary.

Codebook

Huffman coding

HashedNets

Cons: Need a special inference engine, doesn’t work for most applications

Page 93: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Filter Pruning - ThiNet

Idea : Discard whole filter if not important to predictions

Advantage:

• No change in architecture, other than thinning of filters per layer

• Can be further compressed with other methods

Just like feature selection, select filter to discard. Possible greedy methods:

• Absolute weight sum of entire filter closest to 0

• Average percentage of ‘Zeros’ as outputs

• ThiNet – Collect statistics on the output of the next layer

Page 94: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Quantization

Reduce precision from 32 bits to <=16 bits or lesser

Use stochastic rounding for best results

In Practice:

• Ristretto + Caffe• Automatic Network quantization• Finds balance between compression rate and accuracy

• Apple Metal Performance Shaders automatically quantize to 16 bits

• Tensorflow has 8 bit quantization support• Gemmlowp – Low precision matrix multiplication library

Page 95: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Quantizing CNNs in Practice

Reducing CoreML models to half size

# Load a model, lower its precision, and then save the smaller model.

model_spec = coremltools.utils.load_spec(‘model.mlmodel’)model_fp16_spec = coremltools.utils.convert_neural_network_spec_weights_to_fp16(model_spec)coremltools.utils.save_spec(model_fp16_spec, ‘modelFP16.mlmodel')

Page 96: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Quantizing CNNs in Practice

Reducing CoreML models to even smaller size

Choose bits and quantization mode

Bits from [1,2,4,8]

Quantization mode from [“linear","linear_lut","kmeans_lut",”custom_lut”]

• Lut = look up table

from coremltools.models.neural_network.quantization_utils import *quantized_model= quantize_weights(model, 8, 'linear')quantized_model.save('quantizedModel.mlmodel’)compare_model(model, quantized_model, './sample_data/')

Page 97: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Binary weighted Networks

Idea :Reduce the weights to -1,+1

Speedup : Convolution operation can be approximated by only summation and subtraction

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 98: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Binary weighted Networks

Idea :Reduce the weights to -1,+1

Speedup : Convolution operation can be approximated by only summation and subtraction

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 99: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Binary weighted Networks

Idea :Reduce the weights to -1,+1

Speedup : Convolution operation can be approximated by only summation and subtraction

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 100: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

XNOR-Net

Idea :Reduce both weights + inputs to -1,+1

Speedup : Convolution operation can be approximated by XNOR and Bitcount operations

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 101: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

XNOR-Net

Idea :Reduce both weights + inputs to -1,+1

Speedup : Convolution operation can be approximated by XNOR and Bitcount operations

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 102: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

XNOR-Net

Idea :Reduce both weights + inputs to -1,+1

Speedup : Convolution operation can be approximated by XNOR and Bitcount operations

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”

Page 103: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

XNOR-Net on Mobile

Page 104: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Battery free, solar powered AI Device from XNOR.AI

Page 105: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Challenges

Off the shelf CNNs not robust for video

Solutions:

• Collective confidence over several frames

• CortexNet

Page 106: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Building a DL App and get $10 million in funding

(or a PhD)

Page 107: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Competitions to follow

Winners = High accuracy + Low energy consumption

* LPIRC - Low-Power Image Recognition Challenge

* EDLDC - Embedded deep learning design contest

* System Design Contest at Design Automation Conference (DAC)

Page 108: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

AutoML – Let AI design an efficient AI architecture

MnasNet: Platform-Aware Neural Architecture Search for Mobile

• An automated neural architecture search approach for designing mobile models using reinforcement learning

• Incorporates latency information into the reward objective function

• Measure real-world inference latency by executing on a particular platform

Sample models

from search space TrainerMobile

phones

Multi-objective

reward

latency

reward

Controller

accuracy

Page 109: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

AutoML – Let AI design an efficient AI architecture

For same accuracy:

• 1.5x faster than MobileNetV2

• ResNet-50 accuracy with 19x less parameters

• SSD300 mAP with 35x less FLOPs

Page 110: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

Mr. Data Scientist PhD

Page 111: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

One Last Question

Page 112: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision

How to access the slides in 1 second

http://bit.ly/ml-slides@anirudhkoul

Page 113: Deep Learning on mobile phones - A Practitionersguide · Deep Learning on mobile phones - A Practitionersguide Anirudh Koul, Siddha Ganju, Meher Kasam. ... •Google Cloud Vision