infuse computer vision · 2019-09-12 · infuse computer vision into your apps balancing...

INFUSE

COMPUTER VISION INTO

YOUR APPSBALANCING TECHNOLOGY, SKILLS AND INVESTMENT

SOFTWARE ARCHITECTURE CONFERENCE 2019JAKARTA, 2-3 AUGUST 2019

Hello World!Interests• Core Banking Operations & Optimization• Card Payment & EMV Standards• UNIX System Programming • Performance Engineering• Deep Learning, Computer Vision

Favorite Toolsgcc, g++, python, dbx, gdb, valgrind, gprof, purify, make, tensorflow, darknet, vim, powerpoint

http://www.github.com/ngitohttp://www.slideshare.net/ngito/

Nama saya:Gito

http://www.github.com/ngito

http://www.slideshare.net/ngito/

COMPUTER VISION

3

1 2 3 4

5 6 7 8

9 10 11 12

5

1 2 3 4

5 6 7 8

9 10 11 12

6

https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/

https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/

7

Estimated:68 PEOPLE

Computer Vision in Public Sector

9

Computer Vision

Making computers gain high-level understanding from digital imagesor videos.

It tries to achieve human visualsystem can do.

11

Computer Vision SubsystemsWe will discuss further on Image Classification & Object DetectionImage

enhancement

Transformations

Filtering

Visual recognition

Pose estimation

Color vision

Registration

Feature extraction

AI vsMachine Learning vsDeep LearningTERMINOLOGIES

If it is written in Python, It’s probably machine learning

If it is written in PowerPoint,It’s probably AIhttps://twitter.com/matvelloso/status/1065778379612282885?lang=en

https://twitter.com/matvelloso/status/1065778379612282885?lang=en

13

Definitions

Machine Learning

1. Technique for realizing AI2. Enable machines to learn using the provided data and make accurate

predictions3. Implemented in multiple algorithms to solve different problems

Deep Learning

1. Subset and next evolution of (Supervised) Machine Learning2. Inspired by the patterns processing found in the human brain3. Implemented in Neural Network to solve different problems4. Triggers new chip design to handle Deep Learning workload

AI

1. Science and engineering of making intelligent machines and computer programs

2. There are many definitions of AI as of now but as a philosophy, AI is defined as future vision that is unattainable to ensure continuous improvement in multiple disciplines

https://plato.stanford.edu/entries/artificial-intelligence/http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html

https://plato.stanford.edu/entries/artificial-intelligence/

http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html

14

AI Evolution

https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon

https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon

15

AI General Categories

Machine LearningSupervised LearningUnsupervised LearningReinforced Learning

SpeechSpeech to Text

Text to Speech

…

VisionClassification

Object Detection

Instance Segmentation

Face Recognition

…

LanguageClassification

Extraction

Understanding

Translation

…

Robotics

MechanicalElectricalControl

KinematicsMotion

Expert Systems

Knowledge BaseInference EngineReasoning

Deep LearningConvolutional Neural NetworkGenerative Adversarial NetworkRecurrent Neural Network…

Non-exhaustive list

16

Machine Learning & Deep Learning Characteristics

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063

Data Source

Number of Features

Feature Engineering

Hardware

Training Time

Technique

Machine Learning

Tabular

Dozens - Hundreds

By domain expert

General purpose CPU

Minutes - Hours

Multiple ML Algorithm

Deep Learning

Unstructured & Complex Structure(Image, Video, Speech, Text)

Millions

Automated with Feature Extraction

Custom chip (GPU, FPGA, ASIC)

Hours - Weeks

Neural Network Architecture

Non-exhaustive list

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063

17

Major Shift to Deep Learning

1. Region Proposals (R-CNN & Fast R-CNN)

2. Single Shot Multibox Detector (SSD)

3. You Only Look Once (YOLO)

4. …

Vis

ion

Deep Learning

1. WaveNet TTS

2. DeepSpeech STT

3. …Sp

ee

ch

1. Viola Jones

2. Scale Invariant Feature Transform (SIFT)

3. Histogram of Oriented Gradients (HOG)

4. …

Machine Learning

1. Speech Synthesis TTS

2. Hidden Markov Model based STT

3. …

1. Statistical Machine Translation (SMT)

2. …

1. Neural Machine Translation (NMT)

2. …

Tra

ns

late

Non-exhaustive list

Hardware advancements, after GPU being used as Deep Learning accelerator after 2012, has trigger and accelerate conversion some applications from Machine Learning to Deep Learning.

18https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e

https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e

19

Deep Learning InfluencerNon-exhaustive list

Yann LeCun, Chief Scientist FacebookLeNet, first neural network to

Andre NgStanford, Baidu, GoogleBrain behind Google Brain

Li Fei FeiStanford, GoogleImageNet Database

Alex KrizhevskyGoogleAlexNet first neural network runs on GPU

Joseph RedmonYOLO Real-time Object Detection

20

Deep Learning Revolution2012, AlexNetFirst Convolutional Neural Network

(CNN) implementation utilizing GPU and

won object classification ImageNet competition.

08 8

22

152

25.81%

15.30% 14.80%

6.70%3.60%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0

20

40

60

80

100

120

140

160

XRCE

(2011)

AlexNet

(2012)

ZFNet

(2013)

GoogleNet

(2014)

ResNet

(2015)

Err

or

Ra

te

Nu

mb

er

of

La

ye

rs

ImageNet (ILSVRC) Competition Winner(Object Classification)

Number of layers

1. More layers2. More training data3. More computing power4. More accuracy

Neural Network

Deeper Neural NetworkLayers

Lower Error Rates

Why CPU IS NOT ENOUGHCPUs are designed for a wide variety of applications and to provide fast response times to a single task.

GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism.

Cores

Transistor size

RAM Type

Strength

5120 (FP32), 640 (Tensor Core)

12 nm

22

14 nm

GDDR6GDDR4

Thousands simpler cores for parallelprocessing

Few complex cores for generalprocessing

V100

Processor GPU, Nvidia Tesla V100CPU, Intel Xeon Gold

22

Deep Learning Triggers new Chip Design

INFUSING COMPUTER VISIONINTO YOUR APPS

23

24Software stack Illustration when adding Deep Learning Capability to any General Software, On Premise & IaaS

Non-exhaustive list

Infrastructure

Virtualization

OS

Middleware

Runtime

Applications

General Software Stack

CPU, RAM, Storage, Network

Hypervisor / Container

Linux / Windows

Application Server

Java, .NET, Python, PHP, NodeJS

Web Application

Deep Learning Stack

Accelerator (NVIDIA/AMD GPU, Intel FPGA)

Accelerator Virtualization(NVIDIA/AMD Virtual GPU, Intel Virtual FPGA)

Accelerator Driver (NVIDIA Driver/Intel FPGA Driver)

Accelerator Library (NVIDIA CUDA, Intel OpenVino)

Keras, TensorFlow, Caffe, PyTorch, CNTK + Model

Computer Vision, Speech, Language

Computer Vision runs on top of Deep Learning Stack

Computer Vision High Level Integration

Enterprise ApplicationOwner : Development TeamProcess : Software Development Lifecycle

(Waterfall, Agile, other)Type : Web App, Mobile App, other

Your Applications

Computer Vision SystemOwner : Data Science TeamProcess : Machine Learning Lifecycle

(OSEMN, CRISP-DM, TDSP)

Inference Engine

Training Engine

Training & Test Dataset

Trained Model

API

Highly simplified

Image / Video

26

Computer Vision Life Cycle

Deep Learning Framework

Training at Server

Initial Weight Trained ModelNeural Network

Model

Image ClassificationGoogleNetResNet

Object DetectionR-CNN Based SSDYOLO

Object SegmentationMasked R-CNN

Training Parameters

Training ParametersLearning Rate

Epoch

Batch size

Intersection of Union

Average Loss

Mean Avg Precision

Training & Test Data

Public Data SetCOCOImageNetPascal VOC

Research Dataset

Custom Data SetImage CollectionPrivately Labelled

Inference at Server / Edge

Neural Network Model

Trained ModelInference Parameter

PredictionsInput Image /

Video

1

2

Technology

Business Value

Investment

Skill

BALANCING TECHNOLOGY, SKILLSAND INVESTMENT

1. Understand business impact associated with economic dollar values (example: cost saving, revenue increase)

2. Business investment must consider several key capability the company is planning to invest, such as IT Infrastructure, IT Skill and time to market

Business ImpactExample

28

Identify Business Impact

Economic ValueExample

Increase Customer Loyalty

Improve Customer Experience

Improve Operational Effectiveness

Drive Sales

Higher repeat order by 15%

Longer engagement

Reduce operational cost by 17%

Sales Increase 25% YoY

Highly simplified

30

Building Computer Vision Capability Considerations

Model & Dataset

Skill

Infrastructure

Custom 1. Collect / create image data sets2. Create custom model based on custom data sets

Data Science & Developer Team1. Build full data science team in house to support

current or future data science capability2. Data science and Developer team work together to

integrate deep learning software with enterprise applications

Buy1. Buy infrastructure for on-premise deployment2. Maintain hardware infrastructure & deep learning

software stack

Existing1. Use existing image data sets from community/cloud

provider2. Use existing model from community/cloud provider

Developer Team Only1. No plan to build data science team, since current or future

requirement don’t justify to have data science team2. Train developer team to use deep learning software up to

API integration, or use deep learning software that has been pre-configured specifically for user without prior data science knowledge

Rent1. Rent deep learning infrastructure from Cloud

Provider2. Choose between deep learning IaaS (full control

over deep learning infrastructure) or PaaS (partial control up to model & datasets)

Computer Vision as a ServiceAvailable options between technology stack, investment, skills, time to market

Non-exhaustive list

Deep Learning Runtime

Network Model

Training Data

Label Tools + Training Tools

API

Client Managed

Cloud Provider Managed31

Skills

Develop custom deep learning code

Design, build or choose the right Neural Network

Neural Network Training Parameters

Image Labeling. Human or

Automated

API Integration, REST API

Rent IaaS / Buy On Premise

Keras, TensorFlow, Caffe, PyTorch, CNTK

R-CNN, SSD, YOLO

Custom Image Collection

labelImg, OpenLabeling

Rest API / Stream API

Rent PaaS Training & Inference


R-CNN, SSD, YOLO


Web based


Rent PaaS Inference


R-CNN, SSD, YOLO

Provided

N/A


Lower investmentLower barrier to entry

Lower CV skillsLimited Inference Capability

No/Less control to training data

Higher investment costHigher barrier to entryHigher CV skillsRIch Inference CapabilityTotal control to training data

Computer Vision as a ServiceAvailable options between technology stack, investment, skills, time to market

Non-exhaustive list

Deep Learning Runtime

Network Model

Training Data

Label Tools + Training Tools

API

Client Managed

Cloud Provider Managed32

Skills

Develop custom deep learning code

Design, build or choose the right Neural Network

Neural Network Training Parameters

Image Labeling. Human or

Automated

API Integration, REST API

Rent IaaS / Buy On Premise


R-CNN, SSD, YOLO


labelImg, OpenLabeling


Rent PaaS Training & Inference


R-CNN, SSD, YOLO


Web based


Rent PaaS Inference


R-CNN, SSD, YOLO

Provided

N/A


Lower investmentLower barrier to entry

Lower CV skillsLimited Inference Capability

No/Less control to training data

Higher investment costHigher barrier to entryHigher CV skillsRIch Inference CapabilityTotal control to training data

awsRekognition

GoogleAuto-ML

azureCustom-vision

IBM WatsonStudio

awsVision

GoogleCloud Vision

azureVision

IBM Visual Recognition

Deep Learning

PC

Deep Learning

Server

33

Build your own CV Enabled Apps

Find your cool use cases1. Image Search2. Video Analytics3. Pet Counter4. Rat Trap5. Parking Detection6. Match making7. … and more

…Build your dataset

1. Use public dataset2. Build image collection3. Label image4. Augment your Image

… Train

and

Integrate…

… Learn the BasicsStanford UniversityCS231n: Convolutional Neural Networks for Visual Recognition

DEMO!OBJECT DETECTIONPOSE ESTIMATION

34

Q&A

35

TERIMA KASIH

infuse computer vision · 2019-09-12 · infuse computer vision into your apps balancing...

Documents