infuse computer vision · 2019-09-12 · infuse computer vision into your apps balancing...
TRANSCRIPT
INFUSE
COMPUTER VISION INTO
YOUR APPSBALANCING TECHNOLOGY, SKILLS AND INVESTMENT
SOFTWARE ARCHITECTURE CONFERENCE 2019JAKARTA, 2-3 AUGUST 2019
Hello World!Interests• Core Banking Operations & Optimization• Card Payment & EMV Standards• UNIX System Programming • Performance Engineering• Deep Learning, Computer Vision
Favorite Toolsgcc, g++, python, dbx, gdb, valgrind, gprof, purify, make, tensorflow, darknet, vim, powerpoint
http://www.github.com/ngitohttp://www.slideshare.net/ngito/
Nama saya:Gito
6
https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/
9
Computer Vision
Making computers gain high-level understanding from digital imagesor videos.
It tries to achieve human visualsystem can do.
10
Computer Vision Applications
Machine VisionActivity
RecognitionTraffic Monitor &
Enforcement
Computer Vision Applications| 3D reconstruction from multiple images | 3D selfie | Artificial intelligence for video surveillance | Audio-visual speech recognition | Augmented
reality | Augmented reality-assisted surgery | Automated Lip Reading | Automated optical inspection | Automatic image annotation | Automatic number-plate
recognition | Automatic target recognition | Check weigher | Closed-circuit television | Computer stereo vision | Content-based image retrieval |
Contextual image classification | DARPA LAGR Program | Deepfake | Digital video fingerprinting | Document mosaicing | Fingerprint recognition | Free
viewpoint television | Fyuse | GazoPa | Geometric feature learning | Gesture recognition | Image collection exploration | Image retrieval | Image-based
modeling and rendering | Intelligent character recognition | Iris recognition | Machine translation of sign languages | Machine vision | Mobile mapping |
Morphing | Object Co-segmentation | Object detection | Optical braille recognition | Optical character recognition | Pedestrian detection | People
counter | Physical computing | Positional tracking | Red light camera | Reverse image search | Scale-invariant feature operator | Smart camera | This
Person Does Not Exist | Traffic enforcement camera | Traffic-sign recognition | Vehicle infrastructure integration | Velocity Moments | Video content
analysis | View synthesis | Applications of virtual reality | Visual sensor network | Visual temporal attention | Visual Word | Water remote sensing |
Face Recognition Self Driving Car
11
Computer Vision SubsystemsWe will discuss further on Image Classification & Object DetectionImage
enhancement
Transformations
Filtering
Visual recognition
Pose estimation
Color vision
Registration
Feature extraction
AI vsMachine Learning vsDeep LearningTERMINOLOGIES
If it is written in Python, It’s probably machine learning
If it is written in PowerPoint,It’s probably AIhttps://twitter.com/matvelloso/status/1065778379612282885?lang=en
13
Definitions
Machine Learning
1. Technique for realizing AI2. Enable machines to learn using the provided data and make accurate
predictions3. Implemented in multiple algorithms to solve different problems
Deep Learning
1. Subset and next evolution of (Supervised) Machine Learning2. Inspired by the patterns processing found in the human brain3. Implemented in Neural Network to solve different problems4. Triggers new chip design to handle Deep Learning workload
AI
1. Science and engineering of making intelligent machines and computer programs
2. There are many definitions of AI as of now but as a philosophy, AI is defined as future vision that is unattainable to ensure continuous improvement in multiple disciplines
https://plato.stanford.edu/entries/artificial-intelligence/http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html
14
AI Evolution
https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon
15
AI General Categories
Machine LearningSupervised LearningUnsupervised LearningReinforced Learning
SpeechSpeech to Text
Text to Speech
…
VisionClassification
Object Detection
Instance Segmentation
Face Recognition
…
LanguageClassification
Extraction
Understanding
Translation
…
Robotics
MechanicalElectricalControl
KinematicsMotion
Expert Systems
Knowledge BaseInference EngineReasoning
Deep LearningConvolutional Neural NetworkGenerative Adversarial NetworkRecurrent Neural Network…
Non-exhaustive list
16
Machine Learning & Deep Learning Characteristics
https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063
Data Source
Number of Features
Feature Engineering
Hardware
Training Time
Technique
Machine Learning
Tabular
Dozens - Hundreds
By domain expert
General purpose CPU
Minutes - Hours
Multiple ML Algorithm
Deep Learning
Unstructured & Complex Structure(Image, Video, Speech, Text)
Millions
Automated with Feature Extraction
Custom chip (GPU, FPGA, ASIC)
Hours - Weeks
Neural Network Architecture
Non-exhaustive list
17
Major Shift to Deep Learning
1. Region Proposals (R-CNN & Fast R-CNN)
2. Single Shot Multibox Detector (SSD)
3. You Only Look Once (YOLO)
4. …
Vis
ion
Deep Learning
1. WaveNet TTS
2. DeepSpeech STT
3. …Sp
ee
ch
1. Viola Jones
2. Scale Invariant Feature Transform (SIFT)
3. Histogram of Oriented Gradients (HOG)
4. …
Machine Learning
1. Speech Synthesis TTS
2. Hidden Markov Model based STT
3. …
1. Statistical Machine Translation (SMT)
2. …
1. Neural Machine Translation (NMT)
2. …
Tra
ns
late
Non-exhaustive list
Hardware advancements, after GPU being used as Deep Learning accelerator after 2012, has trigger and accelerate conversion some applications from Machine Learning to Deep Learning.
18https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e
19
Deep Learning InfluencerNon-exhaustive list
Yann LeCun, Chief Scientist FacebookLeNet, first neural network to
Andre NgStanford, Baidu, GoogleBrain behind Google Brain
Li Fei FeiStanford, GoogleImageNet Database
Alex KrizhevskyGoogleAlexNet first neural network runs on GPU
Joseph RedmonYOLO Real-time Object Detection
20
Deep Learning Revolution2012, AlexNetFirst Convolutional Neural Network
(CNN) implementation utilizing GPU and
won object classification ImageNet competition.
08 8
22
152
25.81%
15.30% 14.80%
6.70%3.60%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0
20
40
60
80
100
120
140
160
XRCE
(2011)
AlexNet
(2012)
ZFNet
(2013)
GoogleNet
(2014)
ResNet
(2015)
Err
or
Ra
te
Nu
mb
er
of
La
ye
rs
ImageNet (ILSVRC) Competition Winner(Object Classification)
Number of layers
1. More layers2. More training data3. More computing power4. More accuracy
Neural Network
Deeper Neural NetworkLayers
Lower Error Rates
Why CPU IS NOT ENOUGHCPUs are designed for a wide variety of applications and to provide fast response times to a single task.
GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism.
Cores
Transistor size
RAM Type
Strength
5120 (FP32), 640 (Tensor Core)
12 nm
22
14 nm
GDDR6GDDR4
Thousands simpler cores for parallelprocessing
Few complex cores for generalprocessing
V100
Processor GPU, Nvidia Tesla V100CPU, Intel Xeon Gold
24Software stack Illustration when adding Deep Learning Capability to any General Software, On Premise & IaaS
Non-exhaustive list
Infrastructure
Virtualization
OS
Middleware
Runtime
Applications
General Software Stack
CPU, RAM, Storage, Network
Hypervisor / Container
Linux / Windows
Application Server
Java, .NET, Python, PHP, NodeJS
Web Application
Deep Learning Stack
Accelerator (NVIDIA/AMD GPU, Intel FPGA)
Accelerator Virtualization(NVIDIA/AMD Virtual GPU, Intel Virtual FPGA)
Accelerator Driver (NVIDIA Driver/Intel FPGA Driver)
Accelerator Library (NVIDIA CUDA, Intel OpenVino)
Keras, TensorFlow, Caffe, PyTorch, CNTK + Model
Computer Vision, Speech, Language
Computer Vision runs on top of Deep Learning Stack
Computer Vision High Level Integration
Enterprise ApplicationOwner : Development TeamProcess : Software Development Lifecycle
(Waterfall, Agile, other)Type : Web App, Mobile App, other
Your Applications
Computer Vision SystemOwner : Data Science TeamProcess : Machine Learning Lifecycle
(OSEMN, CRISP-DM, TDSP)
Inference Engine
Training Engine
Training & Test Dataset
Trained Model
API
Highly simplified
Image / Video
26
Computer Vision Life Cycle
Deep Learning Framework
Training at Server
Initial Weight Trained ModelNeural Network
Model
Image ClassificationGoogleNetResNet
Object DetectionR-CNN Based SSDYOLO
Object SegmentationMasked R-CNN
Training Parameters
Training ParametersLearning Rate
Epoch
Batch size
Intersection of Union
Average Loss
Mean Avg Precision
Training & Test Data
Public Data SetCOCOImageNetPascal VOC
Research Dataset
Custom Data SetImage CollectionPrivately Labelled
Inference at Server / Edge
Neural Network Model
Trained ModelInference Parameter
PredictionsInput Image /
Video
1
2
1. Understand business impact associated with economic dollar values (example: cost saving, revenue increase)
2. Business investment must consider several key capability the company is planning to invest, such as IT Infrastructure, IT Skill and time to market
Business ImpactExample
28
Identify Business Impact
Economic ValueExample
Increase Customer Loyalty
Improve Customer Experience
Improve Operational Effectiveness
Drive Sales
Higher repeat order by 15%
Longer engagement
Reduce operational cost by 17%
Sales Increase 25% YoY
Highly simplified
30
Building Computer Vision Capability Considerations
Model & Dataset
Skill
Infrastructure
Custom 1. Collect / create image data sets2. Create custom model based on custom data sets
Data Science & Developer Team1. Build full data science team in house to support
current or future data science capability2. Data science and Developer team work together to
integrate deep learning software with enterprise applications
Buy1. Buy infrastructure for on-premise deployment2. Maintain hardware infrastructure & deep learning
software stack
Existing1. Use existing image data sets from community/cloud
provider2. Use existing model from community/cloud provider
Developer Team Only1. No plan to build data science team, since current or future
requirement don’t justify to have data science team2. Train developer team to use deep learning software up to
API integration, or use deep learning software that has been pre-configured specifically for user without prior data science knowledge
Rent1. Rent deep learning infrastructure from Cloud
Provider2. Choose between deep learning IaaS (full control
over deep learning infrastructure) or PaaS (partial control up to model & datasets)
Computer Vision as a ServiceAvailable options between technology stack, investment, skills, time to market
Non-exhaustive list
Deep Learning Runtime
Network Model
Training Data
Label Tools + Training Tools
API
Client Managed
Cloud Provider Managed31
Skills
Develop custom deep learning code
Design, build or choose the right Neural Network
Neural Network Training Parameters
Image Labeling. Human or
Automated
API Integration, REST API
Rent IaaS / Buy On Premise
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Custom Image Collection
labelImg, OpenLabeling
Rest API / Stream API
Rent PaaS Training & Inference
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Custom Image Collection
Web based
Rest API / Stream API
Rent PaaS Inference
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Provided
N/A
Rest API / Stream API
Lower investmentLower barrier to entry
Lower CV skillsLimited Inference Capability
No/Less control to training data
Higher investment costHigher barrier to entryHigher CV skillsRIch Inference CapabilityTotal control to training data
Computer Vision as a ServiceAvailable options between technology stack, investment, skills, time to market
Non-exhaustive list
Deep Learning Runtime
Network Model
Training Data
Label Tools + Training Tools
API
Client Managed
Cloud Provider Managed32
Skills
Develop custom deep learning code
Design, build or choose the right Neural Network
Neural Network Training Parameters
Image Labeling. Human or
Automated
API Integration, REST API
Rent IaaS / Buy On Premise
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Custom Image Collection
labelImg, OpenLabeling
Rest API / Stream API
Rent PaaS Training & Inference
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Custom Image Collection
Web based
Rest API / Stream API
Rent PaaS Inference
Keras, TensorFlow, Caffe, PyTorch, CNTK
R-CNN, SSD, YOLO
Provided
N/A
Rest API / Stream API
Lower investmentLower barrier to entry
Lower CV skillsLimited Inference Capability
No/Less control to training data
Higher investment costHigher barrier to entryHigher CV skillsRIch Inference CapabilityTotal control to training data
awsRekognition
GoogleAuto-ML
azureCustom-vision
IBM WatsonStudio
awsVision
GoogleCloud Vision
azureVision
IBM Visual Recognition
Deep Learning
PC
Deep Learning
Server
33
Build your own CV Enabled Apps
Find your cool use cases1. Image Search2. Video Analytics3. Pet Counter4. Rat Trap5. Parking Detection6. Match making7. … and more
…Build your dataset
1. Use public dataset2. Build image collection3. Label image4. Augment your Image
… Train
and
Integrate…
… Learn the BasicsStanford UniversityCS231n: Convolutional Neural Networks for Visual Recognition