the opportunities and challenges of putting the latest computer vision and deep learning algorithms...

Putting the latest Computer Vision and Deep Learning algorithms to work

The Opportunities and Challenges

Albert Y. C. Chen, Ph.D.Vice President, R&D

Viscovery

Albert Y. C. Chen, Ph.D. • Experience

2017-present: Vice President of R&D at Viscovery 2016-2017: Chief Scientist at Viscovery 2015: Principal Scientist @ Nervve Technologies 2013-2014 Computer Vision Scientist @ Tandent Vision 2011-2012 @ GE Global Research

• Education Ph.D. in Computer Science, SUNY-Buffalo M.S. in Computer Science, NTNU B.S. in Computer Science, NTHU

• Some random things about me… SUNY Excellence in Teaching Award, 2010. Some rapid promotions, some failed startups, some patents, some papers…

1. W. Wu, A. Y. C. Chen, L. Zhao, and J. J. Corso. Brain tumor detection and segmentation in a CRF framework with pixel-wise affinity and superpixel-level features. International Journal of Computer Assisted Radiology and Surgery, 2015.

2. S. N. Lim, A. Y. C. Chen and X. Yang. Parameter Inference Engine (PIE) on the Pareto Front. In Proceedings of International Conference of Machine Learning, Auto ML Workshop, 2014.

3. A. Y. C. Chen, S. Whitt, C. Xu, and J. J. Corso. Hierarchical supervoxel fusion for robust pixel label propagation in videos. In Submission to ACM Multimedia, 2013.

4. A.Y.C. Chen and J.J. Corso. Temporally consistent multi-class video-object segmentation with the video graph-shifts algorithm. In Proceedings of IEEE Workshop on Applications of Computer Vision, 2011.

5. D.R. Schlegel, A.Y.C. Chen, C. Xiong, J.A. Delmerico, and J.J. Corso. Airtouch: Interacting with computer systems at a distance. In Proceedings of IEEE Workshop on Applications of Computer Vision, 2011.

6. A.Y.C. Chen and J.J. Corso. On the effects of normalization in adaptive MRF Hierarchies. In Proceedings of International Symposium CompIMAGE, 2010.

7. A.Y.C. Chen and J.J. Corso. Propagating multi-class pixel labels throughout video frames. In Proceedings of IEEE Western New York Image Processing Workshop, 2010.

8. A. Y. C. Chen and J. J. Corso. On the effects of normalization in adaptive MRF Hierarchies. Computational Modeling of Objects Represented in Images, pages 275–286, 2010.

9. Y. Tao, L. Lu, M. Dewan, A. Y. C. Chen, J. J. Corso, J. Xuan, M. Salganicoff, and A. Krishnan. Multi-level ground glass nodule detection and segmentation in ct lung images. Medical Image Computing and Computer-Assisted Intervention, 2009.

10. A.Y.C. Chen, J.J. Corso, and L. Wang. Hops: Efficient region labeling using higher order proxy neighborhoods. In Proceedings of IEEE International Conference on Pattern Recognition, 2008.

Some work done before I caught the startup fever

Freestyle Sketching Stage

AirTouch waits in background for the initialization signal

Initialize

Terminate

Output

imagedatabase

Start:Results

CBIRquery

Airtouch HCI interface for Content-based Image Retrieval

Interactive Segmentation & Classification• Segmentation then classification:

• computationally more efficient, • results in much higher classification accuracy.

• Pioneered the “pixel label propagation” field. • First to utilize superpixels and supervoxels for the task.

FG

Traditional Spatial Propagation

Pixel label map

Label a subset of pixels

BG

Spatio-temporal Propagation

time

Image/Video Object Recognition and Content Understanding

approaches

person carries

gives

recieves

Ontology

object

Person 1 Person 1Person 2

High-Level

Mid-Level

approachactivity

receives givescarries

activityactivity activity

Time

Reasoning

xx

x

Low-Level

x x

x

x

Learning and Adapting Optimal Classifier Parameters

subspace B

subsp

ace A

subspace C

Image-level feature space

priors

Patch-level feature space

posteriorprobability

suggest optimal parameter configuration

Graphical Models and Stochastic Optimization

A

(a) The space-time volume of a video showing the objects (A--F) and their appearing time-span.

spac

e

time

AB

C

D

E F

B E

F

C

D

(b) The temporal relationship graph. An edge between two vertices mean that the two objects overlap in time.

(c) The goal is: cover all objects with the smallest number of "ground truth key frames".

spac

e

time

AB

C

D

E F

key 1 key 2

A

B E

F

C

D

(d) This translates to: iteratively solving the max clique problem until all vertices belong to a clique.

A

B E

F

C

Dkey 2

key 1frame t-1 frame t

layer n layer n

layer n+1 layer n+1

TemporalShift

Shift

µ

Medical Imaging and Geospatial Imaging

GNN detection and segmentation

in Lung CT geospatial imaging: building detection

Brain tumor detection and segmentation in MR images.

Why Risk to Innovate?

• Good business model NEVER last forever.

• Average “shelf life” on S&P 500: 20 years.

• 100-year old companies constantly reinvent themselves every 10-20 years

• Startups contribute to 20% of USA’s GDP.

The Death of a Good Business Model

• Foxconn 20 year revenue v.s. net profit (now at 5%)

What do 100 year old corporations do?

GE Schenectady, 1896

History of change at GE• 1886: one of the 12 original companies on the Dow

Jone Industrial Average (also the only one remaining). • 1889: lightbulbs • 1919: radios • 1927: TV • 1941: jet engine • 1960: nuclear power • 1971: room AC units • 1995: MRI

History of change at IBM• 1960s: mainframe computer • 1980s: personal computer • 2000s: integrated solutions • 2020s: AI, Watson

How about the leading Semiconductor companies?

NVidia reinventing itself —2 times in 20 years

“Bad money drives out good” in the desktop GPU market

The rise of mobile computing, and how NVidia missed the boat!

NVidia’s Tegra mobile processors never took off

then, the market saturated…

NVidia not just survived. NVidia is thriving!

Meet the new NVidia: Deep Learning, Deep Learning, and still, Deep Learning

The king is dead, long live the king!

Now, again, do we want to do OEM/ODM forever?

Optimizing an old business model is just delaying its eventual death.

Startups• A company, partnership, or temporary

organization designed to search for a new, repeatable and scalable business model.

Your Idea• Are you passionate about it? • Is it disruptive enough? • What is your business plan?

• What is it? • Can it make money? • What is the future of the idea?

• What is your competitive advantage? • How do you build up your entry barrier?

A minimal startup team

• A hacker

• A hustler

• A hipster

Startup Timeline

Prototype• Hack out a prototype

• Spend 2-10 weeks max.

• Investors are much more likely to fund you if you have a minimal initial version of your idea.

• Hackathons are a good place to start.

• Iteratively improve the prototype

Money!

Buildup your entry barrier!

• Market (users)

• Speed

• Team

• Technology

Building entry barrier with Technology!!

Computer Vision, it can’t be that hard, right?

Brief History

Marvin Minsky

“In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a television camera to a computer and get the machine to describe what it sees.”

Gerald SussmanThe student never worked on Computer Vision problems again.

Brief History• 1960’s: interpretation of synthetic worlds • 1970’s: some progress on interpreting selected images • 1980’s: ANNs come and go; shift toward geometry and increased

mathematical rigor • 1990’s: face recognition; statistical analysis in vogue • 2000’s: broader recognition; large annotated datasets available; video

processing starts

Guzman ‘68 Ohta Kanade ‘78 Turk and Pentland ‘91

What’s in our arsenal?

• Image filters

• Feature descriptors

• Classifiers

Filters: blurring

Filters: sharpen

Filters: edge

Filters: straight lines

Features:

Features: Harris Corners

Features: Laplacian of Gaussian (LoG; scale detection)

Features: OrientationHow to compute the rotation?

Create edge orientation histogram and find peak.

Features: SIFT

Features: Gabor

Classifiers: SVM

Classifiers: Ensemble

Classifiers: Random Fields

Classifiers: Deformable Parts Model (DPM)

Classifiers: Deep Neural Network

What alg. should I use then?• How much data do we have? • What objects are we trying to detect? • For example, Google’s DNN trained with 11k images

over 20 classes in 2013 doesn’t always beat DPM.

00.150.3

0.450.6

aero bike bird boat bottle bus car cat chair cow

00.150.3

0.450.6

dog horse m-bike person plant sheep sofa table train TV

D N N

D P M

ML alg. and their Applications• Deep

Learning

• Markovian/Bayesian

• Feature Matching

• Other ML methods

Meta-Learning• Different use

cases calls for different ML algorithms.

• Meta-Learning: learning how to learn.

• Requires plenty of domain-specific know-how.

Maturing Computer Vision Applications

• Final inspection cells • Robot guidance and

checking orientation of components

• Packaging Inspection • Medical vial inspection • Food pack checks • Verifying engineered

components[5] • Wafer Dicing • Reading of Serial

Numbers • Inspection of Saw

Blades

• Inspection of Ball Grid Arrays (BGAs)

• Surface Inspection • Measuring of Spark

Plugs • Molding Flash Detection • Inspection of Punched

Sheets • 3D Plane

Reconstruction with Stereo

• Pose Verification of Resistors

• Classification of Non-Woven Fabrics

1970s-now: Machine Vision for Industrial Inspection

• Automated Train Examiner (ATEx) Systems

• Automatic PCB inspection

• Wood quality inspection

• Final inspection of sub-assemblies

• Engine part inspection • \Label inspection on

products • Checking medical

devices for defects

Industrial Inspection: turbofan jet engine blade maintenance• Some seemingly daunting

machine vision tasks actually works with relatively simple image processing algorithms.

Industrial Inspection: Cognex Omniview

License Plate Recognition (1979-now)

License Plate Readers with Text Detection and Neural Networks

Biometrics

Automated Fingerprint Identification (1970s-now)

Face Recognition (1990s-now)

• Face Detection (Viola and Jones, 2001)

• Face Verification (1:1) v.s. Identification (1:N)

Face Verification and Identification, Labeled Faces in the Wild (LFW)

Recognition Accuracy: • 1 to 1: 99%+ • 1 to 100: 90% • 1 to 10,000:

50%-70%. • 1 to 1M: 30%.

LFW dataset, common FN↑, FP↓

Sports—NFL first down line (1995-now)

Sports—NFL first down line

minus

equals

3D Reconstruction(As old as CV; became practical since SIFT)

3D Reconstruction with Feature Matching, Structure from Motion

Image Panoramas (1980s - now)

Solving Panorama Problem with Markov Random Fields

Input:


ICM (Iterated Conditional Modes), 1986


Belief Propagation (1980-2000)


Graph-Cuts (alpha expansion), 2001

Photosynthesis

Solving Photosynthesis Problems with Alpha-matting (2000s-now)

Object Detection & Classification state-of-the-art

• ImageNet Large Scale Visual Recognition Challenge (ILSVRC) • 1000+ classes, 1.2M images.

0

0.125

0.25

0.375

0.5

11 12 13 14 11 12 13 14classification

errorclassification

+localization error

Image Scene Classification• MIT Places 401

dataset.

• top-5 accuracy rates >80%.

Self-driving cars (2000s-now)

DARPA Grand Challenge (2005)

2005 winner, Stanley (Stanford), 3mph through desert

DARPA Urban Challenge (2007)

2007 winner, Boss (CMU), 13mpg through the city

Self Driving Cadillac, US congressman to airport, 2013

Google Self Driving Car, 2015

Google Self-Driving Car, 2016

NVidia Self Driving Car, 2016

How did we come this far? Race car drivers know the trick

Focus on Free Space / Drivable Area, not Obstacles!

Up-and-coming Computer Vision

Applications

Structure from X, Floored

Structure from X, PIX4D

Object Recognition Blue River Technology

Augmented Reality Magic Leap

Retail Insights

Source: Prism Skylabs

Other Applications in Business Intelligence

• Measure brand exposure. • Measure sponsorship effectiveness. • Loss prevention and retail layout optimization.

How about Smart Surveillance?

Angel.co

My humble attempts at putting the latest Computer Vision algorithms to work

Intrinsic Imaging at Tandent Vision Science

Computer Vision would be half-solved without shadows!

LightOriginal Image Surface

Tandent Lightbrush

Video Tutorial for Tandent Lightbrush: https://vimeo.com/47009123

https://vimeo.com/47009123

Issues• Highly anticipated, highly acclaimed, but small

crowd at $500 a license.

• Adobe Photoshop monopoly and the “not invented here” syndrome.

• Adobe’s arch-rival, Corel (Corel Draw, Paint Shop Pro, Ulead PhotoImpact) was DYING and asked too much from the botched deal.

Have fun scribbling out your shadows in photoshop!

Poor Bob from Adobe wasted 9 minutes removing just 1 shadow

Intrinsic Imaging for improving the RGB signal in autonomous driving

Intrinsic Imaging’s other applications

Retrospect

• 20 researchers burned 25 million in 8 years; investors got 50 patents in return, period.

• Overestimated the total addressable market size, in a market with existing monopoly.

• Many missed opportunities. Counterexample of the lean startup model.

Some SfM, SLAM startups

Satellite/Aerial Imagery Analysis

• 40cm resolution at 30fps for 90 sec for any location on earth. • One LEO satellite revisits any place on Earth every 3 days. • Need 24 satellites to revisit any place on Earth every 3 hours.

Challenges for Single satellite depth estimation and 3D reconstruction

• At 30fps, a LEO satellite travels 250m between two consecutive frames —> theoretically sufficient for cm-level depth estimation.

• Sources of Noise: • Camera distortions • Atmospheric Disturbance • Ground vegetation • Sub-pixel sampling noise

1 2

What happened?

• B2B customers takes too long to strike deals.

• Google ate us alive in just 3 months, while we were still pitching for VC-funding with our prototype.

Visual Search at Nervve

Retrospect• Growth pains expanding from intelligence

community clients to advertisement clients. • Forming the right team of engineers and

researchers and moving at the right pace. • For any Computer Vision/Machine Learning

company: • Researchers that cannot program—> OUT • Engineers that don’t know math —> OUT

Visual Search, Simply Smarter

Once in a lifetime opportunity in China’s video streaming market

What do we need?

Face MotionImage scene Text Audio Object

Semantics

Viscovery VDS (Video Discovery Service)

Challenges Encountered Along the Way

• From Product Recognition in Images, to Face, Logo, Object, Scene recognition in Videos. • Number of Categories • Recognition Accuracy • Recognition Speed

• System Architecture

• Business Model

Viscovery’s Edge• Market: first mover’s advantage in China’s video

streaming market. • Speed: we built the whole VDS thing in a few months! • Team: You! Seriously! • Technology:

• Depth • Breadth • Cloud • Customizability • Self-Learning

Life is not all rosy at startups

• High Risk, High Pressure, High Uncertainty!

• Resources are scarce, but you MUST DELIVER!

• Forming your all-star team is not that easy…

• Focus, and persistence.

What can Taiwan’s academia do to help bridge the gap?

HMM….

Academia

IndustryGeneral Public

reputation and policy support

improved living standards

students

opportunity

well-trained graduates

grants and collaborations

A healthy cycle

Academia

IndustryGeneral Public

unsupportive policies

stagnant wages

useless education

unemployable graduates

A vicious cycle

no grants

no students

Where should we start? Maybe with a few more stories.

The Goldilocks zone of innovation

The Goldilocks zone of innovation

Business Relevance

Academic Relevance

plentiful resources; hierarchical organization

lack of resources; responsive organization

traditional corporations talking “innovation”

corporate research

startups struggling to survive

academic spinoffs

MSR

Thank [email protected]

mailto:[email protected]?subject=