computer vision news...now, for instance, if you select a color image with the image browser app -...

34
Computer Vision News The magazine of the algorithm community October 2016 Spotlight News Women in Computer Vision Dr. Imama Noor Tool LiveCV Microsoft Challenge Data Science Bowl - Diagnosing Heart Disease Application Onfido - Identity Verification Research Paper Deformable Part Models are CNNs Project Fully Automated Surveillance with PTZ Cameras Trick New Matlab Version - R2016b Guest Prof. Jürgen Schmidhuber Image: FAZ/Bieber A publication by

Upload: others

Post on 25-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Computer Vision NewsThe magazine of the algorithm community

October 2016

Spotlight News

Women in Computer VisionDr. Imama Noor

ToolLiveCV

Microsoft ChallengeData Science Bowl - Diagnosing Heart Disease

ApplicationOnfido - Identity Verification

Research PaperDeformable Part Models are CNNs

ProjectFully Automated Surveillance with PTZ Cameras

TrickNew Matlab Version - R2016b

GuestProf. Jürgen Schmidhuber

Image: FAZ/Bieber

A publication by

Page 2: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

2Computer Vision News

Read This Month

Women in Computer Vision:

Imama Noor16

Interview with our Guest:

Jürgen Schmidhuber4 6Trick of the Month:

New Matlab Version

Challenge of the Month:

2nd Data Science Bowl26

Tool of the Month:

LiveCV10

Spotlight NewsComputer Vision headline stories25

Page 3: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

year, since we are partnering with ECCV to publishthe daily magazine of the conference: ECCV Daily.It’s the first time that the attendees of ECCV willenjoy a community tool of this kind: its mission willbe to enhance communication betweenparticipants, promote their work and give themadditional exposure within academia and theindustry at large.

This daily magazine is not a new idea: we did itearlier this year at CVPR 2016, where we publishedthe now famous CVPR Daily. We shall do it againlater this month at MICCAI 2016, which will be heldin Athens on Oct. 17-21: the InternationalConference on Medical Image Computing andComputer Assisted Intervention partners with us forthe publication of a brand new MICCAI Daily.

Of course, we are very proud of being called on forthese delightful projects, and we are grateful to allthe conference chairs (and to the readers too).

Let’s now turn our attention to this Computer VisionNews of October. All the regular sections featureinteresting and stimulating stories, starting at page4 with the exclusive interview of our guest,Professor Jürgen Schmidhuber, one of the pioneersof Deep Learning Neural Networks since 1991. Youwill love this magazine: please keep sharing it withfriends and colleagues.

Enjoy the reading!

Ralph AnzarouthMarketing Manager, RSIP VisionEditor, Computer Vision News

Dear reader,

The month of October will befull of events. We will no doubtmeet many of our readers atECCV 2016 in Amsterdam, onOctober 11-14. The EuropeanConference on Computer Visionhas a special meaning for us this

Computer Vision News

Welcome 3

Computer Vision NewsEditor: Ralph AnzarouthPublisher: RSIP Vision

Click to contact us

Click to give us feedback

Click for free subscription

Copyright: RSIP VisionAll rights reservedUnauthorized reproductionis strictly forbidden.

Editorial 3

Guest 4Prof. Jürgen Schmidhuber

Trick 6New Matlab - R2016b

Give us feedback! 7

Tool 10LiveCV

Project 15Fully Automated Surveillance

Women in Computer Vision 16Dr. Imama Noor

Application 20Onfido - Identity Verification

Spotlight News 25

Challenge 262nd Data Science Bowl

Research 28

Event 32EVIC - IEEE Summer School

Upcoming Events 33

Subscribe: it’s free! 33

Project Management 34Full-kit of Requirements

Page 4: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Prof. Jürgen Schmidhuber is Scientific Director of the Swiss AI Lab IDSIA andProfessor of Artificial Intelligence at the University of Lugano (USI). He haspioneered Deep Learning Neural Networks since 1991 and his research group is aserial winner of international contests. His Recurrent Neural Networks (RNNs) arenow available to billions of users through Google, Apple and others. For example,since 2015, Google’s speech recognition on smartphones is based on his team’s"Long Short-Term Memory (LSTM)” RNNs. Google DeepMind is also heavilyinfluenced by his former students. Computer Vision News was very lucky to have afascinating discussion with Prof. Schmidhuber: his views on the future of the humankind cannot leave anybody indifferent…

Computer Vision News: The field ofDeep Learning has seen during the last25 years (as your impressive researchpage testifies) a long list of successfultechnology breakthroughs. Isn’t thereanything which disappointed you, like afinding or an innovation which youexpected but never arrived?

Professor Schmidhuber: That’s aninteresting question. I often thoughtthat some breakthrough would comeearlier than it actually did and I wassometimes surprised by a developmentarriving faster than I thought. But Icannot say that anything I thought wasgoing to happen, suddenly does nothave a chance to happen anymore. It isa question of predicting the speed oftechnological evolution. Details of mypredictions have been wrong, but the

predictions were generally in line withwhat has happened during recentdecades.

CVN: What area of study would yousuggest to a young engineer enteringnow the field?

Prof. Schmidhuber: Machine Learningand Artificial Neural Networks.

CVN: Would it be the same answer ifthat engineer wanted to turn thosestudies into a business?

Prof. Schmidhuber: Sure. That’s thetechnology used by many big and smallcompanies to make people happier andhealthier and even more addicted totheir smartphones.

CVN: You have used techniques ofcomputer vision to solve problems in somany fields: scientific, financial andeven in the artistic domain. Wherewould you see its main contribution toimproving our society at large?

Prof. Schmidhuber: First, let me pointout that computer vision is just onespecial area where artificial Neural

4Computer Vision News

Guest

Professor Jürgen Schmidhuber

Gu

est

“I am living at a time when the universe wants to make

its next step towards greater complexity”

Image: Tomás Donoso on Vimeo

Page 5: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Networks are applicable. In particular,recurrent neural networks, such asLSTM, are now widely applied notonly to vision but also to speechrecognition, machine translation,automatic email answering, naturallanguage processing, etc.

Generally speaking, such networkslearn by changing the strengths ofconnections between their artificialneurons to solve practical tasks.

Such a task could be pretty muchanything that you can imagine. Oneof the most important applications isin health care, which can be greatlyimproved by artificial NeuralNetworks: they can imitate doctors,for example, by learning to recognizecancer in images of breast tissue. In2012, our deep neural nets were thefirst to win a contest of that kind,having learned from examples how todistinguish between harmlessconditions and dangerouspathologies.

In the future it should be possible toprovide reasonable diagnostics tomany people who today cannotafford it, e.g. in developing countrieswhere decent health care is notavailable. For example, take an imageof a dark brown spot on your skinwith a smartphone camera and sendit to an AI medical doctor, who maydecide whether that needs to beexamined further by a human expert.

CVN: In conclusion, what would youlike to tell our readers?

Prof. Schmidhuber: We are living inextremely exciting times, when greatthings are happening: in not so manydecades, for the first time intelligentmachines may be better generalproblem solvers than humans.

Soon, the most important decision-makers will not be humans anymore:every aspect of civilization will betransformed by that. It will not stopthere, since these smart AI beings willrealize that most of the resources ofthe solar system are not within thethin film of biosphere but out there inspace. They will colonize the solarsystem and the entire Milky Way andbeyond, in a way which is completelyimpossible for humans.

CVN: That sounds fascinating. Do youlook forward to that civilization?

Prof. Schmidhuber: I do. Since the1970s it has seemed clear to me that Iam living at a time when the universewants to make its next step towardsgreater complexity, a step comparableto the invention of life itself over 3billion years ago. This is more than justanother industrial revolution. This issomething new that transcendshumankind and even biology. It is aprivilege to witness its beginnings, andcontribute something to it.

“For the first time intelligent machines may be better general problem solvers

than humans”

Computer Vision News

Guest 5

Gu

est

Image: FAZ/Bieber

Page 6: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Every month, Computer Vision News shows you a nice trick that will make yourwork easier. This time, our engineers will tell you about the new features in thenew version of Matlab 2016 (R2016b). We will briefly highlight the new featuresfrom the computer vision toolbox and the image processing toolbox. For thecomplete list of new features we invite the reader to check the MathWorkswebsite.

(1) Image Browser App.The first new feature we'll cover is the new and improved image browser app,which now supports quick viewing of multiple images and using basic computervision operations on the images. The image browser app lets you quickly uploada set of images and perform some basic image processing on those images.

The Matlab image browser looks as follows:

Now, for instance, if you select a color image with the image browser app - bydouble clicking on the image - you can use the new features of the colorthreshold app on it, which we shall demonstrate next.

6Computer Vision News

Trick

New Matlab Version - R2016b

Trick

Page 7: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

(2) Color Thresholder app:View color data as point cloud for segmentation

The color threshold app has also been upgraded, with the inclusion of a newoption which lets you view image color data as a point cloud in four differentcolor spaces: RGB, HSV, YCbCr, L*a*b*. You can use these point clouds toevaluate which color space provides the clearest separation of color that makesit easy to select particular colors for segmentation. You use the point clouds inconjunction with the existing controls offered for each color space. For example:

Computer Vision News

Trick 7

Tric

k

Dear reader,

How do you like Computer Vision News? Did you enjoy reading it? Give us feedbackhere:

It will take you only 2 minutes to fill and it will help us give the computer visioncommunity the great magazine it deserves!

Give us feedback, please (click here)

FEEDBACK

Page 8: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

8Computer Vision News

Trick

(4) imregdemons Supports 3-D Images on GPU:

Trick

The imregdemonsfunction now comeswith a built-in fastimplementation on GPU.The capabilities of thefunction can be seen inthe example on the left:two hand images indifferent positions arethe input, while thedemons algorithmregisters them one ontop of the other andaligns them together.

(3) 3-D superpixels:Use simple linear iterative clustering(SLIC) with volumetric image:remember the SLIC superpixelalgorithm we have reviewed in ourJuly issue? Now Matlab comes withbuilt-in support for 3D superpixels,enabling you to preprocess 3Dimages (like CT or MRI scans) withthe superpixel oversegmentation, ascan be seen in the example on theright. Though we only include a 2Dslice here, all the computation isdone in 3D, making the superpixelsmore precise. Indeed, you can seethe main organs in the CT arerecognized. This result could be theused as input of a more advancedsegmentation algorithm and improvethe latter's precision due to thesuperpixel prep work.

Axial slice from a CT scan, where the main organs are segmented with the superpixel

Page 9: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Computer Vision News

Trick 9

Tric

k

(2) Structure from Motion:Another new feature is built-in support for estimating the essential matrix andcomputing camera pose from 3-D to 2-D point correspondences and Project 3-Dworld points into image.

New in Computer Vision Toolbox:

(1) Deep Learning for Object Detection: detect objects usingregion-based convolution neural networks (R-CNN)

Matlab keeps adding additional deep learning features: this time, the R-CNNcapability. The following example demonstrates the R-CNN capability to identifya stop sign. After a short learning period on an average laptop computer likemine (a Lenovo Y50 with a simple GPU), this is the result:

Page 10: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

LiveCV (developed by Dinu SV) is a computer vision coding tool enabling quickprototyping of computer vision algorithms while seeing results update live assoon as the code changes. The aim of LiveCV is to simplify the way people learn,configure and interact with computer vision algorithms. LiveCV runs on bothLinux and Windows. Portable stand-alone (i.e. no installation is required) andsource code versions are available. For download, installation and instructionssee here.

At its core, LiveCV uses three main technologies: OpenCV for the algorithms, QMLfor the programming language and QT for the display and the user interface (UI).

The Qt Meta Language (QML)The QML (Qt Meta Language) is a user interface markup language. It is a JSON-like declarative language aiming at designing UI applications. QML represents atree of elements which can be integrated to create UI ranging from simplebuttons and sliders, to complete Internet-enabled programs. QML uses theJavaScript runtime V4 engine and Qt Quick for the scene graph-based UIframework. In QML, objects are specified by their type, followed by a pair ofbrackets. Object types always begin with a capital letter.

Programming in LiveCV is viewed as a pipeline:‘Read-image’ -> ‘processing it’…. -> ‘observing the results’. Let’s see a firstexample, the code snippets below read an image and blur it with a Gaussian filter.

10Computer Vision News

Tool

LiveCV

Too

l

In the code on the left, there are three objects: a Rowand its two children, an ImRead and GaussianBlur. Eachobject could have a special unique property called anid. Id enables the object to be referred to by otherobjects. Between braces, information and propertiesabout the object can be specified; properties arespecified as pairs <property>:<value>. For example,ImRead has a property id and a file specifying the filename to be read.

In addition, objects can be linked between themselvesthrough their inputs and outputs. This is done by theproperty binding. The ‘property binding’ specifies thevalue of a property in a declarative way. The propertyvalue is automatically updated if the other propertiesor data values change. For example, GaussianBlur has aproperty ‘input’ which is bound to the ImRead. Asmentioned, LiveCV wrappers OpenCV function. Thus,most of the documentation about the function, theirusages and parameters can be found in the originalOpenCV documentation.

import lcvcore 1.0

import lcvimgproc 1.0

Row{

ImRead{

id : src

file : '/.image.jpg'

}

GaussianBlur {

Input: src.output

ksize: “21x21”

sigmaX : 5

sigmaY: 5

}

}

Page 11: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Here is how the GaussianBlur OpenCV API looks:

It is evident that the parameters ksize, sigmaX and sigmaY are the same(boarderType is also available in LiveCV).

LiveCV interfaceWe will demonstrate how this code looks in the LiveCV GUI. We will start with asimpler example which only reads the image.

The LiveCV interface: on the top, control buttons for saving, loading, ‘log’ - fordebugging purposes and ‘-/+’ increase/decrease font size; on the left, a panelfor writing the code (the QML code); on the right, the results are displayed.

Now we will add the Gaussian blur filter:

Computer Vision News

Tool 11

Too

l

Page 12: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

12Computer Vision News

Tool

The image and the blur image are displayed side-by-side since we define thosetwo under a “Row” object.

Now comes the fun part :) including any change that you’ve made to the code,either the sigmaX, sigmaY or the ksize, or even specifying a different file to load.The effect of those changes will automatically appear on screen to let youquickly learn, configure and interact with the algorithms and their parameters.

Let’s see another quick example: the canny edge detector. As in theGaussianFilter, the parameters are the same as in the OpenCV. The input to theCanny is bound to the ImRead. Note that here we define the filename as aproperties string and not inside the ImRead section.

As before changing any of the Canny parameters (i.e. thresholds), the result willimmediately be reflected on screen. This lets you quickly evaluate whetheredges can be extracted from the image, which edges will be easy to extract,which are subtler and so on.

More elaborate examples include object recognition through feature detection.We won’t go into all the details but will mention the most important ones. Ingeneral, Image feature matching is a process that includes (1) finding features inthe image, (2) describing those features, (3) comparing those descriptorsbetween images to find similar images or to detect objects in images etc.

Each step in this pipeline may involve many variants of algorithms, each with itsown parameters. Any of those can be evaluated with LiveCV and, as before, theresult is visible on screen as soon as the change is made.

Too

l

“Code less.Create more.

Deploy everywhere.”

Page 13: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

In the above example, the first code section (//Query Image) starts by readingthe image (ImRead) and detecting the feature with the Fast method(FastFeatureDetector). Other methods such as STAR, BRISK (and more) can bequickly changed and evaluated simply by renaming the object name from“FastFeatureDetector” to “ORBFeatureDetector” etc. Follows the description ofthose detected features performed with BriefDescriptorExtractor. Otheralternative includes: BRISK, ORB and more.

The second block of code (// Query Image) starts by the FannBasedMatcher(Fast Approximate Nearest Neighbor Search Library), which finds matchingpoints. The next block DescriptorMatchFilter filters out some matches byremoving non-related ones, such as far distance matching. The last blockDrawMatching (implanted by QT plugin) is responsible for displaying thematching points of the two images (the query and the training image).

Interactive interfaceIn the last example we will show you an interactive interface for investigatingimage features. Here the image is shown on the left and on top of the imagefeatures are marks with circles. In addition, the feature points are listed withtheir values on the right. The feature points were detected and described by oneof the point detection methods (i.e. FastFeatureDetector andBriefDescriptorExtractor). Clicking on one of the points to the left marks it with abig circle on top of the image: its values are displayed below, along with itshistogram values.

Computer Vision News

Tool 13

Too

l

“It works quickly and efficiently”

Page 14: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

14Computer Vision News

Tool

The interested reader can find the code in the LiveCV package itself (in the samplesection). The LiveCV package comes with abundant additional samples,including interactive components which let you tweak parameters from the UI --we strongly recommend you try them out.

Lastly, it's worth mentioning that you can write your own wrapper in case there isan OpenCV function that do not already exist in LiveCV. It is quite intuitive andthe instructions on how to do that can be found on the LiveCV website.

To summarize:

LiveCV is a tool which helps you learn and understand computer visionalgorithms, with the abilities to combine algorithms and progressively achieve thedesired results. It has interactive components used to tweak parameters and alsoincludes support for cross-platform, executable scripts, libraries to link and extendexisting C++ algorithms

It works quickly and efficiently. As a portable stand-alone, it doesn’t require anyinstallation and can be launched even from a USB drive, which is great fordemonstration / teaching.

Note that LiveCV is not reliable for replacing the developing platform (Visualstudio for example): its purpose is for quick prototype familiarization of basiccomputer vision elements. It could be used for performing demonstrations in ateaching environment and getting a quick sense/feeling of various algorithms onyour dataset of images.

Too

l

Page 15: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Every month, Computer Vision News reviews a successful project. Our mainpurpose is to show how diverse image processing applications can be and how thedifferent techniques can help to solve technical challenges and physical difficulties.This month we review software for fully automated surveillance with PTZcameras, developed for a client by RSIP Vision engineers. Do you have a project incomputer vision and image processing? Contact our consultants.

Computer vision technologiescontributed major improvements tothe field of surveillance. From fullymanual systems, in which videos weremonitored by humans, the first steptowards modern surveillance methodshappened when software was able tomonitor the video feeds of cameras:using predefined rules, alarms wereautomatically activated when intruderswere detected entering specific areas.

The second step occurred recentlywhen big data and deep learning wereintroduced into this field. Big dataallowed to process the huge amountof information coming from camerasand sensors and integrate it into someusable database. Deep learningenabled the recognition of patterns(behavior and others) which could notbe learned before, nor set by simplerules.

Thanks to these breakthroughs, we canuse computers to track events whichpreviously could not be brought to ourattention. RSIP Vision has participatedto this progress along both said steps.Regarding the first step, we havedeveloped algorithms and complexmathematic transformations to allowtracking intruders using a PTZ camera:this is a camera providing pan, tilt andzoom functions, so that humans orobjects can always be in the center ofthe frame. From a fully humanintervention based process, RSIP Visionalgorithms were able to automatically

track the target intruder or event(unattended baggage, suspiciousbehavior), starting from the moment itwas detected. On any movement ofthe target, the algorithm detecting itsnew location inside the frame adjuststhe camera to keep it in the center. Thealgorithm can keep track of the targetalso in case of partial occlusion ortemporary complete disappearance.Thanks to our work, security personnelcan continue to observe the target aslong as needed.

Regarding the second step, thebreakthrough is only beginning:technologies to process big amountsof information already exist andengineers are gaining expertise in theuse of deep learning to make furtherprogress in the surveillance area. Weshall learn more about these newdevelopments in next month’sComputer Vision News.

Pro

ject

Fully Automated Surveillance with PTZ Cameras

Computer Vision News

Project 15

Image: DVTEL ioIMAGE on YouTubeImage: DVTEL ioIMAGE on YouTube

Page 16: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Computer Vision News: Where do youcome from, Imama?

Imama Noor: I was born in northernPakistan, in Islamabad. I came to theUnited States in 2009.

CVN: What are you currently doing?

Imama: In my work, I collaborate withpeople who are trying to solvedifferent problems which can leveragemachine learning and computer

vision. The project I’m working onright now is developing a social robot:it will be used as a telepresence robotor an appliance to monitor elderlypeople.

It involves a lot of tasks where youhave to know the position of theperson, their daily activities, recognizethe person, and develop somethingwhich people can quickly adapt to andnot be afraid of.

16Computer Vision News

Women in Computer Vision

Imama Noor

Wo

me

n Scie

ntists

We continue our series of interviews with women in computer vision. This newsection, which we started with the CVPR Daily at CVPR 2016, hopes to helpmitigate the severe gender imbalance in the computer vision community bygetting to know better some remarkable female scientists and their careerpaths. We think that some of what they did might serve as an example forother young females who wish to enter this field. This month, we interviewDr. Imama Noor, who earned her PhD from the University of Memphis.

“My dream is to understandthe complexities of the universe”

Page 17: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

CVN: Who is the group behind thisproject?

Imama: I work with E-MotionRobotics. They are a startup based inHouston, Texas.

CVN: Do you think your role is in theindustry?

Imama: I’m working in industry rightnow because industry has moreresources when it comes todeveloping applications forconsumers. Academia is more focusedon theoretical problems and thingswhich do not have immediatepractical implementation.

I graduated in 2013 with a PhD and Ijoined industry right away. I didn’t gofor academia. Though, in my longterm plan, at some point I might steermy career towards academia, when I’dlike to slow down.

CVN: Did you ever consider teachingor mentoring?

Imama: I currently mentor high schoolkids who are interested in roboticsand machine learning. They try toprototype or follow a certain blog forimplementing a phase detector. Theydon’t have in-depth knowledge ofwhat’s going on, but they can follow acertain guide to implement aparticular application.

CVN: Do you enjoy it?

Imama: It’s interesting to comparetheir experience with mine, when Iwas in high school in Pakistan. Backthen we didn’t have a lot of exposureto advanced technology. It’sinteresting that these kids have thatkind of exposure, and that they arealready building applications so thatthey might be able to do more thingsgoing forward.

CVN: Did you ever experience anydifficulty in being a woman scientist?

Imama: Scientists in general are veryrare in Pakistan because it’s mostly anindustrial-based economy. There arevery few opportunities for research. Inengineering, woman were alwaysfewer than the male counterparts. Inthe US, I think it’s more open. You doget acceptance if you are given thatopportunity, so I think it’s easier thanin Pakistan.

There is a stereotype that womenstruggle to get accepted in thecommunity. Males have the advantageof being perceived as engineers andscientists. If you are a female, youhave to prove your abilities in aconcrete line of work. Other fellowswere able to get accepted withoutvery significant contributions, but thewomen have to put forth concretecontributions to get accepted andmove up in the field.

Computer Vision News

Women in Computer Vision 17

Wo

me

n S

cie

nti

sts

Page 18: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

`

Women could provide a valuableresource in computer vision ormachine learning. They should betreated equally. This may take sometime, but I think the trend is positive.It should happen in our lifetime, in thisgeneration or the next.

CVN: If you hadn’t become a scientist,what would you have done?

Imama: If I hadn’t become a scientist,perhaps I would have become anairplane pilot. I wanted to be a pilotwhen I was young. I think it’s adifferent kind of challenge from what Ido now. The representation of womenis fewer there as well, but now it’sgetting more automated and accesswill become easier. I would think thereare similarities between the two, butdoing what I do now I feel much moreinvolved compared to just operating aflying device.

CVN: What is your dream?

Imama: I am very intrigued by the

concepts of space and time, and howthere is uncertainty in both. I wouldlike to find out what is the truthbehind space and time, andunderstand the complexities of theuniverse. I don’t think that I canachieve it, but this is my dream.

CVN: What would be needed in orderto achieve it?

Imama: I keep on following updates inthat field. We are still very far fromunderstanding how the universe wascreated and where we are going. If Ihad this information, I would timetravel.

CVN: What time would you like tovisit?

Imama: If I could, I’d probably go intothe future.

CVN: Isn’t that scary?

Imama: I think it would be useful. It’ssomething which is beyond ourimagination. It could be scary, but Ithink that shouldn’t prevent you fromtrying.

CVN: You have worked a lot with spaceand time. With which of them you liketo work the most?

Imama: I think that time hasinteresting properties, but myexpertise is mostly in space. I wouldlike to explore time more, but I hadmore opportunity to explore space. Inever had the opportunity to exploretime, but time looks more interestingto me.

18Computer Vision News

Women in Computer Vision

“Into the future”

Wo

me

n Scie

ntists “It could be scary, but

I think that shouldn’t prevent you from trying”

Page 19: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

CVN: Which teacher was particularlymeaningful to you?

Imama: My supervisor, Dr. EddieJacobs from the University ofMemphis, was very helpful during mydissertation. He basically gave me thefreedom to take any problem andfigure out any improvements or otherfindings we can demonstrate.

CVN: Do you plan to go back toPakistan one day?

Imama: After working more in theindustry in the United States, I may goback to Pakistan. The work is moreadvanced in the United States, andthere are a lot of problems in Pakistanwhich can be solved using this kind oftechnology. When I have theopportunity to spend time working onthose problems then I will, but notright now.

CVN: What are you passionate about?

Imama: One thing I’m reallypassionate about in technology isunderstanding how the brainfunctions, to be able to model thatkind of functionality, and help peoplewith different disorders to recover. Ithink that would be really helpful.

CVN: What if this technology ends upin the wrong hands?

Imama: Everything has positive andnegative forces. I think it’s the role ofthe law to make a policy to preventthis from happening.

Computer Vision News

Women in Computer Vision 19

Wo

me

n S

cie

nti

sts

“Understand how the brain functions, be able

to model that kind of functionality, and help people

with disorders to recover”

Page 20: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Traditionally, the identity of a personhas always been verified face to face:when you start to work somewhere,on the first day you would often showyour passport or ID to your newemployer. Then, they would visuallycheck if the photograph on yourpassport is the same as your face.These too may become old timememories thanks to Onfido, a Britishstartup that has developed atechnology using machine learning tobring all this process online and makeit automated. In addition, it alsoprevents counterfeiters and fraudstersfrom creating sophisticated fakes.

Basically, the Onfido platform comeswith many features that provideidentity verification online remotely.For example, it provides identityverification if someone wants to signup for an online bank, become anUber driver, clean houses, and otherthings which are done with no face toface interaction.

The basis of the machine learning is indocument verification. If you hold upa passport or driver’s license, the firststep the app does is verifying thedocument: for example, it checks if it

seems to be a genuine passport or afraudulent one.

Onfido does much more thancomparing user data with an existingdatabase: it actually compares imagesto verify the person and thedocument. As a user, you enter yourname, date of birth, and address, thenyou hold up your driver’s license orpassport, the app takes an image ofthat and verifies that it is genuine.Then you record your face in a newimage and Onfido’s platformcompares the photo of your face withthe photograph on the identitydocument, in order to tell if your facematches the face on the documents.

We asked Husayn Kassai, Onfido’s veryyoung CEO and co-founder, about theapplication’s approach: it primarilyuses deep learning so that, based onall of the historic documents alreadychecked, the app knows if there areany anomalies. If any fraudulentdocument is found during theverification, the system learns all of its

20Computer Vision News

Application

Onfido

Ap

plicatio

n

“Comparing imagesto verify the personand the document”

Page 21: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

patterns and traits. In that way, if anysimilar patterns emerge on futuredocuments in the future, the app willgive an additional alert to their clients.Then clients can double check andfocus more resources on theindividuals that pose a higher risk.

Kassai told us that usually otheridentity verification providers workmanually. Each time an image comeson their computer screen for the dataprocessors, they compare thesetemplates manually. Instead, Onfidoverifies these documents using anautomated process with six layers ofverification.

The first layer is Optical CharacterRecognition. When a passport comesthrough the system, the latter usesOCR to extract all of the charactersand verify whether they are correct.

A good thing with passports and manyother identity documents is that youcan do a parity check. Usuallypassports contain a standardizedmachine readable zone at the bottomwhich includes a parity check, so youcan tell if the extracted passportnumber expiration date is consistentwith the passport machine readablezone. That is like an internalconsistency accuracy check.

Whether or not the app checks withauthority databases or with anautonomous system depends verymuch on the country. There are twotypes of databases you can check. Thefirst type of database is one whichallows you to check whether thegovernment issued that document ornot. In India for example, there is aUnique Identification Authority (UIDAI)and Onfido checks the Indian identitycard (called Aadhaar) by

extracting its number and sending it tothe government’s API database. Then,they receive a confirmation as towhether the name, date of birth, andAadhaar card number was issued bythem or not.

The process is the same in the 15states across the US which allow to gothrough it via the Department ofMotor Vehicles (DMV). They also do itin the UK and in Brazil (with the CPFnumbers). On the other hand, Onfidodoes not need to check passports withanyone elsewhere in the world,because consistency and parity checksare built in the machine readable zone.This is one type of database researchwhich would validate whether or notthe issuing body had issued thatdocument.

The other databases that they searchare police databases, which depend oneach different country where Onfidoestablishes relationships with lawenforcement authorities. Every timepolice raids a warehouse withthousands of fake passports, theyextract the numbers and communicatethem to Onfido. When the latter finds

Computer Vision News

Application 21

Ap

plic

atio

n

Husayn Kassai, Onfido’s CEO and co-founder

Page 22: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

fake documents, they report it to policeas well. This doesn’t happen in the USAhowever, since they don’t share thisinformation, which according to Kassaiis a big shame. Such exchanges aremuch better accepted in Europe.

Of course, there are different levels offake documents: poor quality fakeswhich are regularly caught; moderatequality fakes which are sometimescaught; and sophisticated fakes whichthe client reports to Onfido, givingthem precious feedback for the future.Essentially, the more checks they do,the more robust the app becomes,which is why technology and machinelearning works better than withhumans. It is so much more robust.

The specific algorithms that help solvethese challenges have come about afterfour years of research, during whichthey tried different techniques andfound the ones that work the best. Thetechnology uses an ensemble oftechniques to get the needed results,ranging from what would be consideredmore classical machine learning andcomputer vision techniques (SVM’s,feature extraction and the like) to theuse of deep learning and a combinationof these methods. The nature of theirwork, Kassai explains, often means thatthey are dealing with a considerableamount of uncertainty and they areoften balancing this risk and rewardwith the techniques that they are using.

Their goal in doing that is to create anend-to-end trainable system. However,to achieve that goal they had to breakthe problem into a number of smallerproblems which they are iterating overand then pulling these solutionstogether to create the end-to-endapproach. This consolidation processmay sometimes be quite painful: the

team may throw away weeks ormonths of work and fine tuning toreplace it with a more generalisedmodel or approach, however Onfidofeels that it is all part of the process,when the system is constantly evolvingand growing. Onfido makes use of ahuge variation of techniques, includingconvolutional networks, LSTMs, SVMsand many more.

Privacy is a very big concern with allidentity verification systems. Whenasked about what happens if anindividual is concerned with exposinghis or her own personal documents,Kassai replies:

“There are two things that happen. Oneis that we only check all persons withtheir consent. When you sign up withan online bank, a remittance platform,or with any client of ours, we first takeyour consent before we are able toverify you.

“The second thing is that most peopleunderstand that in order to open abank account, you have to show up tothe bank, queue up and show yourdetails. It takes a long time. Whereas, ifyou want to do it online, then thereneeds to be a robust way to do it.Otherwise there would be so much

22Computer Vision News

Application

Ap

plicatio

n

Machine learning engineers at Onfido

Page 23: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

fraud that it wouldn’t happen.

“Equally, if you have a tutor who comesto your house to tutor your child, youwant to be sure that the tutor has beenchecked. Likewise, you want to checkthe person with whom your daughterwants to share a car on BlaBlaCar.

“Most people recognize that in order tobe a part of the new economy, there isa need for verification. However, whatwe do is we always ensure that securityis of the utmost importance.Penetration testing, ISO 27001, weadhere to all these standards to ensuresecurity; in addition, everyone givestheir full consent before they arechecked; finally, data is deleted afterthe check and it does not enter anydatabase: we only keep a record offraudulent data and documents. That isfor fraud prevention.”

Kassai admits that the system isexposed to false positives, but thishappen nearly always because thequality of the document isn’t goodenough. Instead of giving a definitelypositive or negative judgement on adocument, they use a score. Usuallywhen the quality of the image isn’tgood enough, it may well be a falsepositive. In this case, the client asks theperson checked to submit another formof document, to re-upload or resenddocuments or go to the branch for faceto face verification. This essentiallymeans that only 2 or 3 out of every 100people need to go through a morerigorous manual process as opposed toall of them, which saves a lot of time.

When they suspect that it may be acompromised document, they proceedwith double checking: 90 to 95% ofdocuments that come through pass theverification successfully. Out of theother five, two may be problematic andthree may be questionable, so theyonly need to double check these ones.

Kassai remarks that in the primaryelections in the US, people needed tostand in line for 6 to 9 hours in order tovote, while all this could now be doneonline. As soon as you are able torobustly verify people online, it opensup solutions to many problems, Kassaiexplains. This includes online voting,the gun control problem, or even theSyrian refugee crisis: the project is towork with governments so that theycan verify someone in Jordan as arefugee. When they land in the US,they can verify that it is the sameperson. It alleviates all the concernsthat on the transition they may stealsomeone else’s identity.

Onfido gives much importance also tothe work they do with banks aroundthe world. There are 2 billion people,representing almost 1/3 of the world’spopulation, that don’t have a bankaccount or any access to finance and tothe global economy. The reason forthat is ridiculous: they are not able toverify their identities, so no one willgive them credit or debit cards, cellphones, etc. But there is now a newwave of companies such as PayJoy orPockit that are extending this kind ofservices. Pockit, for example, will giveanyone in the world a prepaid debitcard. Onfido can capture somebody’sface and passport in an emergingcountry at a library or anywhere with acomputer. That’s all is needed to verifylocation and identity in order to sign up

Computer Vision News

Application 23

Ap

plic

atio

n

“As soon as you are able to robustly verify people

online, it opens up solutions to many problems”

Page 24: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

and use this service.

When migrants come to the US or theUK, they have to wait 6 months beforethey are on credit reference agencies,can be verified, get a house, and starttheir lives because there is no historyof them in that country. Whereas,Onfido can instantly verify them sothey can start working for Uber orwhoever else much faster.

Onfido claims to be the only providerthat combines all the elements of thischeck to make it robust and the onlyprovider that uses machine learning

and computer vision for documentverification. It’s the only way to make itrobust so it can’t be cheated.

Onfido is managed by a young co-founding team of three people, wholeft university and started the company,four years ago. At the time, when theytried to sell to their first customers,Kassai was only 22 years old. It is saidthat one of those prospects wouldn’teven believe that he was 18: “How canwe trust you and buy anything fromyou, if we’re not even sure that youare of legal age!” Joking or not, theylater became Onfido clients.

Onfido is constantly growing andlooking for people interested in joininga research project or having somethingto contribute (in Lisbon, London, andSan Francisco).

24Computer Vision News

Application

Ap

plicatio

n

The Onfido Dashboard. Onfido has also been supported with 1 M€ by the Eurostars 2020 program of the European Commission

“The only provider that uses machine learning

and computer vision for document verification”

Page 25: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Computer Vision News lists some of the great stories that we have just foundsomewhere else. We share them with you, adding a short comment. Enjoy!

3 OCR Libraries for Java Compared: Which is the Winner?Gábor Vecsei did a great work to compare Tesseract,Asprise and Google Could Vision, three Optical CharacterRecognition libraries for Java. Want to know which libraryis the winner? Read it here and find the codes here. Youmight be tempted to read other articles in this blog…

Computer Vision News

Spotlight News 25

The Extraordinary Link Between DNN and the Nature of UniverseHow do you explain that Deep Neural Networks are betterthan humans at tasks such as face recognition and objectrecognition? There seem to be no mathematical reasonwhy networks arranged in layers should be so good at it.Or so it seemed until now: Henry Lin (Harvard) and MaxTegmark (MIT) found out that the answer comes fromphysics, the nature of the universe and the hierarchy ofits structure. Read this fascinating work here!

Engineers Teach Machines to Recognize Tree SpeciesDid you know that data from satellite and street-levelimages (like Google Maps’) is used to automatically createan inventory of street trees? The California Institute ofTechnology can help municipalities manage the urbanstock of trees. Other practical application of computervision and image processing in forestry exist. Read...

Salesforce Offers CRM Artificial Intelligence for BusinessMarc Benioff, chairman and CEO of Salesforce, thinks thatthe availability of Einstein - which embeds AI capabilitiesacross every Salesforce Cloud - will make Salesforce theworld's smartest CRM. Salesforce Einstein leverages allcustomer, activity, social and IoT data to to train predictivemodels powered by advanced machine learning, deeplearning, predictive analytics, natural language processingand smart data discovery. Read…

What's in the Photo? Google’s Caption Tool now Open-sourceShow and Tell is Google’s automated captioning system,which learns to identify and describe photos. Now it isavailable for open-source use with TensorFlow, Google'sopen machine-learning framework. Read…

See also this lovely video from MIT CSAIL, consult this excellent list of TensorFlowresources and don’t miss this report about NASA’s 3D mapping of the Sun’s edge.

Page 26: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Every month, Computer Vision News reviews a challenge related to our field. Ifyou can’t find time to read challenges, but are interested in the new methodsproposed by the scientific community to solve them, this section is for you. Thismonth we have chosen to review the Second Annual Data Science Bowl,intended to catalyze a change in cardiac diagnostics: Transforming How WeDiagnose Heart Disease. The website of the challenge, with all its relatedresources, is here; the Kaggle page is here.

BackgroundThe Second Annual Data ScienceBowl, created and sponsored by BoozAllen Hamilton with Kaggle, wasdesigned to take action to transformhow we diagnose heart disease.Thousands of people are diagnosedevery day with heart failure, a life-threatening event. Data scienceapplied to cardiology can helpphysicians save more lives.

Declining cardiac function is a keyindicator of heart disease: assessingthe heart's squeezing ability cantherefore give clues about the heartconditions, enable an early diagnoseand improve the effectiveness of

heart disease treatment.

Two are the properties which need tobe measured: end-systolic and end-diastolic volumes (i.e., the size of onechamber of the heart at the beginningand middle of each heartbeat). Fromthese two measures, it is possible tocalculate the ejection fraction (EF),which is the percentage of bloodejected from the left ventricle at eachheartbeat.

To learn more about the medicalconsiderations behind thesemeasures, see the study by RSIPVision on cardiac left ventriclesegmentation.

Ch

alle

nge

26Computer Vision News

Challenge

DSB - Transforming How We Diagnose Heart Disease

Page 27: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

ChallengeMagnetic Resonance Imaging (MRI) isthe reference standard used to assessthose measures, also due to thereproducibility of making theseimportant measurements.

The challenge is to help replace theslow manual process used by thedoctor to derive the ejection fractionwith an automatized procedure, whichwould arrive to the same conclusion ina more efficient way. Participant wererequested to create an algorithmwhich would automatically measureend-systolic and end-diastolic volumesin MRIs, examining cardiac imagesfrom more than 1,000 patients.

ResultsA relatively short development timewas allotted for the challenge,nonetheless the top teams obtainedexcellent results. This is particularlynotable for those who hand-labeledpatient data without being trainedphysicians. You can read here furthermedical perspectives on the results ofthis challenge by Andrew Arai, MD(NIH, Bethesda).

Coming to the leading teamssubmission analysis, an EF under 35%is a dire emergency, around 60% isnormal, and above 73% is consideredhyperactive. The good news is that theleading models reviewed keep thediagnosis categories quite tightlygrouped together: even though theyare not always perfectly precise, there

is a very low probability that anemergency EF will be incorrectlycategorized in the mild or the averagecategories. The normal to milddiagnoses are very likely to stay withintheir domain of the matrix.

Michael Hansen, co-PI (PrincipalInvestigator) for this Data ScienceBowl, noted that the best models canfail, but they should “fail loudly”: inother words, they should be able topredict their own level of accuracy, sothat when the system flags theconfidence in the prediction asinsufficient, it is possible to considerretaking a measurement. Results showlow correlations between the modelerror and “confidence” distributions,suggesting that further improvementcould be done in assessing theprediction value.

“The leading models reviewed keep the diagnosis categories

quite tightly grouped together”

Ch

allen

ge

Computer Vision News

Challenge 27

Data science applied to cardiology can help

physicians save more lives

Page 28: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

28Computer Vision News

Research

Re

search

Deformable Part Models are CNNs

Every month, Computer Vision News reviews a research from our field. Thismonth we have chosen to review Deformable Part Models are ConvolutionalNeural Networks, a research paper showing a synthesis of these two widely usedtools for visual recognition. In fact the paper, presented at CVPR 2015, showsthat a deformable part model (which is a graphical model) can be formulated asa convolutional neural networks. We are indebted to the authors (Ross Girshick,Forrest Iandola, Trevor Darrell and Jitendra Malik) for allowing us to use theirimages to illustrate this review. The full paper is here and the source code is here.

Background:Deformable Part Models (DPMs) and Convolutional Neural Networks (CNNs)are two common tools for computer visual recognition. They are typicallyviewed as distinct approaches: DPMs are graphical models (Markov randomfields), while CNNs are “black-box” nonlinear classifiers.

DPMs typically operate on a scale-space pyramid of gradient orientation featuremaps (HOG). Nowadays, the state of the art object detection is done with deepconvolutional networks. The authors propose a new method DeepPyramid-DPM in which the HOG features can be replaced with features learned by afully-convolutional network.

Challenge:The challenge is to replace the HOG feature with its CNN counterpart andfinding the optimal way mapping each step in the DPM inference algorithm toan equivalent CNN layer.

Novelty:The implementation details of CNN are time-consuming and challenging tosetup correctly. As a result, HOG-based detectors are still used in a wide rangeof systems, especially where region-based methods (i.e. poselets) are involved.The DeepPyramid-DPM should therefore be of broad practical interest to thevisual recognition community. In addition, DPMs are usually thought of as flatmodels, but DeepPyramid-DPM suggest that DPMs actually have a second,implicit convolutional layer.

Method:A DeepPyramid DPM is a convolutional network that takes as input a colorimage pyramid and produces as output a pyramid of object detection scores.We will start by defining input and output for both the training and the testingas well as the dataset used.

Page 29: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Computer Vision NewsComputer Vision News

Research 29

Re

sear

ch

The DeepPyramid DPM model overview (figure below): (1) an image pyramid isbuilt from a color input image; (2) each pyramid level is fed through a truncatedSuperVision/AlexNet that ends at convolutional layer 5 (conv5); (3) the pyramidof conv5 feature maps depth is 256 as in SuperVision/AlexNet; (4) each conv5level is fed into a DPM-CNN (more details in the next paragraph), which (5)produces a pyramid of DPM detection scores.

A single-component DPM-CNN operates on a feature pyramid level: (1) thepyramid level is convolved with the root filter and the P filter parts, generatingP+1 convolution maps; (2) those are processed with a distance transform (DT)pooling layer and (3) a sparse object geometry filter. The output is a single-channel score map for the DPM component.

Training Testing

Input Set of images

Trained CNN network (SuperVision. AlexNet)

Color image pyramid

Output Optimized hyperparameter:

distance transform (DT), geometry filter

DPM score map for

the input pyramid level

Dataset PASCAL VOC 2007 PASCAL VOC 2010-2012

Page 30: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

30Computer Vision News

Research

Re

search

The distance transform (DT) pooling is a generalization of the familiar max-pooling operation, the object geometry layer that encodes the relative offsetsof DPM parts. More details about that can be found in the paper.

Lastly, a multi-component DPM-CNN composed of one DPM-CNN percomponent and a maxout layer that takes a max over component outputs ateach location.

The model is trained in two stages: first, the front-end CNN is fitted; second, theDPM-CNN is trained using latent SVM while keeping the front-end CNN fixed.

Results:The authors start by comparing HOG with the conv5 features on differentpyramid levels. Next, we evaluate different DPM-CNN settings and comparethem to R-CNN.

HOGvs

conv5

The input is the same two face image (left). The first row shows a HOG featurepyramid, while the second shows the “face channel” of a conv5 pyramid. InHOG pyramid level, an appropriately sized template could detect the face. Incontrast, the conv5 face channel is scale selective: it is almost entirely zero(black) in the first pyramid level and it peaks in level 6.

Evaluated DPM-CNNEvaluated different DPM-CNN settings:

“Can any DPM be formulatedas an equivalent CNN?”

Page 31: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Line 1-3: The results start with a“root-only” model (i.e., no parts) andthen show results after adding 4 or 8parts to each component. With 4parts, mAP increases by 0.9percentage points.

Line 4: The hypothesis is that theconvolution filters in conv5 already actas a set of shared “parts” on top of theconv5 features, this idea isimplemented by applying a 3×3 maxfilter to conv5 and then training aroot-only DeepPyramid DPM withthree components achieving 45.2.Those are the best results for thissection supporting the hypothesis. Inaddition, the hypothesis is alsoevidenced in the face heat mapsabove (HOG vs conv5) in that it selectsspecific visual structures at theirlocations and scales (face at conv5level5).

Line 5-6: Computed to HOG-DPM baseline using 6 components and8 parts, removing parts decreases theperformance to 25.2%. Both do notperform as well compared withDeepPyramid -DPM max5.

Line 7: Compared to the recentlyproposed R-CNN [Goodfellow, ICML2013]. The R-CNN compared here haspool5 features without fine-tuning.

Line 8-9: Fine-tuned R-CNN[Goodfellow, ICML 2013] clearlyoutperform DeepPyramid-CNN.However, DeepPyramid-CNN runs atabout 20x.

The R-CNN results (lines 7-9) suggestthat the gains from fine-tuning comein through the non-linear classifier(layers fc6 and fc7) applied to pool5features. This suggests that similarlevels of performance might beachievable with DeepPyramid DPMthrough the use of a more powerfulnonlinear classifier than the SVM.

Computer Vision News

Research 31

Re

sear

ch

Number of

components

Number

of parts

Train Test

mAP

DP-DPM conv5 3 0 43.3 N/A

DP-DPM conv5 3 4 44.2

DP-DPM conv5 3 8 44.4

DP-DPM max5 3 0 45.2 42.0

HOG-DPM 6 0 25.2 N/A

HOG-DPM 6 8 33.7 33.4

R-CNN pool5[Goodfellow, ICML 2013]

N/A N/A 44.2 N/A

R-CNN FT fc7

[Goodfellow, ICML 2013]

N/A N/A 54.2 50.2

R-CNN FT fc7 BB N/A N/A 85.5 53.7

Sum up:For decades, visual recognition modelshave made a wide use of part-basedrepresentation techniques, such asdeformable part models (DPMs). Inrecent years, good performance onimage classification, object detection anda wide variety of vision tasks made CNNsextremely popular among researchers.DPMs and CNNs are generally viewed asdistinct approaches to visual recognition,the former being graphical models andthe latter being non-linear classifiers. Thequestion asked by this paper is whetherthese models are actually distinct orwhether it can be shown that any DPMcan in fact be formulated as anequivalent CNN. In order to do so, theauthors introduce the notion of distancetransform pooling, and the objectgeometry layer. The authors find thatDeepPyramid DPMs significantlyoutperform DPMs based on histogramsof oriented gradients features (HOG) andslightly outperform a comparable versionof the R-CNN detection system, whilerunning significantly faster.

Page 32: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

The School of Engineering and AppliedSciences of the Universidad de LosAndes (Santiago de Chile), jointly withthe Chilean Chapter of the IEEEComputational Intelligence Society, isorganizing EVIC, the Latin-AmericanSummer School on ComputationalIntelligence (December 14-16).

Thanks to José Delpiano, we have beenable to ask Pablo Zegers, General Chairof the Summer School, to tell us more.

CVN: How many students do you expectat the Summer School and whatpercentage of international students?

Pablo Zegers: We expect more than100 people to attend EVIC (the Spanishacronym for the meeting - Escuela deVerano en Inteligencia Computacional).Most of them will be students, a fewattendants coming from industry, andalso faculty members. Most of thestudents are Chilean, both fromSantiago and regions; the usual ratio ofinternational students is around 10%.

CVN: What is in this program thatstudents will not find anywhere else?

Zegers: This event is unique in Latin-America. Every version of the SummerSchool has attracted world-renownedresearchers in the field ofcomputational intelligence as keynoteand tutorial speakers. Chile is anappealing travel destination forresearchers around the world, as therecord-breaking attendance of lastInternational Conference on ComputerVision (ICCV) showed to the organizers.

CVN: This is already the 12th edition ofthis program: what has changed sincethe first edition and what has remainedthe same?

Zegers: Since the first edition, theworld-class level of keynote and tutorialspeakers has remained. We continue toaim at covering current research andapplications of neural networks, fuzzylogic, and evolutionary computation.Nevertheless, the selection of relevanttopics has changed with the interests ofcomputational intelligence researchers,providing a level of novelty to eachedition. This year, the program focuswill be on analytics and deep learning.

32Computer Vision News

Upcoming Events

Even

t

EVIC - IEEE Latin-American Summer School on Computational Intelligence

Page 33: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

ECCV - European Conference on Computer VisionAmsterdam, Netherlands Oct. 8-16

RE•WORK Women in Machine Intelligence & HealthcareLondon, UK Oct. 12

MICCAI - Medical Image Computing and Computer Assisted Intervention

Athens, Greece Oct. 17-21

RE•WORK Deep Learning SummitSingapore Oct. 20-21

International Conference on Computer Vision and Image Processing

Malaga, Spain Oct. 20-21

ACIVS - Advanced Concepts for Intelligent Vision SystemsLecce, Italy Oct. 24-27

HCOMP 2016 - Human ComputationAustin TX, USA Oct. 30-Nov. 3

RE•WORK Machine Intelligence SummitNew York NY, USA Nov. 2-3

International Conference on Computer Vision and Image Processing

Paris, France Nov. 21-22

ACCV’16: The 13th Asian Conference on Computer VisionTaipei, Taiwan Nov. 20-24

International Conference on Pattern Recognition and Computer Vision

London, UK Nov. 24-25

DICTA 2016: Digital Image Computing Techniques and Applications

Gold Coast, Australia Nov. 30-Dec. 2

ICPR 2016: International Conference on Pattern RecognitionCancun, Mexico Dec. 4-8

ISVC 2016: International Symposium on Visual ComputingLas Vegas NV, USA Dec. 12-14

IPTA 2016: Image Processing Theory, Tools and ApplicationsOulu, Finland Dec. 12-15

ICCVISP 2016 : Computer Vision, Image and Signal ProcessingLondon, UK Dec. 15-16

ICVGIP 2016: Computer Vision, Graphics and Image ProcessingGuwahati, India Dec. 18-22

Did we miss an event? Tell us: [email protected]

Dear reader,

Do you enjoy readingComputer Vision News?Would you like to receive itfor free in your mailboxevery month?

You will fill the SubscriptionForm in less than 1 minute.Join many others computervision professionals andreceive all issues ofComputer Vision News assoon as we publish them.You can also read ComputerVision News on our websiteand find in our archive newand old issues as well.

We hate SPAM andpromise to keep your emailaddress safe, always.

FREESUBSCRIPTION

Subscription Form(click here, it’s free)

Upcoming Events 33Computer Vision News

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Website and Registration

Page 34: Computer Vision News...Now, for instance, if you select a color image with the image browser app - by double clicking on the image - you can use the new features of the color threshold

Our CEO Ron Soferman has launched a series of lectures to provide a robust yetsimple overview of how to ensure that computer vision projects respect goals,budget and deadlines. This month, we will learn about the Full kit of Requirements.

One of the principles in project andsoftware development is the need fora full kit of requirements beforestarting the actual development.

It may happen that the imageprocessing team is given many taskswith tight deadlines, in order tosupport the roadmap of productdevelopment. This often leads theprocess into a bottleneck.

If the project is not launched in asystematic way, it might take severaldevelopment cycles until the desiredproduct is complete. In many cases,the reason is that the fullspecifications of the product are notwell defined since the beginning: thedevelopment team is eager to start itswork, managers are very determinedto keep up with schedule andeveryone wants to start as soon aspossible. But without a full kitincluding all the detailed requirementswhich are needed for thedevelopment, deadlines can’t be met.

Full kit for research:Sometimes, project managers sensethat full requirements cannot bedefined upfront because of technicaland algorithmic unknowns. Generalproduct requirements are probablyknown, but they may be fluctuatinguntil the software development step.

Nevertheless, I recommend that thefull kit in the research stage include allthat is required for research: completeand realistic data in sufficient quantity,without which the developedalgorithm might behave differentlywhen confronted with real data. Theresearch dataset should include allscenarios, especially in the medicalfield where an algorithm tested ondata belonging to healthy peoplemight be useless when used onpatients with pathologies. Collectingrelevant data and analysing itspossible appearances is a difficult butnecessary task.

The reason for the need of a full kit inresearch is that algorithm must takeall cases into account. With onlylimited examples, it will be a naïfalgorithm that will not hold whenused on the full variety.

“Without a full kit includingall the detailed requirements,

it might take several development cycles until the product is complete”

34Computer Vision News

Project Management Tip

Full-kit of Requirements

Man

ageme

nt

“Include completeand realistic data

in sufficient quantity”