wits presentation 6_28072015

Object Recognition Tutorial Beatrice van Eden

- Part time PhD Student at the University of the Witwatersrand.- Fulltime employee of the Council for Scientific and Industrial Research.

Research Problem

• Hierarchical concept formation

• This research will allow a robot to learn about its environment autonomously

• Build a concept about these environments

• Even if it has not seen that specific instance previously

Why Object Recognition • Environments are build up by different objects• RGB-D Sensor for perception• Concept formation need some base line to work from• Exposure to ML techniques

• Cascading classifiers• Convolutional Neural Networks• Support Vector Machine

Index: Cascading Classifiers• Cascading classifiers• Haar-like features• Local binary patterns

• Implementation• Results

Cascading classifiers

• Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade.

Haar-like features• The difference of the sum of pixels of areas inside the

rectangle

• The values indicate certain characteristics of a particular area of the image.

Haar-like features• The Viola-Jones detector is a strong, binary classifier build of

several weak detectors

• Does a certain sub-region of the original image contain an instance of the object of interest or not

Local binary patterns• Divide the examined window into cells (e.g. 16x16 pixels for

each cell).• For each pixel in a cell, compare the pixel to each of its 8

neighbours (on its left-top, left-middle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or counter-clockwise.

• Where the centre pixel's value is greater than the neighbour's value, write "1". Otherwise, write "0". This gives an 8-digit binary number.

• Compute the histogram, over the cell, of the frequency of each "number" occurring.

• Optionally normalize the histogram.• Concatenate (normalized) histograms of all cells. This gives the

feature vector for the window.

Local binary patterns• a Powerful feature for texture classification

• LBP is faster but less accurate than Haar. • LBP does all the calculations in integers. Haar uses floats.• LBP few hours of training Haar few days

Implementation• SAMPLES - How many images do we need? • Depend on a variety of factors, including the quality of the

images, the object you want to recognize, the method to generate the samples, the CPU power you have and probably some magic.

• Positive images 50 -> 1500, list in .txt file.

• Negative images 1500, list in .txt file.

Implementation• Create samples with OpenCV, generates a large number of

positive samples from our positive images, by applying transformations and distortions. Used a Perl script to combine positive image with negative image

• *.vec file are created, merge them into one

• opencv_haartraining and opencv_traincascade. opencv_traincascade supports both Haar [Viola2001] and LBP [Liao2007] (Local Binary Patterns) features.

Implementation

Implementation• http://

coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html

• Video LBP – Coke Can • Video Haar – Coke Can• Video LBP – Face recognition

• Choose amount of stages to train

http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html



Results• To be generated – working on confusion matrix•

Index: CNN• Convolutional Neural Networks• Example• Overview and Intuition• Implementation• Results

Convolutional Neural Networks• Neural network vs. Convolutional neural network

• Layers used to build ConvNets• Convolutional Layer, Pooling Layer, and Fully-Connected Layer

(exactly as seen in regular Neural Networks).

Example• Input:

• Image: width 32, height 32, three colour channels.• CONV layer:

• Local filter over previous layer• Dot product between weights and sliding region in the input volume.

[32x32x12]• RELU layer:

• Apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged.

• POOL layer:• Down sampling operation along the spatial dimensions (width, height).

[16x16x12]• FC layer:

• Compute the class scores. As with ordinary Neural Networks each neuron in this layer will be connected to all the numbers in the previous volume.

Convolutional Neural Networks

• CNN is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field.

Overview and Intuition• CONV layer's parameters consist of a set of learnable filters• Every filter is small spatially (along width and height), but

extends through the full depth of the input volume• As we slide the filter, across the input, we are computing the

dot product between the entries of the filter and the input• Intuitively, the network will learn filters that activate when

they see some specific type of feature at some spatial position in the input

• Stacking these activation maps for all filters along the depth dimension forms the full output volume

Convolutional Neural Networks• Three hyperparameters control the size of the output volume:

the depth, stride and zero-padding• Depth of the output volume is a hyperparameter that we can

pick. It controls the number of neurons in the Conv layer that connect to the same region of the input volume.

• We specify the stride with which we allocate depth columns around the spatial dimensions (width and height).

• Zero padding allow us to control the spatial size of the output volumes.

Example filters learned

Implementationhttp://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/#the-data

• Lasagne, a library for building neural networks with Python and Theano.

• CPU vs. CUDA-capable GPU• Ran the MNIST example (Recognise 0-9 digits). • Facial key point

• Data available as *.csv files. Load training and test data.• Video CNN – Coke Can • Video CNN – Coke Can

The predictions of net1 on the left compared to the predictions of net2.

http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/



Results• To be generated – working on confusion matrix

•

Index: SVM• Support Vector Machine • Histogram of Oriented Gradients

• Implementation• Results

Support Vector Machine • Given a set of training examples, each marked for belonging to

one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.

What is the goal of the Support Vector Machine (SVM)?• The goal of a support vector machine is to find the optimal

separating hyperplane which maximizes the margin of the training data.

Histogram of Oriented Gradients• The technique counts occurrences of gradient orientation in

localized portions of an image

• The descriptor is made up of M*N cells covering the image window in a grid.

• Each cell is represented by a histogram of edge orientations, where the number of discretized edge orientations is a parameter (usually 9).

• The cell histogram is visualized by a 'star' showing the strength of the edge orientations in the histogram: the stronger a specific orientation, the longer it is relative to the others.

• Note that there are various normalization schemes: • Local schemes, in which the cell in normalized with respect to

neighboring cells only [Dalal-Triggs]• Global schemes, in which the orientation length is normalized by

all the cells • Also note that some authors use multiple local normalizations per

cell

Histogram of Oriented Gradients

The example below shows a model of a bike (from Felzenszwalb et al.) with HoG consisting of 7*11 cells, each with 8 orientations

• (a) Test image• (b) Gradient image of the test image• (c) Orientation and magnitude of Gradient in each cell• (d) HoG of cells• (e) Average gradient image over the training example• (f) Weights of positive SVM in the block • (g) HoG descriptor weighted by the positive SVM weights

Histogram of Oriented Gradients

Implementation• http://

solvedstack.com/questions/svm-classifier-based-on-hog-features-for-object-detection-in-opencv

• http://thebrainiac1.blogspot.com/2012/07/v-behaviorurldefaultvmlo.html

• Video HoG – Coke Can • Video HoG – Face recognition

http://solvedstack.com/questions/svm-classifier-based-on-hog-features-for-object-detection-in-opencv



http://thebrainiac1.blogspot.com/2012/07/v-behaviorurldefaultvmlo.html



Implementation

Results• To be generated – working on confusion matrix

Conclusion • Cascading classifiers• Convolutional Neural Networks• Support Vector Machine

Thank you

wits presentation 6_28072015

Education

classifiers haar

haar coke

binary classifier

haar viola2001

positive images

implementation samples

video lbp coke

lbp face