wits presentation 6_28072015
TRANSCRIPT
Object Recognition Tutorial Beatrice van Eden
- Part time PhD Student at the University of the Witwatersrand.- Fulltime employee of the Council for Scientific and Industrial Research.
Research Problem
• Hierarchical concept formation
• This research will allow a robot to learn about its environment autonomously
• Build a concept about these environments
• Even if it has not seen that specific instance previously
Why Object Recognition • Environments are build up by different objects• RGB-D Sensor for perception• Concept formation need some base line to work from• Exposure to ML techniques
• Cascading classifiers• Convolutional Neural Networks• Support Vector Machine
Index: Cascading Classifiers• Cascading classifiers• Haar-like features• Local binary patterns
• Implementation• Results
Cascading classifiers
• Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade.
Haar-like features• The difference of the sum of pixels of areas inside the
rectangle
• The values indicate certain characteristics of a particular area of the image.
Haar-like features• The Viola-Jones detector is a strong, binary classifier build of
several weak detectors
• Does a certain sub-region of the original image contain an instance of the object of interest or not
Local binary patterns• Divide the examined window into cells (e.g. 16x16 pixels for
each cell).• For each pixel in a cell, compare the pixel to each of its 8
neighbours (on its left-top, left-middle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or counter-clockwise.
• Where the centre pixel's value is greater than the neighbour's value, write "1". Otherwise, write "0". This gives an 8-digit binary number.
• Compute the histogram, over the cell, of the frequency of each "number" occurring.
• Optionally normalize the histogram.• Concatenate (normalized) histograms of all cells. This gives the
feature vector for the window.
Local binary patterns• a Powerful feature for texture classification
• LBP is faster but less accurate than Haar. • LBP does all the calculations in integers. Haar uses floats.• LBP few hours of training Haar few days
Implementation• SAMPLES - How many images do we need? • Depend on a variety of factors, including the quality of the
images, the object you want to recognize, the method to generate the samples, the CPU power you have and probably some magic.
• Positive images 50 -> 1500, list in .txt file.
• Negative images 1500, list in .txt file.
Implementation• Create samples with OpenCV, generates a large number of
positive samples from our positive images, by applying transformations and distortions. Used a Perl script to combine positive image with negative image
• *.vec file are created, merge them into one
• opencv_haartraining and opencv_traincascade. opencv_traincascade supports both Haar [Viola2001] and LBP [Liao2007] (Local Binary Patterns) features.
Implementation
Implementation• http://
coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
• Video LBP – Coke Can • Video Haar – Coke Can• Video LBP – Face recognition
• Choose amount of stages to train
Results• To be generated – working on confusion matrix•
Index: CNN• Convolutional Neural Networks• Example• Overview and Intuition• Implementation• Results
Convolutional Neural Networks• Neural network vs. Convolutional neural network
• Layers used to build ConvNets• Convolutional Layer, Pooling Layer, and Fully-Connected Layer
(exactly as seen in regular Neural Networks).
Example• Input:
• Image: width 32, height 32, three colour channels.• CONV layer:
• Local filter over previous layer• Dot product between weights and sliding region in the input volume.
[32x32x12]• RELU layer:
• Apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged.
• POOL layer:• Down sampling operation along the spatial dimensions (width, height).
[16x16x12]• FC layer:
• Compute the class scores. As with ordinary Neural Networks each neuron in this layer will be connected to all the numbers in the previous volume.
Convolutional Neural Networks
• CNN is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field.
Overview and Intuition• CONV layer's parameters consist of a set of learnable filters• Every filter is small spatially (along width and height), but
extends through the full depth of the input volume• As we slide the filter, across the input, we are computing the
dot product between the entries of the filter and the input• Intuitively, the network will learn filters that activate when
they see some specific type of feature at some spatial position in the input
• Stacking these activation maps for all filters along the depth dimension forms the full output volume
Convolutional Neural Networks• Three hyperparameters control the size of the output volume:
the depth, stride and zero-padding• Depth of the output volume is a hyperparameter that we can
pick. It controls the number of neurons in the Conv layer that connect to the same region of the input volume.
• We specify the stride with which we allocate depth columns around the spatial dimensions (width and height).
• Zero padding allow us to control the spatial size of the output volumes.
Example filters learned
Implementationhttp://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/#the-data
• Lasagne, a library for building neural networks with Python and Theano.
• CPU vs. CUDA-capable GPU• Ran the MNIST example (Recognise 0-9 digits). • Facial key point
• Data available as *.csv files. Load training and test data.• Video CNN – Coke Can • Video CNN – Coke Can
The predictions of net1 on the left compared to the predictions of net2.
Results• To be generated – working on confusion matrix
•
Index: SVM• Support Vector Machine • Histogram of Oriented Gradients
• Implementation• Results
Support Vector Machine • Given a set of training examples, each marked for belonging to
one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.
What is the goal of the Support Vector Machine (SVM)?• The goal of a support vector machine is to find the optimal
separating hyperplane which maximizes the margin of the training data.
Histogram of Oriented Gradients• The technique counts occurrences of gradient orientation in
localized portions of an image
• The descriptor is made up of M*N cells covering the image window in a grid.
• Each cell is represented by a histogram of edge orientations, where the number of discretized edge orientations is a parameter (usually 9).
• The cell histogram is visualized by a 'star' showing the strength of the edge orientations in the histogram: the stronger a specific orientation, the longer it is relative to the others.
• Note that there are various normalization schemes: • Local schemes, in which the cell in normalized with respect to
neighboring cells only [Dalal-Triggs]• Global schemes, in which the orientation length is normalized by
all the cells • Also note that some authors use multiple local normalizations per
cell
Histogram of Oriented Gradients
The example below shows a model of a bike (from Felzenszwalb et al.) with HoG consisting of 7*11 cells, each with 8 orientations
• (a) Test image• (b) Gradient image of the test image• (c) Orientation and magnitude of Gradient in each cell• (d) HoG of cells• (e) Average gradient image over the training example• (f) Weights of positive SVM in the block • (g) HoG descriptor weighted by the positive SVM weights
Histogram of Oriented Gradients
Implementation• http://
solvedstack.com/questions/svm-classifier-based-on-hog-features-for-object-detection-in-opencv
• http://thebrainiac1.blogspot.com/2012/07/v-behaviorurldefaultvmlo.html
• Video HoG – Coke Can • Video HoG – Face recognition
Implementation
Results• To be generated – working on confusion matrix
Conclusion • Cascading classifiers• Convolutional Neural Networks• Support Vector Machine
Thank you