wits presentation 6_28072015

33
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific and Industrial Research.

Upload: beatrice-van-eden

Post on 14-Aug-2015

110 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Wits presentation 6_28072015

Object Recognition Tutorial Beatrice van Eden

- Part time PhD Student at the University of the Witwatersrand.- Fulltime employee of the Council for Scientific and Industrial Research.

Page 2: Wits presentation 6_28072015

Research Problem

• Hierarchical concept formation

• This research will allow a robot to learn about its environment autonomously

• Build a concept about these environments

• Even if it has not seen that specific instance previously

Page 3: Wits presentation 6_28072015

Why Object Recognition • Environments are build up by different objects• RGB-D Sensor for perception• Concept formation need some base line to work from• Exposure to ML techniques

• Cascading classifiers• Convolutional Neural Networks• Support Vector Machine

Page 4: Wits presentation 6_28072015

Index: Cascading Classifiers• Cascading classifiers• Haar-like features• Local binary patterns

• Implementation• Results

Page 5: Wits presentation 6_28072015

Cascading classifiers

• Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade.

Page 6: Wits presentation 6_28072015

Haar-like features• The difference of the sum of pixels of areas inside the

rectangle

• The values indicate certain characteristics of a particular area of the image.

Page 7: Wits presentation 6_28072015

Haar-like features• The Viola-Jones detector is a strong, binary classifier build of

several weak detectors

• Does a certain sub-region of the original image contain an instance of the object of interest or not

Page 8: Wits presentation 6_28072015

Local binary patterns• Divide the examined window into cells (e.g. 16x16 pixels for

each cell).• For each pixel in a cell, compare the pixel to each of its 8

neighbours (on its left-top, left-middle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or counter-clockwise.

• Where the centre pixel's value is greater than the neighbour's value, write "1". Otherwise, write "0". This gives an 8-digit binary number.

• Compute the histogram, over the cell, of the frequency of each "number" occurring.

• Optionally normalize the histogram.• Concatenate (normalized) histograms of all cells. This gives the

feature vector for the window.

Page 9: Wits presentation 6_28072015

Local binary patterns• a Powerful feature for texture classification

• LBP is faster but less accurate than Haar. • LBP does all the calculations in integers. Haar uses floats.• LBP few hours of training Haar few days

Page 10: Wits presentation 6_28072015

Implementation• SAMPLES - How many images do we need? • Depend on a variety of factors, including the quality of the

images, the object you want to recognize, the method to generate the samples, the CPU power you have and probably some magic.

• Positive images 50 -> 1500, list in .txt file.

• Negative images 1500, list in .txt file.

Page 11: Wits presentation 6_28072015

Implementation• Create samples with OpenCV, generates a large number of

positive samples from our positive images, by applying transformations and distortions. Used a Perl script to combine positive image with negative image

• *.vec file are created, merge them into one

• opencv_haartraining and opencv_traincascade. opencv_traincascade supports both Haar [Viola2001] and LBP [Liao2007] (Local Binary Patterns) features.

Page 12: Wits presentation 6_28072015

Implementation

Page 13: Wits presentation 6_28072015

Implementation• http://

coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html

• Video LBP – Coke Can • Video Haar – Coke Can• Video LBP – Face recognition

• Choose amount of stages to train

Page 14: Wits presentation 6_28072015

Results• To be generated – working on confusion matrix•

Page 15: Wits presentation 6_28072015

Index: CNN• Convolutional Neural Networks• Example• Overview and Intuition• Implementation• Results

Page 16: Wits presentation 6_28072015

Convolutional Neural Networks• Neural network vs. Convolutional neural network

• Layers used to build ConvNets• Convolutional Layer, Pooling Layer, and Fully-Connected Layer

(exactly as seen in regular Neural Networks).

Page 17: Wits presentation 6_28072015

Example• Input:

• Image: width 32, height 32, three colour channels.• CONV layer:

• Local filter over previous layer• Dot product between weights and sliding region in the input volume.

[32x32x12]• RELU layer:

• Apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged.

• POOL layer:• Down sampling operation along the spatial dimensions (width, height).

[16x16x12]• FC layer:

• Compute the class scores. As with ordinary Neural Networks each neuron in this layer will be connected to all the numbers in the previous volume.

Page 18: Wits presentation 6_28072015

Convolutional Neural Networks

• CNN is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field.

Page 19: Wits presentation 6_28072015

Overview and Intuition• CONV layer's parameters consist of a set of learnable filters• Every filter is small spatially (along width and height), but

extends through the full depth of the input volume• As we slide the filter, across the input, we are computing the

dot product between the entries of the filter and the input• Intuitively, the network will learn filters that activate when

they see some specific type of feature at some spatial position in the input

• Stacking these activation maps for all filters along the depth dimension forms the full output volume

Page 20: Wits presentation 6_28072015

Convolutional Neural Networks• Three hyperparameters control the size of the output volume:

the depth, stride and zero-padding• Depth of the output volume is a hyperparameter that we can

pick. It controls the number of neurons in the Conv layer that connect to the same region of the input volume.

• We specify the stride with which we allocate depth columns around the spatial dimensions (width and height).

• Zero padding allow us to control the spatial size of the output volumes.

Example filters learned

Page 21: Wits presentation 6_28072015

Implementationhttp://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/#the-data

• Lasagne, a library for building neural networks with Python and Theano.

• CPU vs. CUDA-capable GPU• Ran the MNIST example (Recognise 0-9 digits). • Facial key point

• Data available as *.csv files. Load training and test data.• Video CNN – Coke Can • Video CNN – Coke Can

The predictions of net1 on the left compared to the predictions of net2.

Page 22: Wits presentation 6_28072015

Results• To be generated – working on confusion matrix

Page 23: Wits presentation 6_28072015

Index: SVM• Support Vector Machine • Histogram of Oriented Gradients

• Implementation• Results

Page 24: Wits presentation 6_28072015

Support Vector Machine • Given a set of training examples, each marked for belonging to

one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other.

Page 25: Wits presentation 6_28072015

What is the goal of the Support Vector Machine (SVM)?• The goal of a support vector machine is to find the optimal

separating hyperplane which maximizes the margin of the training data.

Page 26: Wits presentation 6_28072015

Histogram of Oriented Gradients• The technique counts occurrences of gradient orientation in

localized portions of an image

• The descriptor is made up of M*N cells covering the image window in a grid.

• Each cell is represented by a histogram of edge orientations, where the number of discretized edge orientations is a parameter (usually 9).

• The cell histogram is visualized by a 'star' showing the strength of the edge orientations in the histogram: the stronger a specific orientation, the longer it is relative to the others.

Page 27: Wits presentation 6_28072015

• Note that there are various normalization schemes: • Local schemes, in which the cell in normalized with respect to

neighboring cells only [Dalal-Triggs]• Global schemes, in which the orientation length is normalized by

all the cells • Also note that some authors use multiple local normalizations per

cell

Histogram of Oriented Gradients

The example below shows a model of a bike (from Felzenszwalb et al.) with HoG consisting of 7*11 cells, each with 8 orientations

Page 28: Wits presentation 6_28072015

• (a) Test image• (b) Gradient image of the test image• (c) Orientation and magnitude of Gradient in each cell• (d) HoG of cells• (e) Average gradient image over the training example• (f) Weights of positive SVM in the block • (g) HoG descriptor weighted by the positive SVM weights

Histogram of Oriented Gradients

Page 30: Wits presentation 6_28072015

Implementation

Page 31: Wits presentation 6_28072015

Results• To be generated – working on confusion matrix

Page 32: Wits presentation 6_28072015

Conclusion • Cascading classifiers• Convolutional Neural Networks• Support Vector Machine

Page 33: Wits presentation 6_28072015

Thank you