cis – 750 advisor – longin jan latecki presented by – venugopal rajagopal

Object Tracking And Object Tracking And SHOSLIF Tree Based SHOSLIF Tree Based Classification Using Classification Using

Shape And Color Shape And Color FeaturesFeatures

Author: Lucio Marcenaro, Franco Oberti and Carlo Author: Lucio Marcenaro, Franco Oberti and Carlo S. RegazzoniS. Regazzoni

CIS – 750CIS – 750Advisor – Longin Jan LateckiAdvisor – Longin Jan LateckiPresented by – Venugopal Presented by – Venugopal

RajagopalRajagopal

IntroductionIntroduction► Main functionalities of video- surveillance systemMain functionalities of video- surveillance system Detection Detection Tracking of objects Tracking of objects acting within the guarded environmentacting within the guarded environment► Higher modules of the system responsible for object and event Higher modules of the system responsible for object and event

classificationclassification► Shape, color, motion are the frequent features used to achieve Shape, color, motion are the frequent features used to achieve

the above tasks.the above tasks.► Here we use shape and color related features for tracking and Here we use shape and color related features for tracking and

recognizing objects.recognizing objects. Shape – to classify among different postures, provides finer Shape – to classify among different postures, provides finer discriminant feature allowing objects within a same discriminant feature allowing objects within a same

general general class to be classifiedclass to be classified Histogram – basis for classification between different objects Histogram – basis for classification between different objects

Introduction (contd.)Introduction (contd.)►Novel approach for tracking and Novel approach for tracking and

recognitionrecognition►Corner groups and objects histograms Corner groups and objects histograms

are used as basis features for multilevel are used as basis features for multilevel shape representation.shape representation.

►Methods for representing models for the Methods for representing models for the tracking and recognition phases are tracking and recognition phases are based on Generalized Hough Transform based on Generalized Hough Transform and on SHOSLIF trees.and on SHOSLIF trees.

System ArchitectureSystem Architecture

Sensor(Camera)

High level IP Classification

Short term memory

SHOSLIFtree

Low level IP ROI

BlobsCorners basedrepresentation

CornersHistogram Corners

Histogram

Surveillance System Modules

Low Level Image ProcessingLow Level Image Processing► Performs the first Stage of Abstraction from the sequence Performs the first Stage of Abstraction from the sequence

acquired from the sensor to the representation that is used for acquired from the sensor to the representation that is used for tracking and classification.tracking and classification.

► From the acquired frame mobile areas of the image (blobs) From the acquired frame mobile areas of the image (blobs) are detected by a frame-background difference and analyzed are detected by a frame-background difference and analyzed by extracting numerical characteristics (e.g.. Geometrical and by extracting numerical characteristics (e.g.. Geometrical and shape properties)shape properties)

► Blob Analysis is performed by the following modules:Blob Analysis is performed by the following modules: Change detection: By using statistical morphological operators Change detection: By using statistical morphological operators

it identifies the mobile blobs present in the image that exhibit it identifies the mobile blobs present in the image that exhibit remarkable difference with respect to the background. remarkable difference with respect to the background.

Focus of attention: The minimum rectangle bounding (MBR) Focus of attention: The minimum rectangle bounding (MBR) each of the blob in the image is detected using some fast each of the blob in the image is detected using some fast image segmentation algorithm.image segmentation algorithm.

► History of Detected ROI and blobs are maintained in terms of History of Detected ROI and blobs are maintained in terms of temporal graph, which are used for further processing by the temporal graph, which are used for further processing by the higher level modules.higher level modules.

Detecting BLOBS and MBRDetecting BLOBS and MBR

Temporal GraphTemporal Graph► Temporal graph provides information on the current Temporal graph provides information on the current

bounding boxes and their relations to the boxes bounding boxes and their relations to the boxes detected in the previous frames.detected in the previous frames.

► All nodes of each level are the blobs detected in each All nodes of each level are the blobs detected in each frame.frame.

► Relationships among the blobs belonging to different Relationships among the blobs belonging to different adjacent levels are represented as arcs between the adjacent levels are represented as arcs between the nodes. nodes.

► Arcs are inserted on the basis of superposition of the Arcs are inserted on the basis of superposition of the blob areas on the image plane. Iblob areas on the image plane. If a blob at step (k-1) overlaps a blob at step k, then a link between them is created, so that the blob at step (k-1) is called "father" of the blob at time step k (its "son").

Temporal Graph (contd.)Temporal Graph (contd.)► Different events can occur: 1) If a blob has only one "father", its type is set "one-

overlapping" (type o), and father label is assigned to it. 2) If a blob has more than one "father", its type is set to

"merge" (type m), and a new label is assigned. 3) If a blob is not the only "son" of its father, its type is

set to "split" (type s), and a new label is assigned. 4) If a blob has no "father", its type is set to "new" (type

n) and a new label is assigned.

Temporal Graph (contd.)Temporal Graph (contd.)

A sequence of images showing critical cases of blob splitting, merging and displacement. Each image contains the detected blobs with their numerical label and type.

Temporal Graphs (contd.)Temporal Graphs (contd.)

► Figure showing the bounding boxes and the Figure showing the bounding boxes and the temporal graph representing the temporal graph representing the correspondences between the bounding boxescorrespondences between the bounding boxes

High Level Image ProcessingHigh Level Image Processing(Corner Extraction)(Corner Extraction)

► High level image processing extracts high-curvature points High level image processing extracts high-curvature points (corners) and histograms from each detected object.(corners) and histograms from each detected object.

► General procedure to extract corners:General procedure to extract corners:1.1. Gradient of the input gray-level image is computed using the Gradient of the input gray-level image is computed using the

Sobel Operator.Sobel Operator.2.2. Edges are extracted by using the gradient magnitude. A pixel Edges are extracted by using the gradient magnitude. A pixel

of the image is considered to be a point of an edge if its of the image is considered to be a point of an edge if its gradient magnitude is greater than a fixed threshold.gradient magnitude is greater than a fixed threshold.

3.3. If large variation in the direction of the gradient is found in a If large variation in the direction of the gradient is found in a neighborhood of edge points, then a corner is detected. neighborhood of edge points, then a corner is detected.

► Given an image to extract corners:Given an image to extract corners: Edges are extracted first using sobel filter. The maximum Edges are extracted first using sobel filter. The maximum

variation of gradient direction of the edges points inside a variation of gradient direction of the edges points inside a square kernel is evaluated. If the maximum variation is square kernel is evaluated. If the maximum variation is greater than a threshold then the pixel at the center of the greater than a threshold then the pixel at the center of the kernel is selected as a corner and its gradient direction is kernel is selected as a corner and its gradient direction is fixed as a the corner direction.fixed as a the corner direction.

Corner Extraction (contd.)Corner Extraction (contd.)

This Figure shows the corner extraction steps:a) Original Imageb) Edges imagec) Corners extracted

Tracking and Recognition Tracking and Recognition ModulesModules

► System uses short-term memory, associated System uses short-term memory, associated with the tracking process and long term with the tracking process and long term memory associated with the recognition memory associated with the recognition process.process.

► This module performs tasks based on two This module performs tasks based on two working modalities: learning and matchingworking modalities: learning and matching

► The tracking module enters in this modality The tracking module enters in this modality whenever the object is not overlapped in whenever the object is not overlapped in order to update the short-term object model.order to update the short-term object model.

► The recognition module builds up a self The recognition module builds up a self organizing tree during the learning modality.organizing tree during the learning modality.

Tracking and Recognition Tracking and Recognition Modules (contd.)Modules (contd.)

► Recognition Module (learning phase): A set of Recognition Module (learning phase): A set of human classified samples presented to the tree human classified samples presented to the tree which automatically organizes them in such a which automatically organizes them in such a way to maximize the inter-class distances, way to maximize the inter-class distances, minimizing the intra-class variances.minimizing the intra-class variances.

► Recognition Module (Matching phase): SHOSLIF Recognition Module (Matching phase): SHOSLIF tree used for objects classification, each object tree used for objects classification, each object that has been detected from the lower levels of that has been detected from the lower levels of the system is presented to the classification the system is presented to the classification tree that outputs the estimated class for that tree that outputs the estimated class for that object and the nearest training sample.object and the nearest training sample.

Generalized Hough Transform Generalized Hough Transform (GHT)(GHT)

► Technique used to find arbitrary curves in an Technique used to find arbitrary curves in an image without having a parametric equation image without having a parametric equation of them.of them.

► A look-up table called R-table is used to A look-up table called R-table is used to model the template shape of the object.model the template shape of the object.

► This R-table is used as a transform This R-table is used as a transform mechanism.mechanism.

► To build the R-table first a reference point and To build the R-table first a reference point and several feature points of the shape are several feature points of the shape are selected. selected.

GHT (contd.)GHT (contd.)

Given a shape we wish to localize the first stage is to build up a look up tableKnown as R-table which will replace the need for parametric equation in the Transform stage.


For each feature point the orientation “omega” of the tangential line at that For each feature point the orientation “omega” of the tangential line at that point, the length “r”, and the orientation “beta” of the radial vector that joins point, the length “r”, and the orientation “beta” of the radial vector that joins the reference point and the feature point can be calculated.the reference point and the feature point can be calculated.

GHT (contd.)GHT (contd.)► If “n” is the number of feature points, a If “n” is the number of feature points, a

indexed table of size n X 2 can be created indexed table of size n X 2 can be created using all “n” pairs (r,beta) and using using all “n” pairs (r,beta) and using “omega” as index.“omega” as index.

► This table is the model of the shape and it This table is the model of the shape and it can be used with a transformation to find can be used with a transformation to find occurrences of the same object in other occurrences of the same object in other images.images.

► The shape is localized using a voting The shape is localized using a voting technique. technique.


► Given an unknown image each edge point is Given an unknown image each edge point is segmented and its orientation “omega” is calculated. segmented and its orientation “omega” is calculated. Using “omega” as an index into the R-table each Using “omega” as an index into the R-table each (r,beta) at this location is extracted.(r,beta) at this location is extracted.


► Using the pair (r,beta), the possible position for the Using the pair (r,beta), the possible position for the reference point can be computed and an reference point can be computed and an accumulator of its position is incremented, the accumulator of its position is incremented, the maximum accumulator value will occur with high maximum accumulator value will occur with high probability at the actual reference point.probability at the actual reference point.

Modified GHTModified GHT► In our approach the GHT is modified to In our approach the GHT is modified to

automatically extract the model of the automatically extract the model of the object (R-table) and also to individuate the object (R-table) and also to individuate the position of the object (voting).position of the object (voting).

► Corners extracted from the object are used Corners extracted from the object are used as feature points and a different as feature points and a different parameterization is used. Instead of using parameterization is used. Instead of using pairs (r, beta), pairs (dx, dy) are used where pairs (r, beta), pairs (dx, dy) are used where dx and dy are the differences in “x” and “y” dx and dy are the differences in “x” and “y” with respect to the reference point.with respect to the reference point.

► Instead of using a 2 X N indexed table, a 3 X Instead of using a 2 X N indexed table, a 3 X N table is used. N table is used.

Modified GHT (contd.)Modified GHT (contd.)

► The first value is the angle direction “omega” of the gradient The first value is the angle direction “omega” of the gradient vector at the corner position with respect to the original image. vector at the corner position with respect to the original image. The obtained triplet (omega, dx, dy) is used to model the The obtained triplet (omega, dx, dy) is used to model the position and orientation of the corner with respect to the position and orientation of the corner with respect to the reference point. Here in this approach for a corner not all of the reference point. Here in this approach for a corner not all of the “n” possible corners are voted, but only the ones that have “n” possible corners are voted, but only the ones that have “omega” similar to the one obtained, thus minimizing “omega” similar to the one obtained, thus minimizing computational time and memory requirements.computational time and memory requirements.

Corner Based TrackerCorner Based Tracker► The output from the Low level Image processing The output from the Low level Image processing

stage (MBR’s and the correspondence graphs) is stage (MBR’s and the correspondence graphs) is used as the input for the tracking stage in order used as the input for the tracking stage in order to detect the objects present in isolated boxes to detect the objects present in isolated boxes when they merge so forming a group.when they merge so forming a group.

► Learning model phase applied to the isolated Learning model phase applied to the isolated rectangles in 2 or 3 frames before the union rectangles in 2 or 3 frames before the union takes place.takes place.

►When the boxes are merged the matching When the boxes are merged the matching phase used in order to find the position of the phase used in order to find the position of the objects inside the merged rectangles.objects inside the merged rectangles.

Corner Based Tracker (Learning Corner Based Tracker (Learning Phase)Phase)

► The input is the gray level image of the desired object. The center The input is the gray level image of the desired object. The center of the gray-level image is selected as reference point. of the gray-level image is selected as reference point.

► Gradient operator applied to extract edges (sobel) and for every Gradient operator applied to extract edges (sobel) and for every edge point the direction of the gradient is calculated. Then the edge point the direction of the gradient is calculated. Then the corners are extracted. corners are extracted.

► For each corner “dx” and “dy” calculated and stored in the R-For each corner “dx” and “dy” calculated and stored in the R-table, which represents the obtained model of the object.table, which represents the obtained model of the object.

► For robustness the previous method is applied to different images For robustness the previous method is applied to different images of the object (frames of a sequence), and a unique R-table is of the object (frames of a sequence), and a unique R-table is constructed by selecting the corners that are present in most of constructed by selecting the corners that are present in most of the images at the same location and with the same orientation.the images at the same location and with the same orientation.

Corner Based Tracker (Matching Corner Based Tracker (Matching Phase)Phase)

► The input are the R-table of the searched object and the The input are the R-table of the searched object and the gray level image in which the object should be present. gray level image in which the object should be present.

► As in the learning phase the gradient operator applied As in the learning phase the gradient operator applied to the input image and corners extracted.to the input image and corners extracted.

► For every extracted corner, “omega” computed and if For every extracted corner, “omega” computed and if present in the R-table, then the possible position for the present in the R-table, then the possible position for the reference point can be calculated using (dx, dy) and its reference point can be calculated using (dx, dy) and its accumulator can be incremented. As in the GHT the accumulator can be incremented. As in the GHT the reference point will be found at the maximum reference point will be found at the maximum accumulator value.accumulator value.

Object ClassificationObject Classification► The long term recognition module uses corner The long term recognition module uses corner

representation and histograms features extracted by the representation and histograms features extracted by the Image Processing modules (previous steps) as a basis for Image Processing modules (previous steps) as a basis for objects classification.objects classification.

► SHOSLIF (Self Organizing Hierarchical Optimal Subspace SHOSLIF (Self Organizing Hierarchical Optimal Subspace Learning and Interference Framework) is the tool used Learning and Interference Framework) is the tool used for the objects classification.for the objects classification.

► Input to the SHOSLIF is a set of labeled patterns Input to the SHOSLIF is a set of labeled patterns X = {(xX = {(xnn w wnn): n= 1..N}, i.e. the training set, where “x): n= 1..N}, i.e. the training set, where “xnn” is ” is

a vector of dimensionality K representing the observed a vector of dimensionality K representing the observed sample and “wsample and “wnn” is a class associated with “x” is a class associated with “xnn”, chosen ”, chosen in a set of “C” classes. in a set of “C” classes.

► The SHOSLIF algorithm produces as output a tree whose The SHOSLIF algorithm produces as output a tree whose nodes contain decreasing set of samples, with root node nodes contain decreasing set of samples, with root node containing all samples in X. containing all samples in X.

SHOSLIFSHOSLIF► Uses the theories of optimal linear projection Uses the theories of optimal linear projection

to generate a space defined by the training to generate a space defined by the training images.images.

► This space is generated using two projections: This space is generated using two projections: Karhunen-Loeve projection to produce a set of Karhunen-Loeve projection to produce a set of

Most Expressive Features (MEFs)Most Expressive Features (MEFs) Subsequent discriminant analysis projection Subsequent discriminant analysis projection

to produce a set of Most Discriminating to produce a set of Most Discriminating Features (MDFs)Features (MDFs)

► System builds a network that tessellates System builds a network that tessellates these MEF/MDF spaces for recognizing objects these MEF/MDF spaces for recognizing objects from images.from images.

SHOSLIF (contd.)SHOSLIF (contd.)

Fig: Tree exampleFig: Tree example a) Sample Partitioning in the feature spacea) Sample Partitioning in the feature space b) Tree structureb) Tree structure

SHOSLIF (contd.)SHOSLIF (contd.)Most Expressive Features (MEF)Most Expressive Features (MEF)► Each input sub image is treated as a high-dimensional Each input sub image is treated as a high-dimensional

feature vector by concatenating the rows of the sub feature vector by concatenating the rows of the sub image.image.

► Perform Principal Component analysis on the set of Perform Principal Component analysis on the set of training images.training images.

► PCA analysis utilizes the eigen vectors of the sample PCA analysis utilizes the eigen vectors of the sample scatter matrix associated with the largest eigenvalues. scatter matrix associated with the largest eigenvalues. These vectors are in the direction of the major variations These vectors are in the direction of the major variations in the samples and as such can be used as a basis set with in the samples and as such can be used as a basis set with which to describe the image samples. Using these eigen which to describe the image samples. Using these eigen vectors the image can be reconstructed close to original. vectors the image can be reconstructed close to original.

► Since the features produced in this projection give Since the features produced in this projection give minimum square error for approximating an image and minimum square error for approximating an image and show good performance in image reconstruction its called show good performance in image reconstruction its called the Most Expressive features.the Most Expressive features.

SHOSLIF (contd.)SHOSLIF (contd.)Most Discriminating Features Most Discriminating Features

(MDF)(MDF)► The features produced by MEF are not good for The features produced by MEF are not good for

discriminating among classes defined by the discriminating among classes defined by the set of samples (fails when two same images set of samples (fails when two same images with different light intensity are present).with different light intensity are present).

► So on the features got from MEF, linear So on the features got from MEF, linear dicriminant analysis (LDA) is performed.dicriminant analysis (LDA) is performed.

► In LDA the between class scatter is maximized In LDA the between class scatter is maximized while minimizing the within-class scatter.while minimizing the within-class scatter.

► The features obtained from LDA optimally The features obtained from LDA optimally discriminate among the classes represented in discriminate among the classes represented in the training set. Due to this it is called the Most the training set. Due to this it is called the Most Discriminating Features.Discriminating Features.

SHOSLIF (contd.)SHOSLIF (contd.)Tree ConstructionTree Construction

► Each level of the tree has an expected radius r(l) of the Each level of the tree has an expected radius r(l) of the space it covers, where “l” is the level.space it covers, where “l” is the level.

► d(X,A) is the distance measure between node “N” with d(X,A) is the distance measure between node “N” with center vector “A” and a sample vector “X”.center vector “A” and a sample vector “X”.

► Root node contains all the images from the training set.Root node contains all the images from the training set.► Every node which contains more than a single training Every node which contains more than a single training

image computes a projection matrix “V” , which are image computes a projection matrix “V” , which are features obtained from projecting the features in to the features obtained from projecting the features in to the MEF space.MEF space.

► If the training samples contained in a node are drawn If the training samples contained in a node are drawn from multiple classes (indicated by the labels), then from multiple classes (indicated by the labels), then MEF vectors are used to compute a projection matrix MEF vectors are used to compute a projection matrix “W” which are the features obtained from projecting the “W” which are the features obtained from projecting the features (MEF) in to the MDF space. features (MEF) in to the MDF space.

SHOSLIF (contd.)SHOSLIF (contd.)Tree Construction (contd.)Tree Construction (contd.)

► If the training samples in a node are from a If the training samples in a node are from a single class we leave it the way it is.single class we leave it the way it is.

► Each node contains the feature vectors which are Each node contains the feature vectors which are within the radius covered by one of the children.within the radius covered by one of the children.

► If we want to add a training sample “X” to node If we want to add a training sample “X” to node “N” at level “l”, first check has been made “N” at level “l”, first check has been made whether the feature vectors of “X” is within the whether the feature vectors of “X” is within the radius covered by one of the children's of the radius covered by one of the children's of the node “N” , then “X” can be added as one of the node “N” , then “X” can be added as one of the descendants of that child, if the feature vectors descendants of that child, if the feature vectors of “X” is outside the radius then “X” is added as of “X” is outside the radius then “X” is added as a new child of “N”.a new child of “N”.

SHOSLIF (contd.)SHOSLIF (contd.)Image RetrievalImage Retrieval

General Flow of each SHOSLIF Processing ElementGeneral Flow of each SHOSLIF Processing Element

Object Classification (contd.)Object Classification (contd.)► The SHOSLIF setup is used to organize The SHOSLIF setup is used to organize

corners extracted by blobs associated corners extracted by blobs associated during a learning phase.during a learning phase.

► A training set “X” is represented by a set of A training set “X” is represented by a set of pairs (corners, class).pairs (corners, class).

►One problem is the dimension of the input One problem is the dimension of the input vectors are fixed in a SHOSLIF tree.vectors are fixed in a SHOSLIF tree.

► The feature selection is performed by The feature selection is performed by partitioning the corner set C(t) into “M” partitioning the corner set C(t) into “M” regions where “M” is the desired cardinality regions where “M” is the desired cardinality for the pattern “x” to be given to the for the pattern “x” to be given to the SHOSLIF.SHOSLIF.

Object Classification (contd.)Object Classification (contd.)

Corner partitioning process example:Corner partitioning process example: a) first division along x-axisa) first division along x-axis b) second division along y-axisb) second division along y-axis c) third division along x – axisc) third division along x – axis d) final areas with M = 16d) final areas with M = 16

25 2512

13

6 6

X*

X*

X*

(a) (b) (c) (d)

Object Classification (contd.)Object Classification (contd.)► The corner set is chosen by iteratively partitioning The corner set is chosen by iteratively partitioning

the blob into two areas, each characterized by the the blob into two areas, each characterized by the same number of corners.same number of corners.

► For each region the vector median corner in the For each region the vector median corner in the survived local population is chosen as survived local population is chosen as representative sample.representative sample.

► The next figure shows the survived corners as two The next figure shows the survived corners as two set of connected points:set of connected points:

a) external closed lines connect median corner a) external closed lines connect median corner points in outer regions.points in outer regions.

b) internal lines connect corner points in inner b) internal lines connect corner points in inner regions.regions.

Object Classification (contd.)Object Classification (contd.)

Examples of Survived CornersExamples of Survived Corners

Object Classification (contd.)Object Classification (contd.)► In this way the vector “xn” is In this way the vector “xn” is

computed for each sample and a class computed for each sample and a class label is associated with it, which is label is associated with it, which is given as input to the SHOSLIF tree.given as input to the SHOSLIF tree.

ResultsResults►Training set : 328 Samples distributed Training set : 328 Samples distributed

over the classes.over the classes.►Test set : 30 SamplesTest set : 30 Samples►Misdetection probability over the test Misdetection probability over the test

set was 15%set was 15%►Second test done by using histograms Second test done by using histograms

for objects identificationfor objects identification►Misdetection probability over the test Misdetection probability over the test

set was 8%set was 8%

Results (contd.)Results (contd.)

Example figure shows the probed figure in the left hand side Example figure shows the probed figure in the left hand side and the retreived figure in the right hand side.and the retreived figure in the right hand side.

ConclusionConclusion►A method for tracking and classifying A method for tracking and classifying

objects in a video-surveillance system has objects in a video-surveillance system has been presented.been presented.

►A corner based shape model is used for A corner based shape model is used for tracking and for recognizing an object.tracking and for recognizing an object.

►Classification performed by using SHOSLIF Classification performed by using SHOSLIF trees.trees.

►Computed misdetection probabilities Computed misdetection probabilities confirm the correctness of the proposed confirm the correctness of the proposed approach.approach.

ReferencesReferences1.1. A. Tesei, A. Teschioni, C.S. Regazzoni A. Tesei, A. Teschioni, C.S. Regazzoni

and G. Vernazza, “ Long Memory and G. Vernazza, “ Long Memory matching of interacting complex matching of interacting complex objects from real image sequences”objects from real image sequences”

2.2. F. Oberti and C.S. Regazzoni, “ Real-F. Oberti and C.S. Regazzoni, “ Real-Time Robust Detection of Moving Time Robust Detection of Moving Objects in Cluttered Scenes”Objects in Cluttered Scenes”

3.3. D.L. Swets and J. Weng, “Hierarchical D.L. Swets and J. Weng, “Hierarchical Discriminant Analysis fro Image Discriminant Analysis fro Image Retrieval”Retrieval”

cis – 750 advisor – longin jan latecki presented by – venugopal rajagopal

Documents