multimodal sensing enabled real-time intelligent wireless
TRANSCRIPT
Multimodal Sensing Enabled Real-time
Intelligent Wireless Camera Networks for
Secure Spaces
Development and implementation of consensus developmentand data fusion algorithms
Al-Khawarizmi Institute of Computer Science (KICS)University of Engineering and Technology Lahore
Contents
1 Report Summary 3
2 Distributive Data Fusion and Consensus Development 4
2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Data Fusion Entities and Consensus Development . . . . . . . . . . . . . . . . . . 4
2.2.1 Layer 1 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1.1 Local Fusion Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Layer2 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Optimal Data Fusion Entity . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Layer 1 Consensus Development Algorithms 6
3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 k-Nearest Neighbor Classifier KNN . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Naive Bayesian Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.3 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.4 Gaussian Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Effect of Audio Features Selection on Classification Problem . . . . . . . . . . . . 10
3.2.1 Temporal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1.1 Zero Crossing Rate . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1.2 Peaks Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1.3 Ratio of Peaks-Count to Zero-Crossings-Rate . . . . . . . . . . . 12
3.2.1.4 Short Time Signal Energy . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Comparison of Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . 13
4 Layer 2 Consensus Development Algorithms 14
4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Histogram of Oriented Gradients . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Deformable Parts Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Optimal Data Fusion (ODF) Unit Performance Evaluation 19
5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Test Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.2 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Test Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1
Contents 2
5.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3.2 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Test Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.2 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
A Appendix A 23
A.1 Peak Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
A.2 Feature Extraction Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
A.3 Compute Feature Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.4 Compute Features of All Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.5 Tracking and Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Report Summary
This report is aimed at consensus development and the data fusion algorithm being used at
proposed hierarchical model. The objective is to perform the system level consensus development
and fusion development. The visual and acoustic domain in this regard has bought into focused
for object classification, visual detection and visual tracking.
Chapter 2 highlights the proposed architecture with modular design for system level data fusion
and consensus development. It contains the design level explanation about the proposed model
for fusion and development. Chapter 3 sheds light on the acoustic consensus development in
context of data fusion at layer-1 in proposed architecture. Different data fusion techniques
have been explored to bring the accurate system level implementation for object detection and
classification. Chapter 4 describes the layer-2 consensus development which directly targets
the object detection and tracking on proposed system entity. Various approaches have been
included with results and implementation. In working on system level data fusion, the need for
suitable system platform and its performance] evaluation have been justified and compared by
performing testing discussed in Chapter 5. The draft also includes the Algorithmic details for
feature extraction, object classification, detection and tracking.
3
Distributive Data Fusion and Consen-
sus Development
2.1 Purpose
To implement the seamless tracking on this distributed network and to meet the processing
requirement it was required to develop the consensus based on fusion processed data. The
development is with upper bound of low cost and low processing embedded resource availability
in our distributive network. Therefore we have divided the resources into multiple stages and
connected each stage with the next one by following the designed network architecture. We are
targeted toward the object detection and tracking therefore the need is to place the robustness
in the consensus development by using these limited processing resources into same distributed
network. To tackle this we have further divided the network into multiple processing layers,
where each performs a dedicated task. Following are the addressable main fusion processing
components for the consensus development.
2.2 Data Fusion Entities and Consensus Development
The proposed distributed data fusion architecture in report 6 has been used for consensus
development and its implementation. The fusion model has been divided into modular approach
as shown in figure 2.1.
ODF
CDFODF
ODF
DFU Level 1
DFU Level 1
DFU Level 1
Local Fusion Unit
Local Fusion Unit
Local Fusion Unit
Local Fusion Unit
Local Fusion Unit
Local Fusion Unit
Figure 2.1: Fusion Architecture
The layered approach depicts the components involved leading toward the object tracking in
visual domain. The initial local fusion unit consists of acoustic sensor node which has been
categorized as a layer-1 fusion entity. The DFU level-1 is the next hierarchical step toward the
4
Distributive Data Fusion Architecture 5
detection and localization of the acoustic event. The fusion model carrying the object detection
and localization has been divided into two different layers, ODF and CDF where the event based
video detection is performed on ODF only and object tracking into single and multiple cameras
is possible by involving the CDF in parallel to ODF.
2.2.1 Layer 1 Entities
2.2.1.1 Local Fusion Unit
The local fusion unit with its components has been shown in figure 2.2. The algorithmic model
on consensus development for feature extraction and classification has been taken into account
on this layer.
S1
Sn
Data AcquisitionFeature
ExtractionClassification
Figure 2.2: Local Fusion Unit
Various techniques have been used for the detection and classification for the consensus and its
measurements. Techniques for this purpose has been included and discussed in later chapters
of this report.
2.3 Layer2 Entities
2.3.1 Optimal Data Fusion Entity
Object detection and tracking with the help of visual sensing modality has been handled over
the fusion layer named ODF and CDF as shown in figure 2.3.
Object Detection
ClassificationContinuous
Tracking
Figure 2.3: Optimal Data Fusion Unit
A complete modular diagram for layer-2 ODF has been shown in figure 2.3. This layer contains
the working model and the developed approaches for detection and tracking implementation
based on build consensus. The development for the consensus has been investigated for detection
using the techniques discussed in chapter later in this report.
Layer 1 Consensus Development Al-
gorithms
3.1 Purpose
This chapter discusses the classification of acoustic feature into gun-shot and non-gunshot cat-
egories using inference based methods, classification techniques and artificial intelligence. The
selection of audio feature vector has briefly been discussed in Report-6. Here we will show
the experimental results for optimal feature set for audio signal database consisting of gunshot
acoustic signals and non-gun-shot signals such as street noise and bird chirping.
Next are discussed the classification methods to declare an audio signature as gunshot in noisy
environment. The feasibility of these methods with constraint embedded platform is briefly
discussed. The extensive real time evaluation of these methods in real world (noisy) settings
will be the subject of future report.
The classification methods that we have considered for our embedded setup include k-nearest
neighbor classification, decision tree classifier, naive Bayes approach, decision tree classifier and
Gaussian Markov model.
3.1.1 k-Nearest Neighbor Classifier KNN
Nearest neighbor classification can be explained by the analogy of learning by example. It
compares a to-be-classified test tuple with a training tuple. The training tuple is the feature
vector in our case consisting of acoustic features vector:
x = (zcr, ste, spbw, spro, ber6, ber7, cep, sprc, sprf)T
where
zcr: Zero Crossing Rate
ste: Short Time Energy
spbw: Spectral bandwidth
spro: Spectral roll-off
ber6: Band energy ratio of the 6th subband
ber7: Band energy ratio of the 7th subband
6
Chapter 4. Data Fusion Algorithms 7
cep: Cepstral coefficients
sprc: Spectral Centroid
sprf: Spectral flux
Attributes of the training tuple are the optimal features selected for the audio classification.
Each tuple represents a point in the n-dimensional space, with n being the number of features.
All of the training tuples are stored in an n-dimensional pattern space. When an unknown tuple
is given for classification, a k-nearest neighbor (k-NN) classifier searches the pattern space for
the k training tuples which are closest to the unknown tuple. These k training tuples are the
k-nearest neighbors of the unknown tuple.
Closeness is defined in terms of a distance metric, such as Euclidean distance. The Euclidean
distance between two points or tuples X1 = (x11, x12, , x1n) and X2 = (x21, x22, , x2n) can be
obtained from following Equation.
dist(X1, X2) = 2
√√√√ n∑i=1
(x1i − x2i)2
The basic steps of the k-NN algorithm are;
• Compute the distances between the new sample and all previous samples that have already
been classified into clusters
• Sort the distances in increasing order and select the k samples with the smallest distance
values
• Apply the voting principle. A new sample will be added (classified) to the largest cluster
out of k selected samples.
Figure 3.1: k-NN classification Example. The test tuple (green circle) is to be classified eitherto the class of blue squares or to the class of red triangles. If k = 3 (solid circle) it is assignedto the red triangle class because there are 2 triangles and only 1 square inside the inner circle.
If k = 5 (dashed circle) it is assigned to the blue square class.
The k-nearest neighbor approach is a non-parametric classification method as it works with our
the knowledge of underlying probability distributions of the member features. Its performance
Chapter 4. Data Fusion Algorithms 8
is tuned by choosing the appropriate k (number of considered nearest neighbors). The choice for
k usually depends on the specific application. Several heuristics are applied to select a suitable
k. The major distance metrics are typically Minkowski distance (Lm norm) and Mahalanobis
distance. In contrast to the Minkowski (Euclidean) distance, the Mahalanobis distance addi-
tionally computes the inverse covariance matrix for each class as weight matrix. Therefore, the
computational complexity of the Mahalanobis distance is higher than with Minkowski distances.
Therefore, the Minkowski distance such as the Euclidean distance is preferred for embedded and
real-time processing. The k-nearest neighbor classification cannot be divided into a training and
classification phases.
Therefore, a major drawback of this algorithm concerning embedded and real-time constraints
is that it cannot be effectively applied to large data sets. Classifying each sample data requires
the complete training data set. If the set of data is large, many distances have to be calculated
and hence, in general it is not feasible to apply k-nearest neighbor to embedded real-time fusion.
Initially we plan to detect the gun sounds from a limited set of possible guns. Because our sensor
nodes will be deployed outdoors mainly, the reverberation disturbances are minimized. In this
controlled experiment the Minkowski distance based classifier can be used effectively.
Another distance-based classifier, the Mahalanobis distance classifier is more popular for em-
bedded real-time classification. The advantage over the k-nearest neighbor approach is that the
Mahalanobis distance classifier separates the training and classification task. Thus, an on-line
implementation of these algorithms is feasible. It reduces memory requirements during the
training of the statistics.
3.1.2 Naive Bayesian Classifier
Bayesian classifiers are statistical classifiers. They work by predicting the class membership
probabilities. Nave Bayes (NB) probabilistic classifiers based on Bayes theorem with strong
(naive) independence assumption between different features. The basic idea in NB approaches
is to use the joint probabilities of feature set in an audio signal of some category. Afterwards
the audio signal category is estimated from the feature probability distribution.
The nave part of NB methods is the assumption of feature independence, i.e. the conditional
probability of a feature given an audio signal class is assumed to be independent from the
conditional probabilities of other features given that same class. Sue to this assumption the
computation of NB classifiers is far more efficient than the exponential complexity of non-
nave Bayes approaches as it does not use feature combinations as predictors. The technique is
particularly attractive for embedded implementation due to computing power constraints.
With an independent feature values xj of the feature vector x, the conditional probability of x
given a class ci is the result of multiplying the probabilities of each feature xj given the class ci.
This is the the product of likelihood functions of the class i. The joint conditional probability
of x given class ci is:
Chapter 4. Data Fusion Algorithms 9
p(x|ci) =∏j
p(xj |ci)
To classify the feature vector x the posterior class probabilities are computed as
p(ci|x) ∝ p(ci)∏j
p(xj |ci)
Finally the vector x is classified using maximum-a-posteriori (MAP) estimate of the class label.
class estimate = cj = D(x) = arg maxi p(ci|x) = arg maxi p(ci)∏j
p(xj |ci)
The computational complexity is low as it is just a multiplication or summation of density
function values (summation in case the logarithms of the probabilities are considered).
3.1.3 Decision Tree
A decision tree (DT) is a flowchart-like tree structure, where each internal node denotes a test
on a feature, each branch represents an outcome of the feature bounds test, and each leaf holds
a class label. The topmost node in a tree is the root node. During tree construction, attribute
selection measures are used to select the attribute that best partitions the tuples into distinct
classes. When decision trees are built, many of the branches may reflect noise or outliers in the
training data. Tree pruning (cutting) attempts to identify and remove these branches so that
classification accuracy on unseen data is improved.
After the decision tree has been developed from training data, the classification of a given feature
vector moves from root node to the leaf (classifier) by following boolean expressions. Therefore
this approach is simple to implement in embedded hardware.
3.1.4 Gaussian Markov Model
For each sound class, the statistical behavior of the features (Probability Density Functions,
pdf) can be modeled with a mixture of Gaussians. This model is characterized by the number
of Gaussians, their relative weights, and their mean / covariance parameters. During a training
process, the system learns the GMM parameters, by analyzing a subset of the sound database.
To find the best model for each class of sounds, the likelihood is maximized using 20 iterations
of the Expectation Maximization (EM) algorithm. In the recognition process, the signal to be
classified is compared to the models of each class to find the most probable one.
Chapter 4. Data Fusion Algorithms 10
3.2 Effect of Audio Features Selection on Classification Problem
The threshold based gunshot detection , though effective in calm inside scenarios, has found to
be impractical in case of the noisy environment such as busy roads. We observed that the node
falsely identifies the gunshots for sounds such as bus horn, engine noise and loud bird singing
near the sensor node. To resolve such false positive alarms one of the approaches is to use the
fact that gun shot noise by its vary nature is not local. Rather it can be heard upto hundreds of
meters in day times and upto several miles on silent nights. On the other hand other disturbing
noises that cross the noise floor threshold are inherently local. For example engine noise is
highly local sonic activity with area of influence within few meters. Therefore if a particular
even of interest is detected at the same time at several nodes distributed at distances upto a
few hundred meters, gunshot sound can be distinguished from rival noise events. This is the
spatial classification and does not require conventional classification methods.
The spatial filtering is one of the earliest techniques to classify gunshots in urban environment.
However this particular solution is possible only if the acoustic sensor node density is above the
bare minimum so as to ensure that multiple sensors are able to register a gunshot. Another
problem belongs to the time after event has been detected, that is the nodes need to share the
sound signatures with all the network so that a consensus can be reached to declare that it
was the same source that was being recorded on other nodes as well. This however poses other
problems for sensor nodes with limited power, limited RAM memory and narrow communication
bandwidth. To optimize the node power consumption, bandwidth utilization and memory
requirements, a number of algorithms are proposed that work by maintaining minimum spanning
tree of the network at each node to efficiently communicate data and audio signatures. This
also poses problems on network flexibility because changing environmental conditions and node
movement disturbs the optimum routing tables.
To resolve the problem of network data flooding and efficient gunshot classification it is needed
only that audio features are selected that best represent the gunshot audio class. The number
of possible features should also be the minimum possible for best classification. After extensive
analysis of about 36 audio features we have selected the following best in terms of computational
and power constraints on embedded sensor nodes.
3.2.1 Temporal Features
The temporal features are easiest to calculate in embedded platforms. The low end micro-
controllers with just summing and simple arithmetic operation can easily compute the time
features. The most important time features are Zero Crossing Rate, Short Time Energy and
Peaks Count. We experimented with a new data set consisting of the ratio of ZCR count to
Peaks Count with better classification results than using the two parameters independently.
Chapter 4. Data Fusion Algorithms 11
3.2.1.1 Zero Crossing Rate
Figure 3.2: Zero Crossing Rate for 5000 sample window as function of time.
Zero Crossing Rate (ZCR) is one of the most used tempral measure in audio feature collection.
It is the measure of how often a signal changes direction across the mean signal value. The
scheme is particularly attractive for embedded implementation because it is more feasible than
the spectrum (FFT) based techniques. The zero crossing rate detector work by counting the
number of signal transitions in a time window of fixed size. It essentially provides information
about the most dominant frequency present in the signal. The ZCR counting is starts at the
moment a thud sound or threshold crossing sound is captured by the sensor node. From here
on the signal recording is started for local classification purposes as well. The signal is low pass
filtered using weighted average filter and the number of times output changes its sign is counted.
The ZCR count after 500 samples is observed. It is noted that the number of zero crossings
is high for outdoor ambient noise near busy roads. The ambient noise signal often crosses the
threshold and has been a major cause of false alarms in field tests for sensor nodes. Similarly
when there is fast wind blow, microphones capture noise that has large amplitude as well as
large ZCR count. With zero crossing detectors these ambient noises can easily be rejected at
root node of the decision tree for gunshot classification. Normal gunshot ZCR mean is much
lower than ambient ZCR mean.
Figure. 3.2 shows the ZCR count as function of time (sample number) for various signals. The
top left signal is recorded in the NWN Lab, simulating high wind environment by placing sensor
node near the fan. The next five windows show the ZCR graphs for gunshots in indoor and
outdoor environment. It is observed that the gunshot signal has maximum signature in first
1000 samples. The bottom three windows are other stray signals. The bottom left and bottom
right are the bird songs while the middle one is the city traffic near a busy road. The maximum
ZCR value for the three signals remains below 100 for all three signals. It can be observed from
Chapter 4. Data Fusion Algorithms 12
the gunshot graphs that ZCR lies in the middle range from 150 to 600 when gunshot sound is
active.
3.2.1.2 Peaks Count
Figure 3.3: Peaks Count in 5000 samples frame of audio files test pool. In embedded platformit is real time implemented using a counter update when audio sample amplitude is higher than
the previous sample and the next sample.
3.2.1.3 Ratio of Peaks-Count to Zero-Crossings-Rate
Figure 3.4: Ratio of Peaks-Counts to Zero-Crossings-Rate per 5000 samples of audio
3.2.1.4 Short Time Signal Energy
The results for Short Time Signal Energy has been show in figure 3.5.
Chapter 4. Data Fusion Algorithms 13
Figure 3.5: Short Time Signal Energy (STE) as function of time
3.2.2 Comparison of Acoustic Features
In order to compare the effectiveness of acoustic features for classification between gunshot-Vs-
All of the rest signals. There were 11 gunshot sounds recorded from weapons of different caliber
and recorded in multitude of environment. For noise signal we used eight media files containing
a large number of possible sounds that can interfere with gunshots. These include thunderstorm,
clap sounds, white noise, night time sounds of cricket and other insects and various other kinds
of loud noise. The parameter was observed over a window of 50 milliseconds, sliding after 50
millisecond time. Next the histogram of all the windows is captured. The standard deviation
of the acoustic parameter is plotted.
Figure 3.6: Acoustic feature comparison for gunshot and ambient noise signals. The parameterstandard deviation among windows of 50mSec duration is taken as the quantity of interest.
Layer 2 Consensus Development Al-
gorithms
4.1 Purpose
Object detection and object tracking is possible when there is an initial event generated by
DFU. This event is taken into account as initial information for performing fusion for visual
detection. The corresponding frame is required to be processed and fusion has been performed
for initial detection. To meet the required object the need of accurate detection is possible
when there is minimum computation and suitable fusion mechanism is designed. Therefore the
purpose of suitable algorithmic model is required to be implemented and consensus development
is required to be done for suitable detection done by system. Object tracking is one of the desired
process to be executed on layer-2 modality. This frame by frame object tracking is possible by
implemented algorithmic model but the complexity arises when there is need of accuracy in
continuous tracking. Therefore the involvement of various techniques has been introduced to
build the consensus which helps in performing the controlled way of object tracking.
4.2 Object Detection
4.2.1 Histogram of Oriented Gradients
Human detection is done in video frame by using the classification methodology has been
adopted. The open computer vision (OpenCV) library has been used for this purpose of classi-
fication. The method used in [3] and [1] is HOG descriptor with SVM based model integrated
for multiple detection libraries.
Normalization GradientWeighted
voting
Contrast Normallization
HOG Collection at
detectionLinear SVM
Person /no-Person
Input Image
Figure 4.1: Human Detection Using HOG
The figure 4.1 is the approach that has been adopted for finding and classifying the humans
in any given video frame. The method of detecting human detection involves the normalized
14
Chapter 4. Data Fusion Algorithms 15
histograms of image gradient. The object characterized is done by finding the gradient direction
over small pixels or cells. The normalized block is referred as Histogram of Oriented Gradient.
The human detection is done by tiling the detection window using combined feature vector into
a conventional SVM.
For classification purpose the set of images has been used as positive training examples. The
variations in images has also been included with right and left reflections. Same way a set of
negative images has also been used to train the classifier for negative samples into image or
frame. The method is then iteratively used until it comes up with a final detector.
4.2.2 Deformable Parts Model
Deformable models [2] provide an elegant framework for object detection and recognition and
is considered state-of-the-art in efficient algorithms for matching models to images. Deformable
Part is a discriminatively trained, multistate model for image training that aim at making
possible the effective use of more latent information such as hierarchical (grammar) models and
models involving latent three dimensional pose. The deformable model includes both a coarse
global template covering an entire object and higher resolution part templates. The templates
represent histogram of gradient features discussed above. Fig.4.3 illustrates a placement of such
Figure 4.2: Deformable Part Model
a model in a HOG pyramid. The root filter location defines the detection window (the pixels
inside the cells covered by the filter). The part filters are placed several levels down in the
pyramid, so the HOG cells at that level have half the size of cells in the root filter level. The
score of a placement is given by the scores of each filter (the data term) plus a score of the
placement of each part relative to the root (the spatial term),
n∑i=0
Fi.φ(H, pi) +n∑
i=1
ai.(xi, yi) + bi.(xi2, yi
2) (4.1)
Where Fi is the w × h× 9× 4 weight vector φ(H, pi)are the features in a w × h subwindow of
a HOG pyramid. (xi, yi) = ((xi, yi) − 2(x, y) + vi)/si gives the location of the ith part relative
Chapter 4. Data Fusion Algorithms 16
Figure 4.3: Pyramids of Deformable Part Model
to the root location. ai and bi are two dimensional vectors coefficients for measuring a score for
each possible placement of the ith part.
4.3 Object Tracking
4.3.1 Kalman Filter
During the prediction phase, we use what we know to figure out where we expect the system to
be before we attempt to integrate a new measurement. In practice, the prediction phase is done
immediately aft er a new measurement is made, but before the new measurement is incorporated
into our estimation of the state of the system. An example of this might be when we measure
the position of a car at time t, then again at time t + dt. If the car has some velocity v, then we
do not just incorporate the second measurement directly. We first fast-forward our model based
on what we knew at time t so that we have a model not only of the system at time t but also of
the system at time t + dt, the instant before the new information is incorporated. In this way,
the new information, acquired at time t + dt, is fused not with the old model of the system,
but with the old model of the system projected forward to time t + dt. Th is is the meaning
of the cycle depicted in Figure 10-18. In the context of Kalman filters, there are three kinds of
motion that we would like to consider. The first is dynamical motion. Th is is motion that we
expect as a direct result of the state of the system when last we measured it. If we measured the
system to be at position x with some velocity v at time t, then at time t + dt we would expect
the system to be located at position x + v dt, possibly still with velocity. The second form
of motion is called control motion. Control motion is motion that we expect because of some
external influence applied to the system of which, for whatever reason, we happen to be aware.
As the name implies, the most common example of control motion is when we are estimating
the state of a system that we ourselves have some control over, and we know what we did to
bring about the motion. This is particularly the case for robotic systems where the control is
Chapter 4. Data Fusion Algorithms 17
the system telling the robot to (for example) accelerate or go forward. Clearly, in this case, if
the robot was at x and moving with velocity v at time t, then at time t + dt we expect it to
have moved not only to x + v dt (as it would have done without the control), but also a little
farther, since we did tell it to accelerate. Th e final important class of motion is random motion.
Even in our simple one-dimensional example, if whatever we were looking at had a possibility
of moving on its own for whatever reason, we would want to include random motion in our
prediction step. Th e effect of such random motion will be to simply increase the variance of
our state estimate with the passage of time. Random motion includes any motions that are not
known or under our control. As with everything else in the Kalman filter framework, however,
there is an assumption that this random motion is either Gaussian (i.e., a kind of random walk)
or that it can at least be modeled effectively as Gaussian. Thus, to include dynamics in our
simulation model, we would first do an update step before including a new measurement. Th
is update step would include first applying any knowledge we have about the motion of the
object according to its prior state, applying any additional information resulting from actions
that we ourselves have taken or that we know to have been taken on the system from another
outside agent, and, finally, incorporating our notion of random events that might have changed
the state of the system since we last measured it. Once those factors have been applied, we can
then incorporate our next new measurement. In practice, the dynamical motion is particularly
important when the state of the system is more complex than our simulation model. Oft en
when an object is moving, there are multiple components to the state such as the position as
well as the velocity. In this case, of course, the state evolves according to the velocity that we
believe it to have. Handling systems with multiple components to the state is the topic of the
next section. We will develop a little more sophisticated notation as well to handle these new
aspects of the situation. consider a particular realistic situation of taking measurements on a
car driving in a parking lot. We might imagine that the state of the car could be summarized
by two position variables, x and y, and two velocities, vk and vy. Th ese four variables would
be the elements of the state vector xk. Th is suggests that the correct form for F is:
xk =
x
y
vx
vy
, F =
1 0 dt 0
0 1 0 dt
0 0 1 0
0 0 0 1
(4.2)
However, when using a camera to make measurements of the cars state, we probably measure
only the position variables:
zk =
[zx
zy
]k
(4.3)
Th is implies that the structure of H is something like:
zk =
1 0
0 1
0 0
0 0
(4.4)
Chapter 4. Data Fusion Algorithms 18
In this case, we might not really believe that the velocity of the car is constant and so would
assign a value of Qk to reflect this. We would choose Rk based on our estimate of how accurately
we have measured the car’s position using (for example) our image analysis techniques on a
video stream. All that remains now is to plug these expressions into the generalized forms of
the update equations. Th e basic idea is the same, however. First we compute the a priori
estimate x−k of the state. It is relatively common (though not universal) in the literature to
use the superscript minus sign to mean at the time immediately prior to the new measurement;
we’ll adopt that convention here as well. This a priori estimate is given by:
x−k = Fxk−1 +Buk−1 + wk (4.5)
Using P−k to denote the error covariance, the a priori estimate for this covariance at time k is
obtained from the value at time k 1 by:
P−k = FPk−1F
T +Qk−1 (4.6)
equation forms the basis of the predictive part of the estimator, and it tells us what we expect
based on what we’ve already seen. From here we’ll state (without derivation) what is often
called the Kalman gain or the blending factor,which tells us how to weight new information
against what we think we already know:
Kk = P−k H
Tk (HkP
−k H
Tk +Rk)−1 (4.7)
The figure below shows the result of different object tracking algorithms carried out in a uni-
versity environment. The object has been marked with two colored boxes. The red box is the
kalman filter tracking result, while the blue one is the result of optical flow tracking.
Figure 4.4: Object Tracking results
Optimal Data Fusion (ODF) Unit Per-
formance Evaluation
Real time object detection and tracking is challenging because of its computational and real
timeliness involvement in our distributed model. The ODF is responsible of handling the object
detection and tracking in visual domain. The initial data fusion is done at ODF entity for the
object detection based on acoustic event generated by DFU Level-1. The post event detection
is then performed into the video frame using this processing entity chosen for ODF. The in-
formation fusion is then performed for object tracking, classification and detection involving
the limited available resources of processing entity involved in data fusion. The question arises
when the accurate object detection and continuous object tracking over some fix resolution is
required to be performed covering the need of real timeliness and getting a suitable frame rate.
5.1 Purpose
In order to achieve the objective of Optimal Data Fusion unit ODF processing entity finaliza-
tion, performance evaluation criteria have been followed filled with multiple testing procedures
adopted to measure the quality and computation power provided by the selected processing
entity at ODF. In this regard we have arranged an evaluation criteria where the measurements
has been performed for various platforms described in table 5.1. A set of experimentation has
been performed to analyze the real timeliness and computational cost for the selected ODF
entity.
Table 5.1: My caption
Test Case 1 Test Case 2 Test Case 3
Processing Board Beagle Board xM Beagle Bone Black Odroid U3RAM 256 MB 256MB 2GBProcessor 1GHz ARM Cortex A8 Sitara,1GHz 1.7GHz Quad-Core processorCost $ 250 $ 70 $ 80
5.2 Test Case 1
The evaluation was done for embedded Linux Based platform with specifications mentioned
in table 5.1. The output generated from DFU has been provided as input to ODF with USB
interfacing. The functionality of FOV mapping has been performed with the help of initial
19
Chapter 4. Optimal Data Fusion (ODF) Unit Performance Evaluation 20
information taken from DFU which was then taken into account to trigger and initial input into
video frame captured at ODF.
This test has been performed for Beagle Board xM with specifications described in table 5.1. The
camera interfacing was done at USB interface and input was adjusted with a video resolution
of 320 x 240.
Figure 5.1: Beagle Board xM
5.2.1 Object Detection
• The RAM utilization was 73% of the toal available 235MB
• CPU utilization was observed at 75% of total available.
• Total delay in object detection for FOV was more than 4 second approximately.
5.2.2 Object Tracking
• The RAM utilization was 70% of the toal available 235MB
• CPU utilization was observed at 100 % of total available.
• The achievable frame rate was 0.25 to 1 Frame Per second.
5.3 Test Case 2
This test has been performed for Beagle Bone Black with specifications described in table 5.1.
The camera interfacing was done at USB interface and input was adjusted with a video resolution
of 320 x 240.
Chapter 4. Optimal Data Fusion (ODF) Unit Performance Evaluation 21
Figure 5.2: Beagle Bone Black
5.3.1 Object Detection
• The RAM utilization was 69% of the toal available 235MB
• CPU utilization was observed at 75% of total available.
• Total delay in object detection for FOV was 4 second approximately.
5.3.2 Object Tracking
• The RAM utilization was 80 % of the toal available 235MB
• CPU utilization was observed at 100 % of total available.
• The achievable frame rate was 1 Frame Per second.
5.4 Test Case 3
This test has been performed for Odroid-U3 with specifications described in table 5.1. The
camera interfacing was done at USB interface and input was adjusted with a video resolution
of 640 x 480.
Figure 5.3: Odroid-U3
5.4.1 Object Detection
• The RAM utilization was 30 % of the toal available 1.7GB.
Chapter 4. Optimal Data Fusion (ODF) Unit Performance Evaluation 22
• CPU utilization was observed at 35% of total available at dedicated single core.
• Total delay in object detection was 1Sec approximately.
5.4.2 Object Tracking
• The RAM utilization was 40 % of the toal available 1.7GB
• CPU utilization varied between 75 % to 79 % of total available.
• The achievable frame rate was 6 Frame Per second.
• Tracking was performed with video resolution of 640x480.
5.5 Conclusions
The test cases were performed for the purpose of finalizing the processing module such that it
may cover the need of performing object detection with minimum delay and object tracking
with highest possible frame rate and quality. The results gathered shows the performance and
utilization gathered for each of the processing module. The captured result displayed that the
object detection and tracking with minimum delay, highest quality and better frame rate was
achieved using the low cost processor named Odroid-U3. Therefore the decision was taken
to replace the initial proposed processor (Beagle xM) with Odroid-U3 to meet the objective
optimally.
Appendix A
A.1 Peak Counts
1 f unc t i on main
2 c l e a r a l l ;
3 path=’ J :\ abubakar\Pro j e c t Mul t inoda l Hammad\wav gunshotsounds\ ’ ; %Direc to ry path
here
4
5 gun1=’ gunshot handgun f i r ing range . wav ’ ;
6 gun2=’ g u n s h o t r i f l e e x t e r i o r 0 0 4 . wav ’ ;
7 gun3=’ g u n s h o t r i f l e e x t e r i o r 0 0 6 . wav ’ ;
8 gun4=’ gunshot x3 handgun on f i r ing range . wav ’ ;
9 gun5=’ h a nd g un 2 2 c a l i b e r s i n g l e s h o t i n t e r i o r s h o o t i n g r a n g e p i s t o l . wav ’ ;
10 gun6=’ handgun 40 c a l i b e r s i n g l e s h o t d i s t an t 5 0 f e e t away sp r i n g f i e l d xdm40 . wav ’ ;
11 bird1=’ b i r d ch i r p i n g . wav ’ ;
12 bird2=’ b i r d ch i r p i n g2 . wav ’ ;
13 c i t y 1=’
c i t y o r t own s t r e e t amb i en c e p ed e s t r i a n s wa l k i n g w i t h s ome t r a f f i c n o i s e i n ba ckg r ound
. wav ’ ;
14 lab1 = ’ HelloHelloHelloOK STM .wav ’ ;
15
16 a = [1 −3 2 0 1 −7 −6 6 3 ] ’ ;
17
18 [ T1 Fs1 ] = audioread ( [ path lab1 ] ) ; x = getPeaks2 (T1) ;
19 subplot (331) ; p l o t ( x ) ;
20 t i t l e ( ’NWN Lab Hel lo He l lo . wav ’ ) ;
21
22
23 [ T1 Fs1 ] = audioread ( [ path gun2 ] ) ;T1 = T1 ( : , 2 ) ;
24 x = getPeaks2 (T1) ;
25 subplot (332) ; p l o t ( x ) ;
26 t i t l e ( ’ gunshot r i f l e e x t e r i o r 004 .wav ’ ) ;
27
28 [ T1 Fs1 ] = audioread ( [ path gun3 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
29 subplot (333) ; p l o t ( x ) ;
30 t i t l e ( ’ gunshot r i f l e e x t e r i o r 006 .wav ’ ) ;
31
32 [ T1 Fs1 ] = audioread ( [ path gun4 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
33 subplot (334) ; p l o t ( x ) ;
34 t i t l e ( ’ gunshot x3 handgun on f i r i n g range . wav ’ ) ;
35
36 [ T1 Fs1 ] = audioread ( [ path gun5 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
37 subplot (335) ; p l o t ( x ) ;
38 t i t l e ( ’ handgun 22 c a l i b e r s i n g l e shot i n t e r i o r . wav ’ ) ;
39
23
Appendix A. Appendix A 24
40 [ T1 Fs1 ] = audioread ( [ path gun6 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
41 subplot (336) ; p l o t ( x ) ;
42 t i t l e ( ’ handgun 40 c a l i b e r s i n g l e shot 50 f e e t . wav ’ ) ;
43
44 [ T1 Fs1 ] = audioread ( [ path bi rd2 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
45 subplot (337) ; p l o t ( x ) ;
46 t i t l e ( ’ b i rd ch i rp ing2 . wav ’ ) ;
47
48 [ T1 Fs1 ] = audioread ( [ path c i t y1 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
49 subplot (338) ; p l o t ( x ) ;
50 t i t l e ( ’ c i t y pede s t r i an s walking with t r a f f i c no i s e . wav ’ ) ;
51
52 [ T1 Fs1 ] = audioread ( [ path bi rd1 ] ) ;T1 = T1 ( : , 2 ) ; x = getPeaks2 (T1) ;
53 subplot (339) ; p l o t ( x ) ;
54 t i t l e ( ’ b i rd ch i rp ing . wav ’ ) ;
55
56 end
57
58 f unc t i on y = getPeaks2 (T1)
59 WindLen = 5000 ;
60 l t 1 = length (T1) ;
61 np = [ ] ;
62 f o r i = 1 : l t 1 − WindLen−163
64 a = T1( i : i+WindLen) ;
65 a1 = [ 0 ; 0 ; a ] ;
66 a2 = [ 0 ; a ; 0 ] ;
67 a3 = [ a ; 0 ; 0 ] ;
68 numPeaks = sum(a2>a1 & a2>a3 ) ;
69
70 np = [ np ; numPeaks ] ;
71 end
72 y = np ;
73 end
74
75 f unc t i on y = getPeaks (T1)
76 WindLen = 5000 ;
77 l t 1=length (T1) ;
78 zcrA = [ ] ;
79 f o r i = 1 : l t 1 − WindLen−180 NumZeroCross = numel ( f indpeaks (T1( i : i+WindLen) ) ) ;
81 zcrA = [ zcrA ; NumZeroCross ] ;
82 end
83 y=zcrA ;
84 end
A.2 Feature Extraction Functions
1 f unc t i on y = peak2zcrRat io (T1 ,WindLen)
Appendix A. Appendix A 25
2 x1 = getPeaks2 (T1 ,WindLen) ;
3 x2 = getZcr2 (T1 ,WindLen) ;
4 a = length ( x1 ) ;
5 b = length ( x2 ) ;
6 i f ( a < b) y = x1 . / x2 ( 1 : a ) ;
7 e l s e y = x1 ( 1 : b) . / x2 ;
8 end
9 y = 1 ./ y ;
10 end
11
12
13 f unc t i on y = getPeaks2 (T1 ,WindLen)
14 % WindLen = 500 ;
15 l t 1 = length (T1) ;
16 np = [ ] ;
17 f o r i = 1 : l t 1 − WindLen−118
19 a = T1( i : i+WindLen) ;
20 a1 = [ 0 ; 0 ; a ] ;
21 a2 = [ 0 ; a ; 0 ] ;
22 a3 = [ a ; 0 ; 0 ] ;
23 numPeaks = sum(a2>a1 & a2>a3 ) ;
24
25 np = [ np ; numPeaks ] ;
26 end
27 y = np ;
28 end
29
30 f unc t i on y = getZcr2 (T1 ,WindLen)
31 % WindLen = 500 ;
32 l t 1=length (T1) ;
33 zcrA = [ ] ;
34 f o r i = 1 : l t 1 − WindLen−135 NumZeroCross = vectorZcr (T1( i : i+WindLen) ) ;
36 zcrA = [ zcrA ; NumZeroCross ] ;
37 end
38 y=zcrA ;
39 end
40
41
42 f unc t i on y = vectorZcr (T1)
43 % T1= T1 / max(T1) ;
44 % minPeak = 0 . 0 1 ; % Minimum s i g n a l l e v e l the r sho ld
45 % to cons id e r f o r zero c r o s s i n g s
46 % T1 = T1 . ∗ ( abs (T1)> minPeak ) ;
47 a = [ 0 ; T1 ] ;
48 b = [T1 ; 0 ] ;
49 c=a .∗b ;
50 y = sum( c<0) ;
51 end
Appendix A. Appendix A 26
A.3 Compute Feature Statistics
1 f unc t i on FF = comput eA l l S t a t i s t i c s ( f i leName , win , s tep )
2
3 % This func t i on computes the average and std va lue s f o r the f o l l ow i ng audio
4 % f e a t u r e s :
5 % − energy entropy
6 % − shor t time energy
7 % − s p e c t r a l r o l l o f f
8 % − s p e c t r a l c en t r o id
9 % − s p e c t r a l f l u x
10 %
11 % ARGUMENTS:
12 % fi leName : the name o f the . wav f i l e in which the s i g n a l i s s to r ed
13 % win : the p ro c e s s i ng window ( in seconds )
14 % step : the p ro c e s s i ng step ( in seconds )
15 %
16 % RETURN VALUE:
17 % F: a 12x1 array conta in ing the 12 f e a tu r e s t a t i s t i c s
18 %
19
20 [ x , f s ] = wavread ( f i leName ) ;
21
22 EE = Energy Entropy Block (x , win∗ f s , s t ep ∗ f s , 10) ;
23 E = ShortTimeEnergy (x , win∗ f s , s t ep ∗ f s ) ;24 Z = zcr (x , win∗ f s , s t ep ∗ f s , f s ) ;
25 R = Spec t r a lRo l lO f f (x , win∗ f s , s t ep ∗ f s , 0 . 80 , f s ) ;
26 C = Spect ra lCent ro id (x , win∗ f s , s t ep ∗ f s , f s ) ;
27 F = Spect ra lF lux (x , win∗ f s , s t ep ∗ f s , f s ) ;
28
29 FF(1) = s t a t i s t i c (EE, 1 , l ength (EE) , ’ s td ’ ) ;
30 FF(2) = s t a t i s t i c (Z , 1 , l ength (Z) , ’ stdbymean ’ ) ;
31 FF(3) = s t a t i s t i c (R, 1 , l ength (R) , ’ s td ’ ) ;
32 FF(4) = s t a t i s t i c (C, 1 , l ength (C) , ’ s td ’ ) ;
33 FF(5) = s t a t i s t i c (F , 1 , l ength (F) , ’ s td ’ ) ;
34 FF(6) = s t a t i s t i c (E, 1 , l ength (E) , ’ stdbymean ’ ) ;
35
36 % X=1: l ength (EE) ;
37 % plo t (X,EE/max(EE) , ’ r ’ ,X,E/max(E) , ’ b ’ ,X, Z/max(Z) ,X,R/max(R) ,X,C/max(C) ,X,F/max(F
) )
38 % plo t (X,EE, ’ r ’ ,X,E, ’ b ’ ,X, Z ,X,R,X,C,X,F)
39 % legend ( ’ Energy Entropy Block ’ , ’ ShortTimeEnergy ’ , ’ zcr ’ , ’ Spec t ra lRo l lO f f ’ , ’
Spect ra lCentro id ’ , ’ Spectra lFlux ’ )
40 %
A.4 Compute Features of All Classes
1 f unc t i on main
2
Appendix A. Appendix A 27
3 classNames=({ ’ J :\AbuBakar\Pro j e c t Mul t inoda l Hammad\wav gunshotsounds\audioFeatureExtract ion \GunShot\ ’ , ’ J :\AbuBakar\Pro j e c t Mul t inoda l Hammad\wav gunshotsounds\ audioFeatureExtract ion \Noise \ ’ }) ;
4
5
6 % func t i on Features = computeFeaturesDirectory ( classNames )
7 %
8 % This func t i on computes the audio f e a t u r e s (6−D vecto r ) f o r each . wav
9 % f i l e in a l l d i r e c t o r i e s ( g iven in classNames )
10 % classNames=({ ’GunShot ’ , ’ Noise ’ } ) ;11
12 c l o s e a l l
13 c l c ;
14 f p r i n t f ( ’
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−\n ’ )
15 f p r i n t f ( ’ Real Time Microphone and Camera a c q u i s i t i o n and audio−video p ro c e s s i ng .\n\n ’ ) ;
16 f p r i n t f ( ’ Theodoros Giannakopoulos\n ’ ) ;
17 f p r i n t f ( ’ http ://www. d i . uoa . gr /˜ tyiannak \n ’ ) ;
18 f p r i n t f ( ’Dep . o f In f o rmat i c s and Telecommunications ,\n ’ ) ;
19 f p r i n t f ( ’ Un ive r s i ty o f Athens , Greece\n ’ ) ;
20 f p r i n t f ( ’
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−\n ’ )
21
22
23 Dim = 6 ;
24
25 win = 0 . 0 5 0 ; s tep = 0 . 0 5 0 ;
26 win = 0 . 2 0 ; s tep = 0 . 0 5 0 ;
27 % win = 0 . 2 0 ; s tep = 0 . 0 2 0 ;
28
29
30 FeaturesNames = { ’ Std Energy Entropy ’ , ’ Std/mean ZCR ’ , ’ Std Ro l l o f f ’ , ’ Std Spec t r a l
Centroid ’ , ’ Std Spec t r a l Flux ’ , ’ Std/mean Energy ’ } ;31
32
33 %
34 % STEP A: Feature Ca l cu l a t i on :
35 %
36
37
38 f o r ( c=1: l ength ( classNames ) ) % f o r each c l a s s ( and f o r r e s p e c t i v e d i r e c t o r y ) :
39 f p r i n t f ( ’ Computing f e a t u r e s f o r c l a s s %s . . . \ n ’ , classNames{c }) ;40 D = d i r ( [ classNames{c} ’ //∗ .wav ’ ] )
41
42 tempF = ze ro s ( l ength (D) ,Dim) ;
43 f o r ( i =1: l ength (D) ) % f o r each . wav f i l e in the cur rent d i r e c t o r y :
44 % compute s t a t i s t i c s (6−D array )
45 F = comput eA l l S t a t i s t i c s ( [ classNames{c} ’ // ’ D( i ) . name ] , win , s tep ) ;
46 [ c lassNames{c} ’ // ’ D( i ) . name ]
47 % sto r e s t a t i s t i c s in the cur rent row :
Appendix A. Appendix A 28
48 tempF( i , : ) = F ’ ;
49 end
50 % keep a d i f f e r e n t c e l l e lement f o r each f e a tu r e matrix :
51 Features {c} = tempF
52 end
53 Features {c}54 %
55 % STEP B:
56 % ca l c u l a t e and p lo t histograms :
57 %
58
59 Colors = [0 0 0 ;
60 0 0 1 ;
61 0 1 0 ;
62 0 1 1 ;
63 1 0 0 ;
64 1 0 1 ;
65 1 1 0 ;
66 0 0 .25 1 ;
67 0 .25 0 1 ;
68 0 1 0 . 2 5 ;
69 0 .25 1 0 ] ;
70
71 f i g u r e ;
72 f o r ( f =1:Dim)
73 subplot (3 , 2 , f ) ;
74 hold on ;
75 f o r ( c=1: l ength ( classNames ) )
76 tempF = Features {c } ( : , f ) ;77 [H,X] = h i s t ( tempF , l ength ( tempF) ) ;
78 p = p lo t (X,H, ’ .− ’ ) ;
79 s e t (p , ’ Color ’ , Colors ( c , : ) ) ;
80
81 % get the ’ others ’ :
82 tempFOthers = [ ] ;
83 f o r ( cc=1: l ength ( classNames ) )
84 i f ( cc˜=c )
85 tempFOthers = [ tempFOthers ; Features { cc } ( : , f ) ] ;86 end
87 end
88 [ E1 , E2 ] = computeHistError ( tempF , tempFOthers ) ;
89 Errors ( f , c ) = 100 ∗ (E1+E2) / 2 ;
90 hM( c ) = max(H) ;
91 end
92 [ EMin ,MMin] = min ( Errors ( f , : ) ) ;
93 [EMax,MMax] = max( Errors ( f , : ) ) ;
94 EMean = mean( Errors ( f , : ) ) ;
95 s t r = [ ’ l egend ( ’ ’ ’ ’ ’ ’GunShot ’ ’ ’ ’ ’ ] ;
96 f o r ( c=2: l ength ( classNames ) )
97 s t r = [ s t r ’ , ’ ’ ’ ’ ’ ’ A l l Other Noise ’ ’ ’ ’ ’ ] ;
98 end
Appendix A. Appendix A 29
99 s t r = [ s t r ’ ) ; ’ ] ;
100 eva l ( s t r ) ;
101 t ex t (0 ,max(hM) ∗0 .80 , FeaturesNames{ f }) ;102 end
A.5 Tracking and Kalman Filter
1 #inc lude <opencv2/opencv . hpp>
2 #inc lude <iostream>
3 #inc lude <s t d i o . h>
4 #inc lude <math . h>
5 #inc lude <s t d l i b . h>
6
7
8 us ing namespace cv ;
9 us ing namespace std ;
10
11
12 i n t main ( i n t argc , char ∗ argv [ ] ) {13
14 VideoCapture capture ;
15 capture . open (0 ) ;
16 namedWindow( ”Output” , CVWINDOWAUTOSIZE) ;
17 Mat frame ;
18 HOGDescriptor hog ;
19 hog . setSVMDetector (HOGDescriptor : : ge tDe fau l tPeop leDetec tor ( ) ) ;
20
21 s i z e t i , j ;
22
23 i f ( ! capture . isOpened ( ) ) // I n i t camera
24 {25 cout << ” capture dev i c e f a i l e d to open ! ” << endl ;
26 re turn 1 ;
27 }28
29 capture . s e t (CV CAP PROP FRAMEWIDTH,320 ) ;
30 capture . s e t (CV CAP PROP FRAME HEIGHT,240 ) ;
31
32 whi le (1 )
33 {34 capture >> frame ;
35
36 vector<Rect> found , f o u n d f i l t e r e d ;
37 hog . de t e c tMu l t iS ca l e ( frame , found , 0 , S i z e (8 , 8 ) , S i z e (32 ,32) , 1 . 05 , 2) ;
38
39
40 f o r ( i =0; i<found . s i z e ( ) ; i++)
41 {42 Rect r = found [ i ] ;
References 30
43 f o r ( j =0; j<found . s i z e ( ) ; j++)
44 i f ( j != i && ( r & found [ j ] )==r )
45 break ;
46 i f ( j==found . s i z e ( ) )
47 f o u n d f i l t e r e d . push back ( r ) ;
48 }49 f o r ( i =0; i<f o u n d f i l t e r e d . s i z e ( ) ; i++)
50 {51 Rect r = f o u n d f i l t e r e d [ i ] ;
52 r . x += cvRound ( r . width ∗0 . 1 ) ;53 r . width = cvRound ( r . width ∗0 . 8 ) ;54 r . y += cvRound ( r . he ight ∗0 .06 ) ;55 r . he ight = cvRound ( r . he ight ∗0 . 9 ) ;56 r e c t ang l e ( frame , r . t l ( ) , r . br ( ) , cv : : S ca l a r (0 , 255 ,0 ) , 2) ;
57 }58 imshow ( ”Output” , frame ) ;
59 i f ( cvWaitKey (10) == ’q ’ )
60 re turn 0 ;
61 }62
63 }
Bibliography
[1] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Con-
ference on, volume 1, pages 886–893. IEEE, 2005.
[2] Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained,
multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008.
CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
[3] SD Lin, Y Liu, and Y Jhu. A robust image descriptor for human detection based on hog
and webers law. International Journal of Innovative Computing, Information and Control,
9(10):3887–3901, 2013.
31