prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/1593/1/1263s.pdf · final defense committee...
TRANSCRIPT
Computation EliminationAlgorithms for Correlation Based
Fast Template Matching
By
Arif Mahmood
Dissertation
Presented to the
Department of Computer Science,
School of Science and Engineering
Lahore University of Management Sciences
In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Submission Date: 28 January 2011
c© Copyright 2011
ByArif Mahmood
i
Final Defense Committee Members
Dr. Sohaib Khan,Associate Professor, Department of Computer Science, LUMS
Dr. Arif ZamanProfessor Department of Computer Science, LUMS
Dr. Nadeem Ahmad KhanAssociate Professor, Department of Computer Science, LUMS
Dr. Mian Muhammad AwaisAssociate Professor, Department of Computer Science, LUMS
Dr. Ashfaq KhokharProfessor University of Illinois at Chicago
Dr. Shahab Munir BaqaiAssociate Professor, Department of Computer Science, LUMS
The Acknowledgements
I gratefully acknowledge the contributions and assistance of my teachers, friends andfamily members who enabled me to complete a good quality PhD. I wish to conveymy gratitude to all of them.
I am especially thankful to my supervisor, Dr. Sohaib A Khan, whose encouragement,guidance and support was essential to complete this thesis.
I would also like to thank Dr. Mansoor Sarwar for his generous support and kindness,which made my life much easier.
I am thankful to Dr. Shahab Munir Baqai for managing my thesis review processand thesis defense. I convey my gratitude to the Final Defense Committee (FDC)members including Dr. Mian Muhammad Awais, Dr. Nadeem Khan and Dr. ArifZaman. I am also obliged by the support of my external FDC member, Dr. AshfaqKhokhar and my reviewers, Dr. Nasir Rajpoot, Dr. Mubarak Shah. Thank you somuch to the FDC members and the reviewers for sparing time and making sincereefforts to improve the quality of my PhD thesis.
I also wish to convey my gratitude to Dr. Javed Saim for strengthening my beliefsfor success and for providing me emotional reinforcement.
I am also thankful to Dr. Murtaza Taj for his efforts in improving the quality of myPhD thesis defense presentation. I am thankful to my fellow students, Ijaz Akhtar,Aamer Zaheer and others for their support during my PhD program. I am especiallythankful to Mr. Numan Sheikh for sharing latex files which I have used to write thisthesis.
In the end I would like to acknowledge the sacrifices done by my family and by myparents. I am thankful to my wife for managing the home and the children educationwhile I was busy in my studies.
Despite all of my hardwork and efforts and help by a number of people, I am fullyconvinced that successful completion of such a good quality PhD is a blessing andfavor of God.
My Lord! Grant me the power and ability that I may be grateful for your favorswhich you have bestowed on me and on my parents, and that I may do righteousgood deeds that will please you, and admit me by your mercy among your righteousslaves.
Arif Mahmood
Computation EliminationAlgorithms for Correlation Based
Fast Template Matching
Arif Mahmood
Department of Computer Science,
School of Science and Engineering
Ph.D. Dissertation, Submission Date: 28 January 2011
ABSTRACT
Template matching is frequently used in Digital Image Processing, Machine Vision,Remote Sensing and Pattern Recognition, and a large number of template matchingalgorithms have been proposed in literature. The performance of these algorithmsmay be evaluated from the perspective of accuracy as well as computational complex-ity. Algorithm designers face a tradeoff between these two desirable characteristics;often, fast algorithms lack robustness and robust algorithms are computationally ex-pensive.
The basic problem we have addressed in this thesis is to develop fast as well as robusttemplate matching algorithms. From the accuracy perspective, we choose correlationcoefficient to be the match measure because it is robust to linear intensity varia-tions often encountered in practical problems. To ensure computational efficiency,we choose bound based computation elimination approaches because they allow highspeed up without compromising accuracy. Most existing elimination algorithms arebased on simple match metrics such as Sum of Squared Differences and Sum of Ab-solute Differences. For correlation coefficient, which is a more robust match measure,very limited efforts have been done to develop efficient elimination schemes.
The main contribution of this thesis is the development of two different categoriesof bound based computation elimination algorithms for correlation coefficient basedfast template matching. We have named the algorithms in the first category asTransitive Elimination Algorithms (Mahmood and Khan, 2007b, 2008, 2010), becausethese are based on transitive bounds on correlation coefficient. In these algorithms,before computing correlation coefficient, we compute bounds from neighboring search
2
locations based on transitivity. The locations where upper bounds are less than thecurrent known maximum are skipped from computations, as they can never becomethe best match location. As the percentage of skipped search locations increases,the template matching process becomes faster. Empirically, we have demonstratedspeedups of up to an order of magnitude compared to existing fast algorithms withoutcompromising accuracy. The overall speedup depends on the tightness of transitivebounds, which in turn is dependent on the strength of autocorrelation between nearbylocations.
Although high autocorrelation, required for efficiency of transitive algorithms, ispresent in many template matching applications, it may not be guaranteed in gen-eral. We have developed a second category of bound based computation eliminationalgorithms, which are more generic and do not require specific image statistics, suchas high autocorrelation. We have named this category as Partial Correlation Elimina-tion algorithms (Mahmood and Khan, 2007a, 2011). These algorithms are based on amonotonic formulation of correlation coefficient. In this formulation, at a particularsearch location, correlation coefficient monotonically decreases as consecutive pixelsare processed. As soon as the value of partial correlation becomes less than the cur-rent known maximum, the remaining computations are skipped. If a high magnitudemaximum is found at the start of the search process, the amount of skipped compu-tations significantly increases, resulting in high speed up of the template matchingprocess. In order to locate a high maximum at the start of search process, we havedeveloped novel initialization schemes which are effective for small and medium sizedtemplates. For commonly used template sizes, we have demonstrated that PCE al-gorithms out-perform existing algorithms by a significant margin.
Beyond the main contribution of developing elimination algorithms for correlation,two extensions of the basic theme of this thesis have also been explored. The firstdirection is to extend elimination schemes for object detection. To this end, wehave shown that the detection phase of an AdaBoost based edge corner detector(Mahmood, 2007; Mahmood and Khan, 2009) can be significantly speeded up byadapting elimination strategies to this problem. In the second direction we provethat in video encoders, if motion estimation is done by maximization of correlationcoefficient and motion compensation is done by first order linear estimation, the vari-ance of the residue signal will always be less than the existing motion compensationschemes (Mahmood et al., 2007). This result may potentially be used to increasecompression of video signal as compared to the current techniques. The fast corre-lation strategies, proposed in this thesis, may be coupled with this result to developcorrelation-based video encoders, having low computational cost.
Contents
1 Introduction 9
1.1 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.1 Transitive Bounds on Correlation Based Image Match Measures 19
1.1.2 Transitive Elimination Algorithms . . . . . . . . . . . . . . . . 20
1.1.3 Basic Mode Partial Correlation Elimination Algorithm . . . . 23
1.1.4 Extended Mode Partial Correlation Elimination Algorithm . . 26
1.1.5 Elimination Algorithms for Fast Object Detection . . . . . . . 27
1.1.6 Video Coding with Linear Compensation . . . . . . . . . . . . 29
1.2 Organization of Rest of the Thesis . . . . . . . . . . . . . . . . . . . . 30
2 A Review of the Commonly Used Image Match Measures 32
2.1 City Block Distance Measure . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Euclidian Distance Measure . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Minkowski Distance Measure . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Angular Distance Measure . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.1 Relationship between Standardized Angular Distance and Stan-
dardized Euclidean Distance . . . . . . . . . . . . . . . . . . . 41
2.5 Correlation Based Similarity Measures . . . . . . . . . . . . . . . . . 42
2.5.1 Relationship between Correlation and Angular Distance Mea-
sure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.2 Relationship between Correlation and Euclidean Distance Mea-
sure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3
4
2.5.3 Correlation Coefficient as a Measure of Strength of Linear Re-
lationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Correlation Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6.1 Derivation of Correlation Ratio Formulation from Functional
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6.2 Relationship between Correlation Ratio and Correlation Coef-
ficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.7 Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . . 60
2.7.1 Relationship between mutual information and Correlation Co-
efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 Computational Aspects of Commonly Used Image Match Measures 67
3.1 Fast Approximate Image Matching Techniques . . . . . . . . . . . . . 70
3.1.1 Search Space Approximation Techniques . . . . . . . . . . . . 70
3.1.2 Algorithms Using Approximate Image Representations . . . . 76
3.2 Fast Exhaustive Accuracy Image Matching in Frequency Domain . . 78
3.2.1 Fast Fourier Transform (FFT) Algorithms . . . . . . . . . . . 79
3.2.2 Image Matching by Correlation Theorem . . . . . . . . . . . . 81
3.2.3 Image Matching by Phase Only Correlation . . . . . . . . . . 84
3.3 Fast Exhaustive Spatial Domain Techniques . . . . . . . . . . . . . . 87
3.3.1 Efficient Rearrangement of Match Measure Formulation . . . . 87
3.3.2 Integral Image Approach . . . . . . . . . . . . . . . . . . . . . 89
3.3.3 Running Sum Approach . . . . . . . . . . . . . . . . . . . . . 90
3.4 Bound Based Computation Elimination Algorithms . . . . . . . . . . 93
5
3.4.1 Successive Similarity Detection Algorithms . . . . . . . . . . . 97
3.4.2 Partial Correlation Elimination Algorithms . . . . . . . . . . . 104
3.4.3 Successive Elimination Algorithms . . . . . . . . . . . . . . . 104
3.4.4 Enhanced Bounded Partial Correlation Elimination Algorithm 107
3.4.5 Transitive Elimination Algorithms . . . . . . . . . . . . . . . . 111
3.4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . 112
4 Transitive Bounds on the Correlation Based Measures 115
4.1 Derivation of Angular Distance Based Transitive Bounds . . . . . . . 117
4.2 Derivation of Euclidean Distance Based Transitive Bounds . . . . . . 121
4.3 Visualization of Transitive Bounds on Correlation . . . . . . . . . . . 125
4.3.1 Visualization of Angular Distance Based Transitive Bounds . . 125
4.3.2 Visualization of Euclidean Distance Based Bounds . . . . . . . 127
4.4 Tightness of Euclidean and Angular Distance Based Transitive Bounds 131
4.4.1 Comparison of Upper Transitive Bounds . . . . . . . . . . . . 133
4.4.2 Comparison of Lower Transitive Bounds . . . . . . . . . . . . 134
4.5 Tightness Analysis of Angular Distance Based Transitive Bounds . . 137
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5 Transitive Elimination Algorithms for Correlation Based Measures141
5.1 Exploiting Strong Intra-Reference Autocorrelation . . . . . . . . . . . 143
5.2 Exploiting Strong Inter-Reference Auto-Correlation . . . . . . . . . . 150
5.3 Exploiting Strong Inter-Template Auto-Correlation . . . . . . . . . . 151
5.4 Experiments with Transitive Elimination Algorithms . . . . . . . . . 154
6
5.4.1 Experiments with Intra-Reference Auto-correlation . . . . . . 159
5.4.2 Experiments with Inter-Reference Auto-correlation: Fast Fea-
ture Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.4.3 Experiments with Inter-Reference Auto-correlation: Fast Com-
ponent Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4.4 Experiments with Inter-Template Auto-correlation: Video Geo-
registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4.5 Experiments with Inter-Template Auto-correlation: Rotation /
Scale Invariant Template Matching . . . . . . . . . . . . . . . 173
5.4.6 Performance Comparison of Different Correlation Based Measures175
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6 Partial Correlation Elimination Algorithms 177
6.1 Monotonic Formulation of Correlation Coefficient . . . . . . . . . . . 181
6.2 Basic Mode Partial Correlation Elimination Algorithm . . . . . . . . 182
6.3 Two-Stage Basic Mode PCE Algorithm . . . . . . . . . . . . . . . . . 183
6.4 Overheads of Basic Mode PCE Algorithm . . . . . . . . . . . . . . . 185
6.5 Experiments with Basic Mode PCE Algorithms . . . . . . . . . . . . 187
6.5.1 Block Motion Estimation Experiments Using Basic Mode PCE 188
6.5.2 Feature Matching Experiments Using Basic Mode PCE Algorithm190
6.5.3 Feature Tracking Experiments Using Two-stage Basic Mode
PCE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7 Extended Mode Partial Correlation Elimination Algorithms 201
7.1 Extended Mode PCE Algorithm . . . . . . . . . . . . . . . . . . . . 203
7
7.2 PCE Mode Selection and Finding Efficient Testing Scheme . . . . . . 206
7.3 Initialization Schemes for Extended Mode PCE Algorithm . . . . . . 212
7.3.1 Extended Mode Multi-Stage PCE Algorithm . . . . . . . . . . 213
7.3.2 Initialization of Extended Mode PCE with Coarse-to-Fine Scheme
215
7.4 Experiments with Extended Mode PCE algorithm . . . . . . . . . . . 217
7.4.1 Feature Tracking with Extended Mode Two-stage PCE Algorithm217
7.4.2 Template Matching with Extended Mode Two-Stage PCE Al-
gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.4.3 Coarse-to-Fine Initialization of Extended Mode PCE Algorithm224
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8 Computation Elimination Algorithms for AdaBoost Based Detec-
tors 232
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3 AdaBoost Global Threshold Based Early Termination Algorithm . . . 236
8.4 Early Non-Maxima Suppression Algorithm . . . . . . . . . . . . . . . 238
8.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9 Use of Correlation Coefficient for Video Encoding 243
9.1 Block Based Motion Compensation in Video Encoders . . . . . . . . 244
9.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8
9.3 Maximization of Gain Guaranteed by Maximization of Correlation Co-
efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.4 Video Coding with Linear Compensation (VCLC) . . . . . . . . . . . 249
9.4.1 Motion Compensation using Linear Estimator . . . . . . . . . 250
9.4.2 Motion Estimation with Correlation Coefficient . . . . . . . . 252
9.5 Video Coding With Linear Compensation: System Overview . . . . . 252
9.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
10 Conclusions and Future Directions 259
10.1 Transitive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.2 PCE Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.3 Limitations of TEA and PCE Algorithms . . . . . . . . . . . . . . . . 262
10.4 Elimination Algorithms for Object Detection . . . . . . . . . . . . . . 262
10.5 Using Correlation Coefficient in Video Coding . . . . . . . . . . . . . 262
10.6 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Chapter 1
INTRODUCTION
Image comparison is a fundamental operation in visual information processing sys-
tems. In daily activities, we frequently compare newly observed images with those
which we have already observed and stored in our memories. The way humans per-
ceive images is quite different from the way computers process the digital image data.
Human image perception is based on patterns as a whole and the combined effect of
colors, while computers can only see images in the form of large arrays of numbers.
Therefore, the bulk of image comparisons done on computers, are mostly based on
pixel by pixel matching, and some image match measure is computed for establishing
the proximity between the two images. Image matching has often been used for the
purpose of image alignment and registration in numerous machine vision applications.
Image matching is frequently used in the areas of Image Processing, Machine Vision,
Remote Sensing, Pattern Recognition and Digital Signal Processing. Typical appli-
cations of image matching include image registration and alignment, object detection
and identification, content based image retrieval, image and video compression, object
tracking, and computing 3D structure. In all of these applications, similarity compu-
tation is an important part of the overall problem, often termed template matching.
Typically, a small template image is compared against multiple windows in a larger
reference image to evaluate an image match measure. The window which yields the
best similarity score is selected as the match location. Although typical template
matching applications may require finding a particular target in a larger image, tem-
plate matching has often been used in many other scenarios, for example, finding
point correspondences between multiple images needed to estimate the fundamental
matrix in stereo problem, or correspondence matching to compute geometric trans-
formations between two images. In video encoding, block motion estimation may also
be considered as a form of template matching.
9
10
Image match measures may be broadly divided into two categories: geometric mea-
sures computed through pixel intensity values and information theoretic measures
based on some image statistic. Some basic geometric measures include city block dis-
tance and Euclidean distance. These distance measures are defined by considering an
image as a point in a high dimensional space. Larger the distance between two points
in that space, more dissimilar the corresponding images will be. Correlation based
measures are also geometric measures, and these are defined by considering an image
as a vector in a high dimensional space. Correlation between two images is the inner
product of two image vectors. Higher correlation means higher matching between the
corresponding images. Geometric measures are generally used for the matching of
uni-modal images acquired by an optical still camera or digital video camera. These
measures are also frequently used for matching remotely sensed satellite images. In
the second category of image match measures, known as information theoretic mea-
sures, entropy and mutual information are the most commonly used image match
measures. Entropy and mutual information are computed from the image histograms
and are used for matching multi-modal medical images, for example, for matching of
MRI and PETS images. The main focus of this thesis is fast computation of some of
the geometric measures, while information theoretic measures will only be discussed
for the purpose of completeness.
In image matching applications, if the search for the best-match location is done
exhaustively over the entire search space, the image matching process turns out to
be computationally expensive. Therefore, template matching applications often em-
phasize reduction of the image matching cost by using approximate algorithms which
either approximate the search space with a smaller one, or approximate image and the
match measure by their simpler versions. A large volume of research about approx-
imate image matching schemes may be found in literature with high speedups and
varying levels of accuracy. Since approximate schemes cannot guarantee the global
maximum to be found, therefore such schemes are not suited for mission critical ap-
plications which may require high accuracy along with low computational complexity.
Bound based computation elimination schemes are a viable option for mission critical
applications, because these schemes guarantee the same accuracy as that of exhaustive
template matching schemes along with high speedups. In computation elimination
11
algorithms, instead of performing actual match measure computations, an alternate
bounding statistic is computed. Actual computations may be skipped partially or
even completely as a result of comparison of the bounding statistic with the partial
computation result or with a previous known result. Elimination algorithms offer-
ing the opportunity of skipping all of the direct computations at a search location,
may be named as complete elimination algorithms, while if only a part of the direct
computations may be skipped, the algorithms may be named as partial elimination
algorithms. If a partial elimination algorithm has been used for image matching,
then computations at a particular search location may be prematurely terminated as
soon as it is determined that the current location cannot compete an already known
best-match location. The decision to skip remaining computations is based on the
comparison of the current known maximum and the bounding statistic. In case of
complete elimination algorithms, this comparison is performed before starting the
actual match measure computations and based on the comparison result, complete
computations may be skipped.
A variety of elimination schemes have been developed for simple image match mea-
sures including SAD and SSD. Complete elimination algorithms developed for these
measures include Successive Elimination Algorithms (SEA) and its variants by Li
and Salari (1995) and Wang and Mersereau (1999). Triangular inequality based
techniques have been proposed by Kawanishi et al. (2004) and Brunig and Niehsen
(2001). Partial elimination algorithms, also known as Partial Distortion Elimination
(PDE) and Successive Similarity Detection Algorithms (SSAD) have been proposed
by Barnea and Silverman (1972); Montrucchio and Quaglia (2005); Cheung and Po
(2003); Hel-Or and Hel-Or (2005). In each case, by skipping computations, elimi-
nation algorithms reduce the computational complexity while guaranteeing that the
result of the best-match location will not be compromised.
Simple image match measures, for example SAD and SSD are not robust to image
brightness and contrast variations which occur in many practical situations. Correla-
tion coefficient is a more robust similarity measure, which denotes the strength of the
linear relationship between two image blocks and therefore it is invariant to the linear
intensity distortions. Correlation coefficient is preferred if significant intensity distor-
tions are present across the images to be matched, as indicated by several researchers,
12
for example, see (Brown, 1992; Ziltova and Flusser, 2003; Svedlow et al., 1976, 1978;
Leese et al., 1971; Dew and Holmlund, 2000; Pratt, 2007; Chalermwat, 1999). Thus in
most of the applications in which image matching is a challenging problem, correlation
coefficient has been used as a preferred similarity measure. Examples include change
detection (Townshend et al., 1992; N. Bryant, 2003; Dierking and Skriver, 2002),
motion estimation (Foster, 2005), glacier surface movement detection (Evans, 2000;
Scambos et al., 1992; Strozzi et al., 2002), ice motion estimation (Ninnis et al., 1986;
Emery et al., 1991), earthquake damage assessment (Avouac et al., 2006; Puymbroeck
et al., 2000), flood erosion and sedimentation analysis (Crowell et al., 2003), cloud
motion vector estimation (Mukherjee and Acton, 2002; Wu, 1995a; Leese et al., 1971;
Dew and Holmlund, 2000; Eumetsat, 1998, 2000; Feind and Welch, 1995), sea cur-
rents tracking from sea surface temperature images (Wu and Pairman, 1995; Emery
et al., 1986; Garcia and Robinson, 1989b; Kamachi, 1989; Wu et al., 1992; Bowen
et al., 2002; Emery et al., 2004), ocean wave propagation direction estimation from
split look images (Ouchi et al., 1999; Vachon and Raney, 1991; Vachon and West,
1992), sea surface velocities estimation from sequential satellite images (Pope and
Emery, 1994; Tokmakian et al., 1990; Garcia and Robinson, 1989a), satellite image
mosaics (Du et al., 2001; Coorg and Teller, 2000; Shum and Szeliski, 2000), image
and video geo-registration (Sheikh et al., 2004; Sheikh and Shah, 2004; Shah and
Kumar, 2003a; Irani and Anandan, 1998), crop identification in remotely sensed im-
ages (Svedlow et al., 1976), automatic image navigation (Emery et al., 2003), feature
matching under variable viewing conditions (Vincent and Lagani‘ere, 2001; Jin et al.,
2001), automatic control point extraction (Kim and Im, 2003; Oller et al., 2003; Caves
et al., 1992), underground target detection in cross bore hole configuration (Ellis and
Peden, 1997), topographic data estimation from SAR (Li and Goldstein, 1990), rect-
angular building extraction from SAR airborne systems (Simonetto et al., 2005), and
linear infrastructure mapping using airborne video imagery (Dare and Fraser, 2000).
Note that this is only a partial list of the applications using correlation coefficient as
the preferred similarity measure.
Although correlation coefficient is a more robust match measure as compared to SAD
and SSD and has also been extensively used in numerous image matching applica-
tions, however it has been criticized for its high computational complexity (Brown,
13
1992; Ziltova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a;
Chalermwat, 1999). Traditionally, correlation coefficient implementations are based
on Fast Fourier Transform (FFT) and significant efforts have been made to reduce
the time complexity of FFT, for example, see (Frigo and Johnson, 1998; Frigo, 1999;
Frigo and Johnson, 2005). However, as the template size reduces the computational
advantage of frequency domain over the spatial domain decreases and for small tem-
plate sizes, spatial domain implementations become faster. Another scenario in which
FFT based implementations may not be efficient, is finding point correspondences be-
tween two images. Each feature from one image has to be correlated at only a few
locations in the second image, often selected by a corner detection algorithm. This
may be easily done in the spatial domain while in the frequency domain, complete
computations at all search locations have to be performed. Frequency domain im-
plementations have also been criticized from some other perspectives as well (Barnea
and Silverman, 1972). For example, in most template matching applications, only
the final best match location is of interest, computations done at all remaining search
locations are redundant. While this redundancy can be exploited in the spatial do-
main, computation elimination scheme may not be devised to reduce this redundancy
in the frequency domain. FFT based implementations of correlation may also be crit-
icized for not incorporating the guess about value or location of the maximum which
is available in many template matching applications, for computation reduction pur-
pose.
Partial elimination algorithms as applied to SAD and SSD cannot be extended in
a straight forward manner for correlation coefficient based image matching. This is
because, the growth of the value of correlation coefficient, as corresponding pixels
of the two images are processed, is non monotonic, while the values of SAD and
SSD increase monotonically ensuring the final value of SAD or SSD to be equal or
larger than the intermediate values. Since the best match location for SAD or SSD is
defined as the minimum over the entire search space, the remaining computations may
be eliminated as soon as the current value of distortion exceeds the previous known
minimum. In contrast, due to non-monotonic growth pattern of correlation coefficient,
any intermediate value may not be guaranteed to be larger or smaller than the final
value of correlation coefficient. Secondly, in case of correlation coefficient, the best
14
match location over the entire search space is often defined as the location exhibiting
maximum value of correlation coefficient. Therefore a previously known maximum
may not be exploited to discard the remaining computations at an intermediate stage.
Hence, partial elimination algorithms have been broadly considered inapplicable to
correlation coefficient based template matching as indicated by Brown (1992), Ziltova
and Flusser (2003), Pratt (2007), Barnea and Silverman (1972), Wu (1995a).
Complete elimination algorithms as applied to SAD and SSD, also may not be ex-
tended for correlation coefficient based template matching. This is because, complete
elimination algorithms require tight upper bound on correlation coefficient, which
should also be computable at low computational cost otherwise the benefit of compu-
tation elimination may get eroded by the overhead cost of the bound computation.
No such bound currently exists for correlation coefficient. The well known bound
upon correlation is based on Cauchy Schwartz inequality, which is too loose to yield
any computation elimination. Therefore, due to absence of tight upper bounds upon
correlation, only very limited efforts may be found in literature for the development of
complete elimination algorithms. These efforts include the category of algorithms pro-
posed by Mattoccia et al. (2008b). These algorithms try to tighten Cauchy Schwartz
inequality based bound by using different schemes. The bound proposed by Mattoccia
et al. (2008b) is tight enough to yield elimination, but requires large number of square
root operations which has high computational complexity causing significant bound
computation overhead cost. The algorithm proposed by Mattoccia et al. (2008b) will
be discussed in more detail in Chapter 3.
The main contribution of this thesis is, we have extended the idea of computation
elimination algorithms beyond simple image match measures like SAD and SSD; we
have extended these algorithms for correlation coefficient based fast template match-
ing. To this end, we have developed two different categories of elimination algorithms.
We named these categories as ‘Transitive Elimination Algorithms’ and ‘Partial Cor-
relation Elimination’ algorithms. To obtain high speed up, Transitive elimination
algorithms (Mahmood and Khan, 2007b, 2008, 2010) exploit auto-correlation present
in the nearby search locations, by using the transitivity property of correlation. There-
fore, the speed up performance of these algorithms strongly depends on the magnitude
of auto-correlation present in the image matching systems. Transitive algorithms have
15
exhaustive equivalent accuracy and have been found to be faster than the existing
fast algorithms by an order of magnitude.
Although high autocorrelation, required for speed up of Transitive algorithms is
present in many image matching systems, it may not be guaranteed in general. Par-
tial Correlation Elimination algorithms may be efficiently used in such situations,
because these algorithms do not require high autocorrelation. However, we find that,
in image matching systems exhibiting strong autocorrelation, Transitive algorithms
are more efficient.
Partial Correlation Elimination (PCE) algorithms (Mahmood and Khan, 2007a, 2011)
are based on monotonic formulations of correlation coefficient, and allow to partially
eliminate computations at each search location. If computed by the monotonic for-
mulations, correlation coefficient monotonically decreases from +1 towards -1 as con-
secutive pixels are processed. At a particular search location, as soon as the current
value of partial correlation becomes less than previous known maximum, remaining
computations become redundant and may be skipped without any loss of accuracy.
Two main categories of Partial Correlation Elimination algorithms are ‘Basic Mode
PCE’ and ‘Extended Mode PCE’. Basic Mode PCE is more efficient for small sized
templates, while Extended Mode is more efficient for medium and large sized tem-
plates. For small and medium sized templates, initialization using two-stage template
matching approach has been found effective, while for the larger sized templates, ini-
tialization using coarse-to-fine scheme has been found faster. PCE algorithms have
been compared with existing fast exhaustive techniques, including FFTW3 based fre-
quency domain implementation and (Mattoccia et al., 2008b) techniques. For the
commonly used template sizes, partial correlation elimination algorithms have out-
performed all existing algorithms by a significant margin.
Other than these contributions, two additional research directions have also been
explored. First research direction is the use of elimination strategies to speed up
applications other than fast image match measure computations. To this end compu-
tation elimination algorithms have been developed to speed up the detection phase of
AdaBoost based edge corner detector (Mahmood, 2007; Mahmood and Khan, 2009).
In the second research direction, we have explored correlation coefficient based block
16
motion estimation and motion compensation by first order linear estimation in video
encoders. We show that if block motion estimation is performed by maximization of
correlation coefficient and motion compensation is done by linear parameter estima-
tion, then the ratio of variance of the original signal to the variance of the residue
signal, often known as gain, will be maximum resulting in minimization of entropy of
the residue signal (Mahmood et al., 2007). No such guarantee has been provided in
traditionally used schemes using minimization of SAD as motion estimation criteria
and simple difference based motion compensation. The proposed encoding scheme has
also been verified by experimentation and comparison with the traditional encoding
scheme.
The contributions of this thesis are introduced in more detail in the following sections:
1.1 Our Contributions
Main contribution of this thesis is extending the computation elimination algorithms
for correlation based fast template matching. Previously these algorithms were only
well known for simple image match measures such as SAD and SSD. Two different
types of elimination algorithms include bound based and monotonic growth based
algorithms. In this thesis, we have extended both types of computation elimination
algorithms for correlation based fast template matching.
For the implementation of bound based computation elimination algorithms for cor-
relation based template matching, we have derived novel bounds on correlation based
measures, which we have named as transitive bounds. By using transitive bounds we
have developed different types of fast template matching algorithms. We have named
these algorithms as ‘Transitive Elimination Algorithms’.
For the implementation of growth based computation elimination algorithms for cor-
relation coefficient based fast template matching, we have proposed a monotonic
decreasing formulation of correlation coefficient. While computing correlation coeffi-
cient by this monotonic formulation, as soon as partial value of correlation decreases
below an already known maximum, remaining computations may be skipped without
17
any loss of accuracy.
We have also extended the idea of monotonic growth based computation elimination
for AdaBoost based object detectors. We reduce computations in AdaBoost based
object detectors in two ways, first by making the computations monotonic and early
terminating computations, second by early non maxima suppression. Both of these
techniques are generic and may be applied to some other object detectors as well.
In the template matching problems, we have focused on correlation coefficient as a
match measure. Our choice is motivated by the fact that it is more robust than
other match measures including SAD, SSD, cross-correlation and Normalized Cross
Correlation (NCC). As a result of our proposed algorithms, correlation coefficient
based template matching has become significantly faster.
In video encoders, block matching for temporal redundancy reduction is also essen-
tially a template matching problem. Due to high computational cost of correlation
coefficient, it has not been consider a viable option for block motion estimation. As
a result of our algorithms, correlation coefficient may be used in video encoders.
We have also explored the benefits of using correlation coefficient as a similarity
measure in block matching algorithms. We find that if block matching is done by
maximization of correlation coefficient and motion compensation is done by first order
linear estimation, the variance of residue signal is guaranteed to be less than the
residue generated by minimization of SAD and simple difference. In most of the
cases, smaller variance means smaller entropy and less number of bits to encode a
video signal. The benefit of using correlation coefficient in video encoding becomes
even more pronounced if intensity and contrast variations between consecutive video
frames are significant.
A visual outline of the complete thesis is shown in block diagram in Figure 1.1. The
contributions regarding fast template matching are arranged in four chapters, from
Chapter 4 to Chapter 7. Use of early termination in AdaBoost based object detectors
is discussed in Chapter 8 and video encoding using correlation coefficient as a match
measure is discussed in Chapter 9. In the following subsections, we introduce each
contribution separately. Each of these subsections corresponds to a full chapter in
18
Core Contributions:Correlation based Fast Template Matching
Complete Elimination Algorithms
Partial Elimination Algorithms
Chapter 1 Additional Contributions
Elimination Algorithms for Fast Object
Detection
Video Coding with Linear
Compensation
Transitive Bounds
Euclidean Distance Based Bounds
Angular Distance Based Bounds
Chapter 4
Exploiting Inter-Reference Autocorrelation
Exploiting Intra-Reference Autocorrelation
Exploiting Inter-Template Autocorrelation
Chapter 5
Transitive Elimination Algorithms
Chapter 2Review of Commonly Used Image Match Measures
Chapter 3
Computation of Commonly Used Image Match Measures
Basic Monotonic Formulation
Basic Mode PCE
Two-Stage Basic Mode PCE
Chapter 6
Extended Monotonic Formulation
Extended Mode PCE
Two-Stage Extended Mode PCE
Chapter 7
Block Based Motion Compensation
Maximization of Gain by Maximization of
Correlation Coefficient
Video Coding with Linear Compensation
(VCLC)
Chapter 9
Early Termination Algorithm
Early Non-Maxima Suppression Algorithm
Chapter 8
Figure 1.1: Organization of full thesis: Related work is organized in Chapters 2 and 3.Chapters 4 and 5 are regarding Transitive Elimination Algorithms (TEA). Chapters6 and 7 are about Partial Correlation Elimination (PCE) algorithms. Chapters 8 and9 contain the additional contributions.
19
the thesis.
1.1.1 Transitive Bounds on Correlation Based Image Match
Measures
Complete elimination algorithms for SAD and SSD based image match measures have
been well investigated. However we find that these algorithms may not be easily ex-
tended for correlation based image matching. This is because of the fact that, getting
complete elimination requires that effective bounds on correlation must be known.
The effectiveness of bounds may be defined in terms of tightness and computational
cost. The bounds must be tight enough to yield computation elimination and must
have low computational cost. The exiting bounds on correlation as derived from
Cauchy Schwartz inequality are not tight, while the bounds derived by Mattoccia
et al. (2008b) have high computational cost.
In this thesis we have derived transitive bounds on correlation based measures and
we show that these bounds may be efficiently computed with low computational
overhead. We also find operating conditions under which transitive bounds become
tight enough to produce significant computation elimination.
Transitive bounds are exact, without involving any type of approximation. We have
computed two different types of transitive bounds, by using Euclidean distance and
by using angular distance. We theoretically compared the tightness of first type with
the second type and found that angular distance based transitive bounds are always
tighter than Euclidean distance based bounds.
We have also analyzed the tightness characteristics of both types of transitive bounds.
We observed that these bounds become sufficiently tight under specific conditions.
We mapped these conditions on the template matching problems and successfully
used the transitive bounds for computation elimination.
20
1.1.2 Transitive Elimination Algorithms
In Transitive Elimination Algorithms, at most of the search locations, initially tran-
sitive bounds are computed. The search locations exhibiting upper transitive bound
less than currently known maximum are skipped from the search space. Correlation
computations only follow at the non-skipped search locations. If large number of
search locations is skipped, the template matching process becomes efficient.
Transitive Elimination Algorithms require a mapping of transitive bounds to the tem-
plate matching problem. In this mapping, we have addressed two main challenges:
the overhead of bound computations must be insignificant as compared to the overall
computations and bounds must be tight enough to produce significant computation
elimination. The bound computation overhead is reduced by developing efficient algo-
rithms and tightness of bounds is guaranteed by ensuring certain operating conditions
as briefly discussed in the following paragraph.
At a particular search location, computation of transitive bounds requires two bound-
ing correlations must be known. We show that tight upper transitive bound may be
guaranteed if at least one of the two known correlations is of large magnitude. We
ensured this condition by mapping one of the two known correlations on the auto-
correlation present in the template matching problems. Auto correlation may be
present in many different forms, resulting in different mappings of transitive bounds
to the template matching problem. Each mapping resulted in a different Transitive
Elimination Algorithm. In this thesis, we have developed three different Transitive
Elimination Algorithms.
1. Exploiting strong intra-reference autocorrelation (Mahmood and Khan, 2008)
Typical template matching scenario is to match one or more small template
images at all valid search locations in a big reference image. Transitive bounds
may be used to speed up template matching process if the reference image
has high local auto-correlation, which means the spatially contiguous search
locations are highly correlated with each other. We divide the reference image
into non-overlapping rectangular windows and the central block in each window
is correlated with the outside blocks in the same window that is the computation
21
of local auto-correlation.
The mapping of image blocks r1, r2 and r3 to this problem is as follows: r1
mapped to the template image, r2 mapped to the central block in each window
and r3 mapped to an outside block in the same window. Template image is only
correlated with the central block to yield one of the two known correlations. By
using this correlation and the auto-correlation as the two known correlations,
we compute transitive bounds on the correlation between outside image blocks
and the template image. If transitive bounds are sufficiently tight, many of the
outside blocks may get eliminated from the search space.
The minimum possible size of reference image window is 3 × 3 pixels, which
means one central block (r2) have eight surrounding neighboring blocks (r3).
Template image (r1) is matched only with the central block and transitive
bounds are computed for the eight outside blocks. If auto-correlation between
the central block and each of the outside block is sufficiently high, each of the
eight out-side blocks may get eliminated. That means ratio of work done to
the total work may be as low as 1:9, ignoring the overhead required to compute
auto-correlation. To compute local auto-correlation, we have formulated a very
efficient algorithm which has reduced this overhead to a negligibly small value.
As the widow size increases, the total number of central blocks in the reference
image decreases, causing a decrease in the number of matches with the central
blocks. For example, for a window size of 5×5 pixels, ratio of work done to the
total work may reduce to 1:25, and if window size is increased to 7 × 7 pixels,
the ratio may reduce to 1:49. The speed up over exhaustive spatial domain may
be 9 times, 25 times, or 49 times for 3× 3, 5× 5, and 7× 7 pixels window sizes
respectively.
Although larger window sizes yield more speed up, window size may not be in-
creased to an arbitrarily big size. By increasing window size the auto-correlation
between central block and outside blocks decreases, which may result in loose
transitive bound. If the upper transitive bound turns out to be larger than
the current known maximum, the corresponding block will not get eliminated
and correlation will be computed on this block. Increasing the window size may
cause increase in the number of un-eliminated blocks, resulting in increase in the
22
computational cost. Thus the window size parameter is quite critical to obtain
maximum benefit of transitive bounds. We have investigated this parameter in
detail and proposed formulation for automatic computation of the window size
parameter.
2. Exploiting strong inter-reference auto-correlation (Mahmood and Khan, 2010)
Many template matching applications such as tracking an object in a surveil-
lance video, checking for missing components on a PCB production line or object
inspection over conveyor belts, require one template image to be correlated with
a set of reference frames. In such an application, the reference frames are often
highly correlated with each other. Therefore, inter-reference auto-correlation
may be used to speed up the template matching process.
We have developed a highly efficient algorithm for the computation of inter-
reference auto-correlation, which has reduced its computational cost to a neg-
ligibly small value. Based on that algorithm, we compute auto-correlation of
all reference frames in the set, with a specific frame which may be temporally
central frame. The template image is also correlated at all search locations
in the central frame. The two known correlations in this case are the correla-
tion between central frame and the other frames in the set, and the correlation
between template image and the central frame. By using these two known cor-
relations, transitive bounds on each of the other reference frames in the set,
may be computed and used to speed up the template matching process.
3. Exploiting strong inter-template autocorrelation (Mahmood and Khan, 2007b)
Certain applications require a set of template images to be correlated with a
single reference image, for example, matching an aerial video with a satellite
image or exhaustive rotation-scale invariant template matching. In such cases,
if the set of templates has high autocorrelation, correlation of one template
with the reference image yields tight bounds upon the correlation of all other
templates within the set.
Correlation of a central template with other templates in the set yield inter tem-
plate auto-correlation and correlation of the central template with the reference
image yields one of the two known correlations. Correlation of other templates
23
with the same reference image may be made faster by computing the transitive
bounds. Computational cost of other templates with the reference image may
be significantly reduced, depending on the inter template autocorrelation.
The transitive elimination algorithms are implemented in C++ and compared with
current known efficient algorithms including Bounded Partial Correlation (Di Stefano
et al., 2005), Enhanced Bounded Correlation (Mattoccia et al., 2008b), fast algorithms
for SAD (Li and Salari, 1995), (Montrucchio and Quaglia, 2005), FFT based frequency
domain implementation (William et al., 2007) and an efficient spatial domain imple-
mentation as described in (Lewis, 1995). Experiments are performed on a variety of
real image datasets. While the exact speed up of the proposed algorithms varies from
experiment to experiment, we have observed speed up ranging from multiple times
to more than an order of magnitude.
1.1.3 Basic Mode Partial Correlation Elimination Algorithm
The performance of transitive elimination algorithms, as discussed in the previous
subsections, strongly depends upon the magnitude of auto-correlation found in an
image matching application. High auto-correlation is present in many applications
but may not be guaranteed to be present in every scenario. Therefore, the need for
more generic algorithms is satisfied by development of partial correlation elimina-
tion algorithms. These algorithms may be used for correlation coefficient based fast
template matching in applications which do not exhibit high auto-correlation, while
transitive elimination algorithms still remain faster if high magnitude auto-correlation
is present.
In partial elimination algorithms, a portion of computations is mandatory to be done
at each search location before that location may be skipped from computations. Par-
tial elimination algorithms has been well investigated for SAD and SSD based distance
measures, while for correlation coefficient based image matching, only algorithms pre-
sented by Di Stefano et al. (2005) and Mattoccia et al. (2008b) may be found in lit-
erature. This is because of the fact that when SAD or SSD is computed between two
24
0 12 24 36 48 60 72 84 96 108 120 132 144−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Processed Pixels
Gro
wth
of P
artia
l Sim
ilarit
y
Threshold
Non−Monotonic
Monotonic
Figure 1.2: Growth of correlation coefficient in its traditional form (blue) and mono-tonic form (red). The curves show intermediate value at each of 144 pixel locationsfor a pair of 12 × 12 pixels blocks. Both formulations reach the same final value ofρ =-0.0647. Computations may be skipped when the partial sum in the monotonicform becomes lower than the threshold.
image blocks and corresponding pixels are processed, the partial value of distance in-
creases monotonically. Since the best match search location is defined as the location
exhibiting minimum value of distance, at a particular search location, as soon as the
current value of distance increases than previous known minimum, further compu-
tations become redundant and may be skipped without any loss of accuracy. Thus,
partial elimination algorithm required two basic properties in the match measure,
the first is monotonic growth pattern and the second is the best match definition by
minimum distance location. Unfortunately, both of these properties are missing for
the case of correlation coefficient when computed by traditional formulations. Corre-
lation coefficient does not grow monotonically and the best match is defined as the
location exhibiting maximum value of correlation coefficient.
25
Due to the two unfavorable properties, correlation has been criticized by many re-
searchers as not being capable of partial computation elimination (Brown, 1992; Zil-
tova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a). How-
ever, we observe that if correlation coefficient is computed by using the normalized
Euclidean distance formulation, it turns out to be monotonic decreasing measure. Al-
though relationship between correlation and the normalized Euclidean distance has
long been known (Rodgers and Nicewander, 1988), it has never been realized as a
means of fast computation of correlation coefficient. We, for the first time proposed
in (Mahmood and Khan, 2007a) a partial elimination algorithm for correlation coef-
ficient that outperformed all of the existing fast exhaustive-equivalent algorithms by
a significant margin for small and medium sized templates.
If correlation coefficient is computed using our proposed formulation, the similarity
starts from +1 at the first pixel of a block and monotonically decreases to the final
value of correlation coefficient till the end of the computations (Figure 1.2). Any
intermediate value of similarity is always larger than (or equal to) the final value. The
speed up occurs because at any point during the computation, if similarity happens
to be less than previous known maximum (or an initial threshold), the remaining
computations become redundant and may be skipped without any loss of accuracy. As
the total amount of skipped computations increases, the template matching process
accelerates accordingly.
In PCE algorithm, the amount of elimination depends upon the magnitude and the
location of the current known maximum. High magnitude maximum found at the
start of the search process may significantly increase computation elimination and
hence reduce the execution time. For this purpose, we have developed an intelligent
re-arrangement of PCE computations, conceptually similar to two-stage template
matching proposed by Vanderbrug and Rosenfeld (1977). In the first stage, only a
small portion of the template is matched at all search locations using PCE algorithm.
Based on the partial result, complete correlation coefficient is computed at the best
match location only, which is used as initial threshold in the second stage. By using
this strategy, we may quickly find a high threshold at no additional computational
cost. This scheme is effective for small and medium sized templates, for these sizes
26
coarse-to-fine schemes often fail to provide a high initialization threshold. PCE algo-
rithm with two-stages is exact, having exhaustive-equivalent accuracy. In contrast,
the existing two-stage algorithm for normalized cross-correlation (NCC) proposed
by Goshtasby et al. (1984) is approximate, with a non-zero probability of missing the
NCC maximum.
1.1.4 Extended Mode Partial Correlation Elimination Algo-
rithm
Basic Mode PCE algorithm, as discussed in the last section, is based on a monotonic
formulation of correlation coefficient. We have analyzed different types of overheads
associated with this monotonic form and found that this form is efficient for speeding
up small and medium sized templates. For large sized templates, the overhead of
the monotonic form may erode some of the computational advantage obtained by
elimination.
To further improve the computational efficiency, we have expanded the simple mono-
tonic form of correlation coefficient and separated the pre-computable terms from the
run-time computable terms. The resulting form of correlation coefficient, we found,
is still monotonic but more complex than the original simple form. However, we find
that full evaluation of the complex form is only required if the elimination test has to
be performed, otherwise the computations may proceed by just computing cross cor-
relation. The elimination test consists of a comparison of current value of similarity
with a previous known maximum value and requires current value of similarity must
be known.
The PCE algorithm based upon the complex monotonic form is named as Extended
Mode PCE algorithm, which is much faster on large sized templates. In order to
reduce the total number of full evaluations of the complex form, we have developed a
strategy to find the number of elimination tests that must be performed and suitable
test indices while matching two image blocks as well. This strategy is based upon
the current known value of maximum and the downward slope of the monotonic
decreasing curve. We observe that the downward slope of the monotonic decreasing
27
curve is on the average linear and a safe value of current known maximum is used for
the computation of total number of tests to be performed and their locations as well.
For large sized templates, coarse to fine scheme to find a high initial threshold at
the start of the search process, at a small computational overhead has been used.
Extended Mode PCE with two-stage approach has also been implemented, however
we observe that in many cases, coarse to fine scheme along with PCE algorithm,
provides the fastest image matching for large sized templates. Therefore we use both
of the schemes side by side. If coarse to fine scheme successfully found high maximum,
following is Extended Mode PCE, while if coarse to fine scheme failed, then two stage
Extended Mode PCE will be used for fast match measure computation.
The PCE algorithms are compared with the current known fast exhaustive equiva-
lent accuracy algorithms, including a frequency domain sequential implementation
of FFT (William et al., 2007), an optimized, adaptive and parallel implementa-
tion FFTW3 (Frigo and Johnson, 2005), a very fast spatial domain implementation
ZEBC (Mattoccia et al., 2008b), and with a spatial domain efficient exhaustive im-
plementation (Pratt, 2007). The comparisons are done over a wide variety of datasets
and on 4× 4 to 128× 128 pixels template sizes. Although the exact speed up is data
dependent, in many cases PCE algorithms have been found to be faster by more than
an order of magnitude over the other algorithms under consideration.
1.1.5 Elimination Algorithms for Fast Object Detection
Bound based computation elimination algorithms have been traditionally used to
speed up image matching applications only. However, we observe that similar schemes
may also be devised to speed up other applications in the fields of Image Processing
and Computer Vision. Some of the obvious applications which may benefit from
elimination schemes are object detectors and edge-corner detectors. As an example,
we consider AdaBoost based object detector (P.Viola and Jones, 2001, 2004), in
which the detector response is the sum of positive weights of a subset of the weak
learners in an ensemble. Using ideas of elimination algorithms, the computation of
this summation may be terminated well before completion, if established that current
28
location cannot exceed a detection threshold.
In the face detector proposed by P.Viola and Jones (2001, 2004), an early rejec-
tion scheme has also been implemented in the form of a cascade of ensembles. The
ensemble at the start of the cascade checks if some of the essential facial features,
for example eyes are missing then the current location may be discarded as being a
non-face. However, such schemes are only possible if the geometric patterns of the
object under consideration always remain fixed. In case of edge corner detectors, no
aligned geometric patterns may be expected, therefore the cascade based schemes
may no longer remain applicable. The computation elimination schemes proposed in
this thesis, for fast edge corner detection by AdaBoost based detector, are generic
and may be applied to any type of objects. In this regard, we have developed two
types of elimination algorithms, the basic early termination algorithm and early non-
maxima suppression algorithm. Both of these algorithms are briefly introduced in
the following paragraphs.
In the basic form of early termination algorithm for fast edge-corner detection, each
candidate location is initialized with the total weight of the trained ensemble. If
a weak learner classified the current location as a non-object, the weight of that
learner is subtracted from the current total weight. As more learners are processed,
the weight of the candidate location monotonically decreases, and as soon as the
current weight becomes less than the detection threshold, further computations may
be skipped without any loss of accuracy.
In order to suppress multiple responses to the same object, only local maximum in
each locality has to be retained, while the local non-maxima candidates have to be
suppressed to zero through the process of Non-Maxima-Suppression (NMS). We re-
duce the computations at local-non-maxima candidate locations by developing Early
Non-Maxima Suppression (ENMS) algorithm. In ENMS algorithm, we partially com-
pute the detector response at all candidate locations. In each local NMS window, we
choose the candidate location with the best partial result, and compute the final de-
tector response at that location. If this final response is larger than the detection
threshold, then for the remaining candidate locations in that NMS window, the early
termination threshold is raised to the final value of the local maximum. That is, in
29
a specific NMS window, a candidate location will be discarded as soon as the detec-
tor response falls below the local maximum or global detection threshold, whichever
is larger. ENMS algorithm is helpful in reducing redundant computations at non-
maxima candidate locations.
The proposed partial computation elimination algorithms are incorporated within our
previous implementation of AdaBoost based edge-corner detector (Mahmood, 2007).
The quality of the detected edge-corners has remained exactly the same, while the
speed up over the original algorithm is more than an order of magnitude. We have also
compared the quality and speed up of the edge-corners detected by Adaboost detector
with three other detectors including KLT detector (Shi and Tomasi, 1994), Harris
detector (Harris and Stephens, 1988) and Xiao’s detector (Xiao and Shah, 2003). We
find that the edge-corners detected by AdaBoost detector are of comparable quality
as KLT, Harris and Xiao detectors while the execution time speed up is up to 4.00
times faster than KLT, 17.13 times than Harris and 79.79 times than Xiao’s detector.
1.1.6 Video Coding with Linear Compensation
In traditional video encoders, in order to reduce the temporal redundancy in the video
signal, block based motion compensation techniques has often been used. In these
techniques, current video frame is divided into non-overlapping blocks. Each block
from the current frame is searched in a previous frame by using the minimization
of SAD. At the best match location, simple difference of current block and the best
match block is computed for further processing. Since SAD is not robust to intensity
and contrast variations, in the presence of such variations SAD will yield incorrect
match location, resulting in large variance of residue and lack of compression.
We observe that correlation coefficient represents the goodness of linear fit between
two image blocks (this will be discussed in more detail in Chapter 2). We have tried
to explore the benefit of this property for motion compensation in video encoders.
If block matching is done by maximization of correlation coefficient, then the best
matching block is the best linear fitting block as well. If we get the maximum cor-
relation of 1.0, the template block and the matched block are in a perfect linear
30
relationship resulting in zero residues if motion compensation is done with first or-
der linear estimation. We theoretically find that if motion estimation is done by
maximization of correlation coefficient and motion compensation is done by linear
estimation, the variance of the motion compensated signal will always be less than
the variance of simple difference signal used in traditional encoding schemes. The
overhead of this approach is one extra parameter, per block, to be encoded which
may require a customized decoder.
The use of correlation coefficient for block motion estimation may be criticized for its
high computational complexity. However, we have implemented the partial correla-
tion elimination algorithm for block motion estimation and compared the execution
time of SAD based motion estimation using Successive Elimination Algorithm (Li and
Salari, 1995) and Partial Distortion Elimination (Montrucchio and Quaglia, 2005) op-
timizations. We find that the execution time of correlation based block matching is
comparable to the execution time of optimized SAD. This shows that the use of corre-
lation coefficient as block motion estimator is a viable option and may be considered
if the resulting compression is larger than that obtained by traditional SAD based
encoding schemes.
1.2 Organization of Rest of the Thesis
Visual organization of the thesis is shown in Figure 1.1. In this figure, the flow of
concepts and dependencies between different chapters is shown by arrows. A brief
overview of the rest of the thesis is as follows:
• Theoretical properties of various commonly used image match measures are
discussed in Chapter 2. Theoretical relationships between different match mea-
sures have also been explored.
• Chapter 3 contains a review of the existing state of the art image matching
algorithms. Positioning of our core contributions within the existing work is
also explained.
• Chapter 4 contains theoretical aspects of transitive bounds on correlation.
31
• Chapter 5 contains Transitive Elimination Algorithms, along with experiments
and results.
• Chapter 6 is about Basic Mode Partial Correlation Elimination algorithm.
• Chapter 7 is about Extended Mode Partial Correlation Elimination algorithm.
• Chapter 8 contains applications of elimination schemes for fast object detection.
Early termination and Early Non-Maxima Suppression (ENMS) algorithms are
discussed in the perspective of AdaBoost based edge-corner detector.
• Chapter 9 is about video coding with linear compensation, which is a new
correlation coefficient based video coding scheme.
• Finally the thesis is concluded in Chapter 10.
Chapter 2
A REVIEW OF THE COMMONLY USED IMAGE
MATCH MEASURES
An image match measure is a function that accepts two images, I1 and I2, as input
and maps them to a single point on the line of real numbers:
M : I1 × I2 → R, (2.1)
where M is the match measure and R is the set of real numbers.
Image match measures may compute distance or dissimilarity, as well as similarity,
closeness and proximity between the input images. Image match measures which
compute dissimilarity between the input images are known as distortion measures
or distance measures, D(·, ·). Commonly used distance measures include city block
distance measure and Euclidean distance measure. A comprehensive list of distance
measures may be found in Deza and Deza (2006). For a distance measure to be
a metric, three necessary conditions must be satisfied (Bryant, 1985; Burago et al.,
2001):
1. Non-negativity: for two given images, r and t, each of sizem×n pixels: D(r, t) ≥0, and D(r, t) = 0 if and only if r = t.
2. Symmetry: Distance should be same from r to t or from t to r : D(r, t) = D(t, r).
3. Triangular Inequality: for three given images, r, s and t, each of size m × n
pixels: D(r, t) ≤ D(r, s) +D(s, t).
Other image match measures compute similarity or proximity between the two input
images. These image match measures are sometimes referred to as image similarity
measures, S(·, ·). Common examples of image similarity measures include correla-
tion based measures and mutual information based measures. For image similarity
32
33
measures to be a valid measure, only a subset of the above cited conditions may be
applicable:
1. Maximum Similarity: for two given images, r and t, each of size m× n pixels,
S(r, t) should be maximum if r(i, j) = t(i, j) where (i, j), is pixel location.
That is, the intensity values of the two images exactly match. In some cases,
similarity may also approach maximum even if pixel intensities are not exactly
equal, rather intensities of both images are related by a perfect relationship.
For example, r(i, j) and t(i, j) may be related by the relationship : t(i, j) =
α+ βr(i, j), where α and β are constants. In this case, the similarity measured
by correlation coefficient will approach maximum value.
2. Symmetry: Similarity should be same from r to t or from t to r: S(r, t) = S(t, r).
A commonly used image similarity measure, correlation coefficient, may map two
input images to any point on the real line from -1.00 to +1.00
ρ : I1 × I2 → R, − 1.00 ≤ R ≤ +1.00, (2.2)
therefore similarity score may be positive or negative. Correlation coefficient does
not follow triangular inequality, however as we will show later, it follows a different
type of transitive inequality.
Many similarity measures may also be mapped to a distance measure using some
inverse function
N(S(r, t))→ D(r, t), (2.3)
where N(·) is an inverse mapping function, which maps a similarity score S(r, t) to
a distance score D(r, t). For example, correlation based similarity measures may be
mapped to Euclidean distance as well as angular distance based measures, as we will
discuss later in this Chapter.
Besides the classification of image match measures as distance measures or similarity
measures, other classification schemes also exist. For example, image match measures
may also be classified by the basic fields in which these were originally defined. City
Block distance and Euclidean distance measures were originally defined in geometry,
34
therefore these are sometimes referred as Geometric Measures (Cha, 2007). Those
measures which were initially defined in statistics, may be called statistical measures,
for example correlation coefficient, and those which are founded in probability the-
ory are known as probabilistic measures, for example Mutual Information. Some
measures have simultaneously been defined in more than one fields, for example, cor-
relation coefficient has been defined in Euclidean Geometry as the inner product of
two zero mean and unit magnitude vectors and in Statistics Theory as covariance of
two random variables normalized by the standard deviation of the individual vari-
ables. Therefore, in such a classification, one match measure may lie in more than
one class.
Another classification of image match measures is based on the assumed relationship
between the two images to be compared (Roche et al., 1998, 1999, 2000). Some
measures assume brightness constancy for a perfect match, that is, brightness of a real
world object remains the same in the two images to be compared. Examples of such
measures include City Block distance measure, Euclidean distance measure and cross-
correlation. Other measures assume linear relationship between two images to be
matched, for example correlation coefficient, and some assume a non-linear functional
relationship between images to be compared, for example correlation ratio. Image
match measures assuming some probabilistic properties or statistics of appearance of
an object remains the same, are probabilistic similarity measures. These measures
include Entropy, Mutual Information and Mahalanobis distance measures.
Some of the most commonly used image match measures and their relationships with
each others are discussed in more detail in the following sections:
2.1 City Block Distance Measure
Different names of City Block distance measure are Manhattan distance measure,
L-1 norm, and Sum of Absolute Differences (SAD), which is often used in Image
Processing literature. SAD assumes brightness constancy assumption between the
images to be compared. SAD is the most frequently used measure for block motion
estimation in video encoders. Validity of SAD as motion estimator may be justified
35
because consecutive video frames are captured by the same sensor at very small time
gaps therefore the brightness constancy assumption may often remain valid.
Given two image blocks r and t, each of size m×n pixels, SAD is the sum of absolute
differences at all m× n pixel locations:
Φ(r, t) =m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣, (2.4)
where | · | represents absolute function. If the brightness of t and r vary by a multi-
plicative factor or change of contrast, both of the images may be normalized to unit
magnitude before computation of SAD:
Φu(r, t) =m∑x=1
n∑y=1
∣∣∣∣∣∣ r(x, y)√∑mx=1
∑ny=1 r(x, y)2
− t(x, y)√∑mx=1
∑ny=1 t(x, y)2
∣∣∣∣∣∣, (2.5)
and if t and r vary due to addition of some constant, or change of brightness, both
of the images may be normalized to zero mean before computation of SAD:
Φz(r, t) =m∑x=1
n∑y=1
|(r(x, y)− µr)− (t(x, y)− µt)|, (2.6)
where µt and µr are mean intensity values of t and r. Combining both of these
normalizations, result in Normalized SAD (NSAD):
Φzu(r, t) =m∑x=1
n∑y=1
∣∣∣∣∣∣ r(x, y)− µr√∑mx=1
∑ny=1 (r(x, y)− µr)2
− t(x, y)− µt√∑mx=1
∑ny=1 (t(x, y)− µt)2
∣∣∣∣∣∣.(2.7)
Although few authors have mentioned the normalized forms of SAD given by Equa-
tions 2.5, 2.6 and 2.7, for example (Roma et al., 2000), the use of normalized SAD
as image match measure is infrequent. In video encoders, where SAD is often used
for block motion estimation, brightness and contrast changes are not expected, while
in image alignment and registration applications, when brightness and contrast vari-
ations are expected, correlation coefficient has often been used.
36
The computational cost of normalization render normalized versions of SAD, as given
by Equation 2.7, significantly more expensive than the simple version given by Equa-
tion 2.4. This is because of the fact that the absolute function cannot be expanded
or simplified, which results in higher cost of normalization as compared to Euclidean
distance or correlation coefficient which may be rearranged to reduce computational
cost. A review of efficient image match measure computation techniques is given in
Chapter 3.
2.2 Euclidian Distance Measure
Euclidean distance is based on the fact that the shortest distance between two points
(in Euclidean space) is a straight line which may be computed by using the Pythago-
ras Theorem. The basic concept of Euclidean distance seamlessly extends from two
dimensional Euclidean space to the higher dimensional Euclidean spaces.
In order to define Euclidean distance between two image blocks r and t, both images
must be considered as points in <m×n. Euclidean distance, ∆(r, t), between r and t
is given by:
∆(r, t) =
√√√√ m∑x=1
n∑y=1
(r(x, y)− t(x, y)
)2, (2.8)
∆(r, t) is also known as Euclidean Norm or L2 norm. During image matching, we
search for a minimum value of ∆(r, t). Since square-root function in Equation 2.8
does not affect the relative order of values, therefore it may be removed to reduce
the computational cost. The resulting measure, which is ∆2, is often called Sum of
Squared Differences (SSD):
SSD =m∑x=1
n∑y=1
(r(x, y)− t(x, y)
)2, (2.9)
In image processing, SSD has often been used instead of Euclidean distance, because
of its reduced complexity.
The magnitude of ∆(r, t) between two points may vary depending on the units of
37
measurement. In order to make ∆(r, t) independent of the measurement units, it
may be normalized. In image processing, one image sensor may be mapping the real
world intensities to a wider range of image intensities as compared to another sensor.
In order to cater for the distortion effects produced by such variations, both images
must be normalized to unit magnitude. The unit magnitude normalized Euclidean
distance, ∆u(r, t), is given by:
∆u(r, t) =
√√√√√ m∑x=1
n∑y=1
r(x, y)√∑mx=1
∑ny=1 r(x, y)2
− t(x, y)√∑mx=1
∑ny=1 t(x, y)2
2
. (2.10)
Terms√∑m
x=1
∑ny=1 r(x, y)2 and
√∑mx=1
∑ny=1 t(x, y)2 are Euclidean norms of r and
t and represent the distance of each of the point r and t from the origin. After
division of each dimension of r and t with the distance from origin, transforms r and
t to points at unit distance from the origin, or at the surface of a unit sphere.
Most of the times, two images to be matched have different lighting conditions. One
image may be overall brighter than the other. In order to cancel the effect of light
intensity variations, images must be zero mean normalized. The zero mean normalized
Euclidean distance, ∆z(r, t) is given by:
∆z(r, t) =
√√√√ m∑x=1
n∑y=1
((r(x, y)− µr)− (t(x, y)− µt))2. (2.11)
Euclidean distance upon zero mean and then unit magnitude normalized images is
given by:
∆zu(r, t) =
√√√√√ m∑x=1
n∑y=1
r(x, y)− µr√∑mx=1
∑ny=1 (r(x, y)− µr)2
− t(x, y)− µt√∑mx=1
∑ny=1 (t(x, y)− µt)2
2
.
(2.12)
∆zu is also called Standardized Euclidean Distance. In the following Sections, we will
see that image matching by the minimization of ∆zu(r, t) is equivalent to the image
matching by the maximization of correlation coefficient.
38
(b) (a)
s
t r
ostrs 180),( <ππθ
r
s
t ),( srθ
),( tsθ
rsπ
stπ
),( srθ ),( tsθ
rsπ
ostrs 180),( =ππθ
stπ
Figure 2.1: Angular distance measure follows the triangular inequality.
2.3 Minkowski Distance Measure
In the previous Sections we have seen L1 norm and L2 norm distance measures named
as City Block distance measure and Euclidean distance measure. In Euclidean space,
higher order norms, such as L3, L4 and so on, are also valid distance measures. In
general, Lp norm is called Minkowski distance of order p, for p ≥ 1.
Considering images r and t, each of size m × n pixels, as two points in m × n di-
mensional Euclidean space, <m×n, Minkowski distance of order p may be defined
as:
Lp =
( m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣p)1/p
, for p > 1 (2.13)
Equation 2.13 holds only for p ≥ 1, for p ≤ 0, Lp no longer remains a metric. In
Equation 2.13, if p→∞, L∞ norm is called Chebyshev distance measure.
L∞ = limp→∞
( m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣p)1/p
, (2.14)
Chebyshev distance between r and t is the greatest of their differences along any
dimension:
L∞ = max1≤x≤m,1≤y≤n
|r(x, y)− t(x, y)| (2.15)
39
Chebyshev distance is also known as Chessboard distance, because in Chess, the
minimum number of moves the king needs to go from one square to another equals
Chebyshev distance between the centers of the squares, assuming that the squares
have unit side length, and the coordinate frame aligned with the board edges. Cheby-
shev distance has also been used in image processing, for example see (Li et al., 2006;
Jedrasiak and Nawrat, 2009).
2.4 Angular Distance Measure
Angular distance measure may be considered as the dissimilarity of the direction of the
two vectors. If the angle between two vectors is zero, both vectors have same direction
therefore having maximum positive association. If angle is 180o then their directions
are exactly opposite to each other which show maximum negative association. When
two vectors are orthogonal to each other, there is no association in their direction,
which may be considered as zero similarity.
Angular distance between two points in <m×n is the angle between the vectors joining
these points with the origin. The images r and t may be considered as points in <m×n
and the angular distance between the vectors joining these points with origin show the
dissimilarity between the images. Let πrt be the plane defined by the vectors r and t.
The angular distance, θ(r, t), between r and t, is the angle measured in the plane πrt.
There are two possible angles between r and t in the plane πrt, which are θ(r, t) and
360− θ(r, t). For maintaining consistency and without losing generality, the smaller
of the two angles may always be chosen as θ(r, t), so that 0 ≤ θ(r, t) ≤ 180o.
The angular distance measure is a valid metric because it satisfies the three conditions
of non-negativity, symmetry and triangular inequality (Mahmood and Khan, 2007b):
1. Angular distance is non-negative, θ(r, t) ≥ 0 and θ(r, t) = 0 if and only if r =t.
2. Angular distance is symmetric, θ(r, t) = θ(t, r).
3. Angular distance follows the triangular inequality of distance measures. For
three images, r, s, and t: θ(r, t) ≤ θ(r, s) + θ(s, t). A simple proof of this fact
40
is given in the following paragraph.
Let πrs and πst be the uniquely defined planes by vectors r, s and s, t as shown in
Figure 2.1. Let θ(πrs, πst) and 360 − θ(πrs, πst) be the magnitude of the two angles
between the planes πrs and πst. Without loss of generality, we may always select
the smaller of these two angles as θ(πrs, πst), which is bounded between 0o and 180o:
0o ≤ θ(πrs, πst) ≤ 180o.
The value of θ(r, t) depends on θ(πrs, πst). If the magnitude of θ(πrs, πst) is 180o, then
θ(r, t) is equal to θ(r, s) + θ(s, t). For all values of θ(πrs, πst) < 180o, the magnitude
of θ(r, t) will remain less than θ(r, s) + θ(s, t). Therefore
θ(r, t) ≤ θ(r, s) + θ(s, t), (2.16)
which shows that the angular distance follows the triangular inequality of distance
measures. �
If r · t represent inner product of two vectors and ||r||2 is the magnitude of vector r,
||t||2 is the magnitude of vector t, using the definition of inner product, the angular
distance θ(r, t) may be given by:
θ(r, t) = cos−1( r · t||r||2||t||2
)(2.17)
and the angular distance between two unit magnitude normalized vectors
θu(r, t) = cos−1( r
||r||2· t
||t||2). (2.18)
Magnitude normalization of a vector only changes the length of the vector while the
direction remains unchanged. Therefore angular distance between two unit length
normalized vectors will remain same as their un-normalized versions, θ(r, t) = θu(r, t).
Angular distance between zero mean normalized vectors is given by:
θz(r, t) = cos−1 (r − µr) · (t− µt)||r − µr||2||t− µt||2
(2.19)
Mean subtraction from each dimension of a vector is equivalent to translating the end
41
points of the vector. Since both vectors are translated in different directions, angular
distance between them may change as a result of zero mean normalization. In general
θz(r, t) 6= θ(r, t) and θz(r, t) 6= θu(r, t). Angular distance between zero mean and unit
magnitude normalized images is given by:
θzu(r, t) = cos−1( (r − µr)||r − µr||2
· (t− µt)||t− µt||2
)(2.20)
Since length normalization does not change the angle, therefore θz(r, t) = θzu(r, t).
θzu(r, t) may also be called as Standardized Angular Distance. Angular distance
measure is directly related with correlation based similarity measures and also with
Euclidean distance measures.
2.4.1 Relationship between Standardized Angular Distance
and Standardized Euclidean Distance
Considering two points on the surface of a unit sphere, standardized Euclidean dis-
tance is the length of a straight line joining them whereas standardized angular dis-
tance is the angle between the vectors joining those points with the center of the
sphere, which is at the origin. Standardized Euclidean distance will be zero if both
points lie on the same position and it will assume maximum value of 2, if the two
points lie exactly opposite to each other along a diagonal. For the minimum standard-
ized Euclidean distance; Standardized angular distance will also be minimum of 0o
and for the case of maximum standardized Euclidean distance, standardized angular
distance will also be maximum of 180o. The function relating Standardized Euclidean
distance with standardized angular distance may be derived from Equation 2.20
cos{θzu(r, t)} =( (r − µr)||r − µr||2
· (t− µt)||t− µt||2
)(2.21)
42
and from Equation 2.12, squaring both sides and simplifying, we get
∆2zu(r, t) = 2− 2
m∑x=1
n∑y=1
(r(x, y)− µr)(t(x, y)− µt)√∑mx=1
∑ny=1 (r(x, y)− µr)2
√∑mx=1
∑ny=1 (t(x, y)− µt)2
.
(2.22)
Substituting the value of cos θzu(r, t), the relating function is given by
∆zu(r, t) =√
2(1− cos{θzu(r, t)}), (2.23)
that shows if θzu(r, t) = 0o, then ∆zu(r, t) = 0, and if θzu(r, t) = 180o, then ∆zu(r, t) =
2.
2.5 Correlation Based Similarity Measures
In signal processing, cross-correlation or a closely related method known as Matched
Spatial Filter (MSF) has often been used to search a short duration signal within a
longer one. The main reason of using cross-correlation for signal detection is because
of the fact that, in the presence of white Gaussian noise, it is an optimal linear
operator for signal detection (Turin, 1960). Cross-correlation is computed by taking
inner product of two one dimensional signals. The image blocks r and t may also be
considered as two dimensional signals, cross-correlation between these signals is given
by inner product between them:
ψ(r, t) =m∑x=1
n∑y=1
r(x, y)t(x, y). (2.24)
In image processing, cross-correlation has often been used in its normalized form
to remove its bias towards brighter regions. Normalized Cross-Correlation (NCC)
between image blocks r and t is often defined as:
ψu(r, t) =
m∑x=1
n∑y=1
r(x, y)t(x, y)√m∑x=1
n∑y=1
r2(x, y)
√m∑x=1
n∑y=1
t2(x, y)
, (2.25)
43
NCC is robust to contrast variations, but it is not robust to the brightness variations.
A more robust measure, invariant to any linear change in the signal, is correlation
coefficient, ρ, which is cross correlation between zero mean and unit magnitude nor-
malized images:
ρ(r, t) =
m∑x=1
n∑y=1
(r(x, y)− µr)(t(x, y)− µt)√m∑x=1
n∑y=1
(r(x, y)− µr)2
√m∑x=1
n∑y=1
(t(x, y)− µt)2
, (2.26)
where µr and µt are means of r and t respectively. The formulation of correlation
coefficient may also be viewed as co-variance normalized by individual variances:
ρ(r, t) =σ2rt
σrσt(2.27)
A re-arrangement of Equation 2.26 is given by:
ρ(r, t) =
m∑x=1
n∑y=1
r(x, y)(t(x, y)− µt)√m∑x=1
n∑y=1
(r(x, y)− µr)2
√m∑x=1
n∑y=1
(t(x, y)− µt)2
, (2.28)
Further re-arrangement yields the computationally efficient form:
ρ(r, t) =
m∑x=1
n∑y=1
r(x, y)t(x, y)−mnµrµt√m∑x=1
n∑y=1
r2(x, y)−mnµ2r
√m∑x=1
n∑y=1
t2(x, y)−mnµ2t
, (2.29)
The formulations given by Equations 2.26, 2.27, 2.28 and 2.29 are equivalent and
yield the same value of correlation coefficient. However, they may vary in their
computational complexity, a topic discussed in detail in the next chapter.
44
2.5.1 Relationship between Correlation and Angular Distance
Measure
Correlation based measures are inversely related to the angular distance measures.
The inverse relationship is realized by the cosine function. Cosine function maps a
larger angle to a smaller value and a smaller angle to a larger value, within the 0o to
180o range. Maximum value of cosine function is 1.00 for an angular distance of 0o
and minimum value is -1.00 for a distance of 180o. For a distance of 90o, the value of
cosine function is 0.00.
Relationship between unit normalized angular distance, θu and the normalized cross-
correlation, ψu, may be written by the comparison of Equations 2.18 and 2.25:
θ(r, t) = cos−1(ψu(r, t)). (2.30)
Similarly, zero mean and unit variance normalized angular distance, θzu, may also be
related with correlation coefficient ρ by using Equations 2.20 and 2.26:
θzu(r, t) = cos−1(ρ(r, t)). (2.31)
Equation 2.31 may be simplified by using the relationship between cos−1 and sin−1:
cos−1(ρ(r, t)) =π
2− sin−1(ρ(r, t)), (2.32)
and expanding sin−1(ρ(r, t)) using Maclaurin series:
cos−1(ρ(r, t)) =π
2− ρ(r, t)− 1
6ρ3(r, t)− 3
40ρ5(r, t)− 5
112ρ7(r, t). (2.33)
For small magnitudes, higher powers of ρ will result in significantly smaller values.
Moreover, the coefficients of the higher powers of ρ are significantly small, therefore
ρ5(r, t) and higher power terms may be ignored without causing significant difference
in the value of the estimated angular distance. The relationship between θzu and ρ
may be written as:
θzu(r, t) ≈π
2− ρ(r, t)(1 +
1
6ρ2(r, t)). (2.34)
45
Using this relationship, for a given value of correlation coefficient, we may estimate
the corresponding angular distance in radians.
2.5.2 Relationship between Correlation and Euclidean Dis-
tance Measure
Cross-correlation also inversely relates to Euclidean distance measure. In order to get
the relationship between ∆ and ψ, we expand Equation 2.8:
∆(r, t) =
√√√√ m∑x=1
n∑y=1
r2(x, y) +m∑x=1
n∑y=1
t2(x, y)− 2m∑x=1
n∑y=1
r(x, y)t(x, y), (2.35)
Substituting the value of ψ from Equation 2.24 and considering the magnitude of
each image as Euclidean norm, Equation 2.35 gets simplified to the following form:
∆(r, t) =√
∆2(r, r) + ∆2(t, t)− 2ψ(r, t), (2.36)
which gives the relationship between cross-correlation and Euclidean distance mea-
sure. We may extend this relationship for the case of normalized cross-correlation
(Equation 2.25) and unit normalized Euclidean distance, ∆u (Equation 2.10). For
the case of unit normalized Euclidean distance, Euclidean norm turns out to be 1.00:
∆2u(r, r) = ∆2
u(t, t) = 1.00, (2.37)
therefore, from Equation 2.36:
∆u(r, t) =√
2(1− ψu(r, t)). (2.38)
Equation 2.38 relates NCC with unit normalized Euclidean distance.
Standardized Euclidean distance, ∆zu, as given by Equation 2.12 may also be related
to correlation coefficient given by Equation 2.26, by using Equation 2.38:
∆zu(r, t) =√
2(1− ρ(r, t)). (2.39)
46
or simply rearranging Equation 2.39:
ρ(r, t) = 1− 1
2∆2zu(r, t), (2.40)
which gives an alternate understanding of correlation coefficient (Rodgers and Nice-
wander, 1988).
2.5.3 Correlation Coefficient as a Measure of Strength of Lin-
ear Relationship
In statistical analysis, correlation coefficient has often been used to estimate the
strength of linear relationship between two random variables (Harnett, 1982; Snedecor
and Cochran, 1968; Montgomery and Peck, 1982; Spigel and Stephens, 1990). This
understanding may be extended towards image processing, to investigate the image
matching capabilities of correlation coefficient. We may assume the two image blocks
to be matched, r and t, as two random variables which are linearly associated. We
may further assume that t is the independent random variable and r is the dependent
random variable. Since r and t are linearly associated, we may estimate r from a
given value of t:
r(x, y) = αrt + βrtt(x, y), (2.41)
where r(x, y) is the estimate of r(x, y), αrt is y-intercept and βrt is the slope of the
regression line between r and t.
The regression analysis used to derive correlation coefficient formulation is based on
three commonly used estimation error terms, Sum of Squared Error (SSE), Sum of
Squared Total Deviation (SSTD) and the Sum of Squared Regression (SSR). The first
error term SSE is the sum of the squared estimation error at all pixels of r:
SSE =m∑x=1
n∑y=1
(r(x, y)− r(x, y))2 (2.42)
=m∑x=1
n∑y=1
(αrt + βrtt(x, y)− r(x, y))2. (2.43)
47
The second term SSTD is the sum of the squared total deviation of r(x, y) from its
mean µr, over all pixels of r:
SSTD =m∑x=1
n∑y=1
(r(x, y)− µr)2 (2.44)
= mnσ2r (2.45)
The third term SSR is the sum of squared difference between estimated values of r
and the mean of r, at all pixels:
SSR =m∑x=1
n∑y=1
(r(x, y)− µr)2 (2.46)
=m∑x=1
n∑y=1
(αrt + βrtt(x, y)− µr)2. (2.47)
The relationship between these three error terms is given by:
SSTD = SSE + SSR. (2.48)
The proof of this derivation is nontrivial, and is presented at the end of the current
section.
Using the three estimation error terms, coefficient of correlation has been defined as
square root of SSR to SSTD ratio (Montgomery and Peck, 1982):
ρ = ±√
SSR
SSTD. (2.49)
By using Equation 2.48:
ρ = ±√
SSR
SSE + SSR. (2.50)
This definition may be elaborated by considering a perfect linear relationship between
r and t, that is, there is zero estimation error: r(x, y) − r(x, y) = 0, or SSE = 0.
Therefore from Equation 2.48, SSTD = SSR, that means correlation coefficient will
evaluate to ±1. On the other hand, if the random variables r and t are independent
of each other, the slope of the regression line will be zero: βrt = 0. In this case, the
48
y-intercept will turn out to be equal to the mean of r: αrt = µr, which results in
y(x, y)− µr = 0. Therefore, in this case, SSR = 0 and total deviation of r is due to
the estimation error: SSTD = SSE. In this case, correlation coefficient will evaluate
to zero.
In the following subsections, we will first derive the expressions for optimal linear
regression parameters, αrt and βrt, and then using these parameters we will derive
the formulation for correlation coefficient. In this formulation, correlation coefficient
is defined as the ratio of co-variance of r and t to the individual standard deviations
of r and t:
ρrt =σ2rt
σrσt. (2.51)
In the end of this subsection, we will present a proof for the fact that SSTD=SSE+SSR.
Derivation of Optimal Linear Regression Parameters
For a given pair of random variables, r and t, optimal regression line parameters
αrt and βrt may be defined as those which minimize sum of squared estimation error
given by Equation 2.43. These optimal parameters may be computed by taking partial
derivatives of SSE with respect to αrt and with respect to βrt.
∂
∂αrtSSE = 2
m∑x=1
n∑y=1
(αrt + βrtt(x, y)− r(x, y)). (2.52)
and∂
∂βrtSSE = 2
m∑x=1
n∑y=1
[(αrt + βrtt(x, y)− r(x, y))t(x, y)]. (2.53)
In order to minimize error, both of the partial derivatives given by Equations 2.52
and 2.53, must be set to zero:
mnαrt + βrt
m∑x=1
n∑y=1
t(x, y) =m∑x=1
n∑y=1
r(x, y) (2.54)
49
and
αrt
m∑x=1
n∑y=1
t(x, y) + βrt
m∑x=1
∑y=1
t(x, y)2 =m∑x=1
∑y=1
r(x, y)t(x, y). (2.55)
Equations 2.54 and 2.55 are also known as normal equations (Montgomery and Peck,
1982). We may solve these equations simultaneously to get a closed form solution
for the optimal parameters αrt and βrt. Equation 2.54 may be simplified by dividing
both sides with the total number of pixels in each image, mn:
αrt + βrtµt = µr. (2.56)
Substituting the value of αrt from Equation 2.56 in Equation 2.55:
(µr − βrtµt)m∑x=1
n∑y=1
t(x, y) + βrt
m∑x=1
∑y=1
t(x, y)2 =m∑x=1
n∑y=1
r(x, y)t(x, y). (2.57)
Rearranging the terms, we get
βrt =
m∑x=1
n∑y=1
r(x, y)t(x, y)− µrm∑x=1
n∑y=1
t(x, y)
m∑x=1
n∑y=1
t(x, y)2 − µtm∑x=1
n∑y=1
t(x, y), (2.58)
which may be further simplified to the following form:
βrt =
m∑x=1
∑y=1
(r(x, y)− µr)(t(x, y)− µt)
m∑x=1
n∑y=1
(t(x, y)− µt)2
. (2.59)
In terms of variance and covariance, βrt may be written as:
βrt =σ2rt
σ2t
, (2.60)
where σ2rt is the covariance of the random variables r and t and σ2
t is the variance of
independent random variable t. Equation 2.60 shows that the slope of the regression
line between random variables r and t is given by the ratio of covariance, σ2rt to
the variance of independent random variable, σ2t . If the two random variables are
50
If images r and t are Gaussian Distributed
𝐼(𝑟, 𝑡) = −12
log �1
2𝜋𝑒(1 − 𝜌2(𝑟, 𝑡)�
𝜃𝑧𝑢(𝑟, 𝑡) = cos−1 { 𝜌(𝑟, 𝑡)} 𝜌(𝑟, 𝑡) = cos {𝜃𝑧𝑢(𝑟, 𝑡)}
𝛥𝑧𝑢 = �2 − 2𝜌(𝑟, 𝑡)
𝜌(𝑟, 𝑡) = 1 −12Δzu2
𝜌(𝑟, 𝑡), Correlation Coefficient
𝜃𝑧𝑢(𝑟, 𝑡), Zero Mean and Unit Variance Normalized Angular
Distance
𝛥𝑧𝑢(𝑟, 𝑡), Zero Mean and Unit Variance Normalized Angular
Distance
𝜂(𝑟|𝑡) = 𝜌(𝑟, 𝑡) 𝜂(𝑟|𝑡), Correlation
Ratio
If images r and t have linear relationship
𝐼(𝑟, 𝑡), Mutual Information
𝜌(𝑟, 𝑡) = �1 − 2𝜋𝑒[1−𝐼2(𝑟,𝑡)]
Figure 2.2: Relationships of zero mean and unit variance normalized Euclidean dis-tance ∆zu, zero mean and unit variance normalized angular distance θzu, correlationratio η and mutual information I with correlation coefficient ρ.
independent, their covariance will be zero and hence the slope of the regression line
will also be zero.
The value of the second regression parameter, αrt, may be found by substituting the
value of slope from Equation 2.60 in Equation 2.56:
αrt = µr − µtσ2rt
σ2t
. (2.61)
Equation 2.61 shows that y-intercept of regression line is given by µr − βrtµt. The
value of βrt as given by Equation 2.60 and the value of αrt as given by Equation
2.61, minimize the sum of squared estimation error (SSE), for a given pair of random
variables, r and t. Note that if r and t are independent random variables, the slope
of the regression line will become zero, therefore we get: βrt = 0, and y-intercept will
be equal to the mean of the dependent variable: αrt = µr. In this case, correlation
coefficient between r and t will also become zero.
51
Deriving Correlation Coefficient Formulation
The estimation error terms, SSTD and SSR may be simplified under the assumption
of linear association between r and t. In the last subsection, Minimum Mean Squared
Error (MMSE) linear regression parameters were derived. By using Equations 2.60
and 2.61, the formulation of SSR given by Equation 2.47 may further be simplified
as follows:
SSR =m∑x=1
n∑y=1
(µr − µtσ2rt
σ2t
+σ2rt
σ2t
t(x, y)− µr)2 (2.62)
=σ4rt
σ4t
m∑x=1
n∑y=1
(t(x, y)− µt)2 (2.63)
= mnσ4rt
σ2t
(2.64)
Substituting the value of SSTD from Equation 2.45 and the value of SSR from Equa-
tion 2.64 in correlation coefficient definition given by Equation 2.49,
ρrt =
√mnσ4
rt
mnσ2rσ
2t
, (2.65)
which simplifies to
ρrt =σ2rt
σrσt(2.66)
The formulation of correlation coefficient as given by Equation 2.66 is the basic for-
mulation of correlation coefficient. All other formulations of correlation coefficient,
found in image processing literature, may be derived from this basic formulation.
The formulation of linear regression parameters as used for simplification of SSR term
is based on minimization of least square error criteria. However, in the presence of
outliers in the data, least square fit may not be the best way of estimation of these
parameters. Therefore, estimation of linear association by using correlation coefficient
may suffer due to the presence of outliers, such as salt and pepper noise. In such cases,
appropriate noise removal procedure such as median filtering may be recommended
before starting the image matching process.
52
Proving that SSTD=SSE+SSR
We have used this relationship in the derivation of correlation coefficient formula-
tion without actually proving it. In this subsection, we will present a proof of this
relationship. Expanding the SSTD definition given by Equation 2.44:
SSTD =m∑x=1
n∑y=1
(r(x, y)− r(x, y) + r(x, y)− µr)2
=m∑x=1
n∑y=1
(r(x, y)− r(x, y))2
+m∑x=1
n∑y=1
(r(x, y)− µr)2
− 2m∑x=1
n∑y=1
(r(x, y)− r(x, y))(r(x, y)− µr),
By using the definitions of SSE and SSR:
SSTD = SSE + SSR− 2SPT, (2.67)
where Sum of Product Terms (SPT) is given by
SPT =m∑x=1
n∑y=1
(r(x, y)− r(x, y))(r(x, y)− µr) (2.68)
The desired relationship may be proved if SPT term is equal to zero. For this purpose,
now we expand SPT term:
SPT =m∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y))(αrt + βrtt(x, y)− µr) (2.69)
Expanding and taking constant terms out of summations:
SPT = αrt
m∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y))+βrt
m∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y))t(x, y)
53
−µrm∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y)). (2.70)
From Equation 2.52, we get:
m∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y)) = 0. (2.71)
Therefore first and third terms in Equation 2.70 become zero. From Equation 2.53:
m∑x=1
n∑y=1
(r(x, y)− αrt − βrtt(x, y))t(x, y) = 0. (2.72)
Which makes the second term in Equation 2.70 zero, and proves that the full expres-
sion of SPT is zero:
SPT =m∑x=1
n∑y=1
(r(x, y)− r(x, y))(r(x, y)− µr) = 0, (2.73)
which completes the proof of the fact that SSTD is the sum of SSE and SSR terms:
SSTD = SSE + SSR (2.74)
Thus we may conclude that the similarity score produced by correlation coefficient
is actually an estimate of the strength of linear relationship between the two images
to be matched. Therefore, a perfect score will only be produced if the relationship
between the two images is perfectly linear. As the image to image relationship will
deviate away from linearity, correlation coefficient may not remain the best measure
of the strength of association. In such cases, other measures such as correlation ratio
may be used as the preferred similarity measure.
2.6 Correlation Ratio
As we have discussed in the last section, the similarity score produced by correla-
tion coefficient represents the strength of linear association between two images. If
54
association between two images is not linear, then the score produced with correla-
tion coefficient may actually be far less than the actual strength of association. For
example, consider that the association between two signals is given by the cosine
function, a(x) = cos(b(x)), where x is the sample index and there are n samples:
1 ≤ x ≤ n. Considering the values of b(x) from 0 to 2π and corresponding values of
a(x) = cos(b(x)) vary from +1 to -1 at discrete intervals, the means of the two signals
are µa = 0, µb = π. From Equation 2.26:
ρ(a, b) =
n∑x=1
(sin(b(x)))(b(x)− π)√n∑x=1
sin2(b(x))
√n∑x=1
(b(x)− π)2
= 0, (2.75)
which shows that there is no linear relationship between the signals. We may also
observe that standardized Euclidean distance ∆zu and angular distance θzu also
measure the strength of linear relationship. For the case of functional relationship
a(x) = cos(b(x)), standardized Euclidean distance given by Equation 2.12 is close to
maximum value of 2:
∆zu(a, b) =√
2 = 1.414, (2.76)
while angular distance, θzu(a, b) is 900, which means both vectors are found to be
orthogonal to each other and no similarity in direction of the two vectors has been
found. In general any functional association whose area under the curve is zero,
will yield ρ(a, b) = 0.00, ∆zu(a, b) =√
2 and θzu(a, b) = 90o, despite the existence
of a perfect functional relationship (Rietz, 1919). Thus angular distance, Euclidean
distance and correlation based measures expect a linear association between the two
images to be matched. In case of a strong functional relationship, all of these measures
fail to yield correct similarity score.
If similarity between two images is defined as the strength of association between
them, then for the case of functional associations, correlation ratio (η) is a preferred
similarity measure (Roche et al., 1998, 1999, 2000). Originally correlation ratio has
been used as a tool of variance analysis (Fisher, 1925). It is a measure of the asso-
ciation between dispersion within individual categories and the dispersion across a
55
whole sample of observations. Suppose a random variable r has a set of n observa-
tions, with variance σ2r and mean µr. Suppose the set of observations may be divided
into k categories. Let µi, i ∈ {1, 2, ...k}, be the mean of each category. The variance
of means of each category is given by:
σ2µ =
∑ki=1 ni(µi − µr)2
n, (2.77)
where ni is the number of observations in ith category. Correlation ratio may be
defined as the ratio of the standard deviation of k category means to the overall
standard deviation of n observations (Rietz, 1919):
η2(r|t) =σ2µi
σ2r
(2.78)
and alternatively in the notation of expectation, we may write (Roche et al., 2000):
η2(r|t) =V ar(E[r|k])
V ar(r), (2.79)
where V ar(·) is variance and E[r|k] is the conditional expectation that represents the
category means.
In order to apply correlation ratio for image matching, we may assume that each image
pixel may contain any of the total k intensity levels. The image r may be considered
as one set of observations and the image t as another set of observations. We may
divide r into k categories depending on the corresponding values in t. For example,
all pixels locations in r(x, y) which correspond to a category value of t(x, y) = 0 is one
category in r. Similarly, all pixel positions in r(x, y) which correspond to category
value t(x, y) = 1 is the second category of r. Since the pixel values in a log2(k) bit
image, t, may vary from 0 to k, there may be k possible categories in r. The category
mean is given by:
µi(r|t) =
∑mx=1
∑ny=1{r(x, y)|t(x, y) = i}
ni, i ∈ {0, 1, 2...k} (2.80)
56
where ni = count(t(x, y) = i) and correlation ratio is given by
η2(r|t) =
∑ki=0 ni(µi(r|t)− µr)2∑m
x=1
∑ny=1 (r(x, y)− µr)2
. (2.81)
Similarly we may also compute η(t|r) which require categories to be made in t based
on the corresponding values from r(x, y) = i, ni = count(r(x, y) = i),
µi(t|r) =
∑mx=1
∑ny=1{t(x, y)|r(x, y) = i}
ni, i ∈ {0, 1, 2...k}, (2.82)
and
η2(t|r) =
∑ki=0 ni(µi(t|r)− µt)2∑m
x=1
∑ny=1 (t(x, y)− µt)2
. (2.83)
Since the strength of functional regression from r to t may be quite different from
that of the regression from t to r, therefore correlation ratio no longer remains a
symmetric similarity measure, that is, in general ηrt(t|r) 6= ηrt(r|t). In the following
subsection, we will show how the definition of correlation ratio as given by Equations
2.78 and 2.81 is related with the strength of functional association.
2.6.1 Derivation of Correlation Ratio Formulation from Func-
tional Regression
Correlation ratio may be considered as a generalization of correlation coefficient,
because correlation ratio measures the strength of functional association, while cor-
relation coefficient just measures the strength of linear association.
Assuming t to be the independent random variable and r to be the dependent random
variable, and assuming the relationship between r(x, y) and t(x, y) to be a determin-
istic function f(·). For a given a value of t(x, y), we may estimate the value of r(x, y)
by using the function f(·):r(x, y) = f(t(x, y)), (2.84)
where r(x, y) is the estimate of r(x, y). The estimation error terms, Sum of Squared
Estimation Error (SSE), Sum of Squared Total Deviation (SSTD), and Sum of
57
Squared Regression (SSR), may also be defined for the case of functional regression.
Since we have to estimate values of r, from given values of t, the direction of regression
may be represented in the error terms by using the notation: SSE(r|t) and SSR(r|t).Since SSTD have no dependence on t, therefore it will be denoted by SSTD(r).
SSE(r|t) may now be defined as
SSE(r|t) =m∑x=1
n∑y=1
(r(x, y)− r(x, y))2 (2.85)
=m∑x=1
n∑y=1
(f(t(x, y))− r(x, y))2. (2.86)
The term SSTD(r) is the sum of squared total deviation of r(x, y) from its mean µr,
over all pixels of r:
SSTD(r) =m∑x=1
n∑y=1
(r(x, y)− µr)2 (2.87)
= mnσ2r (2.88)
The term SSR(r|t), is the sum of squared difference between estimated values of r
and the mean of r, at all pixels:
SSR(r|t) =m∑x=1
n∑y=1
(r(x, y)− µr)2 (2.89)
=m∑x=1
n∑y=1
(f(t(x, y))− µr)2. (2.90)
If each of the images r and t has log2(k) number of bits per pixel, the number of
discrete intensity levels any pixel in these images may have are k. In the template
image t, suppose each intensity level i occurs ni times. SSR(r|t) formulation given
by Equation 2.90 may be written in the form of intensity levels as follows:
SSR(r|t) =k∑i=0
ni(f(i)− µr)2 (2.91)
58
Since correlation ratio measures the strength of functional regression, it may also
be defined parallel to the definition of coefficient of correlation as a measure of the
strength of linear regression:
η(r|t) = ±
√SSR(r|t)SSTD(r)
. (2.92)
Squaring both sides we get:
η2(r|t) =SSR(r|t)SSTD(r)
. (2.93)
Substituting the values of SSR from Equation 2.91 and SSTD from Equation 2.87
η2(r|t) =
∑ki=0 ni(f(i)− µr)2∑m
x=1
∑ny=1(r(x, y)− µr)2
. (2.94)
Assuming perfect functional association between r and t, then for ith category in r
corresponding to the intensity level i in t, the category mean will be f(i):
µi(r|t) =
∑mx=1
∑ny=1{f(i)}ni
, i ∈ {0, 1, 2...k} (2.95)
or
µi(r|t) = f(i), i ∈ {0, 1, 2...k}. (2.96)
which may be substituted in the correlation ratio formulation given by Equation 2.94
η2(r|t) =
∑ki=0 ni(µi − µr)2∑m
x=1
∑ny=1(r(x, y)− µr)2
. (2.97)
By using Equation 2.77, the numerator is the variance of the category means and
denominator is the variance of r:
η2(r|t) =σ2µi
σ2r
. (2.98)
Hence the conclusion is, correlation ratio measures the strength of functional relation-
ship between r and t. For a perfect functional relationship, correlation ratio turns
59
out to be exactly 1.00, and for a weak functional relationship, it is close to zero. For
the case of perfect functional relationship, each category will have only one value,
equal to its mean. Therefore, the variance of category means will become equal to
the overall variance, or SSR(r|t) = SSTD(r). In case of no functional association
between two images, all categories will have same mean, therefore the variance of
category means will become zero or alternatively SSR(r|t) = 0, therefore correlation
ratio will also become zero.
2.6.2 Relationship between Correlation Ratio and Correla-
tion Coefficient
If the functional relationship between two images is linear, the formulation of correla-
tion ratio between these images converges to the formulation of correlation coefficient.
Let the linear relationship between images r and t is given by: r(x, y) = α+ βt(x, y).
The formulation of category means in r, µi(r|t) get simplified to the following expres-
sion:
µi(r|t) =
∑mx=1
∑ny=1{α + βi}ni
, ∀i ∈ {0, 1, 2...255} (2.99)
or
µi(r|t) = α + βi, ∀i ∈ {0, 1, 2...255}. (2.100)
Substituting it in the formulation of η2(r|t)
η2(r|t) =
∑255i=0 ni(α + βi− µr)2
mnσ2r
. (2.101)
Using the fact that: α + βµt = µr,
η2(r|t) =
∑255i=0 ni(α + βi− α− βµt)2
mnσ2r
. (2.102)
or
η2(r|t) =β2
σ2r
∑255i=0 ni(i− µt)2
mn. (2.103)
60
Since i is the intensity in image t and ni is the count of that intensity,
σ2t =
∑255i=0 ni(i− µt)2
mn, (2.104)
therefore
η2(r|t) = β2σ2t
σ2r
, (2.105)
substituting the value of β from Equation 2.55, it follows that
η(r|t) = ρ(r, t), (2.106)
which proves that correlation ratio is equal to correlation coefficient, if the association
between r and t is linear.
Thus we may conclude that correlation ratio is the more general form of correlation
coefficient, because correlation ratio measures the strength of functional relationship,
whereas correlation coefficient measures the strength of only linear relationships. In
many image matching applications, the relationship between images to be matched
is multi-valued functions or stochastic functions. In the presence of multi-valued
functional relationships, correlation ratio no longer remains the best choice because
it cannot measure the strength of multi-valued functional relationships. In such cases,
joint entropy and mutual information are the preferred similarity measures. These
measures are discussed in the following section.
2.7 Entropy and Mutual Information
As we have discussed in the last section, Euclidean distance, angular distance and
correlation based measures assume a linear association between the two images to
be matched. In case of a non-linear or functional association, these measures fail to
obtain a high similarity score. For such cases, correlation ratio has been proposed to
measure the strength of functional relationship. Correlation ratio assumes that the
function is deterministic or single valued. In case the association between two images
is multi-valued or non-deterministic, the similarity score generated by correlation
61
Brightness Constancy Assumption: City Block Distance Measure, Euclidean Distance Measure, Cross-Correlation
Only Brightness Variations Zero Mean Normalized City Block Distance Measure, Zero Mean Normalized Euclidean Distance Measure, Zero Mean Cross-Correlation
Only Contrast Variations: Unit Variance Normalized City Block Distance Measure, Unit Variance Normalized Euclidean Distance Measure, Normalized Cross Correlation
Linear Relationship: Zero Mean Unit Variance Normalized City Block Distance Measure, Zero Mean Unit Variance Normalized Euclidean Distance Measure, Correlation Coefficient (ρ)
Functional Relationship: Correlation Ratio
Probabilistic Relationship: Joint Entropy, Mutual Information
Figure 2.3: An application hierarchy of the commonly used image match measures.The most general match measures are in the outer most while the most restrictivematch measures are in the inner most circles.
62
ratio will be smaller than the actual similarity. As an example, suppose a random
variable a may assume three values, a ∈ {2, 4, 6} with equal probability and another
variable b may assume six values with equal probability, b ∈ {0, 5, 10, 20, 25, 30}. The
association between a and b is such that each value from a maps to two different
values from b, with equal probability: pr(a = 2|b = 0) = 1/6, pr(a = 2|b = 30) = 1/6,
pr(a = 4|b = 5) = 1/6, pr(a = 4|b = 25) = 1/6, pr(a = 6|b = 10) = 1/6, pr(a =
6|b = 20) = 1/6, and all other conditional probabilities are zero. The mean of each
of the three categories of b|a is 15: µi(b|a) = 15. The variance of category means will
be zero: σ2µi
= 0 and as a result η(b|a) = 0. In this specific example, although there
is a perfect multi-valued functional relationship, but correlation ratio is exactly zero,
which shows that in case of multi-valued functional associations, the score generated
by correlation ratio may be smaller than the actual strength of association.
In order to measure the strength of association between images having non determinis-
tic associations, entropy and mutual information based measures have been proposed.
Entropy is a measure of the dispersion of the image histogram. Entropy of an image
r, having total 256 intensity levels and size m× n pixels, is defined as
H(r) = −255∑i=0
pr(i) log pr(i), (2.107)
where pr(i) is the probability of intensity i, which is computed from the image his-
togram. If the intensity i occurs ni number of times, the probability pr(i) is given
by:
pr(i) =nimn
(2.108)
Similarly, the entropy of the image t is given by:
H(t) = −255∑i=0
pt(i) log pt(i) (2.109)
The entropy of individual images, H(r) or H(t), are also known as marginal entropies.
A distribution having probability concentrated in small regions has low entropy while
a dispersed distribution has high entropy. As an example, if an image has only one
intensity value at all pixels, the probability of that intensity will be 1.00 and entropy
63
of that image will be 0.00 bits. In contrast, if an image has all intensity values in
exactly equal number of pixels, then for 256 intensity levels the probability of each
level will be 1/256. pr(i) log(Pr(i)) = −8/256 for i ∈ {0, 1, 2...255}, and entropy of
this image will be maximum, 8.00 bits.
The joint entropy of two images is defined as:
H(r, t) = −255∑i=0
255∑j=0
prt(i, j) log prt(i, j), (2.110)
where H(r, t) is the joint entropy and prt(i, j) is the joint probability of the intensity
pair (i, j) such that r(x, y) = i and t(x, y) = j. We may estimate prt(i, j) by counting
the frequency of occurrence of each intensity pair and dividing the count of each pair
by the total pixel count:
prt(i, j) ≈count(r(x, y) = i, t(x, y) = j)
mn. (2.111)
If r and t have an association, the joint probability density plot will have only few
high probability concentrations along each row, where as in case of unrelated images,
probability will be equally distributed over all outcomes. The dispersion or concentra-
tion of the joint probability density function may be measured by using joint entropy.
A low value of H(r, t) shows a strong association between r and t, while a high value
of H(r, t) correspond to weak association. Relationship between joint entropy and
marginal entropies is given by:
H(r, t) = H(t) +H(r|t) = H(r) +H(t|r). (2.112)
In some image regions, the joint entropy may be low because of the low marginal en-
tropies, therefore a normalization with marginal entropy is required which is provided
by the mutual information of the two images.
Mutual information may also be used to measure the association between two images,
when the association is in the form of non-deterministic function. Mutual information
is a more robust measure than the joint entropy because it has a normalizing effect.
64
Mutual information may be defined as (T. M. Cover, 1991):
M(r, t) = H(r) +H(t)−H(r, t). (2.113)
A rearrangement of terms may yield the following formulation of mutual information
M(r, t) =255∑i=0
255∑j=0
prt(i, j) log2
prt(i, j)
pr(i)pt(j), (2.114)
which is also known as Kullback-Leibler distance between the joint and the marginal
distributions.
2.7.1 Relationship between mutual information and Corre-
lation Coefficient
For random variables having Gaussian probability distribution function, a relationship
between mutual information and correlation coefficient may be derived. If r and t are
Gaussian distributed random variables, then the probability of a particular intensity
i in r is given by:
p(r(x, y) = i) =1√
2πσre− 1
2(i−µr)2
σ2r , (2.115)
and probability of a particular intensity j in t is given by
p(t(x, y) = j) =1√
2πσte− 1
2(j−µt)
2
σ2t . (2.116)
Using the identity: ∫r2e−ar
2
=1
2a
√π
a, (2.117)
marginal entropies of r and t may be computed:
H(r) =1
2log
2(2πeσ2
r), (2.118)
and
H(t) =1
2log
2(2πeσ2
t ) (2.119)
65
The joint probability density function of two Gaussian distributed random variables
is given by:
p(r(x, y) = i, t(x, y) = j) =1√
2π|Σ|e−
12
(i−µr)Σ−1(j−µt)T (2.120)
where Σ is defined as:
Σ =
[σ2r σ2
rt
σ2rt σ2
t
]Using the identity 2.117, the joint entropy may be computed as follows:
H(r, t) =1
2log
2(2πe|Σ|) (2.121)
Using the marginal and the joint entropy expressions, mutual information of two
Gaussian distributed random variables turns out to be:
I(r, t) =1
2log
[2πe
σ2rσ
2t
|Σ|
](2.122)
This expression may further be simplified by using the fact that |Σ| = σ2rσ
2t − σ4
rt to:
I(r, t) = −1
2log
[1
2πe(1− ρ2
rt)
](2.123)
Since logarithm is a monotonic function, therefore one may easily conclude that for
two images which have Gaussian distribution, maximization of mutual information is
equivalent to the maximization of correlation coefficient. Image intensities, however,
are not often Gaussian distributed, and therefore the relationship between correlation
coefficient and mutual information does not hold in general.
2.8 Conclusion
In this chapter we have discussed some commonly used image match measures in-
cluding city block distance, Euclidean distance, angular distance, correlation based
66
Table 2.1: Perfect Score (√
) and Not-Perfect Score (×) produced by different Im-age Match Measures (IMM), if the images to be matched exhibit constant intensity,additive intensity variations, multiplicative intensity variations, linear associations,functional associations and probabilistic associations. Image match measures areSAD, SSD, Cross-Correlation(CC), NCC, correlation coefficient (ρ), correlation ratio(η), Joint Entropy (JE), and Mutual Information (MI).
IMM Constant Additive Multipl. Linear Functional ProbabilisticSAD
√× × × × ×
SSD√
× × × × ×CC
√× × × × ×
NCC√
×√
× × ×ρ
√ √ √ √× ×
η√ √ √ √ √
×JE
√ √ √ √ √ √
MI√ √ √ √ √ √
measures, correlation ratio, joint entropy and mutual information. Correlation coef-
ficient has been discussed in detail due to its larger significance as an image match
measure. Relationships of correlation coefficient with other match measures have also
been elaborated and are summarized in Figure 2.2. Relationship between correlation
coefficient and linear regression has also been discussed in significant detail.
For each image match measure, the underlying assumptions about the relationship
between the images to be matched are summarized in Figure 2.3. City block distance,
Euclidean distance, angular distance and cross correlation assumes that the intensity
values of the images to be matched remain exactly same for perfect score (Table
2.1). Zero mean and unit variance normalized city block distance, zero mean and
unit variance normalized Euclidean distance and correlation coefficient, all give a
perfect score even if there is a linear relationship between the images to be matched.
Correlation ratio assumes that there is a single valued functional relationship which
may be linear or non-linear, but must be deterministic and single valued to yield a
perfect score. Joint Entropy and Mutual Information based measures assume that
the relationship between the images to be matched may be multi-valued functional
relationship which may also be called as probabilistic relationship. Thus, the use of a
particular match measure in an image matching application strongly depends on the
type of the association between the images to be matched.
Chapter 3
COMPUTATIONAL ASPECTS OF COMMONLY USED
IMAGE MATCH MEASURES
The most common template matching process consists of comparing a small template
image against multiple search locations within a relatively larger reference image
and evaluating an image match measure. The search location which yields the best
similarity score may be selected as the best match location. Suppose the template
image t of size m× n pixels, has to be matched at all valid search locations within a
relatively large reference image r of size p × q pixels. For the purpose of matching,
the reference image R is considered to be divided into overlapping rectangular blocks
rio,jo , each of size m× n pixels, where (io, jo) are the coordinates of the first pixel of
the reference block. Each of the reference block rio,jo is a candidate search location.
If the match measure evaluated during each comparison is a distance measure, then
the best match location may be defined as the search location which yields minimum
distance over the entire search space:
imin, jmin = arg minio,jo
D(rio,jo , t), (3.1)
where D(·, ·) is the function computing distance between rio,jo and t. Alternatively,
if the image match measure computed during each comparison is a similarity mea-
sure, then the best match location will be defined as the search location exhibiting
maximum similarity over the entire search space:
imax, jmax = arg maxio,jo
S(rio,jo , t), (3.2)
where S(·, ·) is a function computing similarity score between rio,jo and t.
During template matching process, the search for the best match location is done
over the two translational parameters, (io, jo). Therefore, in the context of general
image registration problem, the template matching process is sometimes referred as
67
68
translation only image registration. In the generic image registration problem, the
search for best match location is done over the entire set of geometric transformation
parameters. For example, if the assumed transformation between rio,jo and t is affine,
the search for the best match location has to be done over four affine parameters in
addition to the two translational parameters, (io, jo).
TA =
a1 a2 io
a3 a4 jo
0 0 1
. (3.3)
If a projective transformation is assumed between rio,jo and t, the search for the best
match location has to be done over six projective parameters in addition to the two
translational parameters (io, jo)
TP =
a1 a2 io
a3 a4 jo
c1 c2 1
. (3.4)
The search for the best match in eight dimensional search space, may be written as:
rmin , minio,jo
(min
a1,a2,a3,a4,c1,c2D(TP (rio,jo), t)
), (3.5)
where TP (·) is the projective transformation function, which geometrically transforms
the input image.
As the dimensionality of the search space increases, the computational cost of match
measure also increases exponentially, this makes the process of image registration
practically intractable. Keeping in view the importance of image registration, signif-
icant efforts have been done to reduce the computational cost of match measure in
the form of fast computational techniques.
Existing techniques for fast computation of match measures may be divided into two
main categories, fast approximate techniques and fast exhaustive techniques. Fast
approximate techniques are based on, as the name implies, some approximation,
which cause a reduction in the computational cost of match measure but may incur
69
associated reduction in accuracy of finding the global maximum. Most of the fast
approximate techniques are implemented in the spatial domain, whereas fast exhaus-
tive techniques may be implemented in either the frequency domain or the spatial
domain. Fast exhaustive techniques guarantee the global maximum over the entire
search space to be found.
A lot of research efforts have been dedicated for the development of fast approximate
techniques with emphasis on relatively less deterioration in accuracy and more com-
putation reduction. Fast approximate techniques may further be divided into two
categories, based upon the strategy used for computation reduction: ‘Approximate
Search Space’ techniques and the ‘Approximate Image Representation’ techniques.
Most commonly used fast approximate techniques employ search space reduction by
approximating the actual search space with a smaller one, and thus reducing the
number of times match measure has to be evaluated. In these techniques, the cost
of one-time computation of match measure remains same. In the second class of ap-
proximate techniques, the cost of one time match measure computation is reduced by
approximating the template or the reference image with a simpler representation. In
the approximate image representation techniques, the actual match measure has also
been approximated with match measure which is simpler to compute.
The exhaustive accuracy spatial domain techniques may also be divided into two
categories including ‘Complete Computation’ techniques and ‘Computation Elimina-
tion’ techniques. Complete computation techniques employ efficient rearrangement
of match measure formulations to reduce the computational complexity by separating
the pre-computable terms from those which have to be computed at run time. By do-
ing so, the order of computational complexity may remain the same however the cost
of the operations with highest order of complexity may reduce. The second category
of the exhaustive accuracy techniques are the bound based computation elimination
techniques. In these techniques a significant amount of computation is skipped by
comparison of a theoretical bound on the match measure with the current known
maximum. This comparison actually discloses the fact that a specific search location
may not be able to compete with the already known best match location. These
techniques focus on skipping most of the computations while ensuring no change in
accuracy. Bound based computation elimination techniques may further be grouped
70
into two categories including ‘Complete Elimination’ techniques which discard entire
computations at a search location, and ‘Partial Elimination’ techniques which discard
a portion of the computation at a particular search location when the unsuitability
of that search location is established.
3.1 Fast Approximate Image Matching Techniques
Fast approximate techniques reduce the computational complexity of image matching
by making different types of approximations, which may be divided into two cate-
gories. The first category includes approximations of the search space with a smaller
search space, and the second category includes image approximations with simpler
representations along with match measure approximations with simpler match mea-
sures. All these approximations cause a reduction in the accuracy of image matching
process.
3.1.1 Search Space Approximation Techniques
Search space approximation techniques include most of the commonly used fast ap-
proximate algorithms. For better comprehension, these techniques are further subdi-
vided into small search space and the large search space techniques.
Small Search Space Techniques
Most of the research on small search space approximate image matching techniques
has been done in the perspective of block motion estimation for temporal redundancy
reduction in the video encoders. In software-based video encoders, the computa-
tional cost of block motion estimation comprises almost 50% to 70% of the total
cost (Shanableh and Ghanbari, 2000). Therefore many fast search methods have
been proposed to reduce the computational complexity of the block motion estima-
tion (Huang et al., 2006a). The emphasis of all of these methods is to reduce the
number of search points by selectively checking the match measure at only a few
71
positions. Search space approximation is based upon the assumption that the image
match measure monotonically varies towards the global maximum which is close to
the starting positions. Following are the most referred techniques that fall in this
category:
i- Two Dimensional Logarithmic (TDL) search has been used by (Jain and Jain,
1981) to track the direction of the minimum of Sum of Squared Differences (SSD)
match measure. In this technique, in the first step, the match measure is com-
puted at only five initial positions. These five positions consist of one position at
the center of the search space, and the four positions are in the four directions,
left, right, up and down the central position, at a half distance in each direction.
In the second step, three more positions are searched in the direction of the min-
imum SSD as found in the first step. The step size is then halved and the above
procedure is repeated until the step size becomes unity. In the last step, all the
nine positions are searched. As an example, for a search window size of 11× 11
pixels, ±5 pixels from the center, only 13 to 21 positions may be required to be
searched as opposed to 121 positions required in the full search approach.
ii- Cross Search Algorithm (CSA) has been proposed by Ghanbari (1990). In CSA,
in the first step, match measure is computed at five search locations, including the
central location in the search window and the four locations in the four diagonal
directions, in the form of a cross (×), at half-way from center to the corner of the
search window. In the following step, four new evaluations are done, each at a
half step-size distance, around the position with minimum distortion value. The
same process continues until the step size reduces to one pixel. For a maximum
step size of w pixels, the total number of computations becomes 5 + 4 log2w
locations, for w ≥ 1.00.
iii- Three Step Search (TSS) technique proposed by Koga et al. (1981), has been used
to compute motion displacement up to 6 pixels per frame. In this technique, in
the first step, match measure is computed at nine positions including the central
position and the eight surrounding positions at half way in each of the eight
principal directions. At the position with minimum distortion, the search step
size is halved and the next eight new positions are searched. As an example,
72
consider a search window of size 23 × 23 pixels, or ±11 pixels in the vertical
and horizontal directions. In the first step, match measure will be computed at
nine positions including the central position (0, 0) and eight positions at (±6,±6)
pixels, in the eight directions. In the second step, eight more evaluations are done
at (±3,±3) around the minimum distortion position found in the step one. In the
third step, eight more evaluations follow at (±1,±1) pixels around the minimum
distortion position as found in step two. Thus, in TSS technique, instead of 529
evaluations in full search, only 25 evaluations are done. Improvements over the
basic algorithm has been proposed in the form of New Three Step Search (NTSS)
(Li et al., 1994) and Four Step Search (FSS) (Po and Ma, 1996).
iv- Orthogonal Search Algorithm (OSA) has been proposed by Puri et al. (1987). In
OSA, each step consists of two stages, a horizontal stage followed by a vertical
stage. In the first step, the match measure is evaluated at the center of the
search window and at two points in the horizontal direction at half-way from
the center to the end of the search window. Two more evaluations are done in
vertical direction around the position of minimum distortion in the horizontal
direction. In the following step, same procedure is repeated with the step size
reduced by half. Since in each new step, the match measure is evaluated at only
four new locations, the total number of evaluations is 1 + 4 log2w, where w is the
initial step size. Therefore, OSA may be considered as the fastest algorithm in
the category of small search space approximate techniques.
v- Modified Motion Estimation Algorithm (MMEA) has been proposed by Kappa-
gantula and Rao (1985). In MMEA, if distortion value at center of the search
space (i, j), is less than a minimum threshold, the search is stopped and the block
is marked as unchanged. Otherwise, the match measure is evaluated at four new
locations: (i−4, j), (i, j+4), (i+4, j), and (i, j−4), assuming a search window of
±7 pixels. If the minimum from these four locations is larger than the distortion
at central location, then the algorithm proceeds to the next step, otherwise the
minimum vale at the central location is used as the best available, and the search
process is stopped. Assuming that from the previous step the pixel position that
had the minimum mismatch was at (i−4, j). Further positions to be searched are
(i− 4, j − 4) and (i− 4, j + 4). Therefore, during the first step, match measure
73
is evaluated at seven locations. In the following steps, the same procedure is
repeated with half the step size. Therefore, with this method for w = 7 pixels,
only 19 evaluations are required.
vi- Conjugate Direction Search (CDS) (Srinivasan and Rao, 1985): All small search
space approximate algorithms, discussed in this subsection try to find a line in the
search space along which the minimum value of distortion function may be found.
In the case of two parameter search, the minimum value along one parameter may
be found first and then the minimum value along the second parameter may be
searched. This type of approach is called as ’One at a Time Search’ (OTS).
Conjugate Direction Search (CDS) is an extension of OTS technique. For two
variable functions to be minimized, CDS obtains two conjugate direction vectors.
Search is done along each direction using OTS approach. An improvement over
the basic algorithm has been proposed by Fast One Step Search (FOSS) algorithm
by Ramachandran and Srinivasan (2001).
If (i, j) is the center of the search space, the first step of the CDS algorithm
proceeds by evaluating the match measure at three positions (i, j), (i, j+ l), and
(i, j− l), and the position with minimum distortion is found. Suppose if (i, j+1)
position yield minimum distortion, (i, j + 2) is computed and the minimum of
(i, j), (i, j + l), and (i, j + 2) is searched. If minimum is found to be between
two high values, search in this direction will stop, otherwise further positions are
checked in the same direction. In the next step of CDS algorithm, the search
continues in the j-direction, similar to the first step.
vii- Diamond Search (DS) algorithm has been proposed by Zhu and Ma (2000). In
this algorithm, the match measure has been evaluated at the center of the search
space and at eight locations around it using the Large Diamond Search Pattern
(LDSP). If minimum value of distortion is observed at the central position, then
match measure is evaluated at six more locations around the center in the form
of Small Diamond Search Pattern (SDSP). Otherwise if minimum of LDSP occur
at some outer point, LDSP is shifted around the minimum distortion position.
Thus, in all of the small search space approximation techniques, the search space is
assumed to be quite smooth and the global maximum is also assumed to be close
74
enough to the initial starting point. If the search space is not smooth, or the position
of the global maximum is far enough from the initial starting point, then these tech-
niques may get stuck in the intermediate local maxima and fail to reach the actual
global maximum. In such cases, the large search space approximate techniques are
preferred over the small search space techniques. The large search space techniques
are discussed in the following section.
Large Search Space Techniques
The small search space techniques as discussed in the last subsection are based on the
assumption of monotonic match measure variation in the proximity of a maximum.
This assumption often causes false estimations in the presence of large displacements
and a larger search region. Large search space techniques have been developed to
handle large displacements, larger search regions and non-smooth variation of match
measure due to the presence of local maxima. Some of the commonly used large
search space techniques are discussed in the following paragraphs:
1. In order to detect large object motion, Hierarchical Block Matching (Bier-
ling, 1988), which is also commonly known as Coarse-to-Fine template match-
ing(A. Rosenfeld, 1977; Burt and Adelson, 1983) has often been used. In coarse
to fine matching, both of the template and the reference images are low-pass
filtered and sub-sampled multiple times. The resulting sequences of images with
reducing sizes are known as image pyramids. The smallest sized images, also
considered as coarse image representation, are at the top level of the pyramids
and the largest sizes having maximum detail are at the lowest pyramid levels.
At higher levels of the pyramids, the motion speed also reduces in accordance
with the total amount of sub-sampling done till that level.
Image matching is initially done across the top level images in the template
image and the reference image pyramids. The best match position found at
the top level is propagated to the next lower level. At the lower level, match
measure is evaluated at only few locations around the expected location of
maximum. The same procedure is repeated for each of the next level, until
75
the lowest pyramid level is reached. In some implementations, the intermediate
levels are skipped and the best match found at the top pyramid level is directly
propagated to the original image.
2. To further speed up coarse-to-fine template matching scheme, an approximate
algorithm using Walsh transform has been proposed (Nillius and Eklundh,
2002). In this technique, the template image and each of the search locations
is efficiently projected on to the Walsh basis using binary tree of filters. Image
match measure is computed using only a part of Walsh basis. The performance
of the coarse to fine scheme using Walsh basis has been studied and it has been
reported that for only 1% loss of accuracy, a speed up of 9% to 23% may be
obtained.
3. Two-stage template matching has been proposed by Vanderburg and Rosenfeld
(1977) using Sum of Absolute Differences (SAD) as the match measure. In
the first stage of this algorithm, only a part of the template image, called sub-
template, is matched at all search locations. In the second stage, the remaining
portion of the template is matched at selected search locations, exhibiting SAD
value less than a specific threshold in the first stage. The algorithm may fail to
detect the presence of an object or may also make false detections.
The two-stage template matching algorithm has been extended for normalized
cross-correlation (NCC) by Goshtasby et al. (1984). The basic algorithm re-
mains same, while instead of SAD, NCC has been used. NCC based two-stage
template matching algorithm is also an approximate algorithm, with non-zero
probability of missing NCC maximum.
4. Sun et al. (2003) have proposed Correlation-based Adaptive Predictive Search
(CAPS) algorithm for fast template matching in the large search space. In
CAPS algorithm, the search space is sub-sampled based on the width of the
auto-correlation function of the template image. Horizontal and vertical width
of the autocorrelation function is the distance in the horizontal and the vertical
directions, for which autocorrelation remains higher than a predefined threshold
value. Search space is sub-sampled in both directions at half of the horizontal
and the vertical width. Once a position with correlation higher than a specific
76
threshold is found, full search is carried out in the neighborhood of this position.
The large search space techniques as discussed in this subsection reduce the computa-
tional cost of image matching by approximating the actual search space with a smaller
search space. The next category of fast approximate techniques reduces the compu-
tational cost by approximating image representation with a simpler representation
and also approximates the match measure formulation with a simpler formulation.
3.1.2 Algorithms Using Approximate Image Representations
In approximate image representation algorithms, either the template image or the
reference image or both are approximated with simpler representations. In order to
efficiently compute the image match measure, approximate formulations of the match
measures have also been proposed along with each technique.
1. Briechle and Hanebeck (2001) have approximated the template image as sum
of rectangular basis functions. The correlation is computed for each of the
basis functions instead of the original images. The final value of the correlation
has been computed as the weighted sum of the correlations of the individual
basis functions. The execution time speed up has been obtained by reducing
the number of basis functions, which increases the approximation. Moreover,
instead of using the actual definition, an approximate correlation formulation
has also been proposed.
2. Yoshimura and Kanade (1994) have used Karhunen-Loeve transform to obtain
eigen images of a set of rotated templates. If the set of eigen images is smaller
than the set of templates, computations may be saved by using this alternate
representation. In order to compute normalized correlation between eigen im-
ages and the reference image, an approximate formulation of the normalized
correlation has also been proposed. Further computation reduction has also
been obtained by employing coarse to fine strategy.
3. Schweitzer et al. (2002) have efficiently computed the least squares approxima-
tion polynomials for each search location in the reference image, by using the
77
integral images proposed by P.Viola and Jones (2001). Computational cost of
estimating the best fit polynomial increases with the order of the polynomial.
It has been experimentally shown that order two polynomials provide enough
approximation required for image matching. An approximate formulation of
normalized correlation has also been proposed to compute the match measure
efficiently with the newly propose reference image representation. In this algo-
rithm, the template image has been used without approximation.
4. In order to reduce the computational cost of the image matching, both the tem-
plate and the reference images may be approximated by one bit per pixel binary
representations. The computational cost is reduced because the computations
are done on one bit data instead of the eight bits. Conversion from 8 bit per
pixel, gray scale image, to one bit per pixel, binary image, may be done by using
a global thresholding scheme as proposed by N.Otsu (1979). However, in global
thresholding scheme, important details from some of the image regions may
be lost, therefore an adaptive local thresholding scheme have been proposed
by NiBlack (1986).
The image match measure used for the binary images is based on bitwise XNOR:
γb(r, t) =m∑x=1
n∑y=1
rb(x, y)⊕ tb(x, y), (3.6)
where rb and tb are the binary images each of size m× n pixels, converted from
the gray scale images r and t. The operator ⊕ represents the binary function
XNOR. In case of translational parameter search, best match location is one at
which γb(r, t) is maximum.
Thus we may conclude that approximate image matching techniques obtain fast speed
up at the expense of the loss in accuracy. If exhaustive equivalent accuracy is required,
approximate techniques may not be used. The fast exhaustive accuracy algorithms
have been proposed in both frequency domain and spatial domain. In the following
section, fast exhaustive algorithms in frequency domain are discussed.
78
3.2 Fast Exhaustive Accuracy Image Matching in
Frequency Domain
In many cases, exhaustive computation of cross-correlation, NCC, and ρ has been
most efficiently done by using frequency domain transformation of the template and
the reference image. Once correlation based measures are efficiently computed, Eu-
clidean distance based measures may also be found by using the relationships dis-
cussed in Chapter 2. In order to transform images from spatial domain to frequency
domain, Discrete Fourier Transform (DFT) has often been used.
Considering the template image t of size m × n pixels to be matched at all search
locations in the larger sized reference image r of size p × q pixels. Two dimensional
DFT of the template image may be defined as:
T (u, v) =1
MN
M∑x=1
N∑y=1
t(x, y)e−j2π(uxM
+ vyN
), (3.7)
where T (u, v) is the transformed template image and (u, v) are the index locations
in frequency domain. The parameters M and N are defined as M = m + p − 1 and
N = n+ q − 1. Since m < M and n < N , the extra image space is filled with zeros,
commonly known as zero padding.
The 2-D Discrete Fourier Transform may also be computed by using two 1-D trans-
forms by using the separability property of Fourier Transform. First 1-D transforma-
tion may be done in the direction of rows only
T (x, v) =1
N
N∑y=1
t(x, y)e−j2π( vyN
), (3.8)
and the second transformation may then be done in the direction of columns only
T (u, v) =1
M
M∑x=1
T (x, v)e−j2π( vyN
). (3.9)
The computational cost of the 2-D transformation is significantly reduced if computed
79
by using two 1-D transforms. Similarly 2-D transformation of the reference image may
also be computed in two steps, first transformation along the rows only and the second
transformation along the columns only.
3.2.1 Fast Fourier Transform (FFT) Algorithms
The straight forward computation of DFT may require computational complexity of
the order of O(M2N2), however a large volume of fast computational algorithms have
been developed. Following are often used categories of the Fast Fourier Transform
(FFT):
1. Radix-2 Algorithms: In these algorithms, the original problem has been suc-
cessively broken down into problems of smaller sizes and then bottom up com-
putation schemes have been used to speed up the computations. The basic
implementations of these algorithms were developed by Danielson and Lanczos
(1942) and later by Cooley and Tukey (1965). These implementations require
each image dimension, the rows M and the columns N , to be in exact powers of
2. Therefore, M and N are selected as M = 2dlog2(m+p−1)e and N = 2dlog2(n+q−1)e,
where de represent the ceiling function. The computational cost of DFT if com-
puted by the FFT algorithms, reduces to O(MN log2(MN)). Using the separa-
ble property of Fourier transform, the complexity of the 2-D transform further
reduces to O(max(MN log2M,MN log2N)).
2. Mixed Radix algorithms (Singleton, 1969) do not require the image size to be
in exact powers of 2. These algorithms successively divided the problem to
the smallest prime factor along the respective image direction. Mixed radix
algorithms may be considered as a generalization of the radix 2 algorithms,
because these algorithms can break down any composite size to its factors. In
these algorithms, if the smallest prime factor is quite small, these algorithms
compute the transform efficiently, while if the smallest prime factor of N (or
M) is quite large, the performance deteriorates accordingly. In worst case,
if N (or M) is a prime number, then no subdivision of the problem is possi-
ble. Therefore the complexity of transformation will increase to the original
80
complexity of O(max(MN2, NM2)) instead of the reduced complexity of FFT,
O(max(MNlog2N,MNlog2M)).
3. The Prime Factor Algorithms: (Good, 1960; Thomas, 1963; Chan and Ho, 1991)
Another related class of FFT algorithms is Prime Factor FFT algorithms which
efficiently perform transformation if the radix is a prime number. However,
these algorithms do not perform well for even numbered radices.
4. The Split Radix Algorithms: These algorithms rearrange computations in the
basic FFT implementation of Cooley and Tukey (1965) by blending radix 2 and
radix 4 to achieve fast speed up. Some of the well known split-radix algorithms
have been proposed by (Yavne, 1968; Duhamel and Hollmann, 1984; Vetterli
and Nussbaumer, 1984; Sorensen et al., 1986; Duhamel and Vetterli, 1990). A
new radix-2/8 split radix FFT algorithm has been proposed by Bouguezel et al.
(2004).
Numerous FFT implementations are available publicly, both commercially as well as
freely, over various hardware platforms. Most of the well known implementations have
been bench-marked by M. Frigo and S. G. Johnson for the accuracy and the speed
up point of view by using their software named benchFFT. The benchmark results of
all these implementations over different types of commonly used hardware platforms
are available at their web site: http://www.fftw.org/benchfft/. In the following
paragraphs, we will briefly discuss only two freely available FFT implementations,
which we have used for execution time comparisons in the later chapters.
A simple FFT routine with minimal interface has been provided by William et al.
(2007). This is a simple sequential implementation of the radix-2 FFT as proposed
by Cooley and Tukey (1965). This routine has been originally written by Rader
and Brenner (1976). This FFT routine may be considered as a base line of the
FFT algorithms as it does not exploit the optimizations for example split radix or
parallelism.
A comprehensive collection of fast C routines for computation of Discrete Fourier
Transform is freely available from www.fftw.org, as Fastest Fourier Transform in
the West (FFTW) (Frigo and Johnson, 1998, 2005; Johnson and Frigo, 2007). The
81
recent version of FFTW has been named as FFTW3, which is based upon Cooley-
Tukey algorithm, and also uses prime factor algorithm, Raders algorithm for prime
sizes, and a split-radix algorithm as well. The input data to FFTW3 may have any
arbitrary length, including prime numbers. FFTW3 adapts the DFT algorithm to
the underlying hardware in order to maximize the performance. FFTW3 also utilizes
SIMD instructions which perform same operation on all elements in a data array,
in parallel. The computation of FFTW3 has been split into two phases a planner
learns the fastest way to compute the transform on a given hardware and makes a
plan, which is then executed in next phase, to transform the input. FFTW3 interface
has been organized into three levels of increasing complexity, the basic interface, the
advanced interface, and the guru interface for the expert users.
We have compared the execution time of FFTW3 based image matching techniques
based on convolution theorem, as discussed in the following section, with partial
correlation elimination algorithms discussed in Chapters 6 and 7. Note that the
implementation of PCE algorithms is sequential, without using any parallelism or
hardware specific optimization. Therefore, the comparison of PCE with FFTW3
appears unjustified; however we observe that even then, PCE has remained faster
than FFTW3 in many cases. The details of these comparisons and the results are
discussed in the Chapters 6 and 7.
3.2.2 Image Matching by Correlation Theorem
During the process of cross correlation computation, the template image is matched
with each valid search location, where a valid search location is a block in the reference
image having same size as that of the template image size. For a particular position,
cross correlation is computed by multiplying corresponding pixels and then adding
the result. The equation of cross correlation was given in Chapter 2, may be repeated
here for easy reference:
ψ(r, t) =m∑x=1
n∑y=1
r(x, y)t(x, y). (3.10)
82
Convolution is a very similar process to the computation of cross correlation. Con-
volution between the reference image r and the template image t may be written
as:
Cr,t(io, jo) =1
mn
m∑x=1
n∑y=1
t(x, y)r(io − x, jo − y). (3.11)
In this equation, (io, jo) shows a particular displacement, the minus signs shows that
the signal r is flipped about the origin, (io, jo) in this case. This flipping is inherent in
the definition of the convolution due to its interpretation as a way to compute output
of an LTI system via its impulse response. The convolution process, as described
by Equation 3.11, may be summarized as flipping one image about the origin, then
shifting that image with respect to the other by changing the values of (io, jo) and
computing sum of products over all values of (x, y). For more details about convo-
lution, any digital image processing text, for example (Gonzalez and Woods, 2002),
may be consulted.
The convolution theorem states that convolution in spatial domain is equivalent to
the point by point multiplication in the frequency domain
r(io, jo) ∗ t(io, jo)⇐⇒ R(u, v)T (u, v), (3.12)
and convolution in frequency domain is equivalent to point by point multiplication in
spatial domain:
R(u, v) ∗ T (u, v)⇐⇒ r(io, jo)t(io, jo), (3.13)
To establish the link between correlation in spatial and frequency domains, we observe
that the general formulation of correlation is given by:
ρr,t(io, jo) =1
mn
m∑x=1
n∑y=1
t∗(x, y)r(io + x, jo + y). (3.14)
where t∗ denotes the complex conjugate of t. The positive signs in this equation in
the indices of r(io + x, jo + y) shows that r is not mirrored around the origin. Using
the similarity of correlation and convolution formulations, correlation theorem has
been defined as
r(io, jo) ◦ t(io, jo)⇐⇒ R∗(u, v)T (u, v), (3.15)
83
where R∗(u, v) shows the complex conjugate of the frequency domain representation
of image r. Equivalent dual form of correlation theorem follows from the duality
property of Fourier Transform:
R(u, v) ◦ T (u, v)⇐⇒ r∗(io, jo)t(io, jo). (3.16)
Using correlation theorem, cross correlation may be computed in frequency domain
as follows:
1. Take Fourier Transform of the images r and t
R(u, v) = F{r(x, y)} (3.17)
T (u, v) = F{t(x, y)} (3.18)
2. Compute complex conjugate of any one of the two images in frequency domain
R∗(u, v) = conj(R(u, v)) (3.19)
3. Compute Point by point complex multiplication of T (u, v) and R∗(u, v)
P (u, v) = T (u, v)R∗(u, v) (3.20)
4. Take the inverse Fourier transform of the product
r(io, jo) ◦ t(io, jo) = F−1{P (u, v)} (3.21)
Note that to perform point-by-point multiplication both R and T must be of the
same size. Therefore, it is necessary to zero pad both r and t before taking their
Fourier transforms. Zero padding also helps in avoiding undesirable overlap of one
image with the next period of the other image. This is because Fourier transform
considers both images as two dimensional periodic signals. Therefore the template
image must be zero padded equal to the size of reference image and reference image
must be zero padded equal to the size of template image. For template image of size
84
m× n pixels and reference image of size p× q pixels, the zero padded images will be
of size (p+m− 1)× (q + n− 1).
The computation of cross-correlation is straight forward by using the correlation
theorem. However, the computation of normalized cross correlation (NCC) given by
Equation 2.25 and correlation coefficient (ρ) given by Equation 2.29 needs separate
computation of normalization parameters. At each search location, the computed
value of cross correlation is normalized by the separate computed parameters. The
cost of correlation coefficient may be reduced by rearranging the formulation given
by Equation 2.28. Since FFT transformation involves computation on real numbers,
therefore if one or both of the images are converted from digits to real numbers,
computational complexity will remain same. The template image t may be zero mean
and unit variance normalized by subtracting mean µt and by dividing the standard
deviation term:
tzu(x, y) =t(x, y)− µt√∑m
i=1
∑nj=1(t(i, j)− µt)2
, (3.22)
and Equation 2.28 may be written as
ρ(r, t) =1√
m∑i=1
n∑j=1
(r(i, j)− µr)2
m∑x=1
n∑y=1
r(x, y)tzu(x, y), (3.23)
which shows that cross correlation between r and tzu may be computed by using
correlation theorem and the resulting value normalized by reference image standard
deviation term will generate correlation coefficient.
3.2.3 Image Matching by Phase Only Correlation
Phase only correlation method has been investigated by many researchers (Kuglin
and Hines, 1975; Reddy and Chatterji, 1996; Foroosh et al., 2002) and may also be
found in digital image processing text books, for example see (Pratt, 2007). Phase
only correlation method may only be used for translation only image registration
applications. If the images t and r are just translated versions of each other, such
85
that the shift is (xo, yo),
r(x, y) = t(x− xo, y − yo) (3.24)
then by the shift property of Fourier transform, frequency domain representations of
the images will be related by a complex exponential term
R(u, v) = T (u, v)e−i(uxo+vyo), (3.25)
where R is Fourier transform of r and T is Fourier transform of t. The phase shift
term may be computed by using the cross power spectrum of the two images:
G(u, v) =R(u, v)T ∗(u, v)
|R(u, v)T (u, v)|= e−i(uxo+vyo) (3.26)
The amount of shift may be found by taking the inverse Fourier transform of the
function G(u, v)
F−1{G(u, v)} = δ(x− xo, y − yo). (3.27)
After taking the inverse Fourier transform, a peak in the spatial domain shows the
position of shift (xo, yo).
Accuracy studies of phase only correlation has been done by Manduchi and Mian
(1993) in comparison with cross correlation for input images corrupted with additive
white Gaussian noise. It has been reported that the phase correlation technique is
more sensitive to noise as compared to the direct cross correlation technique both
for the low pass signals and the high pass input signals. Less accuracy of phase only
correlation technique has also been reported by others, for example (Caelli and Liu,
1988).
The computational cost of correlation computation by convolution theorem as well
as by phase only correlation involves two Fourier transforms in the forward direction,
each having complex computations of the order of (p+m−1)(q+n−1) log2(p+m−1) + (p + m − 1)(q + n − 1) log2(q + n − 1) = (p + m − 1)(q + n − 1) log2((p + m −1)(q+n−1)) for the case of convolution theorem and pq log2 p+pq log2 q = pq log2(pq)
for the case of phase only correlation. The domain transformations are followed by
complex multiplications of the order of O((p + m − 1)(q + n − 1)) for correlation
86
theorem and O(pq) for the case of phase only correlation. Phase only correlation also
require computation of magnitude |R(u, v)T (u, v)| and then pixel by pixel division
of R(u, v)T ∗ (u, v) by the magnitude. Each of these operations has a computational
complexity of the order of O(pq). In both methods, one domain transformation in
the inverse direction having same complexity as the transformation in the forward
direction is also required. The computational cost of computing complex conjugate
may be considered negligible. The dominant computational complexity of correlation
computation by convolution theorem remains O((p+m− 1)(q+n− 1) log2((p+m−1)(q+n−1))), and for phase only correlation, the dominant complexity has remained
O(pq log2(pq)). Both of these complexities are quite smaller than the complexity
of correlation computation in spatial domain, O(pq × mn). Therefore correlation
computation has often been done in the frequency domain by using FFT for domain
transformation.
Often correlation coefficient implementations are based on Fast Fourier Transform
(FFT) and significant efforts have been made to reduce the time complexity of FFT.
However, as the template size reduces, the computational advantage of frequency
domain over spatial domain decreases and for small template sizes, spatial domain
implementations become faster, (see Pratt (2007) and Lewis (1995)). This is because
of the fact that, for small template sizes, the overheads involved in frequency domain
computations become significantly larger than the direct computational cost in the
spatial domain. Another scenario in which FFT based implementation may not be
efficient, is finding point correspondences between two images. Each feature from one
image has to be correlated at only a few locations in the second image, often selected
by a corner detection algorithm. This may be efficiently computed in spatial domain
while in frequency domain, complete computations at all search locations have to be
performed.
Thus, despite the availability of efficient FFT routines, spatial domain computation
of correlation is still of significant practical importance. Spatial domain exhaustive
accuracy algorithms discussed in the following section are also important because
these algorithms provide a base for the development of more efficient spatial domain
algorithms, which are bound based computation elimination algorithms. The main
contributions of this thesis also fall within the category of bound based computation
87
elimination algorithms. The discussion on bound based computation elimination
algorithms will follow the discussion on complete computation fast spatial domain
algorithms in the next section.
3.3 Fast Exhaustive Spatial Domain Techniques
The straight forward way of computing the image match measures discussed in Chap-
ter 2, is to perform complete computations in spatial domain, achieving exhaustive
accuracy. In order to speed up these exhaustive implementations, different tech-
niques have been used. Frequently used techniques include efficient rearrangement of
the match measure formulation, efficiently pre-computing the normalization parame-
ters using the integral image approach or by using the running sum approach. These
techniques are discussed in more detail in the following subsections.
3.3.1 Efficient Rearrangement of Match Measure Formula-
tion
In many cases, spatial domain formulation of a match measure may be rearranged
such that the number of operations with highest order of complexity may be reduced.
Different terms in the match measure formulation may have different order of compu-
tational complexity. An effective rearrangement will separate lower complexity terms
and the higher complexity terms such that the number of operations with higher
complexity terms may be reduced to as fewer as possible. As an example, consider
the formulation of correlation coefficient as given in chapter 2, repeated here for ease
of reference:
ρ(r, t) =
m∑x=1
n∑y=1
(r(x, y)− µr)(t(x, y)− µt)√m∑x=1
n∑y=1
(r(x, y)− µr)2
√m∑x=1
n∑y=1
(t(x, y)− µt)2
. (3.28)
88
In this formulation, the mean and the variance terms for the template image need to
be computed only once for one specific template image, matched over p× q search lo-
cations. Repeated computation of these two terms may be easily avoided if computed
once and stored for repeated usage.
For the reference image, the mean and the variance terms related to the search lo-
cations have to be computed once for each search location, therefore these terms are
computed p × q times if the size of the reference image is p × q pixels. If multiple
templates are to be matched with one reference image, repeated computation of these
terms may be easily avoided by computing only once and stored in the memory for
repeated usage. The reference image related terms may also be efficiently computed
by using the integral image approach or by using the running sum approach, discussed
in the following subsections.
If the mean and the variance terms are available from pre-computations, in the for-
mulation of correlation coefficient as given by Equation 3.28, in the numerator, four
operations have computational complexity of the order of O(mnpq), including two
real number subtractions, one real number multiplication and ten one real number
addition. A rearrangement of numerator term in Equation 3.28 may yield a more
computationally efficient form:
ρ(r, t) =
m∑x=1
n∑y=1
r(x, y)t(x, y)−mnµrµt√m∑x=1
n∑y=1
r2(x, y)−mnµ2r
√m∑x=1
n∑y=1
t2(x, y)−mnµ2t
. (3.29)
In this formulation only one integer multiplication and one integer addition has the
computational complexity of the order of O(mnpq). Ignoring the computational cost
of mean and variance terms, all remaining operation in Equation 3.29 have computa-
tional complexity of the order of O(pq) which is significantly smaller than the order
O(mnpq).
Thus efficient rearrangement of match measure formulation may reduce the compu-
tational cost significantly. The other techniques to speed up complete computation
methods are the use of pre-computable terms by efficient methods. For example, in
89
correlation coefficient formulation given by Equation 3.29, mean and variance terms
of the reference image may be pre-computed and stored in memory for repeated use.
In the following subsections we will discuss efficient pre-computation techniques, often
used to speed up the image match measure computations.
3.3.2 Integral Image Approach
The concept of integral image has been exploited by P.Viola and Jones (2001, 2004)
for efficient computation of rectangular features used for real time object detection.
Integral images have also been used by Schweitzer et al. (2002) for the estimation
of polynomial parameter used for fast approximate template matching. In these
applications, integral images have been used to find the summation of an arbitrary-
sized image patch at a very low computational cost. The normalization parameters
used in different match measure formulations, including mean and variance related
terms, may also be efficiently pre-computed by using the integral image approach.
An integral image I of the reference image r has been defined as an image of the same
size as that of r, but at each location in I, the sum of all previous locations of r is
contained:
I(x, y) =x∑i=1
y∑j=1
r(i, j). (3.30)
Thus the integral image I contains sum over all rectangular regions in the image r
that have their sides parallel to horizontal and vertical axis, their top left corner at
origin, and their bottom right corner at the (x, y) location.
Once integral image has been computed, sum over any arbitrary sized rectangular
region of r may be computed very efficiently, in only four operations. Suppose we
want to compute sum of a rectangular patch of r, with (x1, y1) as its top left corner
and (x2, y2) as its bottom right corner. The sum of all pixels included in this patch,
r(x1 : x2, y1 : y2) is given by:
x2∑i=x1
y2∑j=y1
r(i, j) = I(x2, y2)− I(x2, y1 − 1)− I(x1 − 1, y2) + I(x1 − 1, y1 − 1) (3.31)
90
The integral image itself may be computed efficiently in only one pass of the reference
image r, by using a temporary array s(x, y) and the following recursive formulation:
s(x, y) = s(x, y − 1) + f(x, y) (3.32)
I(x, y) = I(x− 1, y) + s(x, y) (3.33)
Thus for each location two summations are required to make the integral image itself
which are total of 2pq summations followed by 4pq summations required to compute
sums for all patches of size m×n pixels. Therefore, the total cost of computing sums
of all patches in r, of the same size as that of the template image, is 6pq.
In the following subsection, a more efficient summation method is discussed which
can compute the sum of all blocks in the reference image r, of the same size m × npixels, in only 4pq summation operations. However, the integral image approach is
more generic and may be used to compute summations over blocks of varying sizes,
each in just 4 operations, assuming that the integral image is available pre-computed.
3.3.3 Running Sum Approach
In most template matching problems, all patches in the reference image over which
summation has to be computed have same size, m × n pixels. In this case, the sum
may be computed even more efficiently, using the running sum approach.
In the running sum approach, the summation of a block is separated into two steps,
summation along each row is computed first and then summation along each column is
computed. Considering the first step only, summation process will proceed as follows:
1. Allocate a temporary array S, of same size as that of the reference image r.
2. For each row in r, copy the value from fist cell to the corresponding position in
S: S(x, 1) = r(x, 1).
3. For each row, compute sum of first two cells and place in the position of second
cell in S. Then compute sum of first three cells and place in third cell in S.
91
Repeat the process till summation over first n cells is computed in each row.
This can be efficiently done by adding S(x, y − 1) + r(x, y) and placing it in
S(x, y): S(x, y) = s(x, y − 1) + r(x, y).
4. For each row, for cell numbers larger than n, add the current cell value in the
previous sum and subtract one value from the trailing edge. Since previous sum
is available in S(x, y − 1), therefore add r(x, y) in it and subtract r(x, y − n).
Place the final value at S(x, y): S(x, y) = S(x, y − 1) + r(x, y)− r(x, y − n).
5. Continue same process for each row, until row end is reached.
The summation process during first step may also be written in the form of equations:
S(x, y) =
r(x, y) if y = 1;
S(x, y − 1) + r(x, y) if y > 1 & y ≤ m;
S(x, y − 1)− r(x, y −m) + r(x, y) if y > m;
(3.34)
In the second step, summation along each column of array S is computed. The second
step may be written as follows:
1. Allocate a temporary array C, of same size as that of the reference image r.
2. For each column in S, copy the value from fist cell to the corresponding position
in C: C(1, y) = S(1, y).
3. For each column in S, compute sum of first two cells and place at the second
position in C. Then compute sum of first three cells and place in third position
in C. Repeat the process till summation over first m cells is computed in each
column: C(x, y) = C(x− 1, y) + S(x, y).
4. For each column in S, for cell numbers larger than m, add the current cell value
in the previous sum and subtract one value from the trailing edge: C(x, y) =
C(x− 1, y) + S(x, y)− S(x−m, y).
92
Figure 3.1: The template image of size 101 × 101 pixels and the reference imageof size 736 × 1129 pixels (shown in reduced size) are used to generate correlationcoefficient based similarity surface, shown in Figure 3.2. The images are taken fromwww.earth.google.com.
The second step process may also be written in equation form:
C(x, y) =
S(x, y) if x = 1;
C(x− 1, y) + S(x, y) if x > 1 & x ≤ n;
C(x− 1, y)− S(x− n, y) + S(x, y) if x > n;
(3.35)
In the running sum approach, for each summation, we need two operations in the
first step and two operations in the second step. Therefore, we get each summation,
over the same size block, in just 4 operations.
Another advantage of running sum approach, over the integral image approach, is
avoidance of overflow errors. The values in the integral image may soon become
larger than the maximum integer size, causing overflow error. This error may be
avoided by increasing the integer size, for example using long or double data types in
C language. However, this will cause more memory overhead and also increase the
93
Figure 3.2: Correlation coefficient based similarity surface generated by matchingthe template and the reference images shown in Figure 3.1. The similarity surface iscomputed by using the correlation theorem based fast exhaustive frequency domaintechnique.
computational cost. Overflow problems may not appear in the running sum approach.
3.4 Bound Based Computation Elimination Algo-
rithms
In Sections 3.2 and 3.3, different fast exhaustive image matching techniques were
discussed. Some of these techniques are implemented in frequency domain while
others are implemented in spatial domain. In all of these techniques, the template
image is matched at all valid search locations in the reference image and complete
computations are performed. Since the match measure values are computed at all
search locations, a plot of these values over the entire search space may be visualized
as the match measure surface. For the case of similarity measures, the match surface
may also be called as similarity surface. A similarity surface, for the case of correlation
coefficient based template matching, is shown in Figure 3.2. On this surface, multiple
94
peaks and valleys may be seen, while the best match location is visible in the form of
the highest peak.
In most of the image matching applications, complete computations of the match sur-
face is redundant, because interest is only in finding the best match location which
requires complete computations to be done only in a small region around the peak
location. If the important region around the peak is found by some alternate tech-
nique, then the redundant computations at all other locations may be skipped. This is
the key idea behind the bound based computation elimination algorithms in which a
bound is used to classify the search locations falling in the redundant region or in the
important peak region. The core contributions of this thesis, as discussed in chapter
1, also fall in the category of bound based computation elimination algorithms.
In bound based computation elimination algorithms, instead of actually computing
the match measure, an alternate statistic is computed which is essentially a bound
upon the match measure under consideration. In case of distance measures, such
as SAD and SSD, the best match is defined by minimum value of match measure
over the entire search space; therefore a lower bound is required for classification. At
a particular search location, if the value of lower bound is found to be larger than
already known minimum, then that search location may be labeled as redundant and
skipped from the computations. At that particular location, actual value of match
measure is guaranteed to be larger than the previous known minimum.
Same idea may also be applied to the similarity measures, for example cross-correlation,
NCC and correlation coefficient. For the similarity measures, the best match location
is defined by the maximum value of match measure over the entire search space; there-
fore in this case an upper-bound is required for elimination. At a particular search
location, if the upper bound is found to be smaller than a previous known maximum
then that location may be labeled as redundant and skipped from computations.
The bounds used in the elimination algorithms are ensured to be exact without using
any approximation. Therefore the skipped locations are guaranteed to be falling in
the redundant region. It is impossible that a search location in the peak region gets
skipped. Thus, the computation elimination algorithms reduce computational cost
without any compromise on the match accuracy. These techniques guarantee the
95
same accuracy as that of the exhaustive techniques performing complete computa-
tions. Therefore, bound based computation elimination techniques are also called as
‘Exhaustive Equivalent Accuracy’ techniques.
In elimination algorithms the execution time speed up strongly depends upon the ratio
of the search locations labeled as redundant or skipped, to the total search locations.
As the amount of skipped computations increases, the template matching process
accelerates accordingly. The amount of skipped computations strongly depends upon
the position of the maximum found in the search process. Maximum found close to
the start of the search process will generate significantly large elimination as compared
to the maximum found near the end of the search process. Similarly, a maximum of
high magnitude will cause significantly large elimination in the subsequent region as
compared to maximum with small magnitude.
In several template matching applications, a guess about the location of the maximum
may be known from the context of the problem. This guess may define a region in the
search space in which probability of finding the peak is maximum. In the elimination
algorithms, the search process may start from the most probable region. That is why
spiral search is popular in block matching applications. In the absence of any guess
about the position of the maximum, approximate image matching techniques, as we
have discussed in Section 3.1, have been used to find the approximate position of the
maximum. The computational cost of the approximate search is justified by increase
in computation elimination resulting from a higher maximum found at the start of
the search process.
The amount of skipped computations also depends upon the tightness of the bounds.
A tighter bound may produce significantly larger amount of eliminated computations
as compared to a loose bound. In case of similarity computing image match measures,
a tight upper bound is required. A tight upper bound is one which is close to the actual
value of similarity from the upper side. For example, the bound computed by Cauchy
Schwartz inequality on cross-correlation, NCC or correlation coefficient is a loose
upper bound, because it always remains at maximum height above actual similarity.
In case of distortion computing image match measures, a tight lower bound is required,
which approaches actual value of distortion from below. If the bounds are tighter,
96
the elimination test will be successful more often, causing increased computation
elimination.
The main overhead of the bound based computation elimination algorithms is the
computational cost of the bound. High speed ups can only be obtained if the over-
heads are significantly smaller than the benefits obtained by the skipped search lo-
cations. If a low cost bound is not used, the overall cost may approach to the cost
of the exhaustive techniques, resulting in no speed up. If the computational cost of
the bound becomes larger than benefit obtained by skipped computations, the overall
cost may also increase than the cost of the corresponding exhaustive algorithms. We
have observed that for small template sizes, the computational cost of the bound
used in ZEBC algorithm (Mattoccia et al., 2008b) significantly increased than the
benefit of skipped computations (Mahmood and Khan, 2010, 2011), yielding ZEBC
algorithm slower than the corresponding fast exhaustive algorithm.
Based upon different types of computation elimination strategies, the elimination al-
gorithms may be broadly divided into two different categories, ‘Partial Elimination
Algorithms’, and ‘Complete Elimination Algorithms’. In Partial Elimination Algo-
rithms, at each search location a portion of the match measure is computed and using
the result of that portion, a bound is computed, which is then used to perform the
elimination test. In most of the cases, the elimination test consists of just comparing
the bound value with the previous known maximum. If the bound value is found
to be less than the previous known maximum the elimination test is successful. In
that case, at that particular search location, the remaining computations of match
measure may be skipped without any loss of accuracy. On the other hand if the
elimination test is not successful, then some more computations are performed and
then elimination test is reevaluated with the newly computed bound. Same pro-
cess is repeated until the elimination test becomes successful, or the computations at
that location get completed. In this type of elimination algorithms, some computa-
tions are mandatory at each search location, therefore these algorithms are named
as ‘Partial Elimination Algorithms’. Previously known partial elimination algorithms
include Partial Distortion Elimination (PDE) algorithms and Bounded Partial Cor-
relation (BPC) elimination algorithms. A significant part of the core contributions
of this thesis are Partial Correlation Elimination (PCE) algorithms which are partial
97
elimination algorithms for fast image matching by the maximization of correlation
coefficient. We have proposed two main categories of PCE algorithms, Basic Mode
PCE discussed in Chapter 6 and Extended Mode PCE discussed in Chapter 7.
In ‘Complete Elimination Algorithms’, at a particular search location, the alternate
statistic is computed before the start of the image match measure computations.
The elimination test consists of comparison of the bounding statistic with the pre-
vious known maximum. If the bound evaluated to be less than the previous known
maximum, the computations of the match measure are completely skipped at that
particular search location. Otherwise, if the bounding statistic is found to be larger
than the previous known maximum, the elimination test is unsuccessful. In this case,
complete computations of the match measure have to be performed at that particular
search location. The well known complete elimination algorithms include ’Successive
Elimination Algorithm’ and ’Enhanced Bounded partial Correlation’ algorithms. A
significant part of the core contributions of this thesis is Transitive Elimination Al-
gorithms (TEA), which are complete elimination algorithms. We have discussed the
theoretical aspects of TEA algorithms in Chapter 4 and different TEA algorithms are
discussed in Chapter 5.
In the following subsections, the previously known computation elimination algo-
rithms are discussed in more detail. For the purpose of completeness, our proposed
algorithms, PCE and TEA, are also briefly described. TEA will be discussed in sig-
nificant detail in Chapters 4 and 5, Basic Mode PCE will be discussed in Chapter 6
and Extended Mode PCE will be discussed in Chapter 7.
3.4.1 Successive Similarity Detection Algorithms
Successive Similarity Detection Algorithms (SSDA) are the first computation elimina-
tion algorithms developed by Barnea and Silverman (1972). Later, SSDA algorithms
have been extensively studied in the perspective of block motion estimation in video
encoders where SSDA has also been renamed as Partial Distortion Elimination (PDE)
algorithms. For example, Eckart and Fogg (1995); Quaglia and Montrucchio (2001);
Kim and Choi (1999, 2000); Montrucchio and Quaglia (2005); Huang et al. (2006b),
98
may be seen as important references to PDE algorithms.
SSDA (or PDE) algorithms exploit the monotonic growth pattern of some of the
image match measures evaluating distortion or distance between the two images.
One such measure is city block distance measure or commonly known as Sum of
Absolute Differences (SAD), as given by Equation 2.4:
Φ(r, t) =m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣, (3.36)
where | · | represents the absolute function. In the formulation of SAD, the distortion
is the sum of absolute value of the difference between corresponding pixels. While
computing SAD, at each pixel location, the current distortion is added in the previous
sum. For example, if SAD computation has been done up to u rows and v−1 columns,
then SAD for (u, v) position is given by the sum of previous SAD summation and the
absolute value of the current difference:
SADu,v(r, t) = SADu,v−1(r, t) +∣∣r(u, v)− t(u, v)
∣∣. (3.37)
Since |r(u, v)− t(u, v)| ≥ 0, therefore
SADu,v(r, t) ≥ SADu,v−1(r, t). (3.38)
In general we may write
SADu,v(r, t) ≤ SADm,n(r, t), ∀u ≤ m and v ≤ n, (3.39)
where SADm,n(r, t) is the complete value of SAD computed over m × n pixels. In-
equality 3.39 states that the partial value of SAD is always a lower bound upon the
final value of SAD. If the previous known minimum of SAD is SADmin, then if
SADu,v(r, t) > SADmin, (3.40)
99
then it is guaranteed that
SADm,n(r, t) > SADmin. (3.41)
Therefore, Equation 3.40 may be considered as sufficient condition for elimination.
If this condition is satisfied for a specific value of (u, v), computation beyond the
position (u, v) is redundant and may be skipped without any loss of accuracy. A
more intuitive way to rephrase the same result is that as the computation of SAD
proceeds in a particular block, processing more pixels can never decrease its partial
sum. Hence, once the current partial sum exceeds a previously known minimum
the location is no longer a viable best match location; remaining computations may
therefore be skipped.
Another image match measure exhibiting monotonic growth property is the squared
Euclidean distance, also known as Sum of Squared Differences (SSD), as given by
Equation 2.9. In the formulation of SSD, the overall distortion is the sum of the
squared differences between the corresponding pixel values. If SSD has been computed
for u× (v − 1) pixels, then SSD for u× v pixels is given by:
SSDu,v(r, t) = SSDu,v−1(r, t) +[r(u, v)− t(u, v)
]2. (3.42)
Since [r(u, v)− t(u, v)]2 ≥ 0, therefore
SSDu,v−1(r, t) ≤ SSDu,v(r, t), (3.43)
which may be generalized as
SSDu,v(r, t) ≤ SSDm,n(r, t), ∀u ≤ m and v ≤ n, (3.44)
where SSDm,n(r, t) is sum of squared difference between images r and t over m × npixels. The condition given by Equation 3.44 is the sufficient condition for elimination,
because if this condition is satisfied, the complete value of SSD is guaranteed to be
less than the previous known minimum. Therefore, computation beyond (u, v) is
redundant and may be skipped without any loss of accuracy.
100
In PDE algorithms, if sufficiently low minimum is found at the start of the search
process, amount of computation elimination will increase significantly. To exploit this
fact, in block matching applications, search is started from the center of the search
space and proceeds outwards in spiral form, known as Spiral PDE in H.263 software
implementation by ITU-T (1995).
Bounded Partial Correlation (BPC) Elimination Technique
The cross correlation (Equation 3.10) and normalized cross-correlation (NCC) (Equa-
tion 2.25) increase monotonically as consecutive pixels are processed, because only
positive values are added after processing each pixel. However, the concept of SSDA
or PDE may not be extended in a straight forward manner for these measures, be-
cause the correlation based measures are similarity measures, therefore the best match
location is defined as the maximum value of cross correlation (or NCC). If a previous
maximum is known, it may not be utilized to skip computations of cross-correlation
(or NCC) on the remaining search locations as was the case of PDE for SAD and
SSD match measures.
In order to skip computation in correlation based match measures, a theoretical upper
bound on the final correlation value must be known in advance. If at a particular
search location, the upper bound on correlation is found to be lower than previous
known maximum remaining computations at that location may be skipped without
any loss of accuracy.
Since cross-correlation is equivalent to computing the dot product or inner product of
two images, a well known upper bound upon inner product has been given by Cauchy
Schwartz inequality:
m∑x=1
n∑y=1
r(x, y)t(x, y) ≤
√√√√ m∑x=1
n∑y=1
r2(x, y)
√√√√ m∑x=1
n∑y=1
t2(x, y). (3.45)
It turns out that in case of NCC, Cauchy Schwartz inequality yields +1 as the upper
101
bound upon NCC
m∑x=1
n∑y=1
r(x, y)t(x, y)√m∑x=1
n∑y=1
r2(x, y)
√m∑x=1
n∑y=1
t2(x, y)
≤ 1.00. (3.46)
Hence the bound given by Cauchy Schwartz inequality may not be directly used for
computation elimination, because it always remains fixed at the maximum possible
value of cross-correlation or NCC. Such a bound is called loose bound, because no
matter how small actual value of cross-correlation is, the bound yielded by Cauchy
Schwartz inequality will always stay at maximum value. Therefore, it is not possible to
find maximum which is even higher than the upper bound given by Cauchy Schwartz
inequality. Hence this bound may not be directly used for computation elimination
and no other useful bound upon inner product has been known which may be used
instead of Cauchy Schwartz inequality.
An indirect way to exploit Cauchy Schwartz inequality for computation elimination
has been proposed by di Stefano et al. (2003); Stefano and Mattoccia (2003); di Ste-
fano and Mattoccia (2003), Bounded Partial Correlation (BPC) Algorithm. They
observed that if Cauchy Schwartz inequality based bound is computed on a portion
of the images to be matched and on the remaining portion cross-correlation is com-
puted, then the sum of partial bound and the partial correlation is also a bound on
the final value of cross correlation between those images. Suppose, cross-correlation
is computed on a small portion of the image of size u×v pixels and Cauchy Schwartz
inequality based bound is computed on the remaining image, i.e., from u + 1 to m
rows and v + 1 to n columns, then BPC bound is given by
m∑x=1
n∑y=1
r(x, y)t(x, y) ≤u∑x=1
v∑y=1
r(x, y)t(x, y)+
√√√√ m∑x=u+1
n∑y=v+1
r2(x, y)
√√√√ m∑x=u+1
n∑y=v+1
t2(x, y) (3.47)
102
The image portion on which cross-correlation is computed may be called as correlation-
area and the remaining image portion on which bound is computed, as the bound-area.
If correlation-area is reduced to zero, BPC bound will reduce to Cauchy Schwartz in-
equality. As the correlation area increases, BPC bound starts moving towards the
actual cross-correlation, and if bound area reduces to zero, BPC bound exactly match
cross-correlation between the two images.
The BPC bound may also be computed for normalized cross correlation by dividing
both sides of Equation 3.47 by L2 norm of the images:
m∑x=1
n∑y=1
r(x, y)t(x, y)√m∑x=1
n∑y=1
r2(x, y)
√m∑x=1
n∑y=1
t2(x, y)
≤
+
u∑x=1
v∑y=1
r(x, y)t(x, y) +
√m∑
x=u+1
n∑y=v+1
r2(x, y)
√m∑
x=u+1
n∑y=v+1
t2(x, y)√m∑x=1
n∑y=1
r2(x, y)
√m∑x=1
n∑y=1
t2(x, y)
(3.48)
BPC bound has also been extended for correlation coefficient, named as Zero mean
Normalized Cross Correlation (ZNCC) base image matching by (Di Stefano et al.,
2005). Subtracting image means in Equation 3.48 yields one formulation of BPC
bound for correlation coefficient (Di Stefano et al., 2005):
m∑x=1
n∑y=1
(r(x, y)− µr)(t(x, y)− µt)√m∑x=1
n∑y=1
(r(x, y)− µr)2
√m∑x=1
n∑y=1
(t(x, y)− µt)2
≤
u∑x=1
v∑y=1
(r(x, y)− µr)(t(x, y)− µt)2 +
√m∑
x=u+1
n∑y=v+1
(r(x, y)− µr)2
√m∑
x=u+1
n∑y=v+1
(t(x, y)− µt)√m∑x=1
n∑y=1
(r(x, y)− µr)2
√m∑x=1
n∑y=1
(t(x, y)− µt)2
(3.49)
103
An alternate form of BPC bound for correlation coefficient may be obtained by sub-
stituting the bound upon cross-correlation as given by Equation 3.47 in correlation
coefficient formulation given by Equation 3.29:
ρ(r, t) ≤
u∑x=1
v∑y=1
r(x, y)t(x, y) +
√m∑
x=u+1
n∑y=v+1
r2(x, y)
√m∑
x=u+1
n∑y=v+1
t2(x, y)−mnµrµt√m∑x=1
n∑y=1
r2(x, y)−mnµ2r
√m∑x=1
n∑y=1
t2(x, y)−mnµ2t
.
(3.50)
At a particular search location, after processing u× v pixels, both BPC bounds may
be compared with the current known correlation maximum. If any BPC bound is
found to be less than the current known maximum, the elimination condition has
been satisfied because the final value of correlation coefficient is guaranteed to be less
than the current known maximum. Therefore, remaining computations at the current
search location becomes redundant and may be skipped without any loss of accuracy.
If comparison of BPC bound with the current known maximum shows that the BPC
bound is higher than the maximum, no decision can be made. Therefore computations
of correlation will proceed for few more pixels and then BPC bound will again be
computed and compared against the current known maximum. The same process
will be repeated until the elimination condition is successful or the computations at
the current location get completed.
As the bound area reduces, BPC bound approaches actual correlation value and be-
comes tighter. Therefore, the probability of getting the elimination condition satisfied
increases accordingly. In order to reduce the chances of elimination condition failure,
sufficient correlation area may be selected. However, selecting a large correlation area
may incur a large overhead cost of direct correlation computations and selecting a
small correlation area will make BPC bound to be significantly loose - this tradeoff
is fundamental to BPC strategy.
The suitable size of correlation area depends on the magnitude of current known
maximum or the initial threshold. In order to find high initial threshold, any prior
104
information about the location of maximum may be utilized by starting the search
process from the expected best match location. In the absence of any initial guess,
the threshold has been automatically found by using coarse-to-fine scheme by (Di
Stefano et al., 2005).
3.4.2 Partial Correlation Elimination Algorithms
Partial Correlation Elimination (PCE) algorithms are one of the important contribu-
tions of this thesis. These algorithms are in the category of partial elimination algo-
rithms and extend the concept of Partial Distortion Elimination (PDE) to correlation-
coefficient based fast template matching. PCE algorithms will be discussed in signif-
icant detail in Chapters 6 and 7.
3.4.3 Successive Elimination Algorithms
Successive Elimination Algorithms fall in the category of complete elimination algo-
rithms, because in these algorithms the bound on the match measure is computed
before starting the actual match measure computations. The elimination test consists
of comparison of the bound statistic with the previous known minimum or maximum.
If elimination condition is satisfied, the search location is eliminated from the search
space; otherwise complete computations are done on that search location. That is,
the elimination test is performed only once, and in case of unsuccessful elimination
test, no subsequent test is done. The basic successive elimination algorithm was
compatible with the definition of complete elimination algorithms, however various
extensions of this algorithm deviate from that definition. For example, in multilevel
successive elimination algorithm, the elimination test is performed multiple times
with increasing bound tightness and after execution of additional computations.
The original Successive Elimination Algorithm (SEA) was developed by Li and Salari
(1995) for Sum of Absolute Differences image match measure. This algorithm is based
on the following lower bound on SAD between two images r and t, from Equation
105
3.36:
m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣ ≥ ∣∣ m∑
x=1
n∑y=1
|r(x, y)| −m∑x=1
n∑y=1
|t(x, y)|∣∣, (3.51)
Since image values are always positive, r(x, y) ≥ 0 and t(x, y) ≥ 0, therefore Inequal-
ity 3.51 may be simplified as follows:
m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣ ≥ ∣∣ m∑
x=1
n∑y=1
r(x, y)−m∑x=1
n∑y=1
t(x, y)∣∣, (3.52)
The sum of all blocks of r of size m × n pixels may be found efficiently using the
running sum approach. Those search locations for which the lower SEA bound upon
SAD is found to be higher than previous known minimum, may be skipped from
computation without any loss of accuracy. Since there is no approximation involved
in SEA bound given by Equation 3.52, therefore there is no loss of accuracy associated
with computation skipping.
Successive Elimination Algorithm has also been extended for Euclidean distance based
image match measure by Wang and Mersereau (1999). The lower bound on Euclidean
distance is based on the relationship between SAD and Euclidean distance,
m∑x=1
n∑y=1
∣∣r(x, y)− t(x, y)∣∣ ≤ √mn
√√√√ m∑x=1
n∑y=1
(r(x, y)− t(x, y)
)2, (3.53)
Therefore, by Equation 3.52, Euclidean distance has the same lower bound as SAD:
∣∣ m∑x=1
n∑y=1
r(x, y)−m∑x=1
n∑y=1
t(x, y)∣∣ ≤ √mn
√√√√ m∑x=1
n∑y=1
(r(x, y)− t(x, y)
)2, (3.54)
Any search location where absolute of the difference of search-location-sum and the
template-sum exceeds the previous known minimum, may be skipped without any
loss of accuracy.
The basic SEA algorithm has been extended by several researchers. For example, Gao
et al. (2000) have developed a Multi level SEA algorithm (MSEA). Since the amount
106
of eliminated search locations depends upon the tightness of the lower bound, MSEA
makes the bound tighter by computing the norm values on smaller sub-block sizes.
As an example, if the images to be matched, r and t, are divided into four sub-blocks
and norms are computed over each block independently, then the sum of these partial
bounds will be tighter than the original bound:
∣∣ m∑x=1
n∑y=1
r(x, y)−m∑x=1
n∑y=1
t(x, y)∣∣ ≤
∣∣m/2∑x=1
n/2∑y=1
r(x, y)−m/2∑x=1
n/2∑y=1
t(x, y)∣∣+∣∣ m∑x=m/2+1
n/2∑y=1
r(x, y)−m∑
x=m/2+1
n/2∑y=1
t(x, y)∣∣+
∣∣m/2∑x=1
n∑y=n/2+1
r(x, y)−m/2∑x=1
n∑y=n/2+1
t(x, y)∣∣+∣∣ m∑
x=m/2+1
n∑y=n/2+1
r(x, y)−m∑
x=m/2+1
n∑y=n/2+1
t(x, y)∣∣
(3.55)
If the block size is made even smaller, the bound becomes tighter. In the limiting
case, if each block consists of only one pixel, the bound approaches the actual value
of SAD. In MSEA algorithm, the minimum block size used is of 2 × 2 pixels. The
first elimination test is performed with full block norms. If the test is successful, the
search location is skipped; if the test is unsuccessful, then block width and height
is divided by two and norms are computed over the four blocks. If elimination test
is still unsuccessful, further division is done in the same way until the size of small
blocks reach 2 × 2 pixels. An earlier paper by Lee and Chen (1997) had already
formulated a very similar algorithm with the name of Block Sum Pyramid, which is
essentially the same formulation as MSEA. This algorithm has also been improved
by Ahn et al. (2004), by developing more effective unsuitable block detection schemes.
Zhu et al. (2005) have also tried to reduce the granularity of MSEA by developing Fine
Granularity Successive Elimination (FGSE) algorithm. In FGSE algorithm, the gap
between two MSEA levels is reduced by introducing intermediate levels. Moreover,
in FGSE algorithm, the starting level is decided based on the elimination level of the
neighboring blocks. That is if most of the neighboring blocks get eliminated at level
3, then current block matching is also started from the third level, which reduces the
computation cost of unsuccessful elimination tests at the coarser levels.
107
Another related way to speed up image matching by minimization of Sum of Squared
Differences (SSD), is ‘Projection Kernels’ algorithm proposed by Hel-Or and Hel-Or
(2003, 2005). Projection Kernels algorithm has been motivated by the real time object
detection scheme proposed by (P.Viola and Jones, 2001, 2004), in which summations
of image sub-blocks were computed very efficiently using the integral image approach.
In Projection Kernels algorithm, sum of different image partitions has been obtained
by projecting the images on Walsh Hadamard basis vectors. Most of the computation
elimination has been obtained by the full image sum comparison, which is very similar
to SEA algorithms developed by (Li and Salari, 1995) and (Wang and Mersereau,
1999). If a search location is not eliminated by the image sum comparison, more
projections are computed to make the lower bound on SSD tighter. This concept
is quite similar to the concept of MSEA by (Gao et al., 2000) and (Lee and Chen,
1997), where tight lower bounds were obtained by partitioning the images into smaller
blocks. The pattern matching with Projection Kernel algorithm has been compared
with the naive FFT implementation and claimed two orders of magnitude speed up,
while no comparison has been done with more relevant MSEA implementations.
3.4.4 Enhanced Bounded Partial Correlation Elimination Al-
gorithm
Enhanced Bounded partial Correlation (EBC) elimination algorithm has been devel-
oped by (Mattoccia et al., 2008a,b) for cross-correlation, NCC and correlation coeffi-
cient based image match measures. EBC algorithm is an extension of Bounded Partial
Correlation (BPC) elimination algorithm, very similar to Multilevel SEA (MSEA) ex-
tension of the basic SEA algorithm. EBC algorithm is based on increasingly tight
upper bound on correlation by computing the bound on smaller sized image parti-
tions. It falls in the category of complete elimination algorithms because the bound
statistic is compared with the previous known correlation maximum before starting
the actual correlation computations. If the bound is found to be less than the previ-
ous known maximum, all computations at that search location are skipped. However,
if the correlation maximum is found to be less than the bound, then partial elimina-
tion tests follow. Therefore EBC algorithm may also be considered as a cascade of
108
complete and partial elimination algorithms.
The cross-correlation between two images is equivalent to the inner product of two
vectors and the inner product is bounded from above by Cauchy Schwartz (CS)
inequality. CS inequality yields the maximum possible value of cross-correlation,
therefore one may never find correlation maximum higher than the bound given by
CS inequality. As a result, the bound computed by CS inequality may not be used
for computation elimination. However, just like the bound tightening process used
in MSEA algorithm, the bound based upon CS inequality may also be tightened
by dividing the two images to be matched into smaller partitions and then com-
puting CS bound upon all corresponding partitions and computing the final value
of bound as sum of the partition-bounds. Hence the final value of bound may also
be termed Multi-level Cauchy Schwartz (MCS) bound, parallel to Multilevel SEA
(MSEA) bound. MSEA is actually more generic than MCS bound currently used
in EBC and ZEBC algorithms, because MCS bound is computed at only one level,
while MSEA bound was evaluated at multiple levels. This is because, MSEA bound
requires computing summations, while MCS bound requires computing square root
operations which are very costly, making MCS bound computationally expensive.
To understand the formulation of MCS bound, consider the problem of matching two
images r and t, of size m × n pixels and each divided into small non-overlapping
partitions of size ∆x×∆y pixels, such that 1 ≤ ∆x ≤ m and 1 ≤ ∆y ≤ n. The total
number of partitions in each image are given by (mn)/(∆x∆y). MCS inequality may
be given by:m∑x=1
n∑y=1
r(x, y)t(x, y) ≤
m/∆x−1∑j=0
n/∆y−1∑k=0
√√√√ (j+1)∆x∑x=j∆x+1
(k+1)∆y∑y=k∆y+1
r2(x, y)
√√√√ (j+1)∆x∑x=j∆x+1
(k+1)∆y∑y=k∆y+1
t2(x, y) ≤
√√√√ m∑x=1
n∑y=1
r2(x, y)
√√√√ m∑x=1
n∑y=1
t2(x, y) (3.56)
One may easily observe from this equation that if ∆x = 1 and ∆y = 1, then MCS
109
bound will exactly match cross-correlation value:
m∑x=1
n∑y=1
r(x, y)t(x, y) =
m−1∑j=0
n−1∑k=0
√√√√ (j+1)∑x=j+1
(k+1)∑y=k+1
r2(x, y)
√√√√ (j+1)∑x=j+1
(k+1)∑y=k+1
t2(x, y), (3.57)
which shows that if each partition has just one pixel, then MCS bound will become
equal to actual inner product value. As the partition size increases, MCS bound
moves towards CS bound and in the limiting case of ∆x = m and ∆y = n, MCS
bound will exactly match CS inequality bound:
0∑j=0
0∑k=0
√√√√ m∑x=1
n∑y=1
r2(x, y)
√√√√ m∑x=1
n∑y=1
t2(x, y) =
√√√√ m∑x=1
n∑y=1
r2(x, y)
√√√√ m∑x=1
n∑y=1
t2(x, y) (3.58)
Hence, there is an inherent tradeoff to be balanced: selecting large number of parti-
tions will increase the number of square-root operations which will incur significant
cost of MCS bound computation. Selecting very large partition sizes will render MCS
bound too loose to generate any elimination. Therefore a suitable value of partition
size is critical and an algorithm has been proposed by Mattoccia et al. (2008a) to
automatically select the number of partitions parameter.
In EBC algorithm, the value of MCS bound is computed at each search location for
a suitable size of ∆x and ∆y. In order to reduce the computation cost, ∆x has been
recommended by Mattoccia et al. (2008a) to be m/8 and ∆y to be same as n. That
is, each image is divided into 8 partitions along rows and no partitioning has been
done along columns. For these settings, MCS bound reduces to
m∑x=1
n∑y=1
r(x, y)t(x, y) ≤7∑j=0
√√√√ m(j+1)/8∑x=jm/8+1
m∑y=1
r2(x, y)
√√√√ m(j+1)/8∑x=jm/8+1
n∑y=1
t2(x, y) (3.59)
110
One of the limitations of this approach is the assumption that the number of rows in
template image has to be divisible by 8. For template sizes not divisible by 8, one
may choose a suitable number of partitions, such that each partition is of equal size.
However, if the number of rows is prime, the only factor exists is 1. That is, one
partition consists of only one image row.
For very small partition sizes, MCS bound becomes very tight and complete elimina-
tion test may evaluate to be successful at very large number of search locations. How-
ever, the cost of MCS bound computation may become significant for small partition
sizes. From small to medium sized templates, the cost of MCS bound computation
for ∆x = 1 and ∆y = m exceeds the direct computational cost of cross-correlation.
Therefore, EBC algorithm becomes slower than the exhaustive spatial domain imple-
mentations of correlation. In our experiments, we have observed that EBC algorithm
perform best for template sizes in the range of 64× 64, 72× 72, 80× 80 (Mahmood
and Khan, 2011).
At a particular search location, if MCS bound given by Equation 3.59 is found to be
smaller than the current known correlation maximum, complete computations at that
location are skipped without any loss of accuracy. On the other hand, if the bound
is found to be larger than the known maximum, the bound is tightened by replacing
first partition from the bounded area and computing cross-correlation at that area.
The expression of enhanced BPC bound may be written as:
m∑x=1
n∑y=1
r(x, y)t(x, y) ≤m/8∑x=1
n∑y=1
r(x, y)t(x, y)+
7∑j=1
√√√√ m(j+1)/8∑x=jm/8+1
m∑y=1
r2(x, y)
√√√√ m(j+1)/8∑x=jm/8+1
n∑y=1
t2(x, y) (3.60)
The bound is again compared with the current known maximum and if found less,
the remaining computations may be skipped. Alternatively, if the bound is still larger
than the known maximum, more partitions are included in the correlation area and
excluded from the bound area. In general if cross-correlation has to be computed
111
upon p partitions, the expression for enhanced BPC bound may be written as
m∑x=1
n∑y=1
r(x, y)t(x, y) ≤pm/8∑x=1
n∑y=1
r(x, y)t(x, y)+
7∑j=p+1
√√√√ m(j+1)/8∑x=jm/8+1
m∑y=1
r2(x, y)
√√√√ m(j+1)/8∑x=jm/8+1
n∑y=1
t2(x, y) (3.61)
The same bound may be easily extended for NCC and for correlation coefficient as
we have mentioned in the discussion on BPC algorithm.
As we have already mentioned, MCS bound suffers from high overhead of the square-
root operation. As the partition size reduces, the number of square-root operations
increases, causing corresponding increase in the bound computation cost. For tem-
plate sizes having a prime number as the number of rows, the cost of MCS bound
computation is significant which makes EBC algorithm to be quite slow. For small
templates having sizes in the range of 4×4 to 15×15 pixels, the number of partitions
must be made equal to the number of rows, which cause significant computational
cost making these algorithms slower than the exhaustive spatial domain implemen-
tations. Therefore EBC or ZEBC algorithm no longer remains a choice for these
sizes. For templates having an even number of rows, 16, 18, 20, 22, partition size 2
may be selected whereas for odd number of rows, such as 17, 19, 23, again partition
size of 1 has to be selected. For m=21, partition size of 7 may cause a reduction in
computational cost. Similarly for m=24, 25, 26, 27, 28, 29, 30, 31 and 32, suitable
partition sizes are 8, 5, 13, 9, 7, 1, 6, 1, and 8 respectively. For very large partition
sizes, for example for m=26, partition size of 13, there will be only two partitions
and MCS bound will be closer to CS bound causing a reduction in the eliminated
computations.
3.4.5 Transitive Elimination Algorithms
A major contribution of this thesis is the development of Transitive Elimination
Algorithms, Mahmood and Khan (2007b, 2008, 2010), which fall in the category of
112
complete elimination algorithms for cross-correlation, NCC and correlation coefficient
based image match measures. Transitive Elimination Algorithms will be discussed in
significant detail, in Chapters 4 and 5.
3.4.6 Chapter Summary
In this chapter we have discussed the organization of the image match measure com-
putation techniques. These techniques are broadly divided into two categories: ap-
proximate accuracy and exhaustive accuracy techniques. Approximate accuracy algo-
rithms are further divided into two categories, those approximating the search space
with a smaller search space and those approximating the match measure and images
with simpler versions. Approximate search space algorithms were further divided into
large search space and small search space algorithms. Large search space approxi-
mate algorithms included coarse to fine approach and two-stage template matching,
while small search space approximate algorithms included three step search, two di-
mensional logarithmic search, four step search, diamond search and other algorithms.
The second category of approximate algorithms included polynomial image approx-
imation, binary image approximation, approximating image with rectangular filter
basis, approximating images with Walsh Transform basis functions (Figure 3.3).
The exhaustive equivalent accuracy techniques have the two main approaches, fre-
quency domain and spatial domain techniques. Frequency domain techniques in-
clude FFT based correlation computation and the phase only correlation. Spatial
domain algorithms are further subdivided into two classes, complete computation
algorithms and bound based computation elimination algorithms. Complete compu-
tation algorithms have been made efficient by reformulation of the match measure
and by pre-computing the normalization parameters by running sum approach or the
integral image approach. Bound based computation elimination techniques, which
are the main topic of this thesis, are divided into two types, complete elimination
and partial elimination algorithms. Partial elimination algorithms consist of Partial
Distortion Elimination (PDE), Bounded Partial Correlation (BPC) elimination, Par-
tial Correlation Elimination (PCE) algorithms. Complete elimination algorithms in-
cluded Successive Elimination Algorithm (SEA), Multilevel SEA (MSEA), Enhanced
113
Complete Computation Algorithms
Image Matching Algorithms
Fast Approximate Accuracy Algorithms
Fast Exhaustive Accuracy Algorithms
Frequency Domain Algorithms
Spatial Domain Algorithms
Approximate Search Space Algorithms
Approx. Image Representation Algo.
Computation Elimination Algorithms
Complete Elimination Algorithms
Partial Elimination Algorithms
Efficient Rearrangement
Efficient Pre-computation
ConvolutionTheorem
Phase-OnlyCorrelation
Large Search Space Algorithms
Small Search Space Algorithms
T D L Search
Cross Search
Three Step Search
Cross Search
New Three Step
Four Step Search
Orthogonal Search
Modified Motion Est.
Conjugate Dir. Search
Diamond Search
Coarse-to-Fine /Hierarchical BM
Two-Stage Template Matching
Block Matchingusing Walish Trans.
Correlation Adaptive Predictive Search
Sum of Rectangular Basis Functions
Polynomial Image Approximation
Eigen Image Approximation
Binary Image Approximation
Successive Elimination Algo.
Transitive Elimination Algo.
Enhanced Bounded Corr.
Multi-scale S E A
Fine Granularity SEA
Pattern Matching with Projection
Kernels
SSDA or Partial Distortion Elimination
Bounded Partial Correlation Elim.
Correlation Elimination Algorithm
Figure 3.3: An Organization of Image Match Measure Computation Algorithms.
114
Bounded Correlation (EBC) elimination algorithm and Transitive Elimination Al-
gorithms (TEA). Since Partial Correlation Elimination algorithms and Transitive
Elimination Algorithms formulate major contributions of this thesis, both of these
algorithms are discussed in significant detail in Chapters 4 to 7.
Chapter 4
TRANSITIVE BOUNDS ON THE CORRELATION
BASED MEASURES
Due to guaranteed exhaustive equivalent accuracy, bound based computation elim-
ination algorithms constitute an important part of image matching techniques, and
one of the main topics of this thesis. In Chapter 3 two categories of elimination
algorithms were discussed, partial elimination algorithms and complete elimination
algorithms. In this chapter we will focus on developing complete elimination algo-
rithms for correlation based similarity measures. In this category, the elimination test
is performed before starting the match measure computations and if the elimination
test is found to be successful, complete computations at the current search location
are skipped without any loss of accuracy.
Complete elimination algorithms have been well investigated for match measures such
as Sum of Squared Differences (SSD) and Sum of Absolute Differences (SAD) (Li and
Salari, 1995; Gao et al., 2000; Lee and Chen, 1997; Zhu et al., 2005; Ahn et al.,
2004; Wang and Mersereau, 1999; Kawanishi et al., 2004; Brunig and Niehsen, 2001;
Cheung and Po, 2003). However, for correlation-based measures which include cross-
correlation, Normalized Cross Correlation (NCC) and correlation-coefficient, only lim-
ited effort in this regard is found in literature. This is because, complete elimination
algorithms require tight upper bound on correlation, which should also be computable
at a low computational cost, otherwise the benefit of computation elimination may
get eroded by the overhead cost of the bound computation. The well known bound
on correlation based on Cauchy Schwartz inequality is too loose to yield any com-
putation elimination. Therefore, to the best of our knowledge, only one algorithm
proposed by Mattoccia et al. (2008b) is found in literature which tries to tighten
Cauchy Schwartz inequality based bound by using a partitioning technique. In this
technique, Cauchy Schwartz inequality is computed over smaller partitions of the two
images to be matched and the final bound is computed as the sum of the bounds for
115
116
all partitions (Chapter 3 may be seen for more details of this algorithm). The bound
computed by this partition technique may become tight enough to yield elimination,
but requires large number of square root operations which has high computational
complexity causing significant bound computational cost overhead. Therefore, this
bound provides limited speedup for small and medium sized templates as well as for
the templates having number of rows which are a multiple of large prime numbers.
In contrast, in this chapter we present transitive bounds on correlation based im-
age match measures which have low computational complexity and we also develop
methods to make them tight enough to produce significant computation elimination.
The best known direct bounds on correlation coefficient are either too loose to gen-
erate any computation elimination or have very high computational cost. While
searching for direct bounds on correlation based measures we discovered a special
type of bounds, which we named as transitive bounds. To the best of our knowl-
edge, transitive bounds have not been used to speedup correlation based template
matching applications, before us. The use of transitive bounds is motivated by the
fact that we were not able to find any direct bounds on correlation which are tight
enough to yield computation elimination and with low computational cost overhead.
We explored the transitive bounds in detail, and discovered conditions under which
transitive bounds remain tight enough to yield significant elimination. Moreover, we
developed fast and efficient algorithms for bound computation, which significantly
reduced the computational cost overhead of these bounds.
Since correlation based image match measures are geometric similarity measures,
they can be related to geometric distance measures, including Euclidean distance and
Angular distance measures. Both Euclidean distance and Angular distance being
metrics, are non negative, symmetric and follow the triangular inequality of distance
measures. The relationship between correlation based measures and Euclidean dis-
tance measure may be used to transform the triangular inequality for Euclidean dis-
tance into transitive bounds on the correlation based measures. Similarly, exploiting
the relationship between angular distance measure and correlation based measures,
the triangular inequality for angular distance measure may also be transformed into
another formulation of transitive bounds on the correlation based measures. Tight
transitive bounds are more useful from computation elimination perspective. The
117
transitive bounds on correlation based measures, derived from the two different tri-
angular inequalities vary in tightness. We theoretically show that the bounds based
on angular distance measure are tighter than the bounds based on Euclidean distance
measure.
In this chapter, we will analyze the tightness characteristics of the angular distance
based transitive bounds in detail and define the conditions under which both upper
and the lower transitive bounds become tight, the conditions under which only upper
bound becomes tight and the lower bound remains loose and the conditions under
which both the upper and the lower bounds remain loose. Angular distance based
transitive bounds and the tightness conditions are exploited for the development of
transitive elimination algorithms in the following chapter.
4.1 Derivation of Angular Distance Based Transi-
tive Bounds
Let r1 and r2 be the two image blocks, each of size m × n pixels, and ψ1,2 be the
cross-correlation between these vectors:
ψ1,2 =m−1∑i=0
n−1∑j=0
r1(i, j)r2(i, j). (4.1)
r1 and r2 may also be considered as vectors in Rm×n space. Let θ1,2 be the angular
distance between these vectors. Using the definition of scalar product, θ1,2 can be
related with cross-correlation, ψ1,2:
θ1,2 = cos−1 ψ1,2
||r1||2||r2||2, (4.2)
where ||.||2 denotes L2 norm. The angular distance is symmetric, θ1,2 = θ2,1, and
bounded between 0 ◦ and 180 ◦. In addition, angular distance also follows the trian-
gular inequality of distance measures (Mahmood and Khan, 2007b), that is for three
118
image blocks r1, r2 and r3 (Figure 4.1):
θ1,2 + θ2,3 ≥ θ1,3 ≥ |θ1,2 − θ2,3| (4.3)
where θ1,3 is the angular distance between r1, r3 and θ2,3 is the angular distance
between r2, r3. The minimum and the maximum angular distance between r1 and r3
occurs when r3 lies in the same plane as r1 and r2 (see Figure 4.1). Therefore the
upper and lower triangular bounds are also bounded between 0 ◦ and 180 ◦ and the
triangular inequality in Equation 4.3 may be written as:
min{360 ◦ − (θ1,2 + θ2,3), (θ1,2 + θ2,3)} ≥ θ1,3 ≥ |θ1,2 − θ2,3| (4.4)
To link this inequality to correlation, we observe that the cosine function monoton-
ically decreases from +1 to -1 as θ varies from 0 ◦ to 180 ◦. Taking cosine of the
triangular inequality, we get the basic form of transitive inequality:
cos(θ1,2 + θ2,3) ≤ cos(θ1,3) ≤ cos(θ1,2 − θ2,3). (4.5)
This may be rearranged using trigonometric identities to:
cos θ1,2 cos θ2,3 −√
1− (cos θ1,2)2
√1− (cos θ2,3)2 ≤ cos θ1,3
≤ cos θ1,2 cos θ2,3 +√
1− (cos θ1,2)2
√1− (cos θ2,3)2 (4.6)
Multiplying this inequality with (||r1||2||r2||2)(||r2||2||r3||2) and simplifying using Equa-
tion 4.2, we get transitive inequality for cross-correlation:
ψ1,2ψ2,3 +√
(||r1||2||r2||2)2 − ψ21,2
√(||r2||2||r3||2)2 − ψ2
2,3
(||r2||2)2
≤ ψ1,3 ≤
ψ1,2ψ2,3 −√
(||r1||2||r2||2)2 − ψ21,2
√(||r2||2||r3||2)2 − ψ2
2,3
(||r2||2)2(4.7)
119
This inequality provides transitive bounds on cross-correlation between r1 and r3, if
cross-correlation between r1 and r2 and that between r2 and r3 is already known.
Cross-correlation is often used in its normalized form to remove its bias towards
brighter regions. Normalized Cross-Correlation (NCC) between image blocks r1 and
r2 is defined as:
φ1,2 =ψ1,2
||r1||2||r2||2, (4.8)
Angular distance between two image blocks may also be written in terms of NCC as
θ1,2 = cos−1(φ1,2). Transitive inequality given by Equation 4.7 gets modified for NCC
as follows:
φ1,2φ2,3 +√
1− φ21,2
√1− φ2
2,3 ≤ φ1,3 ≤ φ1,2φ2,3 −√
1− φ21,2
√1− φ2
2,3. (4.9)
This inequality yields transitive bounds on NCC between image blocks r1 and r3, if
NCC between r1 and r2 and that between r2 and r3 is already known.
NCC is robust to contrast variations, but it is not robust to the brightness variations.
A more robust measure, invariant to all linear changes in the signal, is correlation
coefficient, defined as:
ρ1,2 =ψ1,2 −mnµ1µ2
||r1 − µ1||2 ||r2 − µ2||2, (4.10)
where µ1 and µ2 are the means of r1 and r2 respectively. Correlation-coefficient can
also be written in terms of the angular distance as follows: ρ1,2 = cos(θ1,2), where θ1,2
is angular distance between r1 − µ1 and r2 − µ2. Transitive inequality in terms of θ
can be derived by following the same steps as that for θ, and yields:
cos(θ1,2 + θ2,3) ≤ cos(θ1,3) ≤ cos(θ1,2 − θ2,3). (4.11)
This can be expanded to transitive inequality for correlation-coefficient:
ρ1,2ρ2,3 +√
1− ρ21,2
√1− ρ2
2,3 ≤ ρ1,3 ≤ ρ1,2ρ2,3 −√
1− ρ21,2
√1− ρ2
2,3. (4.12)
This inequality gives bounds on correlation coefficient between image blocks r1, r3, if
the values of ρ1,2 and ρ2,3 are known.
120
(a)
(c)
(b)
(d) o180', =ππφ
1r
2r
3r
2,1θ
3,1θ
3,2θ
π π 'π 'π
2r 2r
3r 3r 1r
1r
2,1θ 2,1θ 3,2θ 3,2θ
o0', =ππφ
',ππφ
1r
2r
3r 2,1θ
3,2θ
π
'π
Figure 4.1: Triangular inequality for angular distance measure: (a) Image blocks r1,r2 and r3 represented as vertices and angular distance between them is shown asedges of a triangle. (b) θ1,3 depends on the angle between planes π and π′. (c)-(d)θ1,3 becomes maximum θ1,2 + θ2,3 when φπ,π′ = 180 ◦ and minimum |θ1,2 − θ2,3| whenφπ,π′ = 0 ◦.
121
In statistics literature, we find that angular distance based transitive bounds on cor-
relation coefficient have been very briefly mentioned in Sigley and Stratton (1942)
and Langford et al. (2001). However, comprehensive mathematical treatment, analy-
sis and their practical utility for speeding up the template matching process has not
been done before us. Emphasis of most of the researchers from the field of statistics
has been on the fact that positive coefficient of correlation is not transitive, for exam-
ple see Sotos et al. (2007, 2009). The notion of transitivity assumed by these authors
is, if r1 and r2 are positively correlated and r2 and r3 are also positively correlated,
then it is not necessary that r1 and r3 will also be positively correlated. This result
may also be seen from Equation 4.12, by substituting, for example, ρ1,2 = ρ2,3 = .50,
then −.50 ≤ ρ1,3 ≤ 1.00, that means ρ1,3 may turn out to be negative. This result
shows that we need to identify ranges of ρ1,2 and ρ2,3 in which both upper and lower
bounds remain close enough, or the bounds remain tight. The tightness of transitive
bounds will be discussed in detail in the Sections 4.3 and 4.4.
4.2 Derivation of Euclidean Distance Based Tran-
sitive Bounds
In the previous section, we exploited the link between correlation based match mea-
sures and angular distance to derive transitive inequalities for correlation. A different
set of transitive inequalities may also be derived by exploiting the relationship be-
tween correlation and Euclidean distance based measures. For Euclidean distance
based image match measures, the image blocks r1, r2 and r3 may be considered as
points in Rm×n. Let ∆1,2 be Euclidean distance between r1 and r2, from Equation
2.8
∆1,2 =
√√√√ n∑x=1
m∑y=1
(r1(x, y)− r2(x, y))2. (4.13)
Similarly ∆1,3 be the Euclidean distance between r1 and r3 and ∆2,3, be Euclidean
distance between r1 and r3.
122
The Euclidean distance being a metric, follows three properties of the distance mea-
sures, the non-negativity property states that all distances are always positive: ∆1,2 ≥0, ∆1,3 ≥ 0, ∆2,3 ≥ 0, the symmetry of distance measures require that ∆1,2 = ∆2,1,
∆1,3 = ∆3,1, ∆2,3 = ∆3,2. The triangular inequality for distance measures requires
that:
|∆1,2 −∆2,3| ≤ ∆1,3 ≤ ∆1,2 + ∆2,3. (4.14)
Squaring all sides of the inequality
(∆1,2 −∆2,3)2 ≤ ∆21,3 ≤ (∆1,2 + ∆2,3)2, (4.15)
which may be written as
∆21,2 + ∆2
2,3 − 2∆1,2∆2,3 ≤ ∆21,3 ≤ ∆2
1,2 + ∆22,3 + 2∆1,2∆2,3. (4.16)
To relate Euclidean distance to cross-correlation, we note, from Equation 2.36 that
∆21,2 = ∆2
1,1 + ∆22,2 − 2ψ1,2, (4.17)
where ∆21,1 and ∆2
2,2 represent Euclidean norm or magnitude of each of the image.
Substituting the value of Euclidean distance in terms of Euclidean norms and cross-
correlation, in the triangular inequality given by Equation 4.16 yields
∆21,1 + 2∆2
2,2 + ∆23,3 − 2ψ1,2 − 2ψ2,3 − 2
√(∆2
1,1 + ∆22,2 − 2ψ1,2)(∆2
2,2 + ∆23,3 − 2ψ2,3)
≤ ∆21,1 + ∆2
3,3 − 2ψ1,3 ≤
∆21,1 + 2∆2
2,2 + ∆23,3 − 2ψ1,2 − 2ψ2,3 − 2
√(∆2
1,1 + ∆22,2 − 2ψ1,2)(∆2
2,2 + ∆23,3 − 2ψ2,3).
(4.18)
The first part of the inequality yields the upper bound on cross correlation:
(ψ2,3 + ψ1,2 −∆22,2) +
√(∆2
1,1 + ∆22,2 − 2ψ1,2)(∆2
2,2 + ∆23,3 − 2ψ2,3) ≥ ψ1,3, (4.19)
123
and the second part of the inequality yields the lower bound on cross correlation
ψ1,3 ≥ (ψ2,3 + ψ1,2 −∆22,2)−
√(∆2
1,1 + ∆22,2 − 2ψ1,2)(∆2
2,2 + ∆23,3 − 2ψ2,3). (4.20)
Similar inequalities may also be derived for Normalized Cross Correlation (NCC),
as given by Equation 4.8. NCC is cross correlation between two unit magnitude
normalized images. Since Euclidean norm of unit magnitude normalized images is
1.00, therefore the upper bound on NCC may be obtained from Equation 4.19:
(φ2,3 + φ1,2 − 1) + 2√
(1− ψ1,2)(1− ψ2,3) ≥ φ1,3, (4.21)
and the lower bound on NCC may be obtained from Equation 4.20:
φ1,3 ≥ (φ2,3 + φ1,2 − 1)− 2√
(1− ψ1,2)(1− ψ2,3). (4.22)
Sometimes the images to be matched also contain additive intensity variations in
addition to the multiplicative or contrast changes. Robustness to both additive and
multiplicative changes requires image match measure to be computed on zero mean
and unit variance normalized images. Euclidean distance between two zero mean and
unit variance normalized images, ∆zu(1, 2) = ∆1,2, is given by Equation 2.12:
∆1,2 =
√√√√ n∑x=1
m∑y=1
(r1(x, y)− µ1
σ1
− r2(x, y)− µ2
σ2
)2. (4.23)
Triangular inequality for this case is given by:
|∆1,2 − ∆2,3| ≤ ∆1,3 ≤ ∆1,2 + ∆2,3. (4.24)
Squaring all sides we get:
(∆1,2 − ∆2,3)2 ≤ ∆21,3 ≤ (∆1,2 + ∆2,3)2. (4.25)
In order to related the normalized Euclidean distance with correlation coefficient,
124
-0.5
-0.25
0
0.25
0.5
0.75
1
Bou
nds
on ρ
1,3
1.00
0.95
0.80
0.60
0.40
0.20
a=0.00
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on ρ
1,3
ρ2,3
1.00
0.95
0.80
0.60
0.40
0.20
a=0.00
Figure 4.2: Angular distance based transitive bounds for a = {0.00, 0.20, 0.40,0.60, 0.80, 0.95, 1.00}, where a is ρ1,2. The upper transitive bounds are shown bysolid lines and the lower bound by dotted lines. The bounds change from circle toellipse and finally the upper and lower bounds merge together in the diagonal line fora = 1.00.
Equation 2.39 may be used:
ρ1,2 = 1− 1
2∆2
1,2. (4.26)
Simplifying the resulting expression, we get normalized Euclidean distance based
bounds on correlation coefficient:
(ρ1,2 + ρ2,3 − 1) + 2√
(1− ρ1,2)(1− ρ2,3) ≥ ρ1,3
≥ (ρ1,2 + ρ2,3 − 1)− 2√
(1− ρ1,2)(1− ρ2,3). (4.27)
The set of transitive bounds on cross correlation given by Equations 4.19 and 4.20,
NCC by Equations 4.21, 4.22 and correlation coefficient by Equation 4.27 are parallel
to the bounds based on angular distance, given by Equations 4.7, 4.9 and 4.12, as
derived in the last section. In Section 4.4, we will compare first type of transitive
bounds on correlation coefficient given by Equation 4.12 with the second type of
bound given by Equation 4.27, and find that the first type of bounds which are based
125
on angular distance may be preferred over the bounds based on Euclidean distance
due to more tightness. In the following section, different visualizations of transitive
bounds are presented which are helpful for better comprehension of both types of
bounds.
4.3 Visualization of Transitive Bounds on Corre-
lation
In order to get computation elimination, tight bounds on correlation coefficient are
necessarily required. In order to understand the tightness characteristics of the tran-
sitive bounds, these bounds are visualized by plotting the bound surfaces. Both,
angular distance based and Euclidean distance based bounds are separately visual-
ized in the following subsections.
4.3.1 Visualization of Angular Distance Based Transitive Bounds
The angular distance based transitive bounds, as given by Equation 4.12, may be
visualized by fixing one of the two bounding correlations, ρ1,2 and ρ2,3, to a constant
value and varying the second correlation in its full range of +1 to -1. We may fix
ρ1,2 = a, where a is a constant, and study the variation of bound with the variation
of ρ2,3. Putting ρ1,2 = a in Equation 4.12:
aρ2,3 +√
1− a2
√1− ρ2
2,3 ≤ ρ1,3. (4.28)
aρ2,3 −√
1− a2
√1− ρ2
2,3 ≥ ρ1,3. (4.29)
In both of these inequalities, taking the term aρ2,3 on the other side:
√1− a2
√1− ρ2
2,3 ≤ ρ1,3 − aρ2,3. (4.30)
−√
1− a2
√1− ρ2
2,3 ≥ ρ1,3 − aρ2,3. (4.31)
126
Squaring both sides and rearranging, we find that both of the Equations 4.30 and
4.31 reduce to the same equation of ellipse:
ρ21,3 + ρ2
2,3 − 2aρ1,3ρ2,3 ≤ 1− a2. (4.32)
ρ21,3 + ρ2
2,3 − 2aρ1,3ρ2,3 ≥ 1− a2, (4.33)
which means:
ρ21,3 + ρ2
2,3 − 2aρ1,3ρ2,3 = 1− a2. (4.34)
For a = 0.00, we get the equation of unit circle
ρ21,3 + ρ2
2,3 = 1. (4.35)
For a = 1.00, Equation 4.34 reduce to
ρ1,3 = ρ2,3, (4.36)
which is equation of straight line passing through origin, at 45o.
In Figure 4.2, we have plotted Equation 4.34 for different values of a, including a =
{0.00, 0.20, 0.40, 0.60, 0.80, 0.95, 1.00}. We observe that, as the value of a increases
from 0.00 to 1.00, the unit circle transforms into ellipse and the minor axis of the
ellipse continuously shrinks as the value of a increases. Ultimately the minor axis
becomes zero for a = 1.00, where the ellipse degenerates into a single diagonal line:
ρ(2, 3) = ρ(1, 3).
For any value of ρ2,3, the vertical distance between the upper and the lower bounds
in Figure 4.2, shows a range in which ρ1,3 is constrained. We observe that for larger
magnitudes of a, for example, for a = 0.95, the range containing ρ1,3 remains almost
same for all values of ρ2,3. However, for smaller magnitudes of a, for example, for
a = 0.00, the range of ρ1,3 is very close to maximum, and shrinks only when ρ2,3
approaches -1.00 or +1.00. Maximum range of ρ1,3 occurs only when both correlations
ρ1,2 and ρ2,3 are 0.00. This analysis shows that if at least one of ρ1,2 or ρ2,3 has high
magnitude, the transitive bounds will become tight.
127
If both correlations ρ1,2 and ρ2,3 are varied from -1 to +1, rather than keeping one of
them fixed, a bounding surface is generated. Such bounding surfaces are plotted for
upper and lower transitive bounds by using Equation 4.12, shown in Figures 4.3 and
4.4. The upper bound surface approaches the lowest values when one of ρ1,2 and ρ2,3
approaches the highest value of +1 and the other approach the lowest value of -1.00.
The upper bound surface remains at maximum value of +1 if both correlations are
equal, ρ1,2 = ρ2,3. The lower bound surface approaches maximum value if both ρ1,2
and ρ2,3 are +1 or both are -1. When one of ρ1,2 and ρ2,3 is +1 and other -1, the
lower bound approaches minimum. Thus the behavior of upper and the lower bounds
is quite different.
Upper and lower bound surfaces if combined, form the space containing ρ1,3, shown
in Figure 4.5. No value of ρ1,3 can occur outside this space. The shape of this space is
very similar to a special type of tetrahedron having each 2D section as an ellipse and
four corners at (ρ1,2, ρ2,3, ρ1,3) = (-1,1,-1), (-1,1,-1), (-1,-1,1), (1,1,1). At each of the
corner, upper and lower bound surfaces meet, reducing the range of ρ1,3 to a single
value. If both ρ1,2 and ρ2,3 have equal value of +1 or -1, ρ1,3 is +1. If any one of
these correlations is -1 and the other is +1, then ρ1,3 can have only one value, which
is -1. The portions of space close to the corners if tetrahedron are of special interest,
because in these regions both bounds are sufficiently close to each other, causing tight
upper and lower transitive bounds.
The range of ρ1,3 is important because small range is more useful as compared to a
larger range. Small range results when both bounds are tight and large range results
if one or both bounds are loose. The range of ρ1,3 is plotted as shown in Figure 4.6.
The smaller ranges are shown in blue colors, while the larger ranges are shown in red
color.
4.3.2 Visualization of Euclidean Distance Based Bounds
In order to visualize Euclidean distance based transitive bounds, we fix one of the
two bounding correlations, ρ1,2 and ρ2,3 and see the bound variation by varying the
other one. If we fix the value of ρ1,2, in Equation 4.27, to a constant value a, and
128
−1−0.5
00.5
1
−1
−0.5
0
0.5
1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ1,2ρ2,3
Upp
er B
ound
on
ρ 1,3
ρ1,2
ρ2,3
ρ1,2=ρ2,3
Figure 4.3: Angular distance based upper transitive bound surface shown in pseudocolors.
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ1,2ρ2,3
Low
er B
ound
on
ρ 1,3
Figure 4.4: Angular distance based lower transitive bound surface shown in pseudocolors.
129
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ1,2ρ2,3
Bou
nd o
n ρ 1,
3
Figure 4.5: Space of ρ1,3 based on upper and lower transitive bounds, computed fromEquation 4.12.
Figure 4.6: Range of ρ1,3 computed from (upper - lower) transitive bounds based onangular distance.
130
-0.5
-0.25
0
0.25
0.5
0.75
1
Upp
er a
nd L
ower
Bou
nds
on ρ
1,3
0.60
0.40
0.00
0.95
0.80
0.20
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Upp
er a
nd L
ower
Bou
nds
on ρ
1,3
0.60
0.40
0.00
ρ2,3
0.95
0.80
0.20
Figure 4.7: Euclidean distance based bounds for ρ(1, 2) = {0.00, 0.20, 0.40, 0.60, 0.80,0.95, 1.00}. Margin between upper and lower bounds reduces as the value of ρ(1, 2)increases, and ultimately becomes zero when ρ(1, 2) approaches 1.00.
simplify both sides of the equation to get the following form:
(ρ1,3 − ρ2,3)2 + 2(1− a)(ρ1,3 + ρ2,3) = 3− 2a− a2. (4.37)
We plot this Equation for a = {0.00, 0.40, 0.80, 0.95, 1.00}, shown in Figure 4.7. As
the value of a increases, the quadratic curves converge towards center, and for a = 1,
become diagonal line: ρ(2, 3) = ρ(1, 3). In Figure 4.7, for very high values of ρ1,2,
for example, for ρ1,2 = .95, the range of ρ1,3 increases as the value of ρ2,3 decreases.
Also, as the value of ρ1,2 decreases, the range of these bounds increases rapidly. Tight
bounds can only be obtained if both ρ1,2 and ρ2,3 have high values.
In Figure 4.8, we have plotted the upper bound surface, and in Figure 4.9, the lower
bound surface is plotted. The upper bound surface approaches ρ1,3 = −1 only if one
of ρ1,2 and ρ2,3 is +1 and the other is -1. The lower bound surface shows that lower
bound is significantly loose for most of the values of ρ1,2 and ρ2,3. When both ρ1,2
and ρ2,3 are -1, the lower bound approaches -7.00, which is even lower than the least
possible value of correlation coefficient.
The ρ1,3 space is shown in Figure 4.10 by plotting both upper and lower bound surfaces
131
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
ρ1,2ρ2,3
Upp
er B
ound
on
ρ 1,3
Figure 4.8: Euclidean distance based upper transitive bound surface shown in pseudocolors.
simultaneously. We observe that the combined bound surface resembles a cone with
tip at (1,1,1) and comes down as the values of ρ1,2 and ρ2,3 reduces from 1.00. The
space of ρ1,3 is significantly larger than the space computed from angular distance
based transitive bounds shown in Figure 4.5. Also, the space shown in Figure 4.10
is open, while the space shown in Figure 4.5 was a closed space. Thus we observe
that the angular distance based transitive bounds ranges are significantly smaller
than Euclidean distance based bounds. In the following section, we will theoretically
compare the tightness of angular distance based bounds with Euclidean distance based
bounds and we find that angular distance based bounds are tighter.
4.4 Tightness of Euclidean and Angular Distance
Based Transitive Bounds
In bound based computation elimination algorithms, the tightness of the bound is an
important parameter from the algorithm performance point of view. Tight bounds
132
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1−7
−6
−5
−4
−3
−2
−1
0
1
ρ1,2ρ2,3
Low
er B
ound
on
ρ 1,3
Figure 4.9: Euclidean distance based lower transitive bound surface shown in pseudocolors.
−1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1−7
−6
−5
−4
−3
−2
−1
0
1
ρ1,2ρ2,3
Upp
er B
ound
on
ρ 1,3
Figure 4.10: The upper and lower bound surface merge to form a cone with tipat (1,1,1). Difference between upper and lower Euclidean bounds show the boundtightness. Euclidean bounds become tight only if both of the bounding correlationsare high.
133
produce more elimination as compared to the loose bounds. In the comparison of
two types of transitive bounds, we find that angular distance based bounds are the-
oretically tighter than Euclidean distance based bounds. The upper and the lower
transitive bounds are separately compared in the following subsections:
4.4.1 Comparison of Upper Transitive Bounds
A tight upper bound is one which approaches the actual measure from above. If
multiple upper bounds are available for the same measure, the smallest upper bound is
the tightest bound. The upper transitive bound based on Euclidean distance is found
to be greater than the upper transitive bound based on angular distance, therefore
the angular distance based bounds are tighter than Euclidean distance based bounds.
In this comparison, we have considered the upper bounds on correlation coefficient
only, analysis of cross-correlation and NCC follows in a similar way.
Correlation coefficient is bounded between +1.00 and -1.00: −1 ≤ ρi,j ≤ +1, therefore
following inequality is always true:
1
2
√(1− ρ1,2)(1− ρ2,3) +
1
2
√(1 + ρ1,2)(1 + ρ2,3) ≤ 1 (4.38)
The left hand side of Inequality 4.38 approaches maximum value of +1 when both
correlations are equal: ρ1,2 = ρ2,3, for all other combinations, ρ1,2 6= ρ2,3, it remains
less than +1.
In order to bring Inequality 4.38 in a form which will be changed to the formulation
of transitive bounds, we need to include some more terms in it. For this purpose, we
use the non-negativity property of the distance measures, that is, a product of two
normalized Euclidean distance terms will always be positive:
∆1,2∆2,3 ≥ 0. (4.39)
Therefore the term ∆1,2∆2,3 may be multiplied on both sides of inequality 4.38:
∆1,2∆2,3
[12
√(1− ρ1,2)(1− ρ2,3) +
1
2
√(1 + ρ1,2)(1 + ρ2,3)
]≥ ∆1,2∆2,3. (4.40)
134
We may convert Inequality 4.40 to just correlation coefficient terms by using the fol-
lowing relationship between correlation coefficient and normalized Euclidean distance,
given by Equation 2.39:
∆1,2 =√
2(1− ρ1,2), (4.41)
which may be used to derive following equation:
∆1,2∆2,3 = 2√
(1− ρ1,2)(1− ρ2,3) (4.42)
Therefore, inequality 4.40 may be converted to correlation coefficient terms by using
Equation 4.42:√(1− ρ1,2)(1− ρ2,3)
[12
√(1− ρ1,2)(1− ρ2,3) +
1
2
√(1 + ρ1,2)(1 + ρ2,3)
]≥
√(1− ρ1,2)(1− ρ2,3), (4.43)
which may be rearranged to the following final form
ρ1,2ρ2,3 +√
(1− ρ21,2)(1− ρ2
2,3) ≤ (ρ1,2 + ρ2,3 − 1) + 2√
(1− ρ1,2)(1− ρ2,3). (4.44)
Left hand side of Inequality 4.44 is the upper transitive bound on ρ1,3, based on
angular distance measure as given by 4.12, while the right hand side is the upper
transitive bound based on Euclidean distance measure, given by Inequality 4.27. Since
an upper bound with minimum value is tighter, therefore angular distance based
transitive bound is tighter than Euclidean distance based bound.
In order to comprehend the result given by Inequality 4.44, both types of transitive
bounds are simultaneously plotted for same values of ρ1,2 and ρ2,3. We observe that
upper and lower angular distance based transitive bounds are contained within the
Euclidean distance based transitive bounds, as shown in Figure 4.11.
4.4.2 Comparison of Lower Transitive Bounds
If multiple lower bounds are available, then the best lower bound is one which is the
maximum of all bounds. In this subsection, we show that angular distance based
135
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
LA
UE
LE
(b)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
UE
LA
LE
(a)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UE
UA
LA
LE
(d)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
LA
UE
LE
(c)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
LA
UE
LE
(b)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
UE
LA
LE
(a)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UE
UA
LA
LE
(d)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
LA
UE
LE
(c)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
LA
UE
LE
(b)
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
Bou
nds
on C
orr-C
oeffi
cien
t
UA
UE
LA
LE
(a)
Figure 4.11: Comparison of upper transitive bounds based on angular distance (UA)with the upper transitive bounds based on Euclidean distance (UE). Angular distancebased lower transitive bounds (LA) are also compared with Euclidean distance basedlower transitive bounds (LE). (a) Plot of ρ2,3 on x-axis and bounds on y-axis forρ1,2 = 0.40. (b) Plot of ρ2,3 on x-axis and bounds on y-axis for ρ1,2 = 0.50. (c) Plotof ρ2,3 on x-axis and bounds on y-axis for ρ1,2 = 0.60. (d) Plot of ρ2,3 on x-axis andbounds on y-axis for ρ1,2 = 0.70.
lower transitive bound is always larger than (or equal to) Euclidean distance based
lower transitive bound.
Using the fact that normalized Euclidean distance is bounded between 0.00 and 2.00
and correlation coefficient is bounded between +1.00 and -1.00, the proof of the
following inequalities is trivial:
1
2
√(1 + ρ1,2)(1 + ρ2,3) ≤ 1, (4.45)
∆1,2∆2,3 ≥ 0. (4.46)
Multiplying both sides of Inequality 4.45 with ∆1,2∆2,3 will not change the direction
of inequality:1
2∆1,2∆2,3
√(1 + ρ1,2)(1 + ρ2,3) ≤ ∆1,2∆2,3. (4.47)
The proof of following inequality is also trivial:
ρ1,2ρ2,3 ≥ ρ1,2 + ρ2,3 − 1. (4.48)
136
Multiplication of Inequality 4.47 by -1 will invert the direction of inequality:
−1
2∆1,2∆2,3
√(1 + ρ1,2)(1 + ρ2,3) ≥ −∆1,2∆2,3. (4.49)
Adding Inequalities 4.49 and 4.48:
ρ1,2ρ2,3 −1
2∆1,2∆2,3
√(1 + ρ1,2)(1 + ρ2,3) ≥ (ρ1,2 + ρ2,3 − 1)− ∆1,2∆2,3, (4.50)
substituting the value of ∆1,2∆2,3 from Equation 4.42:
ρ1,2ρ2,3 −√
(1− ρ21,2)(1− ρ2
2,3) ≥ (ρ1,2 + ρ2,3 − 1)− 2√
(1− ρ1,2)(1− ρ2,3). (4.51)
Left hand side of Inequality 4.51 is the lower transitive bound on ρ1,3, based on angular
distance measure as given by 4.12, while the right hand side is the lower transitive
bound based on Euclidean distance measure, given by Inequality 4.27. Inequality 4.51
shows that Euclidean distance based lower transitive bound is always less than angular
distance based lower transitive bound. That proves the fact that the lower transitive
bound when derived from angular distance is tighter than the bound formulation
based on Euclidean distance.
Inequality 4.51 may also be observed by plotting angular distance based lower tran-
sitive bound (Inequality 4.12) and Euclidean distance based lower transitive bound
(Inequality 4.27), as shown in Figure 4.11. In this figure, we observe that angular
distance based transitive bounds are contained within Euclidean distance based tran-
sitive bounds. Therefore, in the following section, we will further explore the tightness
characteristics of angular distance based transitive bounds which will be used for the
development of transitive elimination algorithms in Chapter 5.
137
1r
2r
3r
2,1θ
3,2θ 1r
2r
3r 2,1θ 3,2θ
1r
2r
3r
2,1θ 3,2θ
(a) (b) (c)
Figure 4.12: The tightness of Transitive bounds: (a) Case 1: Both angles are small(b) Case 2: One angle is small and the other is large (c) Case 3: Both angles arelarge.
4.5 Tightness Analysis of Angular Distance Based
Transitive Bounds
For a particular search location, transitive bounds indicate the maximum and the min-
imum limits on correlation, which can be used to discard unsuitable search locations.
For example, at a specific location, if the maximum limit is less than the correlation
value at some previous location, correlation computation becomes redundant and
may be skipped without any loss of accuracy. As the percentage of skipped search
locations increases, the template matching process accelerates accordingly. In order
to compute angular distance based transitive bounds, three transitive inequalities
were presented in Section 4.1. In each of these inequalities, there are two Bounding
Correlations which must be known in order to find bounds on the third Bounded
Correlation. For example, in Equation 4.12, ρ1,2 and ρ2,3 are the two bounding cor-
relations which constrain the upper and the lower limits on the bounded correlation
ρ1,3.
The tightness of the transitive bounds depends on the magnitude of the two bounding
correlations, and requires that the upper bound to be low and the lower bound to
be high. This dependency may be more clearly understood by considering transitive
inequalities in terms of angular distances as given by Equations 4.5 or 4.11. In these
equations, a tight upper bound means cos(θ1,2 − θ2,3) resulting a value significantly
138
lesser than +1, which implies |θ1,2 − θ2,3| has a value significantly larger than 0 ◦.
Similarly, lower bound will be tight if cos(θ1,2 + θ2,3) results a higher value, which
implies that θ1,2 +θ2,3 should have a value close to 0 ◦. Considering different ranges of
values which θ1,2 and θ2,3 may assume, three possible cases are shown in Figure 4.12:
1. Case I: If both angles are small (Figure 4.12a), their difference will be even
smaller and their sum will also be a relatively small number. Therefore both
upper and lower transitive bounds will approach +1. This ensures tight upper
and lower bounds because in this case, the bounded correlation will also be very
high.
2. Case II: If one angle is small while the other is large (Figure 4.12b), then their
difference will be large, resulting in a tight upper bound, and their sum will also
be a relatively large number, resulting in a loose lower bound.
3. Case III: If both of the angles are large (Figure 4.12c), then their difference will
be a small number, resulting in a very loose upper bound while their sum will
be a significantly larger number, resulting in a very loose lower bound.
In these three cases, Case I yields tight upper and lower bounds and can potentially
be exploited for computation elimination. However, practically, this case may happen
quite infrequent because it is less likely to get all of the three image patches to be
highly correlated. Case III yields loose upper and lower bounds therefore this case
cannot be exploited for computation elimination. Case II yields a tight upper bound,
and requires that one of the two bounding correlations has high magnitude, may be
exploited for computation elimination.
We have experimentally studied the characteristics of upper and lower transitive
bounds based on angular distance. Figure 4.13 shows the variation of these bounds
with the variation of ρ1,2 and ρ2,3 on a real image dataset. In Figure 4.13, each pair
of upper and lower bounds corresponds to a fixed value of ρ1,2, while the variation
along the x-axis is due to the variation in ρ2,3 on consecutive pixel positions in the
bigger image. From Figure 4.13, it can be observed that if both of the correlations,
ρ1,2 and ρ2,3, are large, then both of the upper and lower bounds become tight. If
139
0 8
1
s
1, 23
0.4
0.6
0.8
1
d B
ound
s
1, 234
5
0
0.2
0.4
0.6
0.8
1
ficie
nt a
nd B
ound
s
1, 234
5
6Upper Bounds
0 6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
on C
oeffi
cien
t and
Bou
nds
1, 234
5
6
7
8
Upper Bounds
Lower Bounds
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Cor
rela
tion
Coe
ffici
ent a
nd B
ound
s
1, 234
5
6
7
891011
Upper Bounds
Lower Bounds
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160 180 200
Cor
rela
tion
Coe
ffici
ent a
nd B
ound
s
Pixel Position
1, 234
5
6
7
891011
Upper Bounds
Lower Bounds
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160 180 200
Cor
rela
tion
Coe
ffici
ent a
nd B
ound
s
Pixel Position
1, 234
5
6
7
891011
Upper Bounds
Lower Bounds
Figure 4.13: Variation of upper and lower bounds on correlation coefficient with thevariation of bounding correlations ρ1,2 and ρ2,3. In this figure, ρ1,2 varies across thecurves while ρ2,3 varies as the pixel position varies along a row in the reference image.Curves 1 to 5 shows the upper bounds for ρ1,2= 0.306, 0.441, 0.571, 0.722, and 0.896respectively. Curve 6 is the actual value of correlation coefficient. Curves 7 to 11show the lower bounds for ρ1,2= 0.896, 0.722, 0.571, 0.441, and 0.306 respectively.Cauchy Schwartz inequality based upper bound is always +1 and Cauchy Schwartzlower bound is always -1.
140
one of the two correlations is high and the other is low then the upper bound remains
tight while the lower bound becomes loose.
4.6 Conclusion
In this chapter we have presented the derivation of transitive bounds on correlation
based measures from two different approaches. The resulting bounds are compared
and it is proved that for the case of correlation coefficient angular distance based
bounds are tighter than Euclidean distance based bounds. Angular distance based
bounds are further studied from the tightness perspective and a practically useful
case is identified, which is Case II. In the following chapter, Case II will be exploited
for the development of transitive elimination algorithms.
Chapter 5
TRANSITIVE ELIMINATION ALGORITHMS FOR
CORRELATION BASED MEASURES
In order to get good elimination performance, tight transitive bounds are essentially
required. In Chapter 4, we have shown that angular distance based transitive bounds
are tighter than the Euclidean distance based bounds. Moreover, in Chapter 4, the
tightness characteristics of the angular distance based bounds were also studied and an
important case was identified in which tight upper transitive bound may be obtained.
Building upon the results of Chapter 4, in the current chapter, we move forward to
develop transitive elimination algorithms.
We manage to get tight upper transitive bound by ensuring at least one of the two
bounding correlations, to be of large magnitude. This is achieved by exploiting differ-
ent forms of autocorrelation found in the images to be matched as one of the bounding
correlation. Most of the template matching applications exhibit strong autocorrela-
tion in one of the following three forms: strong intra-reference autocorrelation, strong
inter-reference autocorrelation or the strong inter-template autocorrelation. To ex-
ploit each of these types, we have proposed three different transitive elimination
algorithms.
In the three transitive elimination algorithms proposed in this chapter, we use au-
tocorrelation as one of the two bounding correlation. In order to get the second
bounding correlation, we divide the search locations into two categories, bounding lo-
cations and bounded locations. We ensure the bounding category to be only a small
fraction of the total search locations while the bounded category to be the bulk of the
locations. Correlation at the bounding search locations must be computed because
it will be used as the second bounding correlation, while the computations at the
bounded search locations may be skipped by using the transitive bounds.
Following is an overview of the three transitive elimination algorithms:
141
142
1. Exploiting strong intra-reference autocorrelation (Mahmood and Khan, 2008)
Most natural images are low-frequency signals, hence exhibit high local spatial
autocorrelation. We divide the reference image into non overlapped windows of
equal size and compute local autocorrelation of the central block in each window
with its neighbors. This autocorrelation is used as first bounding correlation.
Template image is only matched with the central block in each window to get
the second bounding correlation. The correlation of the template with other
blocks in each window is used as bounded correlation and may be skipped by
using transitive bounds. This concept is illustrated in Figure 5.1.
The computation of the local autocorrelation is an algorithmic overhead there-
fore we also present an efficient algorithm for the computation of local autocor-
relation. As a result, this overhead turns out to be insignificant as compared to
the amount of computation elimination achieved in this algorithm.
2. Exploiting strong inter-reference autocorrelation (Mahmood and Khan, 2010)
Tracking an object in a surveillance video, checking for missing components on
a PCB production line or object inspection over conveyor belts requires one
template image to be correlated across multiple reference frames. In such an
application, the reference images are often highly correlated with each other,
because the camera is often static, a fact which can be exploited for high elimi-
nation. The temporal autocorrelation between consecutive frames is used as one
bounding correlation. The object template is fully correlated with a temporally
central frame and the resulting correlations are used as the second bounding
correlations. The correlation of object template with other frames is used as
bounded correlation and may be skipped by using the transitive bounds. This
concept is illustrated in Figure 5.2. The computation of autocorrelation be-
tween different frames is an overhead of this algorithm. We have formulated
an efficient algorithm for the computation of inter-frame autocorrelation, which
reduces the overhead to a significantly small amount.
3. Exploiting strong inter-template autocorrelation (Mahmood and Khan, 2007b):
Certain applications require a set of template images to be correlated with a
single reference image, for example, matching an aerial video with a satellite
143
image or exhaustive rotation-scale invariant template matching. In such cases,
if the set of templates has high autocorrelation, correlation of one template with
the reference image yields tight bounds on the correlation of all other templates
within the set with the same reference image. This concept is illustrated in
Figure 5.3. The correlation between templates is an overhead, but it is quite
small amount of computations therefore may be easily ignored. The correlation
of one template with the full reference image is a small part of the overall
required computations.
Transitive elimination algorithms are implemented in C++ and compared with cur-
rent known efficient algorithms including Enhanced Bounded Correlation Mattoccia
et al. (2008b), Bounded Partial Correlation Di Stefano et al. (2005), SAD with SEA
algorithm Li and Salari (1995) and PDE algorithm Montrucchio and Quaglia (2005),
FFT based frequency domain implementation William et al. (2007) and the fast ex-
haustive spatial domain implementation as discussed in Chapter 3. Experiments are
performed on a variety of real image datasets. While the exact speedup of the pro-
posed algorithms varies from experiment to experiment, we have observed speedups
ranging from multiple times to more than an order of magnitude.
5.1 Exploiting Strong Intra-Reference Autocorre-
lation
The most common case of template matching requires a single template to be corre-
lated with a single reference image. In such applications, local spatial autocorrelation
of the reference image may be exploited for fast template matching. For this purpose,
we divide the search locations within the reference image into non overlapping rect-
angular groups and compute local autocorrelation (AS) of the central location with
the neighboring locations of the group (Figure 5.1).
In each group, the template image is correlated with the central search location, to
yield Central Correlation (CC) and the correlation of the template with the remain-
ing locations is delayed until the evaluation of the elimination test. As shown in
144
Template Image
Reference Search Locations
Central Correlation
Figure 5.1: Groups of Search Locations: A ‘search location’ is the central pixel of apossible matching location of the template, within the reference image. Small squaresshow 81 search locations divided into non overlapping 3×3 groups. Each group has acentral search location shown in red and neighboring search locations shown in blue.The template always has to be correlated with central locations while its correlationwith the neighboring locations may be eliminated based upon the transitive bounds.
Figure 5.1, both local autocorrelation and central correlation are used as bounding
correlations to compute transitive bounds for the remaining locations, and those with
upper bounds less than a current known maximum (or less than a conservative initial
threshold) may be skipped, without any loss of accuracy. Since the spatial autocor-
relation with close neighbors is often high for natural images, this results in a tight
upper bound and hence high elimination at most locations. Complete pseudo-code
for this algorithm is shown as Intra-Ref-TEA.
In Algorithm 1 the speedup is obtained from bounded correlations, shown as dotted
arrows in Figure 5.1, whereas the bounding correlations constitute an overhead for
the algorithm. There are two types of overheads: the computation of the local spatial
autocorrelation of the reference image and the computation of the central correlation
in each group. For the first type, the standard implementation has computational
complexity of the order of O(mnpq) Mahmood and Khan (2008), where m× n is the
template size and p×q is the reference image size. However, redundant computations
can be eliminated by using a more efficient algorithm, which reduces the computa-
tional complexity to O(shswpq) (as discussed later in this section), where sh × sw is
the size of the group of locations.
145
Input: Template Image, Reference Image, AS, Cmax, Size of Group ofLocations
beginAS ⇐ Local Spatial Auto-correlation;Cmax ⇐ Initial correlation threshold;foreach Group of Search Locations do
CC ⇐ correlate(Template, Central Search Location);if CC > Cmax then
(Cmax, imax, jmax)⇐ (CC ,Central location indices);foreach Remaining Search Location Within the Current Group do
UpperBound⇐ ASCC +√
(1− AS2)(1− CC2);if UpperBound < Cmax then
Skip Current Location;endelse
C⇐ correlate(Template,Current Search Location);endif C > Cmax then
(Cmax, imax, jmax)⇐ (C, Current Location Indices );end
end
endendreturn imax, jmax,Cmax;
Algorithm 1: Intra-Ref-TEA
146
For the overhead due to central correlation, we observe that at least one correlation is
a must for each group. Since the number of groups are pq/shsw, and one correlation
of the template of size m × n is must for each group, the overhead cost is given as
O(mnpq/shsw).
The total overhead for both types can be written as the summation of the two over-
heads:
η = ξ(shswpq +mnpq
shsw), (5.1)
where ξ is a machine dependent constant. If k templates are to be matched with the
same reference image, the local autocorrelation overhead is further amortized to yield
a total overhead of
η = ξ(shswk
+mn
shsw)pq, (5.2)
Assuming the cost of spatial domain template matching to be ξmnpq, a theoretical
upper bound upon the speedup of Intra-Ref-TEA may be written as:
SpeedUp ≤ mn
( shswk
+ mnshsw
). (5.3)
As an illustration, if 10 templates each of size 64×64 pixels are to be matched with a
reference image (of any size) and group size is 5×5, the upper bound upon maximum
achievable speedup over spatial domain is 24.624.
Equation (5.3) indicates that more speedup is possible on larger group sizes. How-
ever, on larger sizes the local autocorrelation may decay down to a small value,
hence reducing the tightness of the transitive bounds and therefore resulting in re-
duction in percentage elimination. The proper choice of the group-size parameter,
therefore, depends upon the spread of the local autocorrelation function in the ref-
erence image and the magnitude of the known correlation maximum. The smallest
size of a symmetrical group is 3 × 3 search locations, which means that the cen-
tral search location will be correlated with its eight neighbors only. Practically one
may adapt to the proper group size by observing the computation elimination. For
sh × sw group size, computations at one location are mandatory, maximum number
of skipped locations are shsw− 1. If percentage of eliminated computations approach
the maximum limit (shsw − 1)/(shsw) × 100, the group size may be increased to
147
(sh + 1)× (sw + 1). This is because, approaching the maximum limit of elimination
indicates that the reference image may have a wider autocorrelation that may allow
even larger group size to get more speedup. On the other hand, if the computation
elimination reduces to less than the maximum limit of the smaller group given by
((sh − 1)(sw − 1)− 1)/((sh − 1)(sw − 1))× 100, then the size may be reduced to the
smaller group size, (sh − 1)× (sw − 1). We experimentally observed that for images
in our datasets, the group size of 5× 5 yields good computation elimination therefore
we have used the size of 5× 5 in all of our experiments.
As mentioned earlier, the computation of local autocorrelation can be made much
more efficient than its standard implementation by exploiting the redundancy in its
computation. We propose an algorithm in which the correlation between central
location rc, and a nearby location rn, is computed simultaneously over all groups,
through pixel by pixel multiplication of the reference image with its (wr, wc) translated
version, where (wr, wc) is the row, column difference between rc and rn. Then using
the running-sum approach, we compute the sum of all m × n blocks in the product
array, in just four operations per block. This results in correlation of each search
location with a (wr, wc) pixels translated location. We copy only required values
in a final LA-Array as shown in LA-Algorithm. The same process is repeated shsw
times, and each time pq integer multiplications and 4pq additions are done. Therefore
the overall complexity of the proposed algorithm for local spatial autocorrelation
computation is O(shswpq). Additional memory required by LA-Algorithm consists
of three arrays: Pr, Sf and LA, each of size equal to that of the reference image, p×q.
In LA-Algorithm, an efficient running sum algorithm is used to compute the summa-
tion over all m× n blocks of the products in Pr array. In this algorithm, summation
along the rows is computed first and then over these row-sums, summation along
the columns is computed. For the computation of row-sums, in each row first n
columns are summed up and then next sums are computed by adding the leading col-
umn and subtracting the trailing column. Once row-sums are complete, column-sums
are computed by using the same strategy over the row-sums. The pseudo-code for
Running-Sum-Algorithm is given as Algorithm 3. In this algorithm, for each internal
m × n block sum, only 4 operations are required. If there are p × q blocks to be
summed up, overall complexity of the Running-Sum-Algorithm is O(pq).
148
Input: Reference Image, Size of Group of Locations, Size of Template Imagebegin
Iref ⇐ Reference image;(m,n)⇐ Template Image Rows and Columns ;(sh, sw)⇐ Size of Group of Locations;for wr = 1 to sh do
for wc = 1 to sw doforeach pixel (i, j) in Reference-Image do
Pr(i, j)⇐ Iref(i, j)Iref(i+ wr, j + wc);endSf ⇐ Running sum of all m× n patches in Pr;Comment: Copy Only Required Values From Sf to LA-array ;foreach (i, j) in the final LA-array do
LA(i+ wr, j + wc)⇐ Sf (i+m, j + n);i⇐ i+ sh;j ⇐ j + sw;
end
end
endend
Algorithm 2: Local Autocorrelation (LA) Algorithm
Figure 5.2: Exploiting strong inter-frame autocorrelation for fast template matching.Template is fully correlated with only one frame (shown dark red), while for theremaining frames transitive bounds are computed.
149
Input: Reference Image, Template Image Sizebegin
(p, q)⇐ Reference-Image Rows and Columns;(m,n)⇐ Template-Image Rows and Columns;Comment: One pass through all reference image rows;for i⇐ 1 to p do
sum⇐ 0;Comment: Compute sum over first n columns, where: n < p;for j ⇐ 1 to n do
sum = sum + Pr[i, j];endSr[i, n]⇐ sum;Comment: Sr is a temporary array which contains row-sums;Comment: Onward use running sum;for j ⇐ n+ 1 to q do
Sr(i, j) = Sr(i, j − 1) + Pr(i, j)− Pr(i, j − n);end
endComment: One pass through all reference image columns.;for j ⇐ 1 to q do
sum⇐ 0;Comment: Compute sum over first m row-sums, where: m < q ;for i⇐ 1to m do
sum = sum + Sr[i, j];endSf [m, j]⇐ sum;Comment: Sf is final array to hold summation values;Comment: Onward use running sum;for i⇐ m+ 1 to p do
Sf (i, j) = Sf (i, j − 1) + Sr(i, j)− Sr(i, j − n);end
endend
Algorithm 3: Efficient Running-Sum-Algorithm
150
5.2 Exploiting Strong Inter-Reference Auto-Correlation
In some template matching applications, for example tracking objects across a video
sequence, one template image has to be correlated with multiple reference frames.
If the reference frames are correlated temporally, such as in the case of a static
surveillance camera, we can exploit their temporal autocorrelation (AT ) to get tight
transitive bounds. The concept is illustrated in Figure 5.2. In this scenario, the
central correlation (CC) is obtained by completely correlating the template image
with a specific reference frame. The correlation with the remaining frames is delayed
until evaluation of the transitive elimination test.
Using AT and CC as bounding correlations, we compute transitive upper and lower
bounds on all search locations in the remaining frames and those match locations
with upper bound less than the current known maximum (or an initial correlation
threshold), may be discarded without any loss of accuracy.
In some applications, for example automatically checking the missing components in
a circuit board manufacturing facility, the three image patches may happen to be
very similar. Therefore we may get both upper and lower bounds to be tight as given
by Case I. In such applications, all search locations where upper bound is less than
maximum of the lower bound, may also be skipped without any loss of accuracy. The
pseudo code for the complete algorithm is given as Inter-Ref-TEA.
This algorithm also carries an overhead but this time it is the temporal autocorre-
lation of the sequence of reference frames. We employ a similar strategy as in the
previous case and compute this overhead in O(pq), where pq is the size of reference
image. This is done by multiplying, pixel by pixel, the two reference frames and
then using the running sum approach to compute the summation of all patches of
size m× n in the product array. This summation of products is the cross-correlation
between corresponding blocks of the two frames. Since the complexity of running
sum algorithm is O(pq) and before that pq integer multiplications were carried out,
therefore overall complexity of inter-frame autocorrelation computation is of the order
of O(pq), which is significantly smaller than the complexity of one template corre-
lated with one reference frame in O(mnpq). Hence the computational cost of inter
151
frame autocorrelation computation is insignificant as compared to the overall cost of
template matching.
Input: Sequence of Reference Images, Template Image, Initial CorrelationThreshold
beginfc ⇐ Fully Correlated Reference Frame;CC ⇐ correlate(Template Image, fc);return fc, imax, jmax,max(CC);foreach of the Remaining Reference Image, fk do
AT ⇐ Autocorrelate fc with fk;Lmax ⇐ Compute Maximum of Lower Bound over fk;Cmax ⇐ Initial Correlation Threshold;if Lmax > Cmax then
Cmax = Lmax
foreach Search Location in fk do
UpperBound⇐ ATCC +√
(1− AT 2)(1− CC2);if UpperBound < Cmax then
Skip Current Locationelse
C⇐ Correlate Template With Current Locationif C > Cmax then
Update (Cmax, imax, jmax)⇐ (C, Current Location Indices )return fk, imax, jmax,Cmax
end
endend
Algorithm 4: Inter-Ref-TEA
5.3 Exploiting Strong Inter-Template Auto-Correlation
In some template matching applications, for example registration of an aerial video
with a satellite image (Shah and Kumar, 2003b), a sequence of template frames are
to be correlated with the same reference image. In such applications, if consecutive
template frames exhibit strong inter-template auto-correlation, the transitive bounds
may be used to speedup the template matching process. For this purpose, we divide
the sequence of template frames into groups such that all templates within each
group exhibit strong autocorrelation A′T with the temporally central frame. One such
152
Figure 5.3: Inter-Template-TEA: Exploiting strong inter-template autocorrelation forfast template matching.
153
group of templates is shown in Figure 5.3, in which the central template is shown in
red and central correlation CC is obtained by correlating the central template with
the complete reference image. Then using A′T and CC as bounding correlations, we
compute the transitive bounds upon the correlation of each remaining template in
the group. All match locations with upper transitive bounds less than the current
known maximum or the initial correlation threshold, may be discarded without any
loss of accuracy.
Input: A Sequence of Template Images, Reference Image, Size of Group ofTemplates, Initial Threshold
beginforeach Group of Templates do
tc ⇐Central-Template;CT ⇐ correlate(tc,Reference-Image);tn ⇐Non-central-templates in current group;foreach Template tn in the Current Group do
CA ⇐ correlate(tc, tn);foreach Search Location in the Reference Image do
UB ⇐ CT ∗ CA +√
(1− C2T )(1− C2
A);if UB < Corrmax then
Skip Current Location;else if cl ⇐ Current Location Values then
Corr⇐ correlate(cl, tn);if Corr > Corrmax then
(Corrmax, imax, jmax)⇐ (Corr,Current Location Indices);
endreturn te, imax, jmax,Corrmax;
end
endend
Algorithm 5: Inter Template(IT) TEA
In large template video sequences, the temporal autocorrelation may significantly vary
over time, requiring different group lengths. To find the appropriate group length at
runtime, we have developed a simple algorithm which adapts the length of current
group using the percentage computation elimination results of the previous group.
Let actual elimination obtained in the k − 1st group be ek−1act , and the maximum
154
possible elimination be ek−1max
ek−1max = (L[k − 1]− 1)/L[k − 1], (5.4)
where L[·] denotes the length of a group. Equation 5.4 is based on the fact that
one central correlation must be performed. If both of these eliminations are close to
each other, then autocorrelation may be under-utilized and the group length may be
increased, while if ek−1act is significantly less than ek−1
max, then autocorrelation is less than
expected, therefore group length, L[k − 1], must be decreased for the next group:
L[k] =
L[k − 1] + 2, if ek−1
max − ek−1act < δl
L[k − 1]− 2, if ek−1max − ek−1
act > δh
L[k − 1], otherwise
(5.5)
where δl and δh are low and high thresholds on elimination. Keeping a very low value
of δl will result in an increase in the number of groups and hence the overhead of the
number of fully correlated templates, while keeping a high value of δh may cause an
increase in computational cost due to reduction in elimination.
The only overhead in this algorithm is the computation of inter-template autocorre-
lation which is of the order of O(mn), where m × n is the template size. Therefore,
the computational cost of this overhead is negligibly small as compared to the overall
computations.
5.4 Experiments with Transitive Elimination Al-
gorithms
We have performed extensive empirical evaluation of the three different types of
template matching problems described in the last three sections. Our experiments
are performed on ten different datasets, consisting of 424 reference images and 8465
template images. The size of reference images ranges from 240 × 320 to 1394×2194
pixels, while the smallest template is of size 16× 8 and the largest contain 128× 128
155
pixels. No template image is selected from within a reference image and contains
various types of distortions described in each subsection.
The proposed algorithms were implemented in C++ and compared with the cur-
rently known fast exhaustive template matching techniques including FFT based
frequency domain implementation Lewis (1995), Zero-mean Bounded Partial Cor-
relation (ZBPC) Di Stefano et al. (2005), Zero-mean Enhanced Bounded Correla-
tion(ZNccEbc) Mattoccia et al. (2008b) and an exhaustive spatial domain implemen-
tation (Spat) Haralick and Shapiro (1992). We have implemented the ZBPC algorithm
and all experiments are carried out with the correlation area of 20% and bound area
of 80%, as recommended in Di Stefano et al. (2005). Implementation of ZNccEbc
algorithm provided by the original authors Mattoccia et al. (2008b) has been used
and the parameter representing the number of partitions, r, has been selected to be 8
if possible, as recommended in Mattoccia et al. (2008b). However, for template sizes
that are not divisible by 8, some suitable value of r has been selected as described
later.
Other than correlation based measures, we have also implemented Sum of Abso-
lute Differences (SAD) with Partial Distortion Elimination []Montrucchio and Quaglia
(2005) and Successive Elimination Algorithm (Li and Salari, 1995) optimizations. In
order to ensure a realistic comparison, we have used only sequential implementations
of all algorithms. The execution times are measured on an IBM machine with Intel
Core 2 CPU 2.13 GHz processor and 1GB RAM.
Experiments are divided into six subsections. First five subsections correspond to
the three proposed elimination algorithms using the correlation coefficient match
measure and in the sixth section, the elimination performance of different correlation
based measures is compared with each other. The datasets used in each group,
implementation codes, and experimental setup details along with complete results
are available on our web site: http://cvlab.lums.edu.pk/tea.
156
Figure 5.4: Satellite Image (SI) dataset used for experiments on exploiting intrareference autocorrelation.
157
Figure 5.5: Two Circuit Board (TCB) and Circuit Board (CB) datasets used forexperiments on exploiting intra reference autocorrelation.
158
Figure 5.6: Aerial Image (AI) dataset used for experiments on exploiting intra refer-ence autocorrelation.
159
5.4.1 Experiments with Intra-Reference Auto-correlation
These experiments are performed on four datasets: Satellite Images (SI) dataset,
Aerial Images (AI) dataset, Circuit Board (CB) dataset and Two Circuit Boards
(TCB) dataset (see Table 5.1 and Figures 5.4, 5.5 and 5.6). The images to be matched
have projective distortions due to difference in viewing geometry. In addition, the ref-
erence image of SI dataset has high brightness while the templates have low brightness
and contrast. These brightness and contrast variation were synthetically introduced
in the dataset. In CB and TCB datasets, templates and the reference images are
taken from different boards. In AI dataset, available from flicker.com under ‘Creative
Commons’ license, templates and the reference are aerial images of the same scene,
taken while the aircraft was in two different locations.
For the Intra-Ref-TEA algorithm, the results reported in Table 5.2 are for a group
size of 5 × 5 search locations for all datasets. For ZNccEbc algorithm, when the
number of rows of the template was not a factor of 8, we picked the factor which was
perceived to generate higher speedup. We selected r={8, 8, 8, 8, 8, 8, 17, 17, 17, 5, 9}for SI(a,b,c), CB(a,b,c), TCB(a,b,c) and AI (a,c) respectively. Dataset TCB(a,b,c) is
also experimented with r = {2, 3, 4}, which yielded execution times {329.51, 403.17,
455.63} seconds. These timings are significantly larger than the timings for r = 17,
as given in Table 5.2. The templates in AI.b dataset have 97 rows, which being a
prime number cannot be factorized, therefore one may select r = 1 or r = 97. We
experimentally compared the two choices and found r = 97 to be more efficient. In
Table 5.2, the ZNccEbc results on AI.b dataset are reported for r = 97.
Instead of using coarse-to-fine strategy to initialize ZNccEbc, ZBPC and Intra-Ref-TEA,
a fixed initial correlation threshold of ρ =0.80 has been used. Table 5.2 shows the
total execution time taken by each algorithm on each dataset. The execution time re-
ported for Intra-Ref-TEA includes the local auto-correlation computation overhead
which is {1.463s, 0.270s, 0.505s, 0.963s} for AI, CB, SI and TCB datasets respectively.
The execution time speedup of Intra-Ref-TEA over other algorithms is dataset de-
pendant. Maximum observed speedup over ZBPC is 15.549 times, over ZNccEbc is
4.464 times, over FFT is 24.626 times and over Spat is 22.680 times. Intra-Ref-TEA
160
Table 5.1: Dataset description for experiments with Intra-Ref-TEA
DatasetTemplate Sizes Total Reference
a b c Frames SizeSI 64×64 112×112 128×128 711 800×1000
TCB 34×34 51×51 68×68 579 807×1716CB 16×8 24×12 32×16 328 762×1000AI 95×95 97×97 99×99 171 1453×1548
has remained faster than other correlation coefficient based algorithms, while for CB
and SI datasets SAD has exhibited highest speed. However SAD badly suffers from
lack of accuracy over these datasets, due to brightness and contrast variations. For
SI, the accuracy of SAD is zero percent, and for the CB dataset, out of 328 templates
only 25 correctly matched. However, the accuracy of the correlation coefficient based
algorithms has remained 100% over all datasets.
Over a portion of the four datasets {AI.a, CB.c, TCB.c, SI.c}, the variation of %
computation elimination and average execution time per template has been studied
by varying the group size parameter to {3×3, 5×5, 7×7 and 9×9} (see Table 5.6).
The datasets AI.a and TCB.c has shown best performance at groups size of 5×5
while CB.c and SI.c performed best at 3×3 and 7×7, respectively. Thus by tuning
the group size parameter, speedups reported in Table 5.2 may further be improved
for CB and SI datasets, even though all experiments reported in Table 5.2 are for
5×5 group size.
Maximum, minimum, and average speed up of TEA over other algorithms and confi-
dence interval for confidence level of 0.95 is reported in Table 5.3. Average speedup
along with confidence intervals is plotted in Figure 5.7.
5.4.2 Experiments with Inter-Reference Auto-correlation: Fast
Feature Tracking
In this experiment, manually extracted features are tracked across Pedestrian (PED)
and Cyclist (CYC) video datasets. Both videos were acquired in a typical surveillance
scenario (see Table 5.7 and Figure 5.8). Both datasets contain dissimilarities produced
161
Table 5.2: Total execution time in seconds taken by Intra-Ref-TEA and other algo-rithms upon datasets described in Table 5.1
Dataset IR-TEA ZBPC ZNccEbc FFT SAD Spat
AI.a 108.89 1176.6 306.91 368.05 460.89 2375.9AI.b 141.61 1604.3 632.19 474.49 625.14 3099.4AI.c 185.53 2194.7 413.00 639.87 812.38 4207.8CB.a 9.71 24.14 29.34 193.07 1.45 35.62CB.b 17.14 51.86 28.76 188.03 2.91 70.66CB.c 26.58 81.72 31.94 191.25 5.87 118.58
TCB.a 63.80 675.70 249.03 880.89 50.23 870.78TCB.b 103.10 1426.0 263.64 848.97 130.38 1827.0TCB.c 160.75 2499.5 278.67 838.24 267.31 3260.90
SI.a 352.68 2332.0 460.55 1307.5 5.13 2557.0SI.a 449.84 5717.9 831.12 1152.7 12.13 6356.6SI.a 465.31 6882.4 961.84 1108.9 15.76 7667.3
Table 5.3: Maximum, minimum and average speedup of Intra-Ref-TEA (Table 5.2).Speedup is computed by divided the execution time of each algorithm by the executiontime of TEA. Confidence intervals zασ/
√N are also computed for α = .05 (confidence
level of 0.95), zα = 1.645 and σ is standard deviation of the speedup for N = 7datasets.
Dataset ZBPC ZNccEbc FFT SAD Spat PCEMaxSpeedup 15.55 4.46 19.88 4.41 22.68 1MinSpeedup 2.49 1.20 2.38 0.01 3.67 1AvgSpeedup 9.72 2.40 7.01 1.45 14.01 1ConfInterval 9.72±2.25 2.40±0.48 7.01±2.59 1.45±0.870 14.01±3.52 1±0
Table 5.4: Fast sequence to reference image alignment: computation elimination (%)comparison between different algorithms
Dataset TEA ZBPC ZNccEbc FFT SAD SpatialCB.1 87.278 2.832 96.201 00.00 63.517 00.00CB.2 90.528 2.524 94.426 00.00 52.131 00.00CB.3 92.137 2.407 95.310 00.00 50.683 00.00CB.4 93.045 2.211 96.275 00.00 49.365 00.00SI.1 86.368 9.035 93.563 00.00 99.841 00.00SI.2 89.851 10.487 93.521 00.00 99.807 00.00SI.3 91.767 10.365 93.687 00.00 99.806 00.00SI.4 93.03 10.403 94.220 00.00 99.826 00.00SI.5 94.044 10.778 94.165 00.00 99.81 00.00
162
ZNccEbc ZBPC FFT SAD Spat PCE0
5
10
15
Sca
led
Ave
rage
Exe
cutio
n T
imes
Figure 5.7: Plot of average execution time speedup of TEA on Video Geo-registrationdataset. Confidence intervals for confidence level of 0.95 are also plotted. Correspond-ing values may be seen from Table 5.3.
Table 5.5: Local autocorrelation computation time in seconds for LA-Algorithm andby the previous algorithm Mahmood and Khan (2008) for a group size of 5×5 searchlocations.
SI Dataset SI.1 SI.2 SI.3 SI.4 SI.5LAF Time 0.499 0.499 0.499 0.484 0.484CEA Time 8.937 13.780 18.639 24.216 30.184CB Dataset CB.1 CB.2 CB.3 CB.4 -LAF Time 1.671 1.671 1.671 1.656 -CEA Time 8.390 18.013 32.730 49.260 -
Table 5.6: Intra-Ref-TEA: Variation of percent computation elimination (%E) andaverage execution time (sec) per template (T) by varying the group size parameter(GrSz).
GrSz 3×3 5×5 7×7 9×9DSet T %E T %E T %E T %EAI.a 7.01 87.2 2.74 95.1 2.99 94.6 4.69 91.4CB.c 0.18 85.3 0.24 80.3 0.25 79.2 0.27 77.3
TCB.c 2.04 88.8 0.86 95.4 0.96 94.8 1.79 90.1SI.c 3.23 88.9 1.24 95.8 1.13 96.1 1.6 94.4
163
Figure 5.8: (a) Pedestrian dataset: four reference frames and 21 feature templates.(b) Cyclist dataset: four reference frames and 5 feature templates.
164
Figure 5.9: Fast component tracking dataset: 6 reference frames and 5 componenttemplates. Original images were taken from (Mattoccia et al., 2008a).
165
Table 5.7: Dataset description for fast feature tracking/fast component tracking ex-periments
Dataset # of Feat. Feat. Size # of Frames Frame SizePED 21 23 × 11 325 240 × 320CYC 5 17 × 17 38 240 × 320CT.a 6 63 × 63 16 479 × 640CT.b 1 178 × 62 16 479 × 640CT.c 1 136 × 104 16 479 × 640CT.d 1 147 × 63 16 479 × 640AT 20 95 × 95 25 1453 × 1548
by human motion as well as frame to frame illumination variations. Initial correlation
threshold is set to 0.70 for each of the ZNccEbc, ZBPC and Inter-Ref-TEA algorithms.
The partition parameter r in ZNccEbc has been selected to be 23 for PED and 17 for
the CYC. The total execution times for the Inter-Ref-TEA, in Table 5.8, include the
time of central correlation and inter-frame temporal autocorrelation overheads.
In these experiments Inter-Ref-TEA has remained significantly faster than all other
algorithms. The maximum execution time speedup over ZBPC is 7.147 times, over
ZNccEbc is 9.410, over FFT is 15.073, over SAD is 2.020 and over Spat is 7.400 times.
The slow execution times for the ZNccEbc algorithm are due to unfavorable template
sizes which increased the bound computation overhead. Percentage of eliminated
computations is also reported in Table 5.9. For the PED dataset the ZNccEbc algo-
rithm has obtained maximum elimination, while for the CYC dataset Inter-Ref-TEA
has obtained maximum computation elimination. Despite high computation elimina-
tion obtained by the ZNccEbc algorithm in PED dataset, it has remained significantly
slower than all other algorithms, including the exhaustive spatial domain implemen-
tation, Spat. It is because of the fact that, in ZNccEbc algorithm, the overhead cost
of bound computation is significantly larger than the elimination benefit obtained by
the skipped computations.
166
Table 5.8: Total time in seconds for datasets described in Table 5.8 for Inter-Ref-TEAand other algorithms.
Data IRTEA ZNccEbc ZBPC FFT SAD Spat
PED 58.30 548.61 340.38 268.91 110.27 374.51CYC 1.50 14.04 10.72 22.61 3.03 11.1CT.a 12.76 65.90 198.50 166.08 86.27 263.45CT.b 4.06 27.92 82.05 27.59 49.53 88.53CT.c 3.31 29.88 101.64 27.69 73.47 125.72CT.d 2.05 8.62 57.70 27.72 38.08 81.68AT 223.43 5235.1 22668 4171.3 6105.7 27099
Table 5.9: Percentage computation elimination in Inter-Ref-TEA and other elimina-tion algorithms.
Dataset IR-TEA ZNccEbc ZBPC SAD
PED 80.496 93.839 12.571 75.583CYC 93.749 89.607 8.451 77.259CT.a 92.250 97.691 24.957 69.192CT.b 88.150 93.047 8.103 49.063CT.c 91.029 98.631 19.751 43.889CT.d 93.162 99.560 29.585 57.397AT 95.755 98.947 17.733 78.318
167
5.4.3 Experiments with Inter-Reference Auto-correlation: Fast
Component Tracking
In this dataset there is no local motion and the component templates are signifi-
cantly larger in size as compared to the feature templates. Two types of datasets
are used: Component Tracking (CT) and Aerial Tracking (AT) (see Figure 5.9 and
Table 5.7). Original images in CT were taken from Mattoccia et al. (2008a) and
AT dataset is a portion of the AI dataset used in Subsection 5.4.1. Following frame
to frame variations were synthetically produced: affine photometric variations, non-
linear photometric variations, complementing, sharpening by edge-enhancements and
geometrically transforming the original images.
Initial correlation threshold of 0.70 has been used for Inter-Ref-TEA, ZNccEbc and
ZBPC. The central correlations in Inter-Ref-TEA have been computed by using the
FFT based implementation. For ZNccEbc, the r parameter has been selected to be
{7, 89, 8, 7, 5} for CT(a-d) and AT. For CT.b, we experimented with r = 2 as well,
however we found that the performance of ZNccEbc is better with r = 89, which
is reported in Table 5.8. From the total execution times reported in Table 5.8, the
maximum speedup observed by Inter-Ref-TEA over ZNccEbc algorithm is 23.431
times, over ZBPC is 101.46, over FFT is 18.669, over SAD is 27.327 and over Spat is
121.290 times.
Maximum, minimum, and average speed up of TEA as compared to other algorithms
with confidence interval for confidence level of 0.95 is reported in Table 5.10. Average
speed up of TEA and confidence intervals are also plotted in Figure 5.10.
5.4.4 Experiments with Inter-Template Auto-correlation: Video
Geo-registration
These experiments are performed on two datasets DS1 and DS2 (see Table 5.11 and
Figure 5.11). The two reference images are 800K pixels and 3000K pixels satellite
images taken from Google Earth, earth.google.com. The video frames are acquired
by modeling a flight simulation on satellite images of the same area but captured
168
Table 5.10: Maximum, minimum and average speedup of TEA for Video Geo-registration experiment (Table 5.8). Speedup is computed by divided the executiontime of each algorithm by the execution time of TEA. Confidence intervals zασ/
√N
are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ is standarddeviation of the speedup for N = 7 datasets.
Dataset ZNccEbc ZEBC FFT SAD Spat PCEMaxSpeedup 23.43 101.45 18.67 27.33 121.28 1MinSpeedup 4.20 5.84 4.61 1.89 6.42 1AvgSpeedup 10.57 35.15 11.48 13.35 42.57 1ConfInterval 10.57±4.74 35.15±24.18 11.48±3.49 13.35±6.75 42.57±28.97 1±0
ZNccEbc ZBPC FFT SAD Spat PCE0
10
20
30
40
50
60
70
80
Sca
led
Ave
rage
Exe
cutio
n T
imes
Figure 5.10: Plot of average execution time speedup of TEA on Video Geo-registrationdataset. Confidence intervals for confidence level of 0.95 are also plotted. Correspond-ing values may be seen from Table 5.10.
169
at different time of the year, provided by Microsoft Terra Server, recently named as
www.terraserver.com. In the simulation, the scale and orientation is assumed to
be approximately same as that of the reference images. The images to be matched
contain dissimilarities due to difference in imaging sensor and viewing geometry. Ad-
ditional dissimilarities were generated by reducing the dynamic range of templates in
DS1 to one third of the original range and the templates in DS2 were contrast re-
versed. Contrast reversals are frequently observed in practical situations, if matching
is to be done across infra-red and optical imagery.
For ZNccEbc, ZBPC and Inter-Template-TEA(IT-TEA), initial correlation threshold is
0.80 for DS1 and -0.85 for DS2. In ZNccEbc algorithm, r = 8 has been used for
both datasets. In IT-TEA, the correlation of the central templates with the reference
images has been done by using the FFT based implementation and length of the
group of templates is initialized to 7 for DS1 and 5 for DS2. For the remaining groups,
length was automatically adapted by using δl = 3% and δh = 10% in Equation (5.5).
Average group length has remained {8.6, 10.9, 11.6, 12.1, 12.4, 7.8, 7.2, 7.7, 8.2, 7.7}for DS1(a-e) and DS2(a-e) datasets respectively.
Execution time comparison of IT-TEA and other algorithms is given in Table 5.12.
For DS1, maximum execution time speedup of IT-TEA over ZBPC is 9.772 times, over
ZNccEbc is 1.685, over FFT is 3.610 and over Spat is 15.101 times. For DS2, maximum
observed speedup of IT-TEA over ZBPC is 10.218, over ZNccEbc is 6.376, over FFT is
3.057 and over Spat is 10.264 times. The low performance of ZBPC and ZNccEbc
on DS2 can be attributed to the fact that these algorithms have been developed
to find only positive maximum of the correlation coefficient, where as in case of DS2
negative peaks have to be searched. Transitive elimination algorithm does not require
any modification to search for negative peaks.
Maximum speedup, minimum speedup, and average speedup of TEA along with con-
fidence interval for confidence level of 0.95 is reported in Table 5.13. Average speedup
of TEA along with confidence intervals is plotted in Figure 5.12.
170
Table 5.11: Dataset details used for video geo-registration experimentsDataset # of Frames Frame Size Ref. Size Avg. ρmax
DS1.a 734 64 × 64 736 × 1129 0.939DS1.b 744 80 × 80 736 × 1129 0.961DS1.c 694 96 × 96 736 × 1129 0.963DS1.d 641 112 × 112 736 × 1129 0.961DS1.e 594 128 × 128 736 × 1129 0.958DS2.a 659 64 × 64 1394 × 2152 -0.935DS2.b 645 80 × 80 1394 × 2152 -0.921DS2.c 648 96 × 96 1394 × 2152 -0.874DS2.d 632 112 × 112 1394 × 2152 -0.924DS2.e 616 128 × 128 1394 × 2152 -0.794
Table 5.12: Video geo-registration: average execution time in seconds per templateframe.
Dataset IT-TEA ZNccEbc ZBPC FFT SAD Spat
DS1.a 1.217 1.366 6.415 4.223 0.107 8.455DS1.b 1.156 1.675 8.575 4.173 0.156 13.587DS1.c 1.413 2.314 12.736 4.161 0.258 18.553DS1.d 1.669 2.787 16.310 4.261 0.436 24.018DS1.e 1.977 3.333 16.715 4.266 0.610 29.855DS2.a 6.394 16.725 32.163 19.547 2.969 32.848DS2.b 8.614 28.378 53.303 19.552 4.976 53.760DS2.c 12.290 42.751 74.399 19.606 7.030 74.933DS2.d 15.534 58.995 98.432 19.458 9.374 99.027DS2.e 12.250 78.110 125.170 19.563 11.959 125.740
Table 5.13: Maximum, minimum and average speedup of TEA for Video Geo-registration experiment (Table 5.12). Speedup is computed by divided the executiontime of each algorithm by the execution time of TEA. Confidence intervals zασ/
√N
are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ is standarddeviation of the speedup for N = 10 datasets.
Dataset ZNccEbc ZEBC FFT SAD Spat PCEMaxSpeedup 6.376 10.218 3.60 0.976 15.10 1MinSpeedup 1.12 5.03 1.25 0.088 5.14 1AvgSpeedup 2.71 7.37 2.45 0.42 9.54 1ConfInterval 2.71±0.83 7.37±0.98 2.45±0.43 0.42±0.14 9.54± 2.0 1±0
171
Figure 5.11: Video geo-registration dataset: (a) DS1 (b)DS2. In bothdatasets, reference images are taken from earth.google.com and templates from ter-raserver.microsoft.com.
172
ZNccEbc ZBPC FFT SAD Spat PCE0
2
4
6
8
10
12
Sca
led
Ave
rage
Exe
cutio
n T
imes
Figure 5.12: Plot of average execution time speedup of TEA on Video Geo-registrationdataset. Confidence intervals for confidence level of 0.95 are also plotted. Correspond-ing values may be seen from Table 5.13.
Figure 5.13: Rotation and Scale invariant template matching: Nine reference imagesand 14 templates.
173
5.4.5 Experiments with Inter-Template Auto-correlation: Ro-
tation / Scale Invariant Template Matching
Consecutive rotated and scaled versions of an object are generally highly correlated.
We have used this correlation to speedup the exhaustive rotation/scale invariant
template matching by using IT-TEA. These experiments are performed on optical
character recognition dataset using scanned pages from multiple books. The template
images consist of 14 letters: {a, c, e, g, i, k, m, o, p, s, v, w, x, z}, which were
extracted from one of the scanned image (see Table 5.14 and Figure 5.13). Each
template is rotated from -5 ◦ to +5 ◦ and scaled from -8% to +8% at a step size of
2%, resulting in 99 rotated/scaled versions. All of these rotated/scaled versions are
exhaustively correlated with each of the 14 reference images, with varying background
colors, arbitrary rotations, arbitrary scaling, and aliasing effects due to poor scanner
resolution and with broken and irregular character boundaries.
Out of 99 rotated/scaled versions of each template, only one template (with zero
rotation and unit scaling) is fully correlated with the complete reference image while
for all of the remaining templates, transitive bounds are computed. In these exper-
iments, initial correlation threshold is set to 0.80 for ZBPC, ZNccEbc and IT-TEA. In
ZNccEbc, partition parameter r is set to be: {19, 19, 17, 13, 5, 5, 9, 9, 5, 9, 9, 19, 9, 19}respectively for the 14 templates given in Table 5.14.
For each algorithm, total execution time including all overheads is shown in Table
5.15. The maximum execution time speedup obtained by IT-TEA is 28.292 times
over ZBPC, 30.322 times over ZNccEbc, 126.70 times over FFT, 12.674 times over SAD
and 29.261 over Spat. On this dataset, the speedup obtained by IT-TEA over other
algorithms is enhanced because of the small template sizes and high autocorrelation
between consecutive rotated/scaled template versions.
174
Table 5.14: Rotation and Scale invariant template matching: dataset for characterrecognition
Letter Tmp. Size Ref. Size Letter Tmp. Size Ref. Sizea 19×14 679×889 o 18×17 671×1215c 19×15 755×977 p 25×17 702×1206e 17×15 552×1005 s 18×12 711 × 1224g 26×16 593×1209 v 18×17 681 × 1271i 25×8 907×1263 w 19×23 756 × 1341k 25×17 684×1031 x 18×16 475 × 1463m 18×24 647×1046 z 19×15 291 × 758
Table 5.15: Rotation and Scale invariant template matching: Total execution time(in seconds) for IT-TEA and other algorithms.
Dataset IT-TEA ZNccEbc ZBPC FFT SAD Spat
a 43.06 1164.7 849.77 4975.6 322.97 836.80c 41.20 1187.2 880.01 4769.8 372.29 891.00e 40.12 1091.7 784.39 4808.9 308.02 807.84g 50.37 974.53 1230.8 4761.0 516.43 1245.2i 47.64 445.77 682.37 4804.7 253.40 679.14k 45.45 643.08 1285.9 4756.3 578.13 1289.0m 63.67 760.49 1250.7 5132.5 467.56 1311.3o 42.67 699.70 921.67 4803.1 370.72 955.35p 46.43 559.51 1286.8 4845.2 546.72 1288.7s 38.01 680.66 712.88 4815.8 260.81 706.61v 39.75 682.33 927.68 4829.8 387.66 954.36w 45.21 1311.5 1264.9 5046.7 571.79 1322.9x 41.85 733.22 878.91 4886.5 399.99 888.03z 40.22 1219.6 900.72 4816.2 457.56 893.48
175
Table 5.16: Total execution time (T ) (sec) and average percent elimination (E) forcross-correlation (ψ), NCC (φ) and the correlation-coefficient (ρ).
Dataset Tψ Tφ Tρ Eψ Eφ EρPED 23.614 45.75 58.318 99.36 82.24 80.5DS1.a 213.66 967.32 732.12 95.13 75.4 83.21DS1.b 316.42 1321.6 861.47 94.66 70.5 85.32DS1.c 548.14 1387.9 980.37 92.56 71.42 85.79DS1.d 777.27 1462.1 1069.9 89.13 72.34 86.29DS1.e 931.09 1572.1 1174.4 85.26 72.6 86.49
5.4.6 Performance Comparison of Different Correlation Based
Measures
We compared the execution times and the computation elimination performance of the
three correlation based similarity measures: cross-correlation, NCC and correlation-
coefficient on six datasets: DS1 (a, b, c, d, e) and PED. For DS1, IT-TEA and for
PED Inter-Ref-TEA has been used for comparison. The total execution time and
the average computation elimination per frame is reported in Table 5.16.
In these experiments we observe that cross-correlation is the fastest of the three
measures. Maximum speedup obtained by cross-correlation over NCC is 4.527 times
and over correlation coefficient is 3.427 times. NCC was found to be faster than
correlation coefficient over PED datasets while slower on DS1 datasets. This may
be because of the fact that NCC is not robust to additive intensity variations and
therefore in the presence of such variations the magnitude of NCC maximum may
reduce, causing a reduction in elimination and an increase in execution time. However,
it may be pointed out that the relative speedups are data dependent and may vary
for other datasets.
5.5 Conclusion
In Chapter 5 we have demonstrated that the transitive property of the correlation
based match measures may be exploited for fast template matching by developing
176
different elimination algorithms. Three variations of transitive elimination algorithms
are presented which cater different types of the template matching problems. The
proposed algorithms have exhaustive equivalent accuracy and are compared with
currently known fast exhaustive techniques on a wide variety of real image datasets.
Our empirical results, based on the correlation of almost 8465 templates with 424
reference images, demonstrate that the proposed algorithms have outperformed the
current known algorithms by a significant margin.
Chapter 6
PARTIAL CORRELATION ELIMINATION
ALGORITHMS
Bound based computation elimination algorithms are of special interest in mission
critical applications because these algorithms guarantee exhaustive equivalent ac-
curacy despite large amount of skipped computations. In Chapter 3, elimination
algorithms were broadly divided into two categories, complete elimination algorithms
and partial elimination algorithms. The category of complete elimination algorithms
contains transitive elimination algorithms discussed in the last chapter. Transitive
elimination algorithms exploit strong autocorrelation present in a template matching
system to skip computations and to obtain high speedup. Strong autocorrelation may
be found in many template matching systems, however it cannot be guaranteed in
general. As discussed in the last two chapters, in the absence of strong autocorre-
lation, the speed up performance of transitive elimination algorithms may degrade.
Therefore, in such cases, a more generic elimination scheme is required which should
not be dependent on the autocorrelation function. In the current chapter, we propose
partial correlation elimination algorithms for correlation coefficient based fast tem-
plate matching. These algorithms are generic and the performance of these algorithms
is independent of the autocorrelation function of the template matching system.
Most of the existing partial elimination algorithms have been developed for sim-
ple image match measures including Sum of Absolute Difference (SAD) and Sum of
Squared Differences (SSD). However, these measures are not invariant to brightness
and contrast variations which frequently occur in most of the practical problems. As
compared to SAD and SSD, correlation coefficient is more robust and also invariant to
linear intensity distortions, and therefore preferred if such distortions are present. A
wide variety of applications using correlation coefficient as a preferred match measure
have been listed in Chapter 1. Therefore an efficient partial elimination scheme for
correlation coefficient based template matching is of significant practical importance.
177
178
The partial elimination algorithms for SAD and SSD, for example Partial Distor-
tion Elimination (PDE) algorithms and Sequential Similarity Detection Algorithms
(SSDA), exploit the fact that these measures grow monotonically as consecutive pix-
els are processed within a block at a particular search location. The final value of
distortion is always equal to or larger than the intermediate values. Therefore, the
basic underlying principle of these algorithms is to skip the remaining computations,
as soon as the current value of distortion exceeds previous known minimum. It is be-
cause of the fact that, a location with partial distortion larger than the current known
final distortion cannot compete the currently known best match location. Therefore,
all such locations are skipped without any loss of accuracy.
Partial elimination techniques as applied to SAD or SSD cannot be extended in a
straight forward manner to speed up correlation coefficient based image matching, be-
cause of the two unfavorable properties. Firstly, the growth of correlation coefficient
is non monotonic as consecutive pixels within a block are processed. Therefore any
intermediate value may not be guaranteed to be larger than the final value. Secondly,
the best match location over the entire search space is often defined as the location
exhibiting maximum value of correlation coefficient. Hence a previously known max-
imum may not be exploited to discard the remaining computations of a block at an
intermediate stage. This is why partial elimination algorithms have largely been con-
sidered inapplicable to correlation coefficient based template matching (Brown, 1992;
Ziltova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a), with
the exception of recently proposed technique (Mattoccia et al., 2008b), which we have
discussed in detail in Chapter 3.
One of the major contributions of this thesis is the development of partial elimi-
nation algorithms for correlation coefficient based fast template matching. In these
techniques, we extend the concept of PDE and SSDA algorithms for correlation coeffi-
cient based template matching, therefore, by analogy, we have named these techniques
as Partial Correlation Elimination (PCE) algorithms. PCE algorithms are based on
a monotonic formulation of correlation coefficient. To the best of our knowledge, this
form has not been proposed before us, to speed up the template matching process. If
correlation coefficient is computed using this formulation, the similarity starts from
+1 at the first pixel of a block and monotonically decreases to the final value till
179
0 11 9.99E-012 9.99E-013 9.89E-014 9.82E-015 9.81E-016 9.78E-017 9.71E-018 9.64E-019 9.59E-01
10 9.58E-0111 9.57E-0112 9.56E-0113 9.56E-0114 9.53E-0115 9.45E-0116 9.43E-0117 9.34E-0118 9.26E-0119 9.25E-0120 9.22E-0121 9.22E-0122 9.13E-0123 9.10E-0124 9.02E-0125 8.73E-0126 8.59E-0127 8.59E-0128 8.59E-0129 8.57E-0130 8.45E-0131 8.45E-0132 7.94E-0133 7.87E-0134 7.60E-0135 7.49E-0136 7.49E-0137 7.45E-0138 7.35E-0139 7.35E-0140 7.24E-0141 7.17E-0142 7.16E-0143 7.14E-0144 7.13E-01
-0.21
0
0.21
0.42
0.63
0.84
1.05
0 8 16 24 32 40 48 56 64
Gro
wth
of C
orre
lati
on C
oeff
icie
nt
Pixels Processed
0.680
0.464
0.108
-0.210
1.00
-0.3
-0.1
0.1
0.3
0.5
0.7
0 8 16 24 32 40 48 56 64
Gro
wth
of C
orre
lati
on C
oeff
icie
nt
Pixels Processed
0.464
0.680
0.108
-0.210
Figure 6.1: If computations are done with traditional correlation-coefficient formu-lations, partial value of similarity grows non-monotonically. Such growth patternsare shown for four different pairs of 8 × 8 pixels image blocks, yielding correlation-coefficient to be {0.464, 0.680, 0.108, -0.210}.
the end of the computations (Figure 6.1). Any intermediate value of similarity is
always larger than (or equal to) the final value. The speed up occurs because at any
point during the computation, if similarity happens to be less than a previous known
maximum (or an initial threshold), the remaining computations become redundant
and may be skipped without any loss of accuracy. As the total amount of skipped
computations increases, the template matching process accelerates accordingly. In
this chapter we present only the basic mode PCE algorithm while further extensions
of PCE algorithm will be discussed in Chapter 7.
In PCE algorithm, the amount of eliminated computations depends on the location
and value of the current known maximum. High value of maximum found at the start
of the search process may significantly increase computation elimination and hence
reduce the execution time. If an approximate position of the maximum is known
from the context of the problem, such as in block motion estimation, search process
may start from that location. If no such guess is known, we propose an intelligent
re-arrangement of PCE computations, motivated by the two-stage template matching
technique (Vanderbrug and Rosenfeld, 1977), as a means of finding a high threshold
180
0 11 9.99E-012 9.99E-013 9.89E-014 9.82E-015 9.81E-016 9.78E-017 9.71E-018 9.64E-019 9.59E-01
10 9.58E-0111 9.57E-0112 9.56E-0113 9.56E-0114 9.53E-0115 9.45E-0116 9.43E-0117 9.34E-0118 9.26E-0119 9.25E-0120 9.22E-0121 9.22E-0122 9.13E-0123 9.10E-0124 9.02E-0125 8.73E-0126 8.59E-0127 8.59E-0128 8.59E-0129 8.57E-0130 8.45E-0131 8.45E-0132 7.94E-0133 7.87E-0134 7.60E-0135 7.49E-0136 7.49E-0137 7.45E-0138 7.35E-0139 7.35E-0140 7.24E-0141 7.17E-0142 7.16E-0143 7.14E-0144 7.13E-01
-0.21
0
0.21
0.42
0.63
0.84
1.05
0 8 16 24 32 40 48 56 64
Gro
wth
of C
orre
lati
on C
oeff
icie
nt
Pixels Processed
0.680
0.464
0.108
-0.210
1.00
-0.3
-0.1
0.1
0.3
0.5
0.7
0 8 16 24 32 40 48 56 64
Gro
wth
of C
orre
lati
on C
oeff
icie
nt
Pixels Processed
0.464
0.680
0.108
-0.210
Figure 6.2: If computations are done by our proposed monotonic formulation, par-tial value of similarity monotonically decreases from +1.00 to correlation-coefficientbetween the two image blocks. Monotonic growth pattern is shown for the same fourpairs of 8 × 8 pixels image blocks as used in Figure 6.1.
early in the search process. In the first stage of the proposed technique, only a small
portion of the template is matched at all search locations. Based on the partial
result, complete correlation coefficient is computed at the best match location, which
is used as the initial threshold in the second stage. By using this strategy, we may
quickly find a high threshold at no additional computational cost and speed up is
obtained at no loss of accuracy. This initialization scheme is effective for small to
medium sized templates, while for larger template sizes initialization of PCE with
coarse-to-fine scheme is more efficient. Two-stage PCE is exact, having exhaustive
equivalent accuracy. In contrast, the existing two-stage algorithm for normalized
cross-correlation (NCC) developed by Goshtasby et al. (1984), is approximate with
non-zero probability of missing NCC maximum.
181
6.1 Monotonic Formulation of Correlation Coeffi-
cient
Correlation coefficient between a template image t and any search location in the
reference image ri, i ∈ {1, 2, 3, ...p}, each of size m× n pixels, is defined as (Haralick
and Shapiro, 1992)
ρt,i =
m∑x=1
n∑y=1
(t(x, y)− µt) (ri(x, y)− µi)√m∑x=1
n∑y=1
(t(x, y)− µt)2
√m∑x=1
n∑y=1
(ri(x, y)− µi)2
. (6.1)
This may be written as the normalized dot product of two vectors,
ρt,i =m∑x=1
n∑y=1
δt(x, y)
σt
δi(x, y)
σi, (6.2)
where δt and δi are the mean-subtracted versions of the template and reference lo-
cation respectively and σt and σi are proportional to the standard deviation of the
respective signals.
Partial elimination algorithms require a monotonic behavior of the partial similarity
value, which is the summation in Equation 6.2. We observe that in the currently used
form of correlation coefficient, no monotonic behavior exists. This is because of the
fact that δt(x, y) evaluates to a positive number if t(x, y) > µt, a negative number if
t(x, y) < µt, and zero if t(x, y) = µt. Similarly, δi(x, y) may also evaluate to be posi-
tive, negative or zero, depending upon the value of µi. After processing each location
(x, y), the summation in Equation 6.2 may increase if both δt(x, y) and δi(x, y) have
same sign, may decrease if δt(x, y) and δi(x, y) have opposite signs, or may remain
same if anyone of δt(x, y) and δi(x, y) is zero. Therefore, the partial similarity value
will vary non-monotonically and no direct relationship may be established between
any intermediate value and the final value (Figure 6.1).
To derive a form of correlation coefficient which exhibits monotonic growth, we ob-
serve that the norm of each of the vectors in Equation 6.2 is unity. This implies
182
that
m∑x=1
n∑y=1
δ2t (x, y)
σ2t
+m∑x=1
n∑y=1
δ2i (x, y)
σ2i
= 2. (6.3)
From 6.2 and 6.3:
ρt,i = 2 +m∑x=1
n∑y=1
δt(x, y)δi(x, y)
σtσi− δ2
t (x, y)
σ2t
− δ2i (x, y)
σ2i
. (6.4)
Rearranging and simplifying:
ρt,i = 1− 1/2m∑x=1
n∑y=1
(δt(x, y)
σt− δi(x, y)
σi)2. (6.5)
The summation in 6.5 may be viewed as the square of the normalized Euclidean dis-
tance and each term in this summation is the square of the distance or dissimilarity
presented by the corresponding pixel. Therefore, as consecutive pixels are processed,
only positive values (or zeros) are subtracted from the previous value of partial simi-
larity.
We find that the formulation of correlation coefficient as given by Equation 6.5,
has already been reported in the statistics literature, for example see Rodgers and
Nicewander (1988). However, its implications for the template matching problem and
its use for the computational speed up have not been identified before us (Mahmood
and Khan, 2007a).
6.2 Basic Mode Partial Correlation Elimination Al-
gorithm
The best match of a template image t over p search locations may be defined as the
search location maximizing correlation coefficient ρt,i
imax = arg maxi|ρt,i| ∀ 1 ≤ i ≤ p. (6.6)
183
Let λt,i(u, v) be the value of partial similarity between t and ri, computed over u rows
and v columns, such that 0 ≤ u ≤ m and 0 ≤ v ≤ n. From 6.5, it follows that
λt,i(u, v) = 1− 1/2u∑x=1
v∑y=1
(δt(x, y)
σt− δi(x, y)
σi)2. (6.7)
λt,i(u, v) will monotonically decrease from +1 to ρt,i as (u, v) increases from (0, 0) to
(m,n). Due to monotonic decreasing pattern of λt,i(u, v), it is an upper-bound on
ρt,i:
λt,i(u, v) ≥ ρt,i ∀(0 ≤ u ≤ m, 0 ≤ v ≤ n). (6.8)
The key idea of PCE algorithm is, after processing some initial number of pixels,
(u, v) = (u0, v0) at ri, if λt,i(u0, v0) is found to be less than a previous known correla-
tion coefficient maximum or correlation threshold, ρth, then final value of correlation
coefficient ρt,i is also guaranteed to be less than ρth. Therefore further computations
between t and ri become redundant and may be skipped without impacting the search
for the best match location. The comparison of λt,i(u, v) with ρth is called the elimi-
nation test. If λt,i(u, v) < ρth, the elimination test is true and consequently, remaining
computations may be skipped. If λt,i(u, v) ≥ ρth, then the elimination test is false
and computations must be continued further. After processing more pixels, the elim-
ination test needs to be re-evaluated, as the partial similarity value may have further
reduced. Thus to correlate a block, the elimination test may have to be evaluated
multiple times, until the test is true, or the computations get completed otherwise.
6.3 Two-Stage Basic Mode PCE Algorithm
Like other elimination algorithms, in Basic Mode PCE algorithm as well, the amount
of eliminated computations strongly depends upon the position of a maximum in
the search process. A high maximum found at the start of the search process may
enhance the elimination performance significantly, as compared to a maximum found
near the end of the search process. For small sized templates, we find that coarse-to-
fine scheme (Mattoccia et al., 2008a) fails to find an effective initial threshold. This
is because the coarser representation of a smaller sized template become too small to
184
remain unique and may match at any arbitrary location. As an example, if 20 × 20
pixels template is low pass filtered by a mask of size 3 × 3 and sub-sampled once,
it reduces to 9 × 9 pixels. One more low-pass filtering and sub-sampling reduces
its size to 3 × 3 pixels with all values close to the average intensity. The remaining
information in the template image is too small to yield a correct match.
Due to lack of efficiency of coarse-to-fine scheme, we have to develop some other
initialization scheme for small templates. We find the concept of two-stage template
matching (Vanderbrug and Rosenfeld, 1977) to be quite helpful in this regard. For
small template sizes, we can rearrange computations in PCE algorithm by dividing
the template into two portions, a smaller portion to be matched in the first stage and
a larger portion to be matched in the second stage. At the end of the first stage, we
choose a search location with maximum value of the partial correlation, and perform
complete computations at this location. The final value of correlation coefficient
found at this search location is used as initial threshold in the second matching stage.
In the two-stage Basic Mode PCE algorithm, we select only one elimination test for
all rows in the first stage while one test for each row in the second stage. If there
are n rows in the template and k are the number of rows to be matched in the first
stage then total number of elimination tests are n − k + 1. The first stage consists
of one scan of the complete search space. In this scan, at each search location, only
k template rows are matched using the basic monotonic formulation and the partial
results are preserved. The search location with best partial results over k rows is
considered to be the guess of the best match location. At this location, complete
correlation coefficient, ρth, is computed which is used as threshold in the following
stage.
The second stage consists of again one more scan of the search space. During this
scan, at each valid search location, elimination test is executed by using the threshold
found at the end of the first stage. Search locations where partial correlation result
over k rows, λt,i(k, n), is found to be less than ρth, are eliminated from the search
space. At each of the non-skipped location, computations are starting from k + 1st
row, until that location gets eliminated or the computations get completed. If final
value of correlation is larger than ρth, then ρth will immediately be updated.
185
The overheads of two-stage PCE include one more scan of the search space and one 2D
memory array required to store the temporary results. The results of two-stage PCE
are same as the exhaustive template matching techniques without any loss of accuracy.
Our proposed two-stage technique is better than the technique proposed by Goshtasby
et al. (1984) because that was an approximate technique with no guarantee of always
finding the correlation maximum.
6.4 Overheads of Basic Mode PCE Algorithm
The overhead of Basic Mode PCE may be found by comparing it with the tradi-
tional fast spatial domain implementation (Haralick and Shapiro, 1992) of correlation
coefficient:
ρt,i =1
σtσiψt,i −mn
µtσt
µiσi, (6.9)
where ψt,i is cross-correlation term :
ψt,i =m∑x=1
n∑y=1
t(x, y)ri(x, y). (6.10)
The speed up in this form comes from the efficient computation of the first and second
order statistics, µt, µi, σt, and σi, which can be computed at any location in a few
operations via the running sum approach.
Computationally, the traditional implementation via Equation 6.9 will be more effi-
cient than the computation of correlation coefficient via Equation 6.5, if the entire
computations are to be completed without any computation elimination. This is
because the dominant cost of computing ρt,i via Equation 6.9 is that of ψt,i, which
requires O(2mn) operations at one search location. In contrast, Equation 6.5, if im-
plemented efficiently, requires at least O(5mn) operations at each search location.
Although both implementations have same growth rates, O(mn), the constant factor
is 2.5 times bigger for monotonic formulation. However, we experimentally observe
that this factor is actually smaller than 2.5 times and is dependent on the template
sizes. For very small template sizes, 4 × 4 pixels, if no elimination is done with
186
Figure 6.3: Two frames from each of the movies: ‘Fast and Furious’, ‘Batman Begins’,‘King Kong’, ‘Under World’, ‘Spider Man’ and ‘Pink Floyd’ are shown from top tobottom respectively. Horror movies like ‘Under World’ contain significant frame toframe brightness variations.
187
the monotonic formulation, both formulations take same amount of execution time.
However, as the template size increases, this factor becomes larger than 1.00. For
templates of size 32 × 32, we observe monotonic formulation with no elimination is
slower than the traditional form by a factor of almost 1.5.
For smaller sized templates, the monotonic overhead is small and is easily offset by
the amount of eliminated computations. For larger sized templates, the overhead
of Basic Mode PCE may erode some of the computational advantage realized by
the computation elimination. Therefore, another version of PCE algorithm has been
developed which is faster on larger sized templates. We have named this version as
Extended Mode PCE and discussed in Chapter 7.
6.5 Experiments with Basic Mode PCE Algorithms
We have performed extensive empirical evaluation of the proposed algorithms on the
commonly used small template sizes ranging from 4× 4 pixels to 21× 21 pixels. For
larger templates sizes, extended mode PCE will be used, which is discussed in Chapter
7. In the datasets used in this chapter, each template is an independently captured
image, containing natural, and in some cases, synthetically generated distortions.
The basic mode partial correlation elimination algorithm is implemented in C++
and compared with the currently known fast exhaustive template matching tech-
niques including a highly optimized implementation of FFT known as FFTW3 (Frigo
and Johnson, 2005), Zero-mean Enhanced Bounded Correlation(ZNccEbc) (Mattoccia
et al., 2008b) and an exhaustive spatial domain implementation (Spat) (Haralick and
Shapiro, 1992) based on 6.9. The implementation of ZEBC algorithm was provided
by the original authors (Mattoccia et al., 2008b). Besides correlation coefficient, we
have also implemented Sum of Absolute Differences (SAD) with Partial Distortion
Elimination (PDE) (Montrucchio and Quaglia, 2005) and Successive Elimination Al-
gorithm (SEA) (Li and Salari, 1995) optimizations.
The execution times are measured on Dell Inspiron 6400, with Intel Core 2 CPU
2.13 GHz processor and 2GB physical memory. The datasets, executable scripts and
188
Table 6.1: Dataset description for the block motion estimation experimentsMovieName Dataset FrameSize BlockSize #BlocksFastFurios FF4 256×608 4×4 18151
BatmnBegns BB8 288×704 8×8 10432KingKong KK8 240× 640 8×4 16610
UnderWorld UW12 272 ×640 12×12 4498SpiderMan SM12 224 × 512 12×8 4515PinkFloyd PF12 287×346 12×4 7804Metallica MT16 240×352 16×16 1320Blade-2 BL16 336 × 608 16×12 4117
ReturnKing RK16 259 × 640 16×8 4948MissionImp MI16 218×516 16×4 6621PiratCaribb PC8 368×720 8×12 9761
detailed results are available on our web site: http://cvlab.lums.edu.pk/pce.
6.5.1 Block Motion Estimation Experiments Using Basic Mode
PCE
These experiments are performed in the scenario of block matching for motion es-
timation. The use of correlation coefficient for block motion estimation has been
motivated by Mahmood et al. (2007). These experiments are performed on 11 differ-
ent datasets (Mahmood and Khan, 2007a), taken from different commercial movies
(Table 6.1 and Figures 6.3 and 6.4). In these experiments, the current video frame
is divided into non-overlapping blocks and each block is matched with temporally
previous frame using full frame search technique. In basic mode PCE, elimination
tests are performed at the end of each row. For PCE and ZEBC algorithms an ini-
tial correlation threshold of 0.90 has been used. In ZEBC algorithm, the partition
parameter has been selected to be {4, 8, 8, 6, 6, 6, 8, 8, 8, 8, 8} respectively.
Total execution time for the block matching experiment, over five frames in each
dataset, is shown in Table 6.2 which includes all computational overheads including
the file I/O. In these experiments, PCE algorithm has been found to be 28.03 times
faster than FFTW3, 29.93 times faster than ZEBC, 19.92 times faster than SPAT. Av-
erage computation elimination over all experiments in this section is 91.2% for ZEBC,
189
Figure 6.4: A pair of selected frames from each of the movies: ‘Metallica’, ‘Blade 2’,‘Mission Impossible’ and ‘Pirates of the Caribbean’. In the scene taken from ‘Blade2’, only light intensity varies over a static scene.
190
86.3% for PCE, and 94.6% for SAD. ZEBC has higher elimination than PCE, however
due to small template sizes, the bound computation cost has increased than the com-
putation elimination benefit, therefore ZEBC has remained significantly slower than
PCE. In these experiments SAD has remained faster than all correlation coefficient
based algorithms however the margin between SAD and PCE is significantly smaller
as compared to other algorithms.
6.5.2 Feature Matching Experiments Using Basic Mode PCE
Algorithm
These experiments are performed in the scenario of feature matching for point cor-
respondence. This experiment has been performed on a video dataset obtained from
a small UAV. The UAV dataset consists of 74 frames each of size 240 × 320 pixels
(see Figure 6.5). Due to un-stability of vehicle, the viewing geometry continuously
changes resulting in projective distortions. In each video frame, 1000 best features are
marked by using KLT feature tracker (http://www.ces.clemson.edu). Each feature
from the current frame is matched with only 1000 features in the next frame, using
correlation coefficient to find the best match. In case of FFTW3 implementation,
FFT of full reference frame is computed only once for all features. FFT of each fea-
ture template is computed and pixel by pixel multiplied with the full frame FFT of
the reference frame to get convolution in the spatial domain. In this implementation,
there are 1001 transforms of size 240 × 320 for each frame. Second approach which
we have not used, is to take FFT of a small portion around each feature pair to be
matched. In that case, the number of transforms will be 1000,000 per frame, each of
size double than the size of the feature. Overall cost will further increase due to large
number of transforms. Total execution time for FFT, FFTW3, Spat, ZEBC and PCE
algorithms is shown in Table 6.4. The maximum speed up of PCE, over FFTW3 is
168.68, over ZEBC is 13.60 and over Spat is 3.30 times. Due to a lot of redundant
computations and small template sizes, the performance of FFTW3 has significantly
degraded.
191
Figure 6.5: Selected frames from a video obtained by camera mounted on an Un-manned Aerial Vehicle (UA) video. Images obtained from UAV manufacturing com-pany SATUMA.
192
Table 6.2: Total execution time (sec) for the block motion estimation experimentsDataset FFTW3 ZEBC SPAT PCE SAD
FF4 1141.1 1218.2 593.2 40.7 127.2BB8 1141.2 1081.7 1608.0 80.7 21.4KK8 1102.4 1084.9 355.1 78.83 22.0
UW12 364.5 300.0 270.3 64.3 18.3SM12 201.5 193.097 152.6 30.9 6.2PF12 363.7 277.8 167.48 37.3 6.9MT16 54.4 49.2 58.7 11.6 5.8BL16 551.0 404.16 397.4 66.394 15.8RK16 445.4 311.9 280.4 57.3 14.9MI16 329.5 288.6 189.3 41.1 5.3PC8 1116.9 1450.1 630.6 125.6 40.6
Table 6.3: Percent computation elimination comparison for the block motion estima-tion experiments
Dataset FFTW3 ZEBC SPAT PCE SADFF4 - 50.371 - 86.882 74.037BB8 - 89.277 - 85.934 98.142KK8 - 91.509 - 84.671 98.114
UW12 - 96.996 - 85.433 95.050SM12 - 98.187 - 86.727 97.534PF12 - 98.321 - 85.53 98.021MT16 - 98.381 - 87.64 91.166BL16 - 97.466 - 89.115 97.062RK16 - 96.999 - 86.671 96.377MI16 - 99.052 - 84.539 98.904PC8 - 86.645 - 85.990 96.254
193
Figure 6.6: Selected frames from Night time Highway (NH) video. Dataset obtainedby hand held SONY Handycam video camera.
194
Table 6.4: Total execution time (sec) for the feature tracking experiments to get pointcorrespondences
FeatSize FFTW3 ZEBC SPAT PCE5× 5 15560.607 1579.17 124.06 112.707× 7 13345.641 1590.65 130.43 107.009× 9 13238.55 1593.81 145.88 117.20
11× 11 17822.29 1597.99 161.57 122.9313× 13 12695.35 1596.72 172.41 129.2215× 15 22454.87 1554.33 187.31 133.1217× 17 15650.54 1619.38 212.43 139.0019× 19 11821.91 1614.19 218.92 147.5321× 21 16288.27 1557.50 246.00 151.05
6.5.3 Feature Tracking Experiments Using Two-stage Basic
Mode PCE Algorithm
The two-stage PCE experiments have been performed on three datasets, including
the night time highway video, UAV video, time lapsed still camera cloud images. In
each of these datasets, manually extracted feature templates from a single frame are
tracked in the remaining frames.
In Night time Highway (NH) video dataset (Figure 6.6), the only illumination source
is the headlights and the rear lights of the vehicles, which result in uneven scene
illumination. The UAV dataset is same as shown in Figure 6.5 and used in the
last subsection. CLoud tracking (CL) dataset (Figure 6.7) consists of cloud images
acquired by a still camera with 60 seconds duration between two consecutive images.
The cloud structure is non rigid and illumination conditions also vary over time,
which makes the tracking process quite hard. Table 6.5 may be seen for further
dataset details.
In these experiments, two-stage Basic mode PCE has been used. The first stage
consists of only 1 row for templates of size 4×4 to 18×18, and 2 rows for 19×19
to 21×21. In the second stage, elimination test is evaluated at the end of each
row. An initial threshold of 0.90 has been used for both PCE and ZEBC algorithms.
If a maximum higher than 0.90 is found in the first stage, that maximum is used
as threshold in the second stage otherwise threshold remains 0.90. In ZEBC, the
195
Table 6.5: Dataset description for the experiments on feature tracking across videoframes
Dataset # of Feat Feat Size # of Frames Frame SizeNH04 60 4 × 4 82 288 × 360NH05 60 5 × 5 82 288 × 360NH06 60 6 × 6 82 288 × 360NH07 60 7 × 7 82 288 × 360NH08 60 8 × 8 82 288 × 360NH09 60 9 × 9 82 288 × 360UA10 73 10 × 10 35 240 × 320UA11 73 11 × 11 35 240 × 320UA12 73 12 × 12 35 240 × 320UA13 73 13 × 13 35 240 × 320UA14 73 14 × 14 35 240 × 320UA15 73 15 × 15 35 240 × 320CL16 58 16 × 16 15 360 × 648CL17 58 17 × 17 15 360 × 648CL18 58 18 × 18 15 360 × 648CL19 58 19 × 19 15 360 × 648CL20 58 20 × 20 15 360 × 648CL21 58 21 × 21 15 360 × 648
196
Figure 6.7: Selected frames from Cloud Tracking (CT) dataset. The dataset is ob-tained by a still image camera, in a fixed position and taking cloud images after 60seconds interval.
197
NH04 NH06 NH08 UA10 UA12 UA14 CL16 CL18 CL200
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Datasets With Increasing Feature Size
Exe
cutio
n T
ime
in S
econ
ds
SpatFFTW3ZNccEbcPCE
NH Dataset CL DatasetUA Dataset
Figure 6.8: Plot of execution time for two-stage basic mode PCE experiments ( Table6.6), normalized to 100 templates and 100 reference frames for each dataset.
partition parameter has been selected to be {4, 5, 6, 7, 8, 9, 5, 11, 6, 13, 7, 5, 8, 17,
9, 19, 10, 7} respectively. The total execution time for FFT, FFTW3, ZEBC, and
PCE is given in Table 6.6. In this experiment, PCE algorithm has remained faster
than the other algorithms over all template sizes. The maximum speed up of PCE
over FFT is 44.74, over FFTW3 is 9.16, over SPAT is 7.12, and over ZEBC is 9.04
times.
Maximum, minimum, and average speed up of PCE algorithm as compared to other
algorithms, and confidence interval for confidence level of 0.95 is reported in Table
6.7. Average speed up of PCE along with confidence intervals is also plotted in Figure
6.9.
198
Table 6.6: Total execution time (sec) comparison in two-stage basic mode PCE ex-periments for template sizes ≤ 21× 21 pixels
Dataset FFT FFTW3 ZEBC Spat PCENH04 943.07 281.81 175.07 82.16 33.49NH05 1015.32 288.33 266.91 136.37 38.60NH06 998.74 417.40 237.24 144.16 45.72NH07 1057.75 279.49 236.46 205.98 55.38NH08 982.42 497.47 386.53 185.16 54.31NH09 1030.78 350.89 327.01 258.14 75.27UA10 4870.46 352.71 383.74 524.30 108.87UA11 4851.96 819.82 529.14 529.38 110.99UA12 4911.31 894.62 346.98 822.15 115.13UA13 4986.54 565.11 575.46 1071.88 120.09UA14 5237.24 892.43 369.44 956.55 122.41UA15 5221.95 1007.37 349.35 1179.15 130.42CL16 727.75 106.47 112.44 213.15 35.23CL17 723.05 176.21 200.74 269.55 41.21CL18 737.98 132.43 118.04 250.05 42.64CL19 739.53 113.02 211.56 307.95 52.09CL20 721.40 169.88 140.55 255.15 60.45CL21 727.02 148.37 105.74 351.30 63.80
Table 6.7: Maximum, minimum and average speed up of Two-Stage Extended ModePCE for feature tracking experiment (Table 6.6). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/
√N are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ
is standard deviation of the speed up for N = 12 datasets.Dataset FFT FFTW3 ZNccEbc Spat PCE
Max Speedup 44.74 9.16 7.12 9.04 1Min Speedup 11.40 2.17 1.66 2.45 1
Average Speedup 26.43 5.54 4.10 5.35 1Confidence Interval 26.43±4.88 5.54±0.965 4.10±0.579 5.35±0.772 1±0
199
FFT FFTW3 ZNccEbc Spat PCE0
2
46
810
1214
1618
2022
24
2628
3032
34
Sca
led
Ave
rage
Exe
cutio
n T
imes
Figure 6.9: Plot of average execution time speed up of Two-stage Extended ModePCE with Coarse-to-Fine initialization on Video Geo-registration dataset. Confidenceintervals for confidence level of 0.95 are also plotted. Corresponding values may beseen from Table 6.7.
6.6 Conclusion
In this chapter we have presented the basic formulation of Partial Correlation Elimi-
nation (PCE) algorithm for correlation coefficient based fast template matching. An
effective initialization strategy has also been developed which we have named as ‘Two-
stage Basic Mode PCE’ algorithm. For small template sizes, coarse-to-fine scheme
often fail to yield effective initialization, however the two-stage approach has often
been found effective. Basic Mode PCE and Two-stage Basic Mode PCE algorithms
are exact, having exhaustive equivalent accuracy. These algorithms are compared
with existing fast exhaustive techniques including ZEBC and FFTW3 based imple-
mentations of correlation coefficient. On small sized templates ranging from 4 × 4
pixels to 21 × 21 pixels, PCE algorithms have outperformed other algorithms up to
two orders of magnitude in some cases.
For template sizes larger than 21 × 21 pixels, the overhead of Basic-Mode PCE in-
creases, reducing the speed up margin. Therefore, for the medium and larger sized
200
templates, we have developed an extension of PCE algorithm, which will be discussed
in the following chapter.
Chapter 7
EXTENDED MODE PARTIAL CORRELATION
ELIMINATION ALGORITHMS
In Chapter 6, Basic Mode Partial Correlation Elimination (PCE) algorithms were
discussed which are based on the monotonic formulation of correlation coefficient.
When correlation coefficient is computed using this formulation, the similarity starts
from +1 at the first pixel of a block and monotonically decreases to the final value
at the last pixel of the block. Any intermediate value of similarity is always larger
than (or equal to) the final value. The speedup occurs because at any point during
the computation, if similarity happens to be less than a previous known maximum,
the remaining computations become redundant and may be skipped without any loss
of accuracy.
The computational overhead of Basic Mode PCE algorithm was discussed in Chapter
6. In the basic monotonic formulation of correlation coefficient, the cost of processed
pixels is larger than the efficient spatial domain formulations. This computational
overhead of monotonic formulation increases as the template size increases. It is
because of the fact that when template size increases, the number of pixels to be
processed before elimination may take place, also increases, which causes an increase
in the direct computational cost. Due to this computational overhead, for larger sized
templates, some of the speedup obtained by computation elimination may get eroded.
Therefore, Basic Mode PCE as discussed in Chapter 6 was more suitable for only the
small sized templates, while the algorithm presented in the current chapter is more
efficient on medium and large sized templates. Although we successfully reduce one
type of overhead cost, another overhead has increased in the extended mode, which
may make PCE less efficient on small templates. Therefore, on small templates, Basic
Mode PCE is still more efficient.
In this chapter, we have derived another monotonic formulation of the correlation
201
202
coefficient which has reduced the computational cost of processed pixels to the min-
imum level equivalent to the other spatial domain efficient formulations. We have
named the algorithm based on this formulation as ‘Extended Mode PCE’ algorithm.
Due to lower cost of processed pixels, Extended Mode PCE is faster than Basic Mode
PCE on medium and large sized templates. In Extended Mode PCE, although the
cost of processed pixels is reduced, however another overhead which is the cost of
elimination test is increased than Basic Mode PCE formulation. In order to reduce
the cost of elimination tests, the number of elimination tests should also be kept as
minimum as possible. For this purpose, we have developed an algorithm which de-
termines the number of elimination tests and the efficient test locations as well. In
addition, we have also developed an algorithm for the selection between Basic Mode
PCE and Extended Mode PCE, based on the comparison of overheads associated
with both of the algorithms.
In Extended Mode PCE, the amount of eliminated computations may significantly
increase if a maximum of larger magnitude is found near start of the search process.
For medium sized templates, Extended Mode PCE may be initialized by using the
two-stage approach as discussed in Chapter 6, in the context of Basic Mode PCE. For
larger sized templates, Extended Mode PCE may also be initialized by coarse-to-fine
scheme (A. Rosenfeld, 1977). Extended Mode PCE with any of these initialization
schemes has exhaustive equivalent accuracy. The proposed algorithms are compared
with the current known fast exhaustive equivalent accuracy algorithms, including
a frequency domain sequential implementation of FFT (William et al., 2007), an
optimized, adaptive and parallel implementation FFTW3 (Frigo and Johnson, 2005),
a very fast spatial domain implementation ZEBC (Mattoccia et al., 2008b), and with
a spatial domain efficient exhaustive implementation (Pratt, 2007). The comparisons
are done over a wide variety of datasets and on 22× 22 to 128× 128 pixels template
sizes. For medium sized templates, 22× 22 to 48× 48, Extended Mode PCE is found
to be faster than all other techniques including FFTW3 based implementation. For
larger sized templates, 64×64 to 128×128, the performance of Extended Mode PCE
was some times better than EBC and FFTW3 and in some cases equivalent to these
techniques.
203
7.1 Extended Mode PCE Algorithm
Basic Mode PCE algorithm, discussed in Chapter 6, is based on the following mono-
tonic formulation of correlation coefficient:
λt,i(u, v) = 1− 1/2u∑x=1
v∑y=1
(t(x, y)− µt
σt− ri(x, y)− µi
σi)2, (7.1)
where t(x, y) is the template image intensity at (x, y) position and ri is the ith search
location in the reference image of size p× q pixels. In this formulation, the template
related term, (t(x, y)− µt)/σt, may be pre-computed only once because the template
image has only one mean term, µt, and only one variance term, σt. There are only
m×n normalized template terms, which may be easily stored. However, the reference
image term (r(x, y) − µi)/σi cannot be pre-computed, because in that case, at each
search location we need to save m × n terms, which explodes the size of reference
image to p×q×m×n. Therefore, the reference image terms have to be computed for
each pixel at each search location. This results in total 5 operations for each processed
pixel in Equation 7.1. We want to reduce this computational cost to 2 operations per
pixel as in the following non-monotonic but efficient formulation:
ρt,i =1
σtσiψt,i −mn
µtσt
µiσi, (7.2)
where ψt,i is cross-correlation term :
ψt,i =m∑x=1
n∑y=1
t(x, y)ri(x, y). (7.3)
The dominant computational complexity of ρt,i by Equation 7.2 is the computation
of ψt,i by Equation 7.3, which is two operations per processed pixel, including one
multiplication operation and one addition operation.
Correlation coefficient computation by Equation 7.2 is efficient because pre-computable
terms are separated from the run-time computable terms. The reduction in compu-
tational cost of Equation 7.1 is also possible by using a similar strategy. For this
204
purpose, we expand Equation 7.1:
λt,i(u, v) = 1− 1
2σ2t
u∑x=1
v∑y=1
t2(x, y)− 1
2σ2i
u∑x=1
v∑y=1
r2i (x, y)
+1
σt(µtσt− µiσi
)u∑x=1
v∑y=1
t(x, y)− 1
σi(µtσt− µiσi
)u∑x=1
v∑y=1
ri(x, y)− uv1
2(µtσt− µiσi
)2
+1
σtσiψt,i(u, v). (7.4)
Thus we have separated the run time computable cross-correlation term ψt,i(u, v),
from the pre-computable terms. We simplify and regroup the pre-computable terms,
while maintaining the monotonic growth property:
λt,i(u, v) = 1− σt(u, v) + σi(u, v)
2+ψt,i(u, v)− µt,i(u, v)
σtσi(7.5)
The term σt(u, v) in Equation 7.5 is a template image statistic and may be pre-
computed at specific (u, v) locations only once for each template image.
σt(u, v) =
u∑x=1
v∑y=1
t2(x, y)− 2µtu∑x=1
v∑y=1
t(x, y) + uvµ2t
σ2t
(7.6)
The term σi(u, v) in Equation 7.5 is a search location statistic and may be pre-
computed at specific (u, v) locations only once for a given set of search locations.
σi(u, v) =
u∑x=1
v∑y=1
r2i (x, y)− 2µi
u∑x=1
v∑y=1
ri(x, y) + uvµ2i
σ2i
(7.7)
If partial summations are available, the computation of σi(u, v) requires 9 operations.
The term µt,i(u, v) in Equation 7.5, is a hybrid statistic to be computed from both
the search location and the template, therefore this term cannot be pre-computed.
µt,i(u, v) = µi
u∑x=1
v∑y=1
t(x, y)− uvµiµt + µt
u∑x=1
v∑y=1
ri(x, y) (7.8)
205
If partial summations are available, the computation of µt,i(u, v) require 7 computa-
tions.
Complete evaluation of Equation 7.5 is required only when elimination test is to be
executed, otherwise computations proceed by just computing ψt,i(u, v). If elimina-
tion test is never executed, and complete computations are to be performed, putting
(u, v) = (m,n), σt(m,n) and σi(m,n) evaluate to 1 and µt,i(m,n) evaluate to mnµtµi.
Substituting these values in Equation 7.5, it reduces to Equation 7.2. Therefore, if
no elimination test is to be executed, the cost of Extended Mode PCE is exactly
same as that of the efficient spatial domain form given by Equation 7.2. Although
in Extended Mode PCE the cost of processed pixels has reduced as low as possible,
the cost of elimination test has been increased from one simple comparison in Basic
Mode, to about 22 operations in Extended Mode. This high overhead of each test in
Extended Mode PCE is balanced, on the other hand, by its advantage of exploiting
pre-computation, which is not possible in Basic Mode PCE. Since for larger template
sizes, the ratio of the number of tests to the total number of pixels is significantly
small, Extended Mode PCE becomes more efficient for larger sized templates.
For extended mode PCE, it is more important to design an effective testing scheme
such that the cost of elimination tests is minimized without significant reduction in
computation elimination. This means that pixel locations at which the elimination
test will be executed have to be carefully selected. Having too many test locations
within a block will have the advantage of identifying the possibility of elimination
as soon as the partial value drops below the threshold. However, the overhead of
conducting the test will be increased. If test locations are reduced, the penalty of
not identifying that the partial sum has reduced below the threshold till the next test
may be large.
In the following section we present criteria for PCE algorithm mode selection, an
algorithm for the total number of elimination tests to be performed and the efficient
test locations as well, for high elimination to be achieved.
206
7.2 PCE Mode Selection and Finding Efficient Test-
ing Scheme
For a specific dataset, the selection of Basic Mode or Extended Mode PCE may
be done by comparison of the overheads associated with each algorithm. For Basic
Mode PCE, the cost of processed pixels is larger while in Extended Mode, the cost of
elimination tests is more. Therefore, mode selection requires estimation of the total
computations to be performed and the total number of elimination test executions.
Total computations consist of the number of pixels to be processed at each search
location until the growth curve intersects the correlation threshold. These computa-
tions are dependent on two important factors: the slope of the monotonic decreasing
curve and the magnitude of the current known maximum used as the threshold (Fig.
7.2). If the slope of the growth curve is large and the known maximum is high, the
growth curve will intersect the threshold after processing only a few pixels and the
remaining pixels will constitute the eliminated computations.
At a particular search location, the slope of the monotonic growth curve depends
on two factors: the final value of correlation coefficient, ρt,i, and the distribution of
dissimilarity over the set of pixels to be processed. If the final value of correlation
coefficient is significantly low, the slope of the growth curve will be large and con-
sequently the number of processed pixels will be small, because the threshold for
elimination will be reached faster. If most of the search locations produce very low
correlation coefficient, the amount of total performed computations will reduce and
eliminated computations will increase. Therefore, the amount of total performed
computations strongly depends upon the probability distribution function of ρt,i, in
the range of -1.00 to +1.00. A distribution function skewed towards negative side will
reduce the amount of performed computations, on average, when positive maximum
has to be searched. Fig. 7.1 shows correlation coefficient histograms for four different
datasets. We observe that the shape of the histogram varies depending on the size of
the template and the contents of the images to be matched. Due to large variations in
image content, a generic parametric form of correlation coefficient distribution may
not be useful in practice.
207
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
Correlation Coefficient Values
Rel
ativ
e F
requ
enci
es32 × 32
16 × 16
24 × 24
48 × 48
Figure 7.1: Correlation coefficient histograms plotted for five templates from fourdifferent datasets: Cl dataset, template is 16 × 16, IP dataset template is 24 × 24 ,IT dataset template is 32× 32, and VG dataset template is 48× 48. Dataset detailsare given in Section 5.4.
The amount of total performed computations also depends upon the number of elim-
ination tests and the locations at which these tests are executed. If only one elim-
ination test has to be executed, the location of this test may be determined by the
intersection of the average growth curve and the correlation threshold. If more than
one elimination tests are to be performed, then we need to divide the correlation
range, -1.00 to +1.00, into multiple intervals and compute average growth curves
for the search locations falling in each interval. The test locations may be found by
finding the intersection of each average growth curve and the correlation threshold.
For example, in Fig. 7.2, the range of the correlation coefficient is divided into 12
intervals (or bins) each of size 0.1667. Based upon the final value of correlation coef-
ficient, each search location is assigned to a specific interval. Average growth curves
are computed for all search locations within the same interval. The intersection of
each average curve with the threshold yields the index at which the respective test
should be executed.
We observe that the average curves in Fig. 7.2 are close to straight lines, therefore the
average distribution of dissimilarity may be assumed to be approximately uniform over
208
all pixels. For a specific search location, the average dissimilarity per pixel, E[∆2t,i],
may be defined as the total dissimilarity divided by the number of pixels:
E[∆2t,i] =
1
mn
m∑x=1
n∑y=1
(δt(x, y)
σt− δi(x, y)
σi)2. (7.9)
Using Equation 6.5, Equation 7.9 may be written in the following form
E[∆2t,i] =
2(1− ρt,i)mn
. (7.10)
If the value of the current known maximum or correlation threshold is ρth, then the
number of pixels to be processed such that the partial value of similarity becomes
equal to ρth is given by:
rc =2(1− ρth)E[∆2
t,i], (7.11)
where r are the rows and c are the columns to be processed. From Equations 7.10
and 7.11
rc =(1− ρth)(1− ρt,i)
mn. (7.12)
In general, if the number of tests to be performed is ν, in order to find each of the test
locations, the range of correlation coefficient is divided into ν intervals. Any search
location within the kth interval, 1 ≤ k ≤ ν, will yield ρt,i, such that −1+2(k−1)/ν <
ρt,i ≤ −1 + 2k/ν. Let rk be the number of rows and ck be the number of columns to
be processed to eliminate all search locations within the kth bin. We have to select
ρt,i in 7.12 equal to the upper boundary of the interval, −1 + 2k/ν, to get the kth test
location
rkck =ν(1− ρth)2(ν − k)
mn. (7.13)
The association between a particular search location and the interval it will finally
map to, is not known in advance. Therefore the elimination test designed to elim-
inate search locations in the kth interval, has to be performed on all existing (non-
eliminated) search locations.
If the number of elimination tests is already known, Equation 7.13 may be used
to compute efficient test locations. The maximum number of elimination tests in
209
0 4 8 12 16 20 24 28 32 36 40 44 48−0.666
−0.499
−0.332
−0.165
0.002
0.169
0.336
0.503
0.67
0.837
1
Elimination Test Indexes/No. of Processed Rows
Par
tial C
orre
latio
n V
alue
Bin 12
Bin 11
Bin 10
Bin 8
Bin 7
Bin 6
Bin 5
Bin 9
Bin 4
Bin 2
Bin 3
1 2 3 4 5 6 7 8 9
Figure 7.2: Average growth curves of monotonic correlation coefficient plotted for 48×48 pixels 106 randomly selected templates from VG dataset. Correlation coefficientrange is divided into 12 bins. For a threshold of 0.837, 9 elimination tests are foundto be executed after processing 4, 5, 6, 7, 9, 12, 16, 26, and 48 rows respectively.Elimination tests are aligned with row boundaries.
Extended Mode PCE is constrained due to the test overhead cost. A comparison
between the elimination test overhead cost in Extended Mode PCE and the direct
computation cost overhead in Basic Mode PCE may be used for selection of a specific
mode, as well as the maximum number of elimination tests if Extended Mode is
selected.
The overhead of Basic Mode PCE may be estimated by computing the total amount
of performed computations which requires correlation coefficient histogram. Let nk
be the count of search locations in kth bin, and Pr{k} = nk/p be the probability that
a search location will map to the kth bin. Since the number of processed pixels are
ckrk, computations done in the kth bin are pckrkPr{k}, and total computations to be
performed, wt, are given by the summation over all bins. Substituting value of ckrk
from Equation 7.13
wt = mnpν(1− ρth)
2
ν∑k=1
1
(ν − k)Pr{k}, (7.14)
210
The overhead of Extended Mode PCE may be estimated by computing total number
of elimination test executions. Each elimination test is performed upon different
number of search locations. For example, first elimination test is performed upon
all search locations, 2nd test is performed upon all less than the locations in 1st bin,
and so on. Let lk be the number of search locations on which kth elimination test is
executed
lk = p(1−k−1∑i=1
Pr{k}). (7.15)
Total number of elimination test executions, lt, is given by the summation over all
tests
lt =ν∑k=1
p(1−k−1∑i=1
Pr{k}) (7.16)
which may be simplified to
lt = pν∑k=1
k Pr{k}. (7.17)
After estimating the total performed computations and the total number of elimina-
tion test executions, we may proceed to the comparison of Basic Mode and Extended
Mode PCE overheads. If the computational cost of one elimination test in Extended
Mode is ct operations, the ratio of computational cost of Basic Mode to Extended
Mode is cm, then Extended Mode will be preferred only if ctlt + wt < cmwt or from
7.14 and 7.17
ν∑k=1
k Pr{k} ≤ ν(cm − 1)(1− ρth)mn2ct
ν∑k=1
1
ν − kPr{k} (7.18)
For a given dataset, correlation coefficient histogram may be estimated empirically
using a representative sample. For known values of m × n, ρth, cm, and ct, the
maximum value of ν ≥ 1, which satisfy this inequality is an upper-bound on the
number of elimination tests may be performed with Extended Mode PCE. If no value
of ν ≥ 1 satisfies this inequality, Basic Mode may be selected.
As an illustration, if we assume the distribution of correlation coefficient is uniform,
211
then substituting Pr{k} = 1/ν in Equation 7.18, and simplifying we get
ν + 1 ≤ mn(cm − 1)(1− ρth)ct
ln ν. (7.19)
For a given size of template, initial correlation coefficient threshold, cost of elimina-
tion test, the overhead of monotonic formulation, and distribution of correlation coef-
ficient, 7.19 may be used to find total number of elimination tests. For m×n = 32×32,
ct = 22, cm = 2.5, ρth = 0.90, we get ν + 1 ≤ 6.98 ln ν, which is satisfied for ∀ ν ≤ 32.
For smaller template sizes, for example 20× 20, with same values for other parame-
ters, from 7.19 we get: ν + 1 ≤ 2 ln ν, which cannot be satisfied for ν > 1. Therefore,
for this size, basic mode PCE will be preferred over extended mode PEC.
Another constraint on the maximum number of tests in Extended Mode is that the
testing cost should be significantly less than the elimination benefit: et ≥ ctftlt, where
et is total elimination, ft is a factor which should be significantly larger than 1.00.
From 7.14, et = pmn− wt, or
et = mnp(1− ν(1− ρth)2
ν∑k=1
1
(ν − k)Pr{k}) (7.20)
From 7.20 and 7.17
mn(1− ν(1− ρth)ν∑k=1
Pr{k}(ν − k)
) ≥ ctft
ν∑k=1
k Pr{k} (7.21)
If all other parameters are known, this constraint gives and upper bound upon the
number of elimination tests, ν ≥ 1, that may be performed with Extended Mode
PCE.
Again, as an illustration, assuming uniform distribution of correlation coefficient,
substituting Pr{k} = 1/ν in Equation (7.21), and simplifying we get
2mn(1− ln ν(1− ρth))ct(ν + 1)
≥ ft (7.22)
for m×n = 32×32, ρth = 0.90, ft = 5, ct = 20, constraint on ν is 17.6 ≥ 1.86 ln ν+ν
which is satisfied ∀ ν ≤ 11.
212
The constraints given by inequalities 7.18 and 7.21 may be used for the selection of
Basic Mode or Extended Mode PCE algorithm, and if Extended Mode is selected,
maximum number of elimination tests is also known. The location of each of the test
in Extended Mode may be computed by using Equation 7.13. In case of Basic Mode,
elimination tests may be executed more frequently. In our implementation of Basic
Mode PCE, we have used one elimination test at the end of each row. In Extended
Mode as well, for the ease of implementation, we align the elimination tests with the
row boundaries. It may be done by using round function in Equation 7.13. Since
each test is executed at a unique row index, the number of tests in Extended Mode
may further decrease if more than one tests map to the same row index.
From the analysis given in this section as well as by performing large number of
experiments, we find that Basic Mode is more efficient on small template sizes, while
Extended Mode is more efficient on medium to large templates. In our experiments,
we observe that Basic Mode is more efficient for template sizes 4×4 to 21×21, while
Extended Mode is more efficient on all sizes ≥ 22× 22 pixels.
7.3 Initialization Schemes for Extended Mode PCE
Algorithm
The amount of eliminated computations in Extended Mode PCE algorithm strongly
depends upon the position of a maximum in the search process. A maximum found at
the start of the search process may enhance the elimination performance significantly
as compared to a maximum found near end of the search process. For larger template
sizes, we use coarse-to-fine initialization scheme (Mattoccia et al., 2008a) to find a
high correlation maximum before start of the actual search process. For medium
sized templates, we find that coarse-to-fine scheme often fails to yield an effective
initial threshold. This is because of the fact that due to low-pass filtering and sub-
sampling, the coarse representation of a medium sized template loose uniqueness and
may match at arbitrary locations (Robinson and Milanfar, 2004). When full size
template is matched at the corresponding location, no correlation maximum is found.
213
7.3.1 Extended Mode Multi-Stage PCE Algorithm
Extended Mode Multi-Stage PCE algorithm is an enhanced version of the basic mode
two-stage algorithm, as discussed in Chapter 6. Due to increased cost of the elim-
ination tests, the number of tests are significantly reduced than those used in the
basic mode. Efficient test locations may be computed by using Equation 7.13. Rows
between two consecutive test locations may be considered as one partition of the tem-
plate image. Rows before first elimination test comprise first partition, rows between
first and second elimination tests is second partition and so on.
For ease of implementation, we assume the partition boundaries are aligned with the
row boundaries. That is, no partition can have size less than one row and all partition
sizes are in term of number of complete rows. Number of rows in the first partition are
given by: ε1 = Round(r1c1), as given by Equation 7.13. ε1 is the number of rows to be
processed before first elimination test is executed, and these rows constitute the first
partition of the template image. Similarly, number of rows between first and second
elimination test constitute the second partition: ε2 = Round(r1c1) − Round(r2c2).
Note that in some cases, due to round operation, the size of a partition may evaluate
to zero. This case is encountered when the size of a partition is less than one row,
while rounding to nearest row boundary that partition will merge in a neighboring
partition. In such cases, the corresponding elimination test will also be skipped.
In multi-stage algorithm, we initially correlate first partition of template consisting of
ε1 rows, at all search locations. The partial correlation value, λt,i(ε1, n), computed at
each search location is stored and the maximum partial correlation value is tracked.
After completing scan of full search space, complete correlation value is computed at
the location exhibiting maximum partial correlation value. The complete correlation
value will serve as first initial threshold, ρth1 for the following partitions.
In the second stage, at each search location first elimination test is executed: λt,i(ε1, n) <
ρth1 . Search locations where this comparison evaluates to true are marked as skipped
from the search space. At the non-skipped search locations, only partition two rows,
ε2, are matched. The partial correlation result obtained from second partition are
accumulated with the results of first partition, λt,i(ε1 + ε2, n). Therefore, the partial
214
correlation results after matching partition two are for all rows included in partition
one and partition two. Once again complete correlation is computed at the location
with maximum partial correlation over two partitions. This correlation value will
be used as second initial threshold, ρth2 , for partition three. Note that the value of
threshold will increase as more and more partitions are processed: ρth2 ≥ ρth1 and
the partial correlation values will decrease: λt,i(ε1 + ε2, n) ≤ λt,i(ε1, n).
In the third stage, elimination test is performed at all non-skipped locations: λt,i(ε1 +
ε2, n) < ρth2 . Search locations where the test is found to be successful are again
skipped from the search space. At remaining locations, rows in partition three are
matched. Same process is repeated for the following partitions, until the value and
location of partial maximum become fixed to the same search location for multiple
iterations: ρthi−1 = ρthi . That means the maximum has converged to the correct
position, therefore further iterations become redundant. All remaining partitions
in the template image are matched with the non-skipped search locations by using
Extended Mode PCE algorithm in only one scan of the search space.
The execution time performance of multi-stage algorithm depends on the convergence
of maximum found at the end of each stage, to the global maximum. If convergence is
fast and obtained in only one or two stages, the execution time will significantly reduce
in the remaining computations. If convergence is slow, requiring a large number of
stages, execution time will increase. We observe that convergence of multi-stage
algorithm depends on the downward slope of the monotonic decreasing growth curve.
In Figure 7.2, average growth curves are shown for different final values of correlation
coefficient. Curves having final values on the lower end have larger slopes, while
curves having final values on the higher side have relatively smaller slopes. For the
search locations having perfect correlation coefficient score of +1, the slope of the
monotonic decreasing curve will be zero.
For a particular template image, growth curve at each search location have a different
average slope. Growth curve at best match location has minimum average slope,
while for the locations having larger dissimilarities, growth curves have larger slopes.
Convergence of the multi-stage algorithm depends on the estimation of average slope
of growth curves in as few stages as possible. Once the growth curve having minimum
215
average slope is identified, the algorithm convergence is complete, because global
maximum is found.
The overhead of multi-stage algorithm is the multiple scans of the search space.
That is, if there are k stages, then there are k scans of the search space. A fast
converging multi-stage algorithm may have only two stages, and corresponding only
two scans of the search space. A slow converging algorithm may have many stages
and therefore increased cost of multiple scans of the search space. In case of slow
convergence, as the stage number increases, the number of search locations to be
processed decreases. Therefore, when the percentage of remaining locations reduces
below a certain threshold, for example ≤ 5%, algorithm may switch to the last stage.
7.3.2 Initialization of Extended Mode PCE with Coarse-to-
Fine Scheme
Coarse-to-Fine scheme is a fast technique for searching approximate location of the
maximum in a large search space. We have discussed this technique in Chapter 3
among the large search space approximate techniques. Although this technique is
not effective to initialize small and medium sized templates, ≤ 48 × 48, it is quite
effective for larger sized templates, ≥ 80× 80 pixels. We observe that for larger sized
templates, initialization of Extended Mode PCE with coarse-to-fine scheme is more
efficient than initialization by the multi-stage algorithm discussed in the last section.
As discussed in the last section, the efficiency of multi-stage algorithm depends on the
speed of convergence of approximate maximum to the global maximum. This converge
depends on the accuracy of predicting average slope of growth curve by processing
as few pixels as possible. Multi-stage algorithm will converge quickly if the growth
curve with minimum average slope is identified by processing only few pixels. On the
other hand, the multistage algorithm will converge slowly if large number of pixels is
to be processed to identify the growth curve with minimum average slope.
As the template size increases, the number of pixels required to estimate the average
slope of monotonic decreasing growth curve also increases. It is because, by increasing
template size, average contribution of each pixel in the slope of growth curve decreases.
216
It may also be observed that the average amount of normalized distortion per pixel
decreases by increasing the template size. It is because of the fact that total amount
of distortion is 1.00, which is to be shared by all pixels. If template size is 4 × 4
pixels, average contribution per pixel is 1/16 and if template size is 80× 80, average
contribution per pixel is 1/6400. The average contribution of one pixel in 4 × 4
template is same at the contribution of 400 pixels in 80× 80 template. Therefore, for
larger template sizes, the use of coarse-to-fine scheme for finding the initial correlation
threshold becomes more efficient as compared to multi-stage approach.
Multiple implementations of coarse-to-fine scheme are possible to find a high initial
threshold for Extended Mode PCE algorithm. In our implementation, we have down-
sampled both the template and the reference images by 1/4 in each dimension. That
resulted in a reduction of 1/16 in the number of pixels of both images. This is also
equivalent to reducing the image sizes to the second pyramid level. Reduced template
is matched at all search locations in the reduced reference image. The best match
location found at the coarser level is projected to the actual images and full sized
template is matched in a 5× 5 block around the expected maximum location in the
actual reference image. Maximum value of correlation coefficient found in the 25
locations is selected as the initial threshold for Extended Mode PCE algorithm.
The computational overhead of our implementation of coarse-to-fine scheme is neg-
ligibly small. Assuming the size of the template to by m × n pixels and size of the
reference image to be p× q pixels, the spatial domain template matching complexity
is O(mnpq). The overhead of matching the reduced template with the reduced refer-
ence image is O(mnpq256
), which turns out to be only 0.390% of the total spatial domain
computations. Therefore, the computational overhead of coarse-to-fine scheme may
easily be ignored.
217
7.4 Experiments with Extended Mode PCE algo-
rithm
We have performed extensive empirical evaluation of Extended Mode PCE algorithms
on template sizes ranging from 22 × 22 pixels to 128 × 128 pixels. In our datasets,
each of the template is an independently captured image, containing natural, and in
some cases, synthetically generated distortions. Extended Mode PCE algorithms are
implemented in C++ and compared with the currently known fast exhaustive tem-
plate matching techniques including a sequential implementation of FFT (William
et al., 2007), highly optimized parallel implementation FFTW3 (Frigo and Johnson,
2005), ZNccEbc algorithm (Mattoccia et al., 2008b) and an exhaustive spatial do-
main implementation (Haralick and Shapiro, 1992). The implementation of ZNccEbc
algorithm was provided by the original authors. Besides correlation coefficient, we
have also implemented Sum of Absolute Differences (SAD) with optimizations pro-
posed by Montrucchio and Quaglia (2005) and Li and Salari (1995). The execution
times are measured on Dell Inspiron 6400, with Intel Core 2 CPU 2.13 GHz processor
and 2GB physical memory. The datasets, executable scripts and detailed results are
available on our web site: http://cvlab.lums.edu.pk/pce.
7.4.1 Feature Tracking with Extended Mode Two-stage PCE
Algorithm
These experiments have been performed for feature tracking in Infra Red (IR) video
datasets. Two Infra Red (IR) video datasets acquired in two different scenarios have
been used: IR Pedestrian video (IP dataset) used for tracking humans and IR Traffic
video (IT dataset) used for tracking vehicles. Due to very low incident energy, both
IR videos suffer from significant background noise which has been removed by simple
averaging technique. See Table 7.1 and Fig. 7.3 for dataset details.
In Extended Mode Two-stage PCE Algorithm, the number of elimination tests has
been evaluated to be {7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10 } respectively, by using Equations
7.18 and 7.21. For all sizes, 2 rows are required to be processed before executing the
218
Table 7.1: Dataset description for two-stage Extended Mode PCE experiments onfeature tracking across video frames
Dataset # of Feat Feat Size # of Frames Frame SizeIP22 27 22 × 22 56 240 × 360IP23 27 23 × 23 56 240 × 360IP24 27 24 × 24 56 240 × 360IP25 27 25 × 25 56 240 × 360IP26 27 26 × 26 56 240 × 360IP27 27 27 × 27 56 240 × 360IT28 33 28 × 28 74 240 × 360IT29 33 29 × 29 74 240 × 360IT30 33 30 × 30 74 240 × 360IT31 33 31 × 31 74 240 × 360IT32 33 32 × 32 74 240 × 360IT33 33 33 × 33 74 240 × 360
first elimination test. For the sake of simplicity, the first stage consists of only one
elimination test, which is first two rows of each template. An initial threshold of 0.90
has been used for both PCE and ZEBC algorithm. If a maximum higher than 0.90
is found in the first stage, that maximum is used as threshold in the second stage
otherwise threshold remains 0.90. In ZEBC algorithm, the partition parameter has
been selected to be {11, 23, 12, 5, 13, 9, 14, 29, 10, 31, 8, 11} respectively.
The average computation elimination for ZEBC, PCE and SAD are 99.57%, 90.56%,
and 88.88% respectively. ZEBC has achieved maximum computation elimination over
all template sizes, however still the cost of elimination test in ZEBC is larger than the
benefit obtained by computation elimination. Therefore ZEBC has remained slower
than PCE. The total execution time for FFTW3, ZEBC, PCE, and SAD is given in
Table 7.3. PCE algorithm has remained faster than all the other algorithms over all
template sizes. The maximum speedup of PCE over FFTW3 is 4.94 times, over SPAT
is 12.60, and over ZEBC is 5.53. PCE has been found to be faster than SAD for only
IP27 dataset, while for the remaining datasets, SAD is faster. However SAD is not
robust to intensity and contrast variations. Due to the presence of these variations
in these datasets, the accuracy of SAD has remained less than 5%.
219
Table 7.2: Percent computation elimination comparison for the feature tracking ex-periment for template sizes ≥ 22× 22 pixels
Dataset ElimZEBC ElimTPCE ElimSADIP22 99.35 87.97 84.21IP23 99.04 88.31 83.09IP24 99.36 88.67 81.94IP25 99.37 89.04 80.74IP26 99.34 89.32 79.56IP27 99.43 89.60 78.40IT28 99.85 91.69 96.76IT29 99.76 91.94 96.60IT30 99.88 92.22 96.51IT31 99.76 92.45 96.38IT32 99.89 92.67 96.24IT33 99.88 92.85 96.11
Maximum, minimum, and average speed up of PCE over other algorithms, and con-
fidence interval for confidence level of 0.95 is reported in Table 7.4. Average speed
up of PCE along with confidence intervals is plotted in Figure 7.5.
7.4.2 Template Matching with Extended Mode Two-Stage
PCE Algorithm
These experiments are performed upon Video Geo-registration (VG) dataset taken
from (Mahmood and Khan, 2010). With addition of some new frame sizes it consists
of 300 square templates of each size: {47, 48, 63, 64, 79, 80, 95, 96, 111, 112, 127,
128}. The reference image is 736 × 1129 pixels. Further dataset details may be found
in Chapter 5.
In Extended Mode PCE algorithm, the number of elimination tests evaluated to {10,
10, 12, 12, 14, 14, 16, 16, 19, 19, 22, 22} respectively by using Equations 7.18 and
7.21, for ρth = 0.90. The partition parameter in ZEBC has been selected to be {47,
8, 9, 8, 79, 8, 5, 8, 37, 8, 127, 8} respectively. Transitive Elimination Algorithm
(TEA) (Mahmood and Khan, 2010) for correlated templates has also been run upon
VG dataset. The GOP parameter was initialized to 7, while the actual GOP length
220
(a)
(b)
Figure 7.3: (a) Four frames from IR camera pedestrian video dataset. (b) Four framesfrom IR traffic video dataset
221
Table 7.3: Total execution time (sec) comparison for two-stage extended mode PCEexperiments, for template sizes ≥ 22× 22
Dataset FFTW3 ZEBC SPAT TPCE SADIP22 221.03 211.22 400.68 70.26 49.42IP23 247.24 365.91 529.76 66.21 59.02IP24 296.13 279.19 540.40 73.30 64.42IP25 105.67 134.24 565.04 93.26 75.79IP26 208.77 197.33 610.40 81.40 80.03IP27 247.24 189.76 552.72 81.71 94.96IT28 213.56 195.38 520.96 59.83 15.05IT29 215.86 308.35 614.20 58.26 17.27IT30 281.64 168.53 568.32 56.95 17.58IT31 99.23 278.20 495.06 59.77 19.99IT32 236.65 138.86 700.04 55.54 20.26IT33 136.23 160.07 666.00 58.38 23.19
22X22 24X24 26X26 28X28 30X30 32X32 33X330
500
1000
1500
2000
2500
Template Size
Exe
cutio
n T
ime
in S
econ
ds
ZEBC
SPAT
TPCE
SAD
FFTW3
Figure 7.4: Plot of execution time for two-stage extended mode PCE experiments(Table 7.3), normalized to 100 templates and 100 reference frames for each dataset.
222
Table 7.4: Maximum, minimum and average speedup of Two-Stage Extended ModePCE for feature tracking experiment (Table 7.3). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/
√N are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ
is standard deviation of the speedup for N = 12 datasets.Dataset FFTW3 ZEBC SPAT PCE SAD
Max Speedup 4.95 5.52 12.60 1 1.16Min Speedup 1.13 1.44 5.70 1 0.25
Average Speedup 3.17 3.33 8.58 1 0.615Confidence Interval 3.17±0.52 3.33±0.596 8.58±1.02 0 0.615±0.153
FFTW3 ZNccEbc Spat PCE SAD0
2
4
6
8
10
Sca
led
Exe
cutio
n T
ime
(Sec
)
Figure 7.5: Plot of average execution time speedup of Two-stage Extended Mode PCEalong with confidence intervals for confidence level of 0.95 are plotted. Correspondingvalues may be seen from Table 7.4.
223
Table 7.5: Percent computation elimination comparison for the feature tracking ex-periment for template sizes ≥ 22× 22 pixels
Dataset ElimZEBC ElimTPCE ElimSADIP22 99.35 87.97 84.21IP23 99.04 88.31 83.09IP24 99.36 88.67 81.94IP25 99.37 89.04 80.74IP26 99.34 89.32 79.56IP27 99.43 89.60 78.40IT28 99.85 91.69 96.76IT29 99.76 91.94 96.60IT30 99.88 92.22 96.51IT31 99.76 92.45 96.38IT32 99.89 92.67 96.24IT33 99.88 92.85 96.11
Figure 7.6: Video Geo-registration (VG) dataset: reference image fromearth.google.com and templates from terraserver.microsoft.com.
224
was computed at run time.
In these experiments, total execution time of FFT, ZEBC, SPAT, TEA, and PCE al-
gorithms is given in Table 7.7 which includes all computational overheads. Maximum
speedup of PCE over FFT is 11.32, ZEBC is 5.06, SPAT is 22.49, and over TEA is
2.17 times. For VG80 dataset, ZEBC has remained 1.44 times faster than PCE, while
on VG63, VG64, VG95, and VG128 datasets, both algorithms have performed quite
similar. It is because of the fact that ZEBC performed quite well on template sizes
which are composite of small prime numbers while its performance deteriorated upon
template sizes which are composite of large prime numbers, for example, for dataset
VG047 execution time of ZEBC is 609.44 seconds while for VG048 its execution time
is only 180.86 seconds. In contrast, PCE algorithm has no issue with any particular
size and has performed equally well upon prime and non-prime sizes.
In comparison of PCE with TEA, TEA has also remained faster than PCE algo-
rithm on 4 datasets with maximum speedup of 1.32 times. The speedup of TEA
depends upon inter-template auto-correlation. If inter-template auto-correlation is
high TEA will be faster, and if low, TEA will be slower. A high autocorrelation can-
not be guaranteed in all template matching scenarios. In contrast, PCE algorithm
have no performance dependence upon autocorrelation, therefore PCE algorithm has
significantly broader scope than TEA.
7.4.3 Coarse-to-Fine Initialization of Extended Mode PCE
Algorithm
In coarse to fine scheme, we have used Gaussian low-pass filter of 5 tabs to filter
the templates and the reference images and then down-sampled by 1/2. The down-
sampled images are again low-pass filtered and again down sampled by 1/2. Each
of the coarse image representation have size 1/16 of the original image size. The
computational cost of the coarse to fine initialization scheme is 1/256 of the spatial
domain template matching, which is significantly small cost.
In VG dataset, for template sizes ≥ 79×79 pixels, coarse to fine initialization scheme
remains successful to find high initial threshold while for the smaller sized templates,
225
0 4 8 12 16 20 24 28 32 36 40 44 48−1
−0.8333
−0.6666
−0.4999
−0.3332
−0.1665
0.0002
0.1669
0.3336
0.5003
0.667
0.8337
1
Number of Processed Rows
Par
tial C
orre
latio
n C
oeffi
cien
t
threshold=0.90
Figure 7.7: Correlation coefficient from -1 to +1 is divided into 12 bins of equalsize and average growth curves of search locations within each bin are computed.Number of locations in each bin is found to be: 0, 4025, 25093, 56970, 113044,187363, 167027, 113314, 52598, 20953, 3294, and 47 respectively and average valueof correlation coefficient in each bin is found to be: -0.7059, -0.5693, -0.4082, -0.2406,-0.0812, 0.0796, 0.2431, 0.4036, 0.5683, 0.7047, 0.8925. This experiment is done forfirst template in VG48 dataset.
226
Table 7.6: Video Geo-Registration Dataset for Experiments on Larger Sized Tem-plates (Mahmood and Khan, 2010)
Dataset # Frames Frame Size # of Ref. Ref. SizeVG047 300 47 × 47 1 736 × 1129VG048 300 48 × 48 1 736 × 1129VG063 300 63 × 63 1 736 × 1129VG064 300 64 × 64 1 736 × 1129VG079 200 79 × 79 1 736 × 1129VG080 200 80 × 80 1 736 × 1129VG095 200 95 × 95 1 736 × 1129VG096 200 96 × 96 1 736 × 1129VG111 150 111 × 111 1 736 × 1129VG112 150 112 × 112 1 736 × 1129VG127 100 127 × 127 1 736 × 1129VG128 100 128 × 128 1 736 × 1129
Table 7.7: Total execution time in seconds for the Video Geo-Registration datasetwith Two-Stage PCE implementation
Dataset FFT ZEBC Spat TPCE TEAVG047 1254.38 609.44 1175.50 120.36 247.96VG048 1289.64 180.86 1194.20 113.88 247.92VG063 1256.11 222.92 2256.00 224.97 250.86VG064 1230.05 194.29 2185.20 211.43 251.00VG079 1242.3 965.41 3488.7 230.31 261.42VG080 1249.8 202.97 3195.1 292.49 261.75VG095 1234.6 311.26 4518.1 325.59 265.47VG096 1265.9 247.77 4531.4 239.99 265.64VG111 1296.3 444.6 6205.6 293.16 267.56VG112 1292.3 288.26 6049.6 268.94 267.3VG127 1247.7 1277.9 7154.4 348.51 267.99VG128 1267.8 352.83 7052.4 353.34 268.29
227
we observe that this scheme has high failure rates. If coarse-to-fine scheme fail to
produce an initial threshold larger than 0.90, we use 0.90 as initial threshold in
ZEBC and PCE algorithms. If coarse-to-fine scheme successfully finds a high initial
threshold, we use Extended Mode PCE algorithm, without two-stage optimization.
Total number of elimination tests and the test locations are computed for initial
correlation threshold of 0.90. In ZEBC, the partition parameter has been selected to
be the same as in experiments performed in the last subsection.
Both ZEBC and PCE algorithms are initialized with coarse-to-fine scheme. Gaussian
level 2 pyramids are used to find an approximate matching location. The best match
location is found at level 2 and projected to the actual size. Around approximate
location, 5 × 5 correlations are done for fine search. The maximum found by this
scheme is used as ρth if it is larger than 0.90, otherwise ρth = 0.90 has been used.
In PCE experiments with coarse to fine scheme, the execution time for different
algorithms is shown in Table 7.8. In these experiments, PCE has remained faster
than ZEBC over 8 datasets with a maximum speedup of 6.15 times, while on the 4
remaining datasets the performance of PCE is close to ZEBC. In comparison with
FFTW3, the PCE algorithm has remained faster over 6 datasets with maximum
speedup of 1.60, while FFTW3 has remained faster than PCE on the 6 remaining
datasets. FFTW3 adapts to the hardware to maximize performance and also utilizes
the SIMD instructions which perform same operation on all elements in a data array,
in parallel. PCE algorithm has been sequentially implemented without hardware
specific optimizations similar to those used in FFTW3 (Frigo and Johnson, 2005).
Despite the hardware specific optimizations in FFTW3, in our experiments, PCE
algorithms have remained faster than FFTW3, for template sizes from 4×4 to 48×48
consecutively and then for 64 × 64, 79 × 79, 80 × 80, and 95 × 95, while no other
spatial domain algorithm has remained faster than FFTW3 for so many sizes.
Maximum speedup of PCE over other algorithms, minimum speedup, average speedup
along with confidence interval for confidence level of 0.95 is reported in Table 7.9.
Average speedup of PCE along with confidence intervals are plotted in Figure 7.10.
228
48X48 64X64 80X80 96X96 112X112 128X128
200
400
600
800
1000
1200
1400
1600
Template Size in Pixels
Exe
cutio
n T
ime
in S
econ
ds
ZEBC
FFTW3
TEA
PCE
Figure 7.8: Plot of execution time of VG dataset with coarse to fine initializationscheme.
Table 7.8: Total execution time in seconds for Video geo-registration with coarse tofine initialization scheme used for PCE and ZEBC algorithms
Dataset FFT FFTW3 ZEBC Spat PCEVG047 1254.38 193.13 609.44 1175.50 120.36VG048 1289.64 169.10 180.86 1194.20 113.88VG063 1256.11 184.39 222.92 2256.00 224.97VG064 1230.05 221.21 194.29 2185.20 211.43VG079 1242.3 172.92 941.29 3488.7 162.4VG080 1249.8 217.86 189.24 3195.1 164.81VG095 1234.6 246.41 258.6 4518.1 221.22VG096 1265.9 180.65 212.69 4531.4 222.12VG111 1296.3 239.32 489.1 6205.6 260.42VG112 1292.3 169.74 267.76 6049.6 267.98VG127 1247.7 227.43 1645.1 7154.4 267.36VG128 1267.8 250.86 280.68 7052.4 266.58
229
48X48 64X64 80X80 96X96 112X112 128X128100
120
140
160
180
200
220
240
260
280
300
Template Size in Pixels
Exe
cutio
n T
ime
in S
econ
ds
FFTW3
PCE
TEA
Figure 7.9: Plot of execution time of PCE with coarse to fine initialization scheme,compared with FFTW3 and TEA on VG dataset.
Table 7.9: Maximum, minimum and average speedup of Two-Stage Extended ModePCE for feature tracking experiment (Table 7.8). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/
√N are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ
is standard deviation of the speedup for N = 12 datasets.Dataset FFT FFTW3 ZNccEbc Spat PCE
Max Speedup 11.32 1.60 6.15 26.76 1Min Speedup 4.66 0.633 0.918 9.76 1
Average Speedup 6.57 1.05 2.31 18.49 1Confidence Interval 6.57±1.071 1.05±0.138 2.31±0.983 18.49±3.13 1±0
230
FFT FFTW3 ZNccEbc Spat PCE0
2
4
6
8
10
12
14
16
18
20
22
24
Sca
led
Ave
rage
Exe
cutio
n T
imes
Figure 7.10: Plot of average execution time speedup of Two-stage Extended ModePCE with Coarse-to-Fine initialization on Video Geo-registration dataset. Confidenceintervals for confidence level of 0.95 are also plotted. Corresponding values may beseen from Table 7.9.
231
7.5 Conclusion
In this chapter we have discussed Extended Mode Partial Correlation Elimination
(PCE) algorithm which is more efficient on medium and large templates as compared
to Basic Mode PCE discussed in Chapter 6. For a given dataset, a scheme for selection
of a particular mode has also been proposed. Two effective initialization strategies
have also been proposed which include coarse-to-fine scheme for large template sizes
and two-stage PCE for medium sized templates. An algorithm for estimating total
number of elimination tests and the test locations has also been proposed. Extended
Mode PCE algorithms are exact, having exhaustive equivalent accuracy, and are
compared with existing fast exhaustive techniques including ZEBC and FFTW3. On
medium sized templates, PCE algorithm have outperformed other algorithms with
significant margin, while on the larger sized templates, PCE algorithm have shown
competitive performance.
This chapter concludes the core contribution of this thesis. In the following two
Chapters, two further research directions will be presented.
Chapter 8
COMPUTATION ELIMINATION ALGORITHMS FOR
ADABOOST BASED DETECTORS
Bound based computation elimination algorithms have been well investigated in the
perspective of fast computation of image match measures. However, we observe that
similar strategies may also be used to speed up other applications in the fields of Com-
puter Vision and Image Processing. We find that many of the object detectors may
be made faster by just rearranging the computations and terminating computations
before completion. In this regard, we have proposed early termination algorithms for
speeding up the detection phase of the AdaBoost based object detectors.
As discussed in Chapters 6 and 7, a monotonic formulation of correlation coefficient
was required for Partial Correlation Elimination (PCE) algorithms. For partial com-
putation elimination of AdaBoost based object detector, we rearrange the computa-
tions such that the detector response becomes monotonic decreasing. At a particular
search location, as soon as the response becomes less than AdaBoost global threshold,
remaining computations become redundant and may be skipped without any change
in detection accuracy.
In order to further reduce computations, we have incorporated the concept of two-
stage template matching in the framework of non-maxima suppression process. We
have developed a new non-maxima suppression algorithm that we have named as
‘Early Non Maxima Suppression’, which provides the opportunity of discarding com-
putations based on the local maximum. We have implemented the proposed algo-
rithms to speed up an AdaBoost based edge-corner detector proposed by Mahmood
(2007). Our experiments show more than an order of magnitude speed up over the
original AdaBoost detector implementation. Significant speedups are also observed
over some other edge corner detectors.
232
233
8.1 Introduction
Since the seminal work of P.Viola and Jones (2001, 2004) on real time face detection
using AdaBoost algorithm, the face detection problem has been well explored by
many other researchers as well, for example Vincenzo and Lisa (2007); Cristinacce
and Cootes (2003); Wu et al. (2004). In all of these techniques, a high speed up
has been obtained by exploiting the regularity of a human face. As an example, one
of the most extensively used rules is, if at a search location no eyes are detected,
that location cannot contain a face. Therefore, all search locations where no eyes are
detected are eliminated from the search space. Unfortunately, such rules cannot be
made for objects which do not possess a fixed orientation or highly regular patterns.
In contrast, our proposed early termination algorithms discussed in this chapter are
generic and applicable to the detection of any type of objects. In this regard we
have proposed two algorithms, a basic early termination algorithm and an early non-
maxima suppression algorithm.
In the basic early termination algorithm, each candidate location is initialized with
the total weight of the trained AdaBoost ensemble. If a weak learner classifies the
current location as a non-object, the weight of that learner is subtracted from the
current total weight. As more learners are processed, the weight of the candidate
location monotonically decreases, and as soon as the current weight becomes less than
AdaBoost global threshold, that location can never become a positive object instance,
therefore further calculations may be skipped and the location may be discarded.
In order to suppress multiple responses of the same object, only local maximum in
each locality has to be retained, and the local non-maxima candidates have to be
suppressed to zero by using a process known as Non-Maxima-Suppression (NMS).
We reduce the computations at local-non-maxima candidate locations by developing
the Early Non-Maxima Suppression (ENMS) algorithm. In ENMS algorithm, we
partially compute AdaBoost detector response at all candidate locations. In each
local NMS window, we choose the candidate location with the best partial result, and
compute the final detector response at that location. If this final response is larger
than AdaBoost classification threshold, then for the remaining candidate locations
in that NMS window, the early termination threshold is raised to the final value of
234
the local maximum. That is, in a specific NMS window, a candidate location will be
discarded as soon as the detector response falls below the local maximum or below
AdaBoost classification threshold, whichever is larger. ENMS algorithm is helpful in
reducing the redundant computations done at local non-maxima candidate locations.
The proposed early termination algorithm is incorporated within our previous imple-
mentation of AdaBoost based edge-corner detector (Mahmood, 2007). The quality of
the detected edge-corners has remained exactly the same, while the speed up over the
original algorithm is more than an order of magnitude. We have also compared the
quality and speed up of the edge-corners detected by Adaboost detector with three
other detectors including KLT detector, Harris detector (Harris and Stephens, 1988)
and Xiao’s detector (Xiao and Shah, 2003). We find that the edge-corners detected
by AdaBoost detector are of comparable quality as KLT, Harris and Xiao detectors
while the execution time speed up is up to 4.00 times faster than KLT, 17.13 times
than Harris and 79.79 times than Xiao’s detector.
8.2 Related Work
The details of AdaBoost algorithm may be found in texts on machine learning and
the details of edge-corner detection using AdaBoost algorithm may be found in our
earlier work (Mahmood, 2007). For completeness, the detection phase of AdaBoost
algorithm is briefly described as used by P.Viola and Jones (2004).
Suppose the trained ensemble of weak learners consists ofm learners, {f1, f2, f3, ...fm},ordered in the descending order of weights: {α1 ≥ α2 ≥ α3 ≥ ... ≥ αm} (Figure 8.1).
At a specific candidate location rio,jo , where (io, jo) are the coordinates of first pixel
of the search window, AdaBoost detector response is given by:
Λ(rio,jo) =m∑k=1
αkLk(riojo), (8.1)
where Lk(rio,jo) is the label of rio,jo as predicted by the learner fk. Lk(rio,jo) may have
235
0
0.02
0.04
0.06
0.08
0.1
0 10 20 30 40 50
Wea
k Le
arne
r Wei
ghts
The Selection Numbers
Figure 8.1: The weights of weak learners are not always in decreasing order withrespect to the selection number. Therefore, after training phase, the ensemble shouldbe sorted in decreasing order of weights.
236
only two values:
Lk(rio,jo) =
1 if prediction is Object,
0 otherwise.(8.2)
After evaluating Λ(rio,jo) in the whole search space, the labeling process starts: the
search locations where Λ(rio,jo) is larger than the AdaBoost global threshold Gt, are
labeled as objects, while the remaining locations are labeled as non-objects.
Lm(rio,jo) =
1 if Λ(rio,jo) ≥ Gt,
0 otherwise,(8.3)
where Lm(·) is the final label of a search location. AdaBoost global threshold Gt, is
defined as:
Gt = Tα
m∑k=1
αk, (8.4)
where 1.0 ≥ Tα ≥ 0.0.
Once labeling process is complete, Non Maxima Suppression (NMS) process has been
followed to suppress multiple responses to the same object.
8.3 AdaBoost Global Threshold Based Early Ter-
mination Algorithm
In our proposed algorithm, the current candidate search location is initially assigned
the maximum possible AdaBoost detector response, wm:
wm =m∑k=1
αk. (8.5)
Then starting with the weak learner f1, with maximum weight α1, in the trained
ensemble, we keep on evaluating learners in the order of decreasing weights: {α1 ≥α2 ≥ α3 ≥ ... ≥ αm}. If a learner predicts the current search location as object, we
take no action; however if the predicted label is 0, we subtract the weight of that
237
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Mon
oton
ic D
ecre
asin
g R
espo
nse
The Number of Weak Learners
Figure 8.2: Monotonic decreasing ensemble response at a non-object location. Asmore and more weak learners are evaluated, the response decreases monotonically.
learner from the current value of response. Therefore, the detector response, after
processing i < m learners is given by:
Λi(rio,jo) = wm −i∑
k=1
αk(1− Lk(rio,jo)), (8.6)
In this form, AdaBoost detector response has become monotonic decreasing function
over the number of processed learners. After processing each learner, either the
response remains same or decreases (Figure 8.2). As soon as the current response,
Λi(rio,jo), falls below the global threshold Gt, computation of the remaining learners
becomes redundant and may be skipped without any loss of accuracy.
Since most of the ensemble weight is generally concentrated in the first few learners,
for non-object locations, the detector response rapidly decreases to less than global
threshold. Therefore the average number of learners to be evaluated at any location
reduces to a very small number, rendering the detection speed significantly faster,
without any loss of accuracy.
238
Figure 8.3: Image dataset consisting of still images of varying details, used for detec-tion speed up comparison.
8.4 Early Non-Maxima Suppression Algorithm
Non Maxima Suppression (NMS) process has been commonly used to suppress mul-
tiple detections corresponding to the same real world object. Assuming that the
detector response surface is smooth and considering an NMS-window of appropriate
size around the current search location, the non-maxima suppression may be de-
scribed as: if the detector response at current location is not maximum within the
NMS window, current location will be labeled as non-object, otherwise it will remain
labeled as object. That is, the label of current location rio,jo is given by:
Lm(rio,jo) =
0 if Λ(rio,jo) ≤ Λ(ri′o,j′o),
1 otherwise,(8.7)
where Λ(rio,jo) is the detector response at the current location and Λ(ri′o,j′o) is the
maximum detector response at any other location within the same NMS-window.
The early termination algorithm discussed in the last section may be integrated with
NMS process to further reduce redundant computations. If in a locality, the local
maximum Λ(ri′o,j′o) is significantly higher than the global threshold, Gt, then all search
239
Figure 8.4: View Invariance of early terminated AdaBoost detector: (a)-(b) Twoviews of LUMS library building (c)-(d) two views from hotel sequence. Red crossesshow AdaBoost detections, yellow dots show missing detections.
locations in that locality having response less than Λ(ri′o,j′o) are non-object locations.
Therefore computations at the current location will stop as soon as the detector
response falls below the local maximum, Λ(ri′o,j′o).
Lm(rio,jo) =
0 if Λi(rio,jo) ≤ max(Λ(ri′o,j′o),Gt)
1 otherwise,(8.8)
where Λi(rio,jo) is the detector response at current location for i < m learners.
In order to find local maximum, we compute AdaBoost detector response over all
search locations for first p learners such that the sum of weight of these first p learners,
wp, satisfies the following bound:
wp ≥ (1− Tα)m∑k=1
αk, (8.9)
which means that we−wp ≤ Gt, the global threshold. Partial response over p learners
is given by:
Λp(rio,jo) = we −p∑
k=1
αk(1− Lk(rio,jo)). (8.10)
Search locations where Λp(riojo) ≤ Gth are labeled as 0, while the search locations
where Λp(riojo) ≥ Gth are still undecided.
At these undecided locations, ENMS algorithm is implemented as follows: if partial
response at current search location, ri′o,j′o , is larger than the partial response at all
240
Table 8.1: Execution time (sec) of AdaBoost, Early-terminated AdaBoost, KLT,Harris, and Xiao edge-corner detectors.
Img ID AdaBoost EAdaBoost KLT Hrs Xiao
1 24.08 1.73 5.86 22.57 34.44
2 24.16 1.46 5.89 22.86 9.82
3 24.11 2.76 5.89 22.91 244.15
4 24.19 3.96 6.02 22.93 300.16
5 24.12 1.34 5.83 22.96 5.366
6 24.09 1.84 6.00 22.93 60.20
7 24.15 1.77 5.81 22.47 54.60
8 24.11 2.27 5.95 22.90 130.24
9 24.14 1.84 5.81 22.87 48.81
10 24.15 2.69 5.91 22.86 86.23
Mean 24.18 2.16 5.90 22.83 97.40
search locations within the current NMS window, calculate the complete response
over m learners at the current search location:
Λm(ri′o,j′o) = Λp(ri′o,j′o)−m∑
k=p+1
αk(1− Lk(ri′o,j′o)). (8.11)
If Λm(ri′o,j′o) ≥ Gt, all remaining search locations in the current NMS window having
partial response less than Λm(ri′o,j′o) will be labeled as non-objects:
Lm(rio,jo) =
0 if Λp(rio,jo) ≤ Λm(ri′o,j′o),
u otherwise,(8.12)
where u means label is yet undecided. At each of these undecided locations, further
learners are evaluated until that location is labeled as non-object or final response is
computed. In any locality, as soon as a maximum larger than the previous known
maximum is found, the previous best location is labeled as non-object. When all
locations are exhausted, the last undecided location in each locality will be labeled
as object.
241
8.5 Experiments and Results
The speed up generated by the proposed early termination algorithms is compared
with our previous AdaBoost implementation Mahmood (2007) as well as KLT, Harris
and Xiao’s detectors. The speed up comparison is done on a dataset of ten images
shown in Figure 8.3, each of size 2304×3072 pixels, having varying levels of details.
The number of detected edge corners varies from the minimum 309 in image 5 to
the maximum 167898 in image 3 (Figure 8.3). In Xiao and EAdaBoost detectors,
execution time increases with an increase in number of detected edge corners, while
in AdaBoost, Harris and KLT detectors the processing time remains same. The
thresholds for each algorithm are set such that the number of detected edge corners
remain approximately same. In most of the experiments, AdaBoost global threshold
was selected to be 0.70. The number of weak learners evaluated varied as p = 4, 8, 12
and 16. Figure 8.5 shows the corresponding fraction of eliminated locations. After
processing 16 learners, on the average 91.75% locations are found to be eliminated.
On the remaining locations, Early Non-Maxima Suppression algorithm was applied.
The speed up comparison is done on HP Pavilion Notebook PC with Intel Core 2
Duo CPU 2.0 GHz and 2GB RAM. The early terminated AdaBoost detector is up to
4.00 times faster than KLT detector, 17.13 times faster than Harris detector, 18.00
times faster than the traditional AdaBoost detector and up to 79.79 times faster than
Xiao’s detector (Table 1).
Invariance comparison of EAdaBoost based detector with the three other detectors is
made in the presence of view changes, scale changes, blurring noise, additive Gaussian
noise and rotation (Figure 8.4). In our experiments, we found that the quality of
AdaBoost detector is comparable with the other detectors. Moreover, the quality of
AdaBoost detector with and without early termination remains exactly same.
8.6 Conclusion
In this chapter we have presented early termination algorithms to speed up the Ad-
aBoost based edge corner detector. The proposed algorithms have been found to be
242
0
0.25
0.5
0.75
1
4 8 12 16
Frac
tion
of E
limin
ated
Loc
atio
ns
Number of Processed Weak Learners
Figure 8.5: Fraction of eliminated search locations reduces as the number of processedweak learners increases.
faster up to an order of magnitude over our original AdaBoost implementation (Mah-
mood, 2007). The proposed algorithms are exact, therefore the final results of the
AdaBoost detector remains exactly same. The proposed elimination algorithms are
more generic than those used by Viola and Jones in their face detector and may be
potentially used to speed up many other object detectors, edge corner detectors and
image feature detectors.
Chapter 9
USE OF CORRELATION COEFFICIENT FOR VIDEO
ENCODING
The process of block matching for motion estimation in video encoders may also be
considered as an application of image matching problem. In traditional encoders, the
best match position is defined by minimization of SAD and motion compensation is
done by taking simple difference between the best match location and the matched
block. In this chapter, we have explored the role of correlation coefficient in motion
compensation in video encoders. If motion estimation is done by maximization of
correlation coefficient, the best match location is also a best linearly fitting location
therefore having least linear estimation error. Using this fact, motion compensation
may be done by finding two linear parameters that relate the block and the best match
location. We theoretically show that the reduction in variance of residue is maximum
if motion estimation is done by maximization of correlation coefficient and motion
compensation is done by first order linear estimation. We find that by using the
linear motion compensation, the entropy of residue significantly reduces as compared
to the entropy of simple difference. In existing video encoders, the block mean may be
encoded in the bit stream as an extra parameter. One of the two linear parameters,
estimated in our approach, may be encoded instead of mean. Therefore, the overhead
of our proposed approach is the encoding of second parameter, which is the slope of
the best fitting line.
We have verified our findings through experimentation on a wide variety of datasets
taken from several commercial movies. In some cases, for the same number of bits
per pixel, our proposed scheme exhibits an improvement in peak signal to noise ratio
(PSNR) of up to 5 dB when compared to the traditional encoding scheme.
243
244
9.1 Block Based Motion Compensation in Video
Encoders
A digital video signal consists of a sequence of frames and is usually characterized by
strong temporal correlation between adjacent frames. This correlation is exploited in
standard video codec to achieve significant compression, resulting in storage and com-
munication efficiency. For this purpose block based motion compensation techniques
are used and have become an integral part of modern video codec such as H.263 and
H.264/AVC. Block based motion compensation involves dividing each frame into non
overlapping rectangular blocks, matching each block with another suitable block in
a previous frame and finally taking the difference of the two matched blocks. Most
video encoders use the minimization of Sum of Absolute Differences (SAD) as a cri-
terion for finding the best match for a block (Ghanbari, 2003), a process commonly
known as motion estimation. Current video codec expect high similarity between the
two matching blocks such that the variance of the difference signal is smaller than the
variance of the current block to be encoded. In video literature, this type of encod-
ing is referred to as predictive coding. However, it is important to realize that this
procedure simply takes the differential of the two matched blocks and is equivalent to
differential encoding used in audio signals. The notion of predictive encoding in audio
signals is different and involves linear estimation of a signal sample from previously
observed samples. That is, the characteristics of a sample are predicted from previous
sample values. Although existing video encoding techniques predict the motion of a
block, they do not attempt to predict the relationship between two matched blocks.
We argue that it is beneficial to predict a relationship between two matched blocks
as compared to the current practice of simply taking their difference.
For motion estimation, SAD presents a computationally efficient solution (Vanne
et al., 2006) and is therefore used in existing video encoders. However, SAD is im-
plicitly based on the ‘brightness constancy’ assumption, i.e. the intensity values of
a block of video are not expected to change from one frame to another, although
the block may undergo a spatial shift. However, such ideal conditions rarely exist:
brightness and contrast changes are frequently observed between frames, especially in
245
commercial videos. Even under simple linear changes, such as brightness variation,
SAD does not guarantee a correct match.
As a simple illustration of this fact, consider an m × n image block. Suppose that
the subsequent frame is brighter by a constant factor ∆ at each pixel. The correct
matching location for this block will thus have a SAD value of mn∆, and the difference
signal at this location will have a variance of zero. However, it is quite likely that
there would be other locations in the search area that will have a lower SAD value.
This is because addition of ∆ causes the intensity levels at the other locations to
become closer to the intensity levels of the original block to be encoded. Thus,
a motion estimator based on SAD will result in a match at an incorrect location
where the variance of the difference signal can potentially be much higher than zero.
Hence, even under simple variations from the brightness-constancy assumption, SAD
no longer remains an accurate motion estimator for video encoding.
We consider the use of first order linear estimator to model the changes in intensity
of a block from frame to frame. This choice is motivated by observing the brightness
and contrast changes in real videos. Instead of taking the difference between two
matching blocks, we estimate one from the other and take the difference between
the actual and estimated values. That is, if r2 is the best match of r1, we use r2
to compute r1 as Minimum Mean Squared Error (MMSE) linear estimate of r1, and
then consider the estimation error, r1 − r1, as residue for further processing. We
show that the variance of linear estimation error is always smaller than the variance
of simple differences, r1 − r2, leading to better compression and resulting in storage
and communication efficiency. We further show that, when r1 − r1 is used instead of
r1 − r2, the optimal criterion for finding the best match is the maximization of the
magnitude of correlation coefficient. The proposed scheme, Video Coding with Linear
Compensation (VCLC), captures all first order variations in video signals. We have
observed with experimentation on eight commercial videos that considering a non
linear predictor instead of a linear predictor results in diminishing gains, indicating
that the video signal changes from frame to frame are well modeled by linear predictor.
246
9.2 Problem Definition
We consider a digital video signal as a sequence of frames F indexed at discrete time
k. For the purpose of encoding, each frame F (k) is divided into non overlapping
blocks r1(k, x, y), each of size m × n pixels and the parameters x, y represent the
spatial position of block r1(k, x, y) within frame F (k). Two primary steps in the
video encoding process are:
1. Motion prediction (or motion estimation) which is carried out on each block
r1(k, x, y) by finding its closest match r2(k + δk, x+ δx, y + δy), in a judiciously
selected search area within a previous frame.
2. Motion compensation, which essentially means finding the motion compensated
differential signal:
∆ = r1(k, x, y)− h{r2(k + δk, x+ δx, y + δy)} (9.1)
where h(·) is an arbitrary function that has to be chosen such that the variance
of ∆, σ2∆ is minimized. ∆ is also known as the motion compensated residue and
σ2∆ is known as inter-frame variance (Jain and Jain, 1981). In current practice,
h(·) is taken to be the identify function.
The primary goal of video encoding is to maximize the compression for which a
heuristic is to minimize the variance of motion compensated residue (∆), such that
fewer bits are needed for its representation. Thus h(·) is used as an estimation filter
for r1(k, x, y), such that the estimation error variance (σ2∆) is minimized.
We, therefore, intend to find the function h(·) in motion compensation step as well
as the criteria for finding the closest match in motion estimation step such that σ2∆
is minimized. It is expected, and as we will show, that the criteria for finding the
closest match and the estimation function h(·) are closely related to each other.
247
9.3 Maximization of Gain Guaranteed by Maxi-
mization of Correlation Coefficient
A number of schemes have been proposed and standardized for video encoding. All
existing video encoders use minimization of SAD as the criteria for finding the best
match in the motion estimation step. That is, the values k + δk, x + δx, y + δy are
determined, such that the SAD value given by the following expression is minimized:
SAD =m∑x=1
n∑y=1
|r1(k, x, y)− r2(k + δk, x+ δx, y + δy)| (9.2)
Furthermore, existing video encoders select h(·) in Equation 9.1 such that h(θ) = θ.
In this case, the resulting motion compensated differential signal (∆d) is given by:
∆d(k, x, y) = r1(k, x, y)− r2(k + δk, x+ δx, y + δy) (9.3)
The variance of ∆d can be expressed in the following form:
σ2∆d
= σ21 + σ2
2 − 2ρ1,2σ1σ2 (9.4)
where r1 = r1(k, x, y) and r2 = r2(k + δk, x + δx, y + δy) and ρ1,2 is the correlation
coefficient between blocks r1 and r2. We define the gain of traditional video encoders
as: Gd = σ21/σ
2∆d
. Using Equation 9.4, an expression for Gd can be derived and is
given below:
Gd =σ2
1
σ2∆d
=1
1 + σ2σ1
(σ2σ1− 2ρ1,2)
(9.5)
If the video signal is assumed to be stationary, such that σ21 = σ2
2, then Equation 9.5
reduces to:
Gds =1
2(1− ρ1,2)(9.6)
From 9.6, we note that Gds is maximized when ρ1,2 is maximized. However, minimiza-
tion of SAD does not guarantee a maximization of ρ1,2, thus SAD is not the optimal
248
criteria for the maximization of Gds. Furthermore, in general the video signal is non-
stationary and the true gain is given by 9.5, whose maximum cannot be guaranteed
either by maximization of ρ1,2 or by minimization of SAD.
Nevertheless, from equations 9.5 and 9.6, maximization of correlation coefficient ap-
pears to be a more attractive criterion for motion estimation as compared to minimiza-
tion of SAD. However, correlation coefficient has not been given serious consideration
in the video encoding literature because of its high computational complexity (Barnea
and Silverman, 1972). We have removed this objection to a larger extent by develop-
ing Basic Mode Partial Correlation Elimination (PCE) algorithm which has very good
performance on small template sizes (see Chapter 6). It should also be noted that
with the availability of powerful processors, to enhance the compression efficiency,
complex algorithms are now practicable. For example, H.264/AVC uses much more
complex algorithms than those employed by previous video encoding standards.
The use of correlation coefficient as a motion estimator has also been ignored because
of the comments made in the seminal paper by Jain and Jain (Jain and Jain, 1981),
suggesting that the accuracy of the area correlation method is poor when the block
size is small and the blocks are not undergoing pure translation. However, for block
sizes commonly used in motion estimation algorithms, correlation coefficient actually
outperforms SAD and other measures that are based upon the brightness constancy
assumption. We have verified this by performing a large number of experiments
on a number of scenes taken from ten commercial videos. Furthermore, in case of
non-translational motion, all block based motion estimation algorithms suffer some
degradation in performance. However, performance of correlation coefficient based
estimators degrades much gracefully (Wu, 1995b).
We note that VCLC is a fundamental technique and other schemes and optimizations
proposed in literature or included in standards may be used in addition to VCLC. Such
additional schemes include the overlapped block motion estimation (OBMC) (Orchard
and Sullivan, 1994; Su and Mersereau, 2000) that was invented to handle the complex
motion within a block. Similarly, sub-pixel motion estimation (Girod, 1993) that aims
to increase the accuracy of motion compensation may also be used with VCLC.
249
simple difference linear quadratic 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Mea
n Sq
uare
Err
or
Figure 9.1: The average and standard deviation of Mean Squared Error of differentestimation filters h(·). More than 400,000 8x8 blocks taken from eight commercialmovies were used to compute these statistics.
9.4 Video Coding with Linear Compensation (VCLC)
In traditional video encoding systems, the estimation filter h(·) in Equation 9.1 is
selected to be the identity function: h(θ) = θ, therefore in these systems, the differ-
ence signal is used in Equation 9.3. Inherently, the use of Equation 9.3 is based on
the brightness constancy assumption for pixel intensities. However, we have observed
that brightness and contrast changes are so ubiquitous in natural videos, especially
commercial videos like movies, that the assumption of constant pixel intensities breaks
down frequently.
We have experimentally verified this observation by measuring the average MSE of the
original block and the matched block while varying the estimation filter h(·). Figure
9.1 shows the reduction in MSE as h(·) was changed from identity to first order linear
and first order quadratic estimator. A significant decline in MSE was observed when
linear estimator was used compared to identity function. Increasing the complexity of
the estimator from linear to quadratic resulted in diminishing returns, and subsequent
improvements were not as significant.
250
Hence, in this chapter, we propose that intensity changes between blocks in the nearby
frames can be better modeled by a first order linear estimator. Therefore, we select
h(·) for the estimation of block r as:
h(r2) = αr2 + β (9.7)
where α and β are selected to minimize the mean square error between h(r2) and
the block r1 that is being estimated. With each block of video, these two additional
parameters are transmitted, but, as we will show, the corresponding reduction in the
variance of the linear compensated difference signal justifies this overhead.
In the next two subsections, we first discuss the theoretical impact of choosing the
first order linear model on the motion compensation strategy, and then discuss the
optimal motion estimator under this model.
9.4.1 Motion Compensation using Linear Estimator
For the motion compensation step, current input block r1(k, x, y) is linearly estimated
from the best matching block r2(k + δk, x + δx, y + δy), instead of computing the
difference as in traditional methods. Thus we use:
r1(k, x, y) = αr1(k + δk, x+ δx, y + δy) + β (9.8)
The parameters α and β are selected such that the mean squared estimation error,
given below is minimized:
Λ =m∑y=1
n∑x=1
(r(k, x, y)− αr2(k + δk, x+ δx, y + δy)− β)2 (9.9)
Minimizing Λ with respect to α and β yields:
α = ρ1,2σ1
σ2
(9.10)
β = µ1 − ρ1,2σ1
σ2
µ2 (9.11)
251
In the proposed VCLC scheme, we define motion compensated residue ∆p, similar to
the traditional case, but using MMSE linear estimate r1 instead of r2:
∆p(k, x, y) = r1(k, x, y)− r1(k, x, y) (9.12)
It is straightforward to show that the mean of ∆p is always zero, regardless of the
form of the original and the matched block. The variance of ∆p has a direct impact
on compression efficiency: if σ2∆p
< σ2∆d
, VCLC would lead to better compression
compared to the traditional schemes. Since ∆p is zero mean, its variance is the
minimum mean square error of estimation given by Equation 9.9, which can also be
derived to the follow form:
σ2∆p
= (1− ρ21,2)σ2
1 (9.13)
The above relationship of σ2∆p
, i.e. the variance of linear compensated difference
signal, should be compared to the expression of σ2∆d
in Equation 9.4, which is the
variance of the simple difference. Using the expression in 9.13, we can show that σ2∆p
is always less than or equal to σ2∆d
.
Theorem 9.1 For same motion estimator, σ2∆p
is upper bounded by σ2∆d
.
proof 9.4.1.1 Since the square of any real number is non-negative, the following
inequality holds: (ρ1,2
σ1
σ2
− 1
)2
≥ 0 (9.14)
Rearranging we get:
σ21 − ρ2
1,2σ21 ≤ σ2
1 − 2ρ1,2σ1σ2 + σ22 (9.15)
Comparing 9.15 with 9.4 and 9.13, it follows that σ2∆p≤ σ2
∆dalways hold true, re-
gardless of the form of input the signal.
Similar to the definition of Gd in section III, we define motion compensation gain of
VCLC scheme as:
Gp =σ2
1
σ2∆p
=1
1− ρ21,2
(9.16)
252
Since σ2∆p≤ σ2
∆d, therefore Gp ≥ Gd. Hence it can be concluded that the use of VCLC
scheme will never result in a lower gain when compared with traditional encoding
scheme.
9.4.2 Motion Estimation with Correlation Coefficient
In previous discussion, advantage of VCLC scheme was shown over the traditional
motion compensation techniques, independent of motion estimation process. This
implies that if VCLC scheme is used with traditional motion estimation, gain will
still be improved. However we notice from 9.16 that the gain of VCLC is maximized
when |ρ1,2| is maximized. This indicates that for VCLC, the optimal criteria for
finding the closest match in the motion estimation step is not minimization of SAD,
rather it is the maximization of the magnitude of correlation coefficient in the search
space. Thus the location of best matching block is given by:
(x, y) = arg maxδk,δx,δy
|ρ1,2| (9.17)
where
ρ1,2 =
m∑x=1
n∑y=1
(r1(k, x, y)− µr)(r2(k + δk, x+ δx, y + δy)− µ2)
σ1σ2
(9.18)
Thus there is no other location where the linear compensated differential signal would
have a lower variance or a higher gain than the one obtained by maximizing Equation
9.17 over ((k + δk, x+ δx, y + δy)).
9.5 Video Coding With Linear Compensation: Sys-
tem Overview
Simplified block diagrams of VCLC encoder and decoder are shown in Figures 9.2
and 9.3 respectively. An input video frame to be encoded is sent to the motion vector
253
DCT Q Entropy Coding
Q-1
DCT-1
+
-
MEM
LFE
LPE
MVE
Input Video (b)
b’+
p
V
Encoded Video
V
inter/intra
Figure 9.2: Block diagram of video coder with linear compensation (VCLC). MVE:motion vector estimator, LPE: linear parameter estimator, LFE: linear frame estima-tor
estimator (MVE) which also obtains a reference frame from the memory. The MVE
finds the best matching blocks in the reference frame for each block in the input video
frame, by maximizing the magnitude of correlation coefficient as given in Equation
9.17. For each block MVE provides the motion vector information to linear parameter
estimator (LPE) which computes α and β for each block in accordance with 9.10 and
9.11. LPE sends these parameters to linear frame estimator (LFE) where the linear
estimate of the complete frame is formed using the linear estimates of the individual
blocks. The linear estimate of the complete frame is subtracted from the input video
frame and the resulting residue error is further processed through transform coder,
quantizer and entropy coder.
Traditional decoders require residue error information along with motion vectors in
order to decode the current frame. VCLC decoder additionally requires transmission
of α, β parameters. We, however, observe that when using VCLC, the mean of motion
compensated residue is zero, resulting in a zero DC value of the transform of each
block:
DCVCLC = 0 (9.19)
which essentially reduces transmission of one parameter as compared to traditional
encoders. In traditional generic encoders (GE) Ghanbari (2003), the DC value of a
transformed block is the difference of means of input block r1 and its best matching
254
Entropy Decoding
Q-1 DCT-1Coefficients
MCP
MV
LPC
MEM
b’++
Encoded Video
Decoded Video
Reference Frame
Inter/Intra
Figure 9.3: A generic inter-frame predictive decoder with linear compensation. MCP:Motion compensated prediction, LPC: linear parameter compensation
block r2:
DCGE = µ1 − µ2 (9.20)
and it is generally non-zero. Note that in VCLC scheme, instead of transmitting α, β
parameters, we can transmit α, (µ1 − µ2) and reconstruct α, β on decoder side using
9.11. Therefore as compared to traditional systems, the actual overhead is only one
parameter per block. Furthermore, for intermediate to larger block sizes, for example
8 by 8 or above, the cost of α parameter, in terms of bits per pixel, turn out to be
insignificant. For smaller block sizes, which are not very common in video encoders
due to large number of motion vectors, the cost of sending an additional parameter
becomes noticeable requiring the use of efficient quantization and coding for sending
the same.
9.6 Experiments and Results
We have experimentally verified the theoretical results of previous sections by en-
coding scenes selected from numerous commercial videos. These videos often exhibit
significantly larger changes in lighting compared to the standard test sequences often
used in video codec research. On this dataset, the efficiency of VCLC was compared
with that of traditional Generic Encoder (GE) (Ghanbari, 2003) which used SAD
255
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
-20 -15 -10 -5 0 5 10 15 20
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Figure 9.4: (a) The histogram of α always have a maximum at 1.00 (b) The histogramof µ1 − µ2 always have a maximum at 0.00 and have shape similar to Laplaciandistribution
for motion estimation and simple differences for motion compensation. In our ex-
periments, the motion compensated residue of VCLC and GE was first transformed
using DCT and then quantized by a uniform quantizer. The minimum number of bits
needed to transmit the quantized residue was estimated by calculating its entropy.
Note that VCLC improves the motion compensation efficiency while the other blocks
of video encoder remain same.
The improvement in motion compensation is generally measured by the improvement
in prediction SNR defined in (Jain and Jain, 1981):
SNR = 10 log10
MI2max∑M
r=1 σ2r
(9.21)
where M is the total number of residue blocks, σ2r is the variance of a block of
residue, and Imax is the maximum pixel intensity. For traditional generic encoder:
σ2r = σ2
∆dand for VCLC: σ2
r = σ2∆p
. The SNR comparison as shown in table 9.6 was
computed for three motion estimation block sizes: 4×4, 8×8 and 16×16. Maximum
improvement in SNR was observed for 4×4 block size, which was up to 11.3 dB,
whereas, for 8×8 and 16×16 block sizes it was up to 6.8 dB and 6.1 dB respectively.
The SNR in Equation 9.21 measures the performance of an encoding scheme based
on the variance of the motion compensated residue without considering the effects of
transform encoding and quantization. A better way to evaluate an encoding scheme is
256
30
35
40
45
50
55
60
65
0 0.5 1 1.5 2 2.5 3
Batman Begins
VCLC
GE
30
35
40
45
50
55
60
0 0.5 1 1.5 2 2.5 3
PSN
R (d
B)
King Kong
VCLC
GE
30
35
40
45
50
55
60
0 0.5 1 1.5 2 2.5 3
PSN
R (d
B)
Underworld Evolution
VCL
GE
30
35
40
45
50
55
60
0 0.5 1 1.5 2 2.5 3
Blade 2
VCLC
GE
30
35
40
45
50
55
60
0 0.5 1 1.5 2 2.5 3
Bits Per Pixel
PSN
R (d
B)
Lord of The Rings 3
VCLC
GE
25
30
35
40
45
50
55
60
0 0.5 1 1.5 2 2.5 3
Bits Per Pixel
Mission Impossible
VCLC
GE
Figure 9.5: Variation of PSNR with the variation of bits per pixel (bpp) for VideoCoding with Linear Compensation (VCLC) and the Traditional Generic Encoder(TGE).
257
Table 9.1: Comparison of traditional Generic Encoder (GE) and VCLC motion com-pensation SNR (dB)
SNRV CLC SNRGE
Dataset 4×4 8×8 16×16 4×4 8×8 16×1 6
Fast&Furious 36.11 36.51 29.03 29.09 33.04 25.32BatmanBgins 41.83 39.33 32.97 34.05 34.66 28.82
KingKong 38.69 35.08 29.31 30.58 31.62 25.58UnderWorld 42.70 39.58 35.25 34.95 34.34 29.15Spiderman 35.01 31.56 26.44 29.64 29.46 24.41PinkFloyd 40.59 37.66 35.14 37.18 35.83 33.29Metallica 40.59 35.28 31.17 32.91 32.49 28.37
Blade 45.25 39.68 35.31 35.89 32.739 30.04LordOfRings 39.69 35.84 33.60 34.15 31.91 30.13MissionImps 36.70 31.87 27.66 29.42 26.60 23.75
to characterize the end to end performance of the system by measuring the distortion
in the decoded signal. Although in VCLC scheme only motion compensation and
motion estimation steps are improved, by simple experiments we show that end to
end performance is also improved. We computed peak signal to noise ratio (PSNR)
defined in H264 as:
PSNR = 10 log10
2552
MSE(9.22)
where MSE is the mean squared error between original frame and the corresponding
reconstructed frame.
In an end-to-end system, the additional parameters α and (µr − µr′) also have to be
quantized before entropy coding. Typical histograms of both of these parameters are
shown in figure 9.4. We used a generalized Lloyd Max quantizer with 5 bits for α
and 4 bits for (µ1 − µ2). For our datasets, the entropy of these parameters, for 8 by
8 block size was computed to be 6.46 bits, and therefore the additional overhead of
these parameters is approximately 0.1 bits per pixel. Figure 9.5 shows the average
rate distortion curves for six videos encoding using 8×8 block size. In figure 9.5,
the slight rightward shift of the top curve, representing VCLC’s performance, is due
to the overhead of the two additional parameters. We note that the VCLC scheme
exhibits an improvement of up to 5 dB in PSNR when compared with the traditional
generic encoder.
258
9.7 Conclusion
In this chapter we demonstrated that motion estimation with correlation coefficient
and motion compensation with MMSE first order linear estimator can be used to
reduce the number of bits required to encode a video for same PSNR. Therefore, the
proposed video encoding scheme, Video Coding with Linear Compensation (VCLC),
may turn out to be of practical significance in the perspective of video transmission
and storage applications.
Chapter 10
CONCLUSIONS AND FUTURE DIRECTIONS
Despite the presence of large number of fast approximate image matching techniques,
bound based computation elimination algorithms are of special interest because of ex-
haustive equivalent accuracy and high speedups offered by these algorithms. In this
thesis we have presented two different types of bound based computation elimination
algorithms for correlation based fast template matching, namely Transitive Elimina-
tion Algorithms (Mahmood and Khan, 2007b, 2008, 2010), and Partial Correlation
Elimination algorithms (Mahmood and Khan, 2007a, 2011). The first type of algo-
rithms is complete elimination algorithms while the second type is partial elimination
algorithms.
10.1 Transitive Algorithms
While investigating transitivity property of correlation based measures, we have de-
rived two different types of transitive bounds on correlation based measures. We
derived first type of transitive bounds by applying the triangular inequality on an-
gular distance measure while the second type of bounds by applying the triangular
inequality on Euclidean distance measure. We theoretically compared both type of
bounds, and showed that angular distance based bounds are contained within Eu-
clidean distance based bounds. Therefore, we concluded that angular distance based
bounds are tighter than Euclidean distance based bounds and more useful from com-
putation elimination perspective.
We studied tightness characteristics of angular distance based transitive bounds and
found that these bounds remain tight if at least one of the two bounding correlations
is ensured to remain high. We suggested that autocorrelation present in most of
the template matching systems, should be used as the strong bounding correlation.
Transitive elimination algorithms presented in this thesis may be efficiently used in
259
260
the following scenarios:
1. In a typical template matching scenario, a template image is matched across
one or more big reference images. Natural scenes, especially remotely sensed
satellite images have high local spatial autocorrelation. We have developed
Transitive algorithm for fast template matching in this scenario by exploiting
intra-reference autocorrelation. Fast algorithms for autocorrelation computa-
tion are also developed.
2. In other template matching applications, such as object tracking in a video
sequence, especially for applications such as tracking humans in surveillance
videos, one object template have to be correlated with multiple video frames.
The nearby frames of a video sequence are generally highly correlated. To ex-
ploit inter-frame temporal autocorrelation, we have developed Intra-Ref Tran-
sitive Elimination Algorithm. Fast algorithms for the computation of autocor-
relation between two video frames are also developed.
3. In some other applications, such as video geo-registration, a set of highly corre-
lated template frames are to be matched with the same reference image. To ex-
ploit the strong inter-template correlation, we have developed an Inter-Template
TEA algorithm.
4. Inter-Template TEA algorithm is also applied to rotation/scale invariant tem-
plate matching, because the consecutive rotated and scaled versions of an object
are highly correlated. Fast speedup is obtained by exploiting this inter-template
autocorrelation for fast rotation/scale invariant template matching.
The main principle of all of these transitive elimination algorithms is as follows: if at a
particular search location, upper transitive bound is found to be less than the current
known maximum, correlation computations at that location may be skipped without
any loss of accuracy. The execution time speedup of transitive algorithms depends on
the strength of autocorrelation found in nearby locations. If strong autocorrelation
is found, transitive algorithms may become extremely fast, while if autocorrelation
is weak, transitive elimination algorithms may not remain very efficient, hence the
261
speed up will reduce. To handle such scenarios, we have proposed the second cat-
egory of elimination algorithms, Partial Correlation Elimination (PCE) algorithms.
These are more generic than transitive algorithms because these are not dependent
on autocorrelation function.
10.2 PCE Algorithms
In PCE algorithms, correlation coefficient is computed by using monotonic decreas-
ing formulations of correlation coefficient. At a particular search location, as soon as
partial value of correlation falls below the current known maximum, remaining com-
putations may be skipped without any loss of accuracy. Different versions of PCE
algorithms are efficient in different template size ranges:
1. Basic Mode PCE is empirically found to be more efficient on small template
sizes, ≤ 21 × 21 pixels. For these sizes, we have also developed a novel initial-
ization scheme, named as Two-stage Basic-Mode PCE algorithm. Basic Mode
PCE algorithm has been found to be significantly faster than all existing fast
algorithms. It is because of the fact that frequency domain implementations
are slow for small templates. Other efficient spatial implementations such as
ZEBC are also slow due to high overhead of bound computation.
2. Extended Mode PCE with multi-stage initialization scheme is efficient on medium
sized templates, having size larger than 21×21 and less than 48×48 pixels. An
algorithm to find efficient elimination test locations is also developed. On these
sizes, PCE algorithm has remained faster than all other algorithms, however
the speedup margin reduces as the template size increases.
3. Extended Mode PCE with coarse-to-fine scheme is more efficient for larger sized
templates, ≥ 48 × 48 pixels. Algorithm for efficient elimination test locations
is also applicable for these sizes. For large template sizes, FFT performance
significantly improves. The performance of efficient spatial domain algorithm,
ZEBC also improves for large template sizes. However, ZEBC is still slow on
templates having number of rows equal to some prime number.
262
10.3 Limitations of TEA and PCE Algorithms
Despite high speedups obtained by our proposed algorithms, there are also some im-
portant limitations of these schemes. One important limitation is these algorithms
expect high correlation maximum to be present in the search space. The amount
of eliminated computations increases as the height of known maximum increases.
For template matching applications requiring finding a small maximum, for example
ρmax = 0.40, elimination algorithms no longer remain efficient. This is because the
amount of eliminated computations will decrease, resulting an increase in the execu-
tion time. In our proposed algorithms, high speedup will only be obtained if large
magnitude maximum is found at the start of the search process.
10.4 Elimination Algorithms for Object Detection
We show that bound based computation elimination strategies can also be applied for
fast object detection (Mahmood and Khan, 2009). An early termination algorithm is
applied to speedup AdaBoost based edge-corner detector (Mahmood, 2007). In this
regard, Early Non Maxima Suppression (ENMS) algorithm has also been proposed
which integrates the detection process within the non-maxima suppression process to
reduce computations.
10.5 Using Correlation Coefficient in Video Cod-
ing
The use of correlation coefficient in video encoders (Mahmood et al., 2007) has also
been explored. We found that if correlation coefficient is used for motion estimation
and first order linear estimation is used for motion compensation, the variance of
residue signal is guaranteed to be less than that of the traditional encoding schemes.
The proposed video encoding scheme may potentially be used to increase compression
of the video signal.
263
10.6 Future Directions
Several future work directions emerge from the research presented in this thesis. Some
of the important future research directions are introduced in this section.
1. Cascading PCE and TEA Algorithms: An important extension of the TEA
and PCE algorithms is to combine them in the form of cascade. A possible
cascading scheme is to first apply transitive bounds at each search location to get
complete elimination. Search locations where transitive bounds fail to produce
elimination, PCE algorithms may follow to get partial elimination. We find
that the overhead of TEA algorithm including autocorrelation computations
and central correlation computations may not be avoided by using cascading
scheme. PCE algorithm can only reduce computations at search locations where
transitive bound was used. Since most of these locations are eliminated by
transitive bounds, the improvement margin left for the PCE algorithm is quite
small. A study to efficiently couple both algorithms such that the results are
better than the individual algorithm may be an interesting research direction.
2. Approximate Accuracy TEA Algorithms: TEA algorithms discussed in the the-
sis have exhaustive equivalent accuracy. If this constraint is removed, then
TEA algorithm can be made extremely fast with some loss of accuracy. As
we have already discussed, the efficiency of TEA algorithm strongly depends
on the strength of local autocorrelation, which may be arbitrarily increased by
low pass filtering the reference image. TEA algorithm will run extremely fast
on blurred image with some potential loss of accuracy. An important future
direction is to study the effect of low pass filtering on accuracy and speedup of
TEA algorithm.
3. Approximate Accuracy PCE Algorithms: PCE algorithms discussed in the the-
sis also have exhaustive equivalent accuracy. These algorithms can also be made
extremely fast if the exhaustive equivalent accuracy constraint is removed. One
possible direction to make approximate accuracy PCE is to estimate the down-
ward slope of the monotonic decreasing curve by matching as few template
pixels as possible. Best match location may be defined as one having minimum
264
downward slope. An important future direction is to estimate the minimum
number of pixels to be processed such that a reliable estimate of the slope is ob-
tained. The smaller the number of processed pixels is, the faster the algorithm
will be. At the same time, the more reliable the estimate of the slope is, the
more accurate the algorithm will be. Accuracy verses speed up of approximate
PCE algorithm must be studied.
4. Integrating Existing Approximate Algorithms with PCE: Another important re-
search direction is to combine PCE algorithm with approximate accuracy al-
gorithms such as Three Step Search (TSS) or Two Dimensional Logarithmic
(TDL) search. We believe that PCE algorithm may be effectively used to re-
duce computations in most of these algorithms.
5. Early Terminated Object Detectors: An important future direction is to extend
the idea of partial elimination beyond template matching problem. We have
demonstrated that same idea may also be used to speedup AdaBoost based
object detectors. We observe that if at a particular search location, final detector
response is computed as summations of multiple partial values, computations
may be reduced by intelligent rearrangements. Different type of object detectors
should be explored to find which of them may get benefit from the partial
elimination algorithms.
6. Early Non Maxima Suppression (ENMS): NMS has often been used to suppress
multiple responses of a detector to the presence of single object instance. It is
commonly used in the pedestrian detectors, face detectors and object detectors.
We observe that if final detector response is a summation of the multiple partial
sums, ENMS algorithm may be used to reduce computations.
7. Video Coding with Linear Compensation (VCLC): An important future direc-
tion is to explore the idea of VCLC in significantly more detail. We have demon-
strated by performing some experimentation that the VCLC algorithm may offer
promising results, especially in cases when the videos to be compressed contain
significant intensity and contrast variations. The VCLC algorithms should be
integrated within the framework of existing codec, such as H.264. We observe
that the proposed scheme requires transmission of two parameters in addition
265
to residue signal, for each block. In H.264 bit stream there is provision of trans-
mitting one parameter per block. The second parameter required by VCLC
algorithm may change the bit stream and make it non compatible for standard
video codecs. Transmission overhead of extra parameters and the benefit ob-
tained by linear compensation should be theoretically compared. The benefit
should be significant in order to justify a new video encoding scheme.
8. Elimination Algorithms for Volume Registration: Volume registration is some-
times required in medical image processing. The computation elimination algo-
rithms may be extended for volume registration. We observe that volume data
have higher local autocorrelation, resulting in significant speedup of TEA algo-
rithms. PCE algorithms may also be used to speedup volume image registration
problem.
9. Expanding the Scope of Elimination Algorithms: Before the work presented by
this thesis, elimination algorithms were only well known for Sum of Absolute
Differences (SAD) and Sum of Squared Differences (SSD). We have extended
the scope of elimination algorithms to include correlation based measures as
well, with emphasis on correlation coefficient. As we have discussed in Chapter
2, correlation coefficient is robust to linear intensity variations and can measure
the strength of linear association between two images. It cannot measure the
strength of non-linear, functional or stochastic associations. The strength of
functional associations may be measured by correlation ratio and the strength of
stochastic associations can be measured by using mutual information. Mutual
information has been extensively used in medical image registration. A very
important future direction is to extend the concept of partial and complete
elimination algorithms for fast computation of correlation ratio and for the
mutual information. If efficient elimination algorithms are developed for these
measures, the application areas of these measures may expand.
APPENDICES
List of Publications Related to the Thesis
Following is the list of publications included in this thesis:
1. Arif Mahmood and Sohaib Khan, Exploiting inter-frame correlation for fast
video to reference image alignment, in Lecture Notes in Computer Science,
Asian Conference on Computer Vision (ACCV 2007), vol. 4843, pp. 647-656,
Springer Berlin / Heidelberg, 2007.
2. Arif Mahmood and S. Khan, Exploiting local auto-correlation function for fast
video to reference image alignment, in IEEE International Conference on Image
Processing (ICIP ’08), October 2008, pp. 2412-2415.
3. Arif Mahmood and S. Khan, Exploiting transitivity of correlation for fast tem-
plate matching, IEEE Transactions on Image Processing, vol. 19, no. 8, pp.
2190-2200, August 2010.
4. Arif Mahmood and S. Khan, Early termination algorithms for correlation co-
efficient based block matching, in IEEE International Conference on Image
Processing, (ICIP ’07), October 2007, vol. 2, pp. II-469-II-472.
5. Arif Mahmood and S. Khan, Correlation coefficient based fast template match-
ing through partial elimination, accepted for publication in IEEE Transactions
on Image Processing, May 2011.
6. Arif Mahmood and S. Khan, Early terminating algorithms for AdaBoost based
detectors, in IEEE International Conference on Image Processing (ICIP ’09),
November 2009, pp. 1209-1212.
7. Arif Mahmood, Z.A. Uzmi, and S. Khan, Video coding with linear compensa-
tion (VCLC), in IEEE International Conference on Communications,(ICC’07),
June 2007, pp. 6220-6225.
266
267
List of Publications Not Included in the Thesis
Following is the list of publications not included in the thesis:
1. Arif Mahmood, Structure-less object detection using Adaboost algorithm, in
International Conference on Machine Vision (ICMV 2007), December 2007,
Islamabad, Pakistan
2. M. Shahid Farid and Arif Mahmood, Image Morphing in Frequency Domain”,
in Journal of Mathematical Imaging and Vision, Springer Netherlands, March
2011.
3. M. Shahid Farid, Hassan Khan and Arif Mahmood, Image Inpainting using
Cubic Hermit Spline, in International Conference on Signal and Information
Processing (IEEE ICSIP), December 2010, Changsha, China.
4. M. Shahid Farid, Hassan Khan and Arif Mahmood, Image Inpainting based on
Pyramids, in 10th IEEE International Conference on Signal Processing (ICSP),
November 2010, Beijing, China.
5. Mian Muhammad Awais, Arif Mahmood and Asim Karim, Automatically Gen-
erating Association Rules Under Diverse Operational Conditions for a Large
Scale Power Plant, In 2nd International Bhurban Conference on Applied Sci-
ences and Technology (IBCACT), June 2003, Islamabad, Pakistan .
Bibliography
A. Rosenfeld, G. J. V. 1977. Coarse to fine template matching. IEEE Trans.
Syst., Man, Cybern. 7, 104–107.
Ahn, T. G., Moon, Y. H., and Kim, J. H. 2004. Fast full-search motion estimation
based on multilevel successive elimination algorithm. IEEE Trans. Circuits and
Systems for Video Technology 14, 11 (November), 1265–1270.
Avouac, J. P., Ayoub, F., Leprince, S., Konca, O., and Helmberger, D. V.
2006. The 2005, Mw 7.6 Kashmir earthquake: Sub-pixel correlation of ASTER im-
ages and seismic waveforms analysis. Science Direct, Earth and Planetary Science
Letters 249, 514–528.
Barnea, D. and Silverman, H. 1972. A class of algorithms for fast digital image
registration. IEEE Trans. Commun. 21, 2 (February), 179–186.
Bierling, M. 1988. Displacement estimation by hierarchical block matching. Proc.
SPIE, Visual Communications and Image Processing 10, 942–951.
Bouguezel, S., Ahmad, M., and Swamy, M. 2004. A new radix-2/8 fft algorithm
for length-q times 2m dfts. IEEE Transactions on Circuits and Systems 51, 9
(September), 1723 – 1732.
Bowen, M. M., Emery, W. J., and Wilkin, J. L. 2002. Extracting multi-
year surface currents form sequential thermal imagery using the maximum cross
correlation technique. J. Atmos. Oceanic Technol. 19, 1665–1676.
Briechle, k. and Hanebeck, U. D. 2001. Template matching using fast normal-
ized cross correlation. Proc. SPIE, Opt. Patt. Rec. XII 4387, 95–102.
Brown, L. 1992. A survey of image registration techniques. ACM Computing
Surveys 24, 326–373.
Brunig, M. and Niehsen, W. 2001. Fast full-search block matching. IEEE Trans.
Circuits Syst. Video Technol. 11, 2, 241–247.
268
269
Bryant, V. 1985. Metric Spaces: Iteration and Application. Cambridge University
Press, New York,USA.
Burago, D., Burago, Y. D., and Ivanov, S. 2001. A Course in Metric Geometry.
American Mathematical Society, New York,USA.
Burt, P. J. and Adelson, E. H. 1983. The laplacian pyramid as a compact image
code. IEEE Trans. Comput. 31.
Caelli, T. M. and Liu, Z. Q. 1988. On the minimum number of templates required
for shift, rotation and size invariant template matching. Patt. Rec. 21, 3, 205–216.
Caves, R. G., Harley, P. J., and Quegan, S. 1992. Matching map features
to synthetic aperture radar (SAR) images using template matching. IEEE Trans.
Geosci. Remote Sensing 30, 4, 680–685.
Cha, S.-H. 2007. Comprehensive survey on distance/similarity measures between
probability density functions. International Journal of Mathematical Models and
Methods in Applied Science. 1, 300–307.
Chalermwat, P. 1999. HIGH PERFORMANCE AUTOMATIC IMAGE REGIS-
TRATION FOR REMOTE SENSING. PhD Thesis, George Mason University,
Fairfax, Virginia.
Chan, S. C. and Ho, K. L. 1991. On indexing the prime-factor fast fourier trans-
form algorithm. IEEE Trans. Circuits and Systems 38, 8, 951–953.
Cheung, C. and Po, L. 2003. Adjustable partial distortion search algorithm for
fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 13, 1,
100–110.
Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine calculation
of complex fourier series. Math. Comput. 19, 297–301.
Coorg, S. and Teller, S. 2000. Spherical mosaics with quaternions and dense
correlation. IJCV 37, 3, 259–273.
270
Cristinacce, D. and Cootes, T. 2003. Facial feature detection using adaboost
with shape constraints. BMVC .
Crowell, K. J., Wilson, C. J., and Canfield, H. E. 2003. Application of local
surface matching to multi-date ALSM data for improved calculation of flood-driven
sediment deposition and erosion. In Fall Meeting. American Geophysical Union,
San Francisco.
Danielson, G. C. and Lanczos, C. 1942. Some improvements in practical fourier
analysis and their application to x-ray scattering from liquids. J. Franklin Inst. 233,
365380 and 435452.
Dare, P. M. and Fraser, C. S. 2000. Linear infrastructure mapping using air-
borne video imagery and subsequent integration into a gis. In IEEE International
Geoscience and Remote Sensing Symposium, IGARSS 2000. IEEE, Honolulu, HI ,
USA.
Dew, G. and Holmlund, K. 2000. Investigations of cross-correlation and euclidean
distance target matching techniques in the mpef environment. In Proc. 5th Int.
Winds Workshop,. IWWG, Lorne, Australia, 235–243.
Deza, E. and Deza, M. 2006. Dictionary of Distances. Elsevier.
di Stefano and Mattoccia, L. 2003. A sufficient condition based on the cauchy-
schwarz inequality for efficient template matching. ICIP , 269–272.
di Stefano, Mattoccia, S., and Mola, M. 2003. An efficient algorithm for
exhaustive template matching based on normalized cross correlation. CIAP , 322–
327.
Di Stefano, L., Mattoccia, S., and Tombari, F. 2005. ZNCC-based template
matching using bounded partial correlation. Pattern Recognition Ltr. 26, 14, 2129–
2134.
Dierking, W. and Skriver, H. 2002. Change detection for thematic mapping by
means of airborne multitemporal polarimetric sar imagery. IEEE Trans. Geosci.
Remote Sensing 40, 3, 618–636.
271
Du, Y., Cihlar, J., Beaubien, J., and Latifovic, R. 2001. Radiometric normal-
ization, compositing, and quality control for satellite high resolution image mosaics
over large areas. IEEE Trans. Geosci. Remote Sensing 39, 3, 623–634.
Duhamel, P. and Hollmann, H. 1984. Split-radix fft algorithm. Electron.
Lett. 20, 1, 14–16.
Duhamel, P. and Vetterli, M. 1990. Fast fourier transforms: a tutorial review
and a state of the art. Signal Processing 19, 259–299.
Eckart, S. and Fogg, C. 1995. Iso/iec mpeg-2 software video codec. Proc.
SPIE 2419, 100118.
Ellis, G. A. and Peden, I. C. 1997. Cross-borehole sensing: Identification and
localization of underground tunnels in the presence of a horizontal stratification.
IEEE Trans. Geosci. Remote Sensing 35, 3, 756–761.
Emery, B., Matthews, D., and Baldwin, D. 2004. Mapping surface coastal
currents with satellite imagery and altimetry. In IEEE IGARSS’ 04. IEEE, USA.
Emery, W. J., Baldwin, D., and Matthews, D. 2003. Maximum cross correla-
tion automatic satellite image navigation and attitude corrections for open-ocean
image navigation. IEEE Trans. Geosci. Remote Sensing 41, 1, 33–41.
Emery, W. J., Fowler, C. W., Hawkins, J., and Preller, R. H. 1991. Fram
strait satellite image-derived ice motions. J. Geophys. Res. 96, 4751–4768.
Emery, W. J., Thomas, A. C., and Collins, M. J. 1986. An objective method
for computing advective surface velocities from sequential infrared images. J. Geo-
physical Res 91, 12865–12878.
Eumetsat. 1998. Workshop on wind extraction from operational meteorological
satellite data. In Proc. 4th Int. Winds Workshop. IWWG, Saanenmser, Switzerland.
Eumetsat. 2000. Workshop on wind extraction from operational meteorological
satellite data. In Proc. 5th Int. Winds Workshop. IWWG, Lorne, Australia.
272
Evans, A. N. 2000. Glacier surface motion computation from digital image se-
quences. IEEE Trans. Geosci. Remote Sensing 38, 2, 1064–1072.
Feind, R. E. and Welch, R. M. 1995. Cloud fraction and cloud shadow property
retrievals from coregistered tims and aviris imagery: The use of cloud morphology
for registration. IEEE Trans. Geosci. Remote Sensing 33, 172–184.
Fisher, R. 1925. Statistical methods for research workers. Oliver and Boyd.
Foroosh, H., Zerubia, J. B., and Berthod, M. 2002. Extension of phase
correlation to subpixel registration. IEEE Trans. Image Processing 11, 3 (March),
205–216.
Foster, M. P. 2005. Motion Estimation in Remote Sensed Multi-Channel Images.
MS thesis, University of Bath, UK.
Frigo, M. 1999. A fast Fourier transform compiler. In Proc. 1999 ACM SIGPLAN
Conf. on Programming Language Design and Implementation. Vol. 34. ACM, At-
lanta, GA, 169–180.
Frigo, M. and Johnson, S. G. 1998. FFTW: An adaptive software architecture for
the FFT. In Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing.
Vol. 3. IEEE, Seattle, WA, 1381–1384.
Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3.
Proceedings of the IEEE 93, 2, 216–231.
Gao, X., Duanmu, C., and Zou, C. 2000. A multilevel successive elimination al-
gorithm for block matching motion estimation. IEEE Trans. Image Processing 9, 3
(March), 501–504.
Garcia, C. A. E. and Robinson, I. S. 1989a. Sea surface velocities in shallow
seas extracted from sequential coastal zone color scanner satellite data. J. Atmos.
Oceanic Tech. 94, 12681–12691.
Garcia, C. A. E. and Robinson, I. S. 1989b. Sea surface velocities in shallow seas
extracted from sequential coastal zone colour scanner satellite data. J. Geophysical
Res 94, 12681–12691.
273
Ghanbari, M. 1990. The cross search algorithm for motion estimation. IEEE Trans.
Comput. 38, 7 (July), 950–953.
Ghanbari, M. 2003. Standard Codecs: Image compression to advanced video coding.
Vol. 49.
Girod, B. 1993. Motion compensating prediction with fractional pel accuracy. IEEE
Trans. Comput. 41, 4 (April), 604–611.
Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing. Pearson
Education.
Good, I. J. 1960. The interaction algorithm and practical fourier analysis. J. R.
Statist. Soc. 22, 2, 373–375.
Goshtasby, A., Gage, S. H., and Bartholic, J. F. 1984. A two-stage cross
correlation approach to template matching. IEEE Trans. Pattern Anal. Machine
Intell. 6, 374–378.
Haralick, R. M. and Shapiro, L. G. 1992. Computer and Robot Vision. Vol. 2.
Addison-Wesley.
Harnett, D. L. 1982. Statistical Methods , 3 ed. Addison-Wesley Publishing Com-
pany, Inc., New York.
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In Pro-
ceedings of the Alvey Vision Conference. The British Machine Vision Association
and Society for Pattern Recognition, University of Manchester, UK.
Hel-Or, Y. and Hel-Or, H. 2003. Real-time pattern matching using projection
kernels. ICCV .
Hel-Or, Y. and Hel-Or, H. 2005. Real-time pattern matching using projection
kernels. IEEE Trans. Pattern Anal. Machine Intell. 27, 9 (September), 1430–1445.
Huang, Y.-W., Chen, C.-Y., Tsai, C.-H., Shen, C.-F., and Chen, L.-G.
2006a. Survey on block matching motion estimation algorithms and architec-
tures with new results. The Journal of VLSI Signal Processing 42, 297–320.
10.1007/s11265-006-4190-4.
274
Huang, Y.-W., Chen, C.-Y., Tsai, C.-H., Shen, C.-F., and Chen, L.-G.
2006b. Survey on block matching motion estimation algorithms and architectures
with new results. The Journal of VLSI Signal Processing 42, 297–320.
Irani, M. and Anandan, P. 1998. Robust multi-sensor image alignment. In ICCV.
IEEE, Bombay, India.
ITU-T. 1995. Itu-t recommendation h.263 software implementation. Digital Video
Coding Group, Telenor R&D .
Jain, J. and Jain, A. 1981. Displacement measurement and its application in
interframe image coding. IEEE Trans. Commun. 29, 12 (December), 1799–1808.
Jedrasiak, K. and Nawrat, A. 2009. Image recognition technique for unmanned
aerial vehicles. In Computer Vision and Graphics. Lecture Notes in Computer
Science, vol. 5337. Springer Berlin / Heidelberg, 391–399.
Jin, H., Favaro, P., and Soatto, S. 2001. Real-time feature tracking and outlier
rejection with changes in illumination. In ICCV. Vol. 1. IEEE, USA, 684 – 689.
Johnson, S. G. and Frigo, M. 2007. A modified split-radix fft with fewer arith-
metic operations. IEEE Trans. Signal Processing 55, 1, 111–119.
Kamachi, M. 1989. Advective surface velocities derived from sequential im- ages for
rotational flow field: Limitations and applications of maximum cross-correlation
method with rotational registration. J. Geophysical Res 94, 18227–18233.
Kappagantula, S. and Rao, K. R. 1985. Motion compensated interframe image
prediction. IEEE Transactions on Communication 33, 9 (September), 1011–1015.
Kawanishi, T., Kurozumi, T., Kashino, K., and Takagi, S. 2004. A fast
template matching algorithm with adaptive skipping using inner-subtemplates dis-
tances. In ICPR. International Association for Pattern Recognition (IAPR), Cam-
bridge, England, UK, 654 – 657.
Kim, J. N. and Choi, T. S. 1999. Adaptive matching scan algorithm based on gra-
dient magnitude for fast full search in motion estimation. IEEE Trans. Consumer
Electron. 45, 3, 762–772.
275
Kim, J. N. and Choi, T. S. 2000. A fast full-search motion-estimation algorithm
using representative pixels and adaptive matching scan. IEEE Trans. Circuits Syst.
Video Technol. 10, 7, 1040–1048.
Kim, T. and Im, Y. J. 2003. Automatic satellite image registration by combi-
nation of matching and random sample consensus. IEEE Trans. Geosci. Remote
Sensing 41, 5, 1111–1117.
Koga, T., Iinuma, K., Hirano, A., Iijima, Y., and Ishiguro, T. 1981. Motion
compensated interframe coding for video conferencing. Proc. National Telecom.
Conf., G5.3.1–G5.3.5.
Kuglin, C. and Hines, D. 1975. The phase correlation image alignment method.
IEEE Conf. Cyb. Soc., 163–165.
Langford, E., Schwertman, N., and Owens, M. 2001. Is the property of being
positively correlated transitive. The American Statistician 55, 4, 33–55.
Lee, C. and Chen, L. 1997. A fast algorithm based on the block sum pyramid.
IEEE Trans. Image Processing 6, 11 (November), 1587–1591.
Leese, J. A., Novak, S., and Clark, B. 1971. An automatic technique for
obtaining cloud motion from geosynchronous satellite data using cross correlation.
J Appl Meteor 10, 118–132.
Lewis, J. 1995. Fast normalized cross-correlation. In International Conference on
Vision Interface. Canadian Image Processing and Pattern Recognition Society, Cal-
gary, Canada, 120–123.
Li, F. and Goldstein, R. 1990. Studies of multibaseline spaceborne interferometric
synthetic aperture radars. IEEE Trans. Geosci. Remote Sensing 28, 1, 88–97.
Li, H., Shi, R., Chen, W., and Shen, I.-F. 2006. Image tangent space for image
retrieval. Pattern Recognition, 2006. ICPR 2006. 18th International Conference
on 2, 1126 –1130.
Li, R., Zeng, B., and Liou, M. 1994. A new three-step search algorithm for block
motion estimation. IEEE Trans. Circuits System, Video Technology 9, 2, 287290.
276
Li, W. and Salari, E. 1995. Successive elimination algorithm for motion estima-
tion. IEEE Trans. Image Processing 4, 1 (January), 105–107.
Mahmood, A. 2007. Structure-less object detection using Adaboost algorithm. In
International Conference on Machine Vision(ICMV 2007). National University of
Science and Technology, Pakistan, Islamabad, Pakistan, 85–90.
Mahmood, A. and Khan, S. 2007a. Early termination algorithms for correlation
coefficient based block matching. In IEEE International Conference on Image
Processing, (ICIP ’07). Vol. 2. IEEE, San Antonio, TX, USA, II–469–II–472.
Mahmood, A. and Khan, S. 2007b. Exploiting inter-frame correlation for fast
video to reference image alignment. Lecture Notes in Computer Science, Asian
Conference on Computer Vision (ACCV 2007) 4843, 647–656.
Mahmood, A. and Khan, S. 2008. Exploiting local auto-correlation function for
fast video to reference image alignment. In IEEE International Conference on
Image Processing (ICIP ’08). IEEE, San Diego, CA, 2412 –2415.
Mahmood, A. and Khan, S. 2009. Early terminating algorithms for adaboost
based detectors. In IEEE International Conference on Image Processing (ICIP
’09). IEEE, Cairo, Egypt, 1209 –1212.
Mahmood, A. and Khan, S. 2010. Exploiting transitivity of correlation for fast
template matching. IEEE Transactions on Image Processing 19, 8 (August 2010),
2190 –2200.
Mahmood, A. and Khan, S. 2011. Correlation coefficient based fast template
matching through partial elimination. Accepted for Publication in IEEE Transac-
tions on Image Processing x, y (May), xyz.
Mahmood, A., Uzmi, Z., and Khan, S. 2007. Video coding with linear compen-
sation (vclc). In IEEE International Conference on Communications,(ICC ’07).
IEEE, Glasgow, UK, 6220 –6225.
Manduchi, R. and Mian, G. A. 1993. Accuracy analysis for correlation based
image registration algorithms. IEEE ISCAS , 834–837.
277
Mattoccia, S., Tombari, F., and Di Stefano, L. 2008a. Fast full-search equiv-
alent template matching by enhanced bounded correlation. IEEE Trans. Image
Processing 17, 4, 528–538.
Mattoccia, S., Tombari, F., and Di Stefano, L. 2008b. Reliable rejection of
mismatching candidates for efficient ZNCC template matching. In International
Conference on Image Processing. IEEE, San Diego, CA, 849–852.
Montgomery, D. C. and Peck, E. A. 1982. Introduction to Linear Regression
Analysis. John Wiley and Sons, Inc., New York,USA.
Montrucchio, B. and Quaglia, D. 2005. New sorting-based lossless motion
estimation algorithms and a partial distortion elimination performance analysis.
IEEE Trans. Circuits Syst. Video Technol. 15, 2, 210–220.
Mukherjee, D. P. and Acton, S. T. 2002. Cloud tracking by scale space classi-
fication. IEEE Trans. Geosci. Remote Sensing 40, 2 (February), 405–415.
N. Bryant, A. Zobrist, T. L. 2003. Automatic co-registration of space-based
sensors for precision change detection and analysis. In Proc. IEEE Int. Geoscience
and Remote Sensing Symp. IGARSS ’03. IEEE, Centre de Congre‘s Pierre Baudis
Toulouse, France, 1371–1373.
NiBlack, W. 1986. An introduction to digital image processing. Prentice Hall ,
115–116.
Nillius and Eklundh, J. O. 2002. Fast block matching with normalized cross cor-
relation using walsh transforms. TRITA-NA-P02/11, ISRN KTH/NA/P–02/11–
SE .
Ninnis, R. M., Emery, W. J., and Collins, M. J. 1986. Automated extraction
of pack ice motion from avhrr imagery. J. Geophys. Res. 91, 10, 725–734.
N.Otsu. 1979. A threshold selection method from gray-level histograms. IEEE
transactions on Systems, man and Cybernetics 9, 1, 62–66.
278
Oller, G., Marthon, P., and Rognant, L. 2003. Correlation and similarity
measures for sar image matching. In 10th Int. Sym. on Remote Sensing. SPIE,
Barcelone, Spain.
Orchard, M. T. and Sullivan, G. J. 1994. Overlapped block motion compensa-
tion: an estimation theoratic approach. IEEE Trans. Image Processing 3, 5 (Sep),
693–699.
Ouchi, K., Maedoi, S., and Mitsuyasu, H. 1999. Determination of ocean wave
propagation direction by split-look processing using jers-1 sar data. IEEE Trans.
Geosci. Remote Sensing 37, 2, 849–855.
Po, L. M. and Ma, W. C. 1996. A novel four-step search algorithm for fast block
motion estimation. IEEE Trans. Circuits System, Video Technology 9, 2, 287290.
Pope, P. A. and Emery, W. J. 1994. Sea surface velocities from visible and
infrared multispectral atmospheric mapping sensor (mams) imagery. IEEE Trans.
Geosci. Remote Sensing 32, 1, 220–223.
Pratt, W. K. 2007. Digital Image Processing, 4th Edition. Wiley Interscience, New
Jersey, USA.
Puri, A., Hang, H. M., and Schilling, D. L. 1987. An efficient block matching
algorithm for motion compensated coding. Proc. IEEE ICASP , 25.4.1–25.4.4.
Puymbroeck, N. V., Michel, R., Binet, R., Avouac, J. P., and Taboury, J.
2000. Measuring earthquakes from optical satellite images. Applied Optics 39, 20,
3486–3494.
P.Viola and Jones, M. 2001. Rapid object detection using a boosted cascade of
simple features. In IEEE CVPR. IEEE Computer Society, Kauai, Hawaii, USA.
P.Viola and Jones, M. 2004. Robust real time face detection. International
Journal on Computer Vision (IJCV) 57, 2, 137–154.
Quaglia, D. and Montrucchio, B. 2001. Sobol partial distortion algorithm
for fast full search in block motion estimation. Proc. 6th Eurographics Workshop
Multimedia, 77–84.
279
Rader, C. M. and Brenner, N. M. 1976. A new principle for fast fourier transfor-
mation. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-24,
264–266.
Ramachandran, S. and Srinivasan, S. 2001. Fpga implementation of a novel,
fast motion estimation algorithm for real-time video compression. Ninth Interna-
tional Symposium on FPGAs 2, 287290.
Reddy, B. S. and Chatterji, B. N. 1996. An fft-based technique for translation,
rotation, and scale-invariant image registration. IEEE Trans. Image Processing 5, 8
(August), 1266–1271.
Rietz, H. L. 1919. On functional relations for which coefficient of correlation is zero.
Publications of the American Statistical Association 1, 472–476.
Robinson, D. and Milanfar, P. 2004. Fundamental performance limits in image
registration. IEEE Trans. Image Processing 13, 9 (September), 1185–1199.
Roche, A., Malandain, G., and Ayache, N. 2000. Unifying maximum likelihood
approaches in medical image registration. International Journal of Imaging Systems
and Technology: Special issue on 3D imaging 11, 71–80.
Roche, A., Malandain, G., Ayache, N., and Prima, S. 1999. Towards a
better comprehension of similarity measures used in medical image registration. In
Proc. 2th MICCAI. Lecture Notes in Computer Science, vol. 1679. Springer Verlag,
Cambridge, United Kingdom, 555–566.
Roche, A., Malandain, G., Pennec, X., and Ayache, N. 1998. The corre-
lation ratio as a new similarity measure for multimodal image registration. In
Proc. 1st MICCAI. Lecture Notes in Computer Science, vol. 1496. Springer Verlag,
Cambridge, MA, 1115–1124.
Rodgers, J. L. and Nicewander, W. A. 1988. Thirteen ways to look at corre-
lation coefficient. The American Statistician 42, 59–66.
Roma, N., Santos-Victor, J., and Tom, J. 2000. A comparative analysis of
cross-correlation matching algorithms using a pyramidal resolution approach. In
280
2nd Workshop on Empirical Evaluation Methods in Computer Vision. World Sci-
entific Press, 117–142.
Scambos, A, T., Dutkiewicz, J, M., Wilson, C, J., Bindschadler, and A, R.
1992. Application of image cross-correlation to the measurement of glacier velocity
using satellite image data. Remote Sensing of Environment 42, 3, 177–186.
Schweitzer, H., Bell, J., and Wu, F. 2002. Very fast template matching. In
ECCV. IV: 358 ff.
Shah, M. and Kumar, R. 2003a. Video Registration. Kluwer Academic Publishers,
Boston.
Shah, M. and Kumar, R. 2003b. Video Registration. Kluwer Academic Publishers,
Boston.
Shanableh, T. and Ghanbari, M. 2000. Heterogeneous video transcoding to
lower spatio-temporal resolutions and different encoding formats. IEEE Trans.
Multimedia 2, 2, 101110.
Sheikh, Y., Khan, S., and Shah, M. 2004. Feature-Based Georegistration of
Aerial Images. A. Stefanidis and S. Nittel (eds.) Geosensor Networks, Boca Raton,
Florida: CRC Press. ISSN 0415324041.
Sheikh, Y. and Shah, M. 2004. Aligning dissimilar images directly. In ACCV.
Asian Federation of Computer Vision Societies, Jeju, Korea.
Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE CVPR. IEEE,
Seattle, WA, USA.
Shum, H. Y. and Szeliski, R. 2000. Systems and experiment paper: Construction
of panoramic image mosaics with global and local alignment. IJCV 36, 2, 101–130.
Sigley, D. T. and Stratton, W. T. 1942. Solid Geometry and Mensuration.
Dryden Press, Inc., New York.
Simonetto, E., Oriot, H., and Garello, R. 2005. Rectangular building ex-
traction from stereoscopic airborne radar images. IEEE Trans. Geosci. Remote
Sensing 43, 10, 2386–2395.
281
Singleton, R. C. 1969. An algorithm for computing the mixed radix fast fourier
transform. IEEE Transactions on Audio and Electroacoustics 17, 2 (June), 93–103.
Snedecor, G. W. and Cochran, W. G. 1968. Statistical Methods , 6 ed. The
Iowa State University Press,, Ames, Iowa, USA.
Sorensen, H. V., Heideman, M. T., and Burrus, C. S. 1986. On computing
the split-radix fft. IEEE Trans. Acoust., Speech, Signal Processing 34, 1, 152–156.
Sotos, A. E. C., Vanhoof, S., Noortgate, W. V. D., and Onghena, P.
2007. The non-transitivity of pearson’s correlation coefficient: An educational
perspective. International Statistical Institute, 56th Session.
Sotos, A. E. C., Vanhoof, S., Noortgate, W. V. D., and Onghena, P.
2009. The transitivity misconception of pearsons correlation coefficient. Statisics
Education Research Journal 8, 2, 33–55.
Spigel, M. R. and Stephens, L. J. 1990. Schaum’s Out Lines STATISTICS , 3
ed. Tata McGraw-Hill Publishing Company Ltd., New York.
Srinivasan, R. and Rao, K. R. 1985. Predictive coding based on efficient motion
estimation. IEEE Trans. Commun. COM-33, 8 (August), 888–896.
Stefano, L. D. and Mattoccia, S. 2003. Fast template matching using bounded
partial correlation. Machine Vision and Applications 13, 213–221.
Strozzi, T., Luckman, A., Murray, T., Wegmller, U., and Werner, C. L.
2002. Glacier motion estimation using sar offset-tracking procedures. IEEE Trans.
Geosci. Remote Sensing 40, 11 (Nov), 2384–2391.
Su, J. K. and Mersereau, R. M. 2000. Motion estimation methods for overlapped
block motion compensation. IEEE Trans. Image Processing 9, 9 (September),
1509–1521.
Sun, S., Park, H., Haynor, D. R., and Kim, Y. 2003. Fast template matching
using correlation based adaptive predictive search. Int. J. Img. Sys. Tech., Wiley
InterScience.
282
Svedlow, M., McGillem, C. D., and Anuta, P. E. 1976. Experimental ex-
amination of similarity measures and preprocessing methods used for image reg-
istration. In Symposium on Machine Processing of Remotely Sensed Data. The
Laboratory for Applications of Remote Sensing, Purdu University, West Lafayette,
Indiana.
Svedlow, M., McGillem, C. D., and Anuta, P. E. 1978. Image registration:
Similarity measure and preprocessing method comparisons. IEEE Transactions on
Aerospace and Electronic Systems 14, 1 (January), 141–150.
T. M. Cover, J. A. T. 1991. Elements of Information Theory. John Wiley and
Sons, New York.
Thomas, L. H. 1963. Using a computer to solve problems in physics. Applications
of Digital Computers .
Tokmakian, R., Strub, P. T., and McClean-Padman, J. 1990. Evaluation
of the maximum crosscorrelation method of estimating sea surface velocities from
sequential satellite images. J. Geophys. Res. 7, 852–865.
Townshend, J., Justice, C., Gurney, C., and McManus, J. 1992. The impact
of misregistration on change detection. IEEE Trans. Geosci. Remote Sensing 30, 5
(Sep), 1054–1060.
Turin, G. L. 1960. An introduction to matched filters. IRE Transactions on Infor-
mation Theory 6, 3 (September), 311–329.
Vachon, P. W. and Raney, R. K. 1991. Resolution of the ocean wave propagation
direction in sar imagery. IEEE Trans. Geosci. Remote Sensing 29, 105–112.
Vachon, P. W. and West, J. C. 1992. Spectral estimation techniques for multi-
look sar images of ocean waves. IEEE Trans. Geosci. Remote Sensing 30, 568–577.
Vanderbrug, G. and Rosenfeld, A. 1977. Two-stage template matching. IEEE
Trans. Comput. 26, 4 (April), 384–393.
Vanderburg, G. J. and Rosenfeld, A. 1977. Two-stage template matching.
IEEE Trans. Comput. 26, 384–393.
283
Vanne, J., Aho, E., Hamalainen, T. D., and Kuusilinna, K. 2006. A high-
performance sum of absolute difference implementation for motion estimation.
IEEE Trans. Circuits Syst. Video Technol. 16, 7 (July), 876–883.
Vetterli, M. and Nussbaumer, H. J. 1984. Simple fft and dct algorithms with
reduced number of operations. Signal Processing 6, 4, 267–278.
Vincent, E. and Lagani‘ere, R. 2001. Matching feature points in stereo pair a
comparative study of some matching strategies. Machine Graphics and Vision 10, 3,
237–259.
Vincenzo, R. and Lisa, U. 2007. An improvement of adaboost for face-detection
with motion and color information. ICIAP .
Wang, H. and Mersereau, R. 1999. Fast algorithms for the estimation of motion
vectors. IEEE Trans. Image Processing 8, 3, 435–438.
William, P., Saul, T., William, V., and Brian, F. 2007. Numerical Recipes:
The Art of Scientific Computing , 3nd ed. Cambridge University Press, Cambridge,
UK.
Wu, B., Haizhou, Huang, C., and Lao, S. 2004. Fast rotation invariant multi-
view face detection based on real adaboost. IEEE FG .
Wu, Q. X. 1995a. A correlation-relaxation-labeling framework for computing optical
flow - template matching from a new perspective. IEEE Trans. Pattern Anal.
Machine Intell. 17, 8 (September), 843–853.
Wu, Q. X. 1995b. A correlation-relaxation-labeling framework for computing optical
flow - template matching from a new perspective. IEEE Trans. Pattern Anal.
Machine Intell. 17, 8 (September), 843–853.
Wu, Q. X. and Pairman, D. 1995. A relaxation-labeling technique for computing
sea surface velocities from sea surface temperature. IEEE Trans. Geosci. Remote
Sensing 33, 1 (January), 216–220.
284
Wu, Q. X., Pairman, D., McNeill, S., and Barnes, E. J. 1992. Computing
advective velocities from satellite images of sea surface temperature. IEEE Trans.
Geosci. Remote Sensing 30, 166–176.
Xiao, J. and Shah, M. 2003. Two-frame wide baseline matching. In The
Ninth IEEE International Conference on Computer Vision (ICCV’03). IEEE,
Nice,France.
Yavne, R. 1968. A modified split-radix fft with fewer arithmetic operations. Proc.
AFIPS Fall Joint Computer Conf. 33, 115–125.
Yoshimura, S. and Kanade, T. 1994. Fast template matching based on the
normalized correlation by using multiresolution eigenimages. IEEE/RSJ/GI Int.
Conf. Int. Rob. and Sys.(IROS ’94) 3, 2086 – 2093.
Zhu, C., Qi, W., and Ser, W. 2005. Predictive fine granularity successive elim-
ination for fast optimal block-matching motion estimation. IEEE Trans. Image
Processing 14, 2 (February), 213–220.
Zhu, S. and Ma, K. K. 2000. A new diamond search algorithm for fast block-
matching motion estimation. IEEE Trans. Image Processing 9, 2, 287290.
Ziltova, B. and Flusser, J. 2003. Image registration methods: A survey. Image
and Vision Computing 21, 977–1000.