prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/1593/1/1263s.pdf · final defense committee...

Computation EliminationAlgorithms for Correlation Based

Fast Template Matching

By

Arif Mahmood

Dissertation

Presented to the

Department of Computer Science,

School of Science and Engineering

Lahore University of Management Sciences

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Submission Date: 28 January 2011

c© Copyright 2011

ByArif Mahmood

i

Final Defense Committee Members

Dr. Sohaib Khan,Associate Professor, Department of Computer Science, LUMS

Dr. Arif ZamanProfessor Department of Computer Science, LUMS

Dr. Nadeem Ahmad KhanAssociate Professor, Department of Computer Science, LUMS

Dr. Mian Muhammad AwaisAssociate Professor, Department of Computer Science, LUMS

Dr. Ashfaq KhokharProfessor University of Illinois at Chicago

Dr. Shahab Munir BaqaiAssociate Professor, Department of Computer Science, LUMS

The Acknowledgements

I gratefully acknowledge the contributions and assistance of my teachers, friends andfamily members who enabled me to complete a good quality PhD. I wish to conveymy gratitude to all of them.

I am especially thankful to my supervisor, Dr. Sohaib A Khan, whose encouragement,guidance and support was essential to complete this thesis.

I would also like to thank Dr. Mansoor Sarwar for his generous support and kindness,which made my life much easier.

I am thankful to Dr. Shahab Munir Baqai for managing my thesis review processand thesis defense. I convey my gratitude to the Final Defense Committee (FDC)members including Dr. Mian Muhammad Awais, Dr. Nadeem Khan and Dr. ArifZaman. I am also obliged by the support of my external FDC member, Dr. AshfaqKhokhar and my reviewers, Dr. Nasir Rajpoot, Dr. Mubarak Shah. Thank you somuch to the FDC members and the reviewers for sparing time and making sincereefforts to improve the quality of my PhD thesis.

I also wish to convey my gratitude to Dr. Javed Saim for strengthening my beliefsfor success and for providing me emotional reinforcement.

I am also thankful to Dr. Murtaza Taj for his efforts in improving the quality of myPhD thesis defense presentation. I am thankful to my fellow students, Ijaz Akhtar,Aamer Zaheer and others for their support during my PhD program. I am especiallythankful to Mr. Numan Sheikh for sharing latex files which I have used to write thisthesis.

In the end I would like to acknowledge the sacrifices done by my family and by myparents. I am thankful to my wife for managing the home and the children educationwhile I was busy in my studies.

Despite all of my hardwork and efforts and help by a number of people, I am fullyconvinced that successful completion of such a good quality PhD is a blessing andfavor of God.

My Lord! Grant me the power and ability that I may be grateful for your favorswhich you have bestowed on me and on my parents, and that I may do righteousgood deeds that will please you, and admit me by your mercy among your righteousslaves.

Arif Mahmood

Computation EliminationAlgorithms for Correlation Based

Fast Template Matching

Arif Mahmood

Department of Computer Science,

School of Science and Engineering

Ph.D. Dissertation, Submission Date: 28 January 2011

ABSTRACT

Template matching is frequently used in Digital Image Processing, Machine Vision,Remote Sensing and Pattern Recognition, and a large number of template matchingalgorithms have been proposed in literature. The performance of these algorithmsmay be evaluated from the perspective of accuracy as well as computational complex-ity. Algorithm designers face a tradeoff between these two desirable characteristics;often, fast algorithms lack robustness and robust algorithms are computationally ex-pensive.

The basic problem we have addressed in this thesis is to develop fast as well as robusttemplate matching algorithms. From the accuracy perspective, we choose correlationcoefficient to be the match measure because it is robust to linear intensity varia-tions often encountered in practical problems. To ensure computational efficiency,we choose bound based computation elimination approaches because they allow highspeed up without compromising accuracy. Most existing elimination algorithms arebased on simple match metrics such as Sum of Squared Differences and Sum of Ab-solute Differences. For correlation coefficient, which is a more robust match measure,very limited efforts have been done to develop efficient elimination schemes.

The main contribution of this thesis is the development of two different categoriesof bound based computation elimination algorithms for correlation coefficient basedfast template matching. We have named the algorithms in the first category asTransitive Elimination Algorithms (Mahmood and Khan, 2007b, 2008, 2010), becausethese are based on transitive bounds on correlation coefficient. In these algorithms,before computing correlation coefficient, we compute bounds from neighboring search

2

locations based on transitivity. The locations where upper bounds are less than thecurrent known maximum are skipped from computations, as they can never becomethe best match location. As the percentage of skipped search locations increases,the template matching process becomes faster. Empirically, we have demonstratedspeedups of up to an order of magnitude compared to existing fast algorithms withoutcompromising accuracy. The overall speedup depends on the tightness of transitivebounds, which in turn is dependent on the strength of autocorrelation between nearbylocations.

Although high autocorrelation, required for efficiency of transitive algorithms, ispresent in many template matching applications, it may not be guaranteed in gen-eral. We have developed a second category of bound based computation eliminationalgorithms, which are more generic and do not require specific image statistics, suchas high autocorrelation. We have named this category as Partial Correlation Elimina-tion algorithms (Mahmood and Khan, 2007a, 2011). These algorithms are based on amonotonic formulation of correlation coefficient. In this formulation, at a particularsearch location, correlation coefficient monotonically decreases as consecutive pixelsare processed. As soon as the value of partial correlation becomes less than the cur-rent known maximum, the remaining computations are skipped. If a high magnitudemaximum is found at the start of the search process, the amount of skipped compu-tations significantly increases, resulting in high speed up of the template matchingprocess. In order to locate a high maximum at the start of search process, we havedeveloped novel initialization schemes which are effective for small and medium sizedtemplates. For commonly used template sizes, we have demonstrated that PCE al-gorithms out-perform existing algorithms by a significant margin.

Beyond the main contribution of developing elimination algorithms for correlation,two extensions of the basic theme of this thesis have also been explored. The firstdirection is to extend elimination schemes for object detection. To this end, wehave shown that the detection phase of an AdaBoost based edge corner detector(Mahmood, 2007; Mahmood and Khan, 2009) can be significantly speeded up byadapting elimination strategies to this problem. In the second direction we provethat in video encoders, if motion estimation is done by maximization of correlationcoefficient and motion compensation is done by first order linear estimation, the vari-ance of the residue signal will always be less than the existing motion compensationschemes (Mahmood et al., 2007). This result may potentially be used to increasecompression of video signal as compared to the current techniques. The fast corre-lation strategies, proposed in this thesis, may be coupled with this result to developcorrelation-based video encoders, having low computational cost.

Contents

1 Introduction 9

1.1 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.1.1 Transitive Bounds on Correlation Based Image Match Measures 19

1.1.2 Transitive Elimination Algorithms . . . . . . . . . . . . . . . . 20

1.1.3 Basic Mode Partial Correlation Elimination Algorithm . . . . 23

1.1.4 Extended Mode Partial Correlation Elimination Algorithm . . 26

1.1.5 Elimination Algorithms for Fast Object Detection . . . . . . . 27

1.1.6 Video Coding with Linear Compensation . . . . . . . . . . . . 29

1.2 Organization of Rest of the Thesis . . . . . . . . . . . . . . . . . . . . 30

2 A Review of the Commonly Used Image Match Measures 32

2.1 City Block Distance Measure . . . . . . . . . . . . . . . . . . . . . . 34

2.2 Euclidian Distance Measure . . . . . . . . . . . . . . . . . . . . . . . 36

2.3 Minkowski Distance Measure . . . . . . . . . . . . . . . . . . . . . . . 38

2.4 Angular Distance Measure . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4.1 Relationship between Standardized Angular Distance and Stan-

dardized Euclidean Distance . . . . . . . . . . . . . . . . . . . 41

2.5 Correlation Based Similarity Measures . . . . . . . . . . . . . . . . . 42

2.5.1 Relationship between Correlation and Angular Distance Mea-

sure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.2 Relationship between Correlation and Euclidean Distance Mea-

sure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3

4

2.5.3 Correlation Coefficient as a Measure of Strength of Linear Re-

lationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6 Correlation Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.6.1 Derivation of Correlation Ratio Formulation from Functional

Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.6.2 Relationship between Correlation Ratio and Correlation Coef-

ficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.7 Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . . 60

2.7.1 Relationship between mutual information and Correlation Co-

efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3 Computational Aspects of Commonly Used Image Match Measures 67

3.1 Fast Approximate Image Matching Techniques . . . . . . . . . . . . . 70

3.1.1 Search Space Approximation Techniques . . . . . . . . . . . . 70

3.1.2 Algorithms Using Approximate Image Representations . . . . 76

3.2 Fast Exhaustive Accuracy Image Matching in Frequency Domain . . 78

3.2.1 Fast Fourier Transform (FFT) Algorithms . . . . . . . . . . . 79

3.2.2 Image Matching by Correlation Theorem . . . . . . . . . . . . 81

3.2.3 Image Matching by Phase Only Correlation . . . . . . . . . . 84

3.3 Fast Exhaustive Spatial Domain Techniques . . . . . . . . . . . . . . 87

3.3.1 Efficient Rearrangement of Match Measure Formulation . . . . 87

3.3.2 Integral Image Approach . . . . . . . . . . . . . . . . . . . . . 89

3.3.3 Running Sum Approach . . . . . . . . . . . . . . . . . . . . . 90

3.4 Bound Based Computation Elimination Algorithms . . . . . . . . . . 93

5

3.4.1 Successive Similarity Detection Algorithms . . . . . . . . . . . 97

3.4.2 Partial Correlation Elimination Algorithms . . . . . . . . . . . 104

3.4.3 Successive Elimination Algorithms . . . . . . . . . . . . . . . 104

3.4.4 Enhanced Bounded Partial Correlation Elimination Algorithm 107

3.4.5 Transitive Elimination Algorithms . . . . . . . . . . . . . . . . 111

3.4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . 112

4 Transitive Bounds on the Correlation Based Measures 115

4.1 Derivation of Angular Distance Based Transitive Bounds . . . . . . . 117

4.2 Derivation of Euclidean Distance Based Transitive Bounds . . . . . . 121

4.3 Visualization of Transitive Bounds on Correlation . . . . . . . . . . . 125

4.3.1 Visualization of Angular Distance Based Transitive Bounds . . 125

4.3.2 Visualization of Euclidean Distance Based Bounds . . . . . . . 127

4.4 Tightness of Euclidean and Angular Distance Based Transitive Bounds 131

4.4.1 Comparison of Upper Transitive Bounds . . . . . . . . . . . . 133

4.4.2 Comparison of Lower Transitive Bounds . . . . . . . . . . . . 134

4.5 Tightness Analysis of Angular Distance Based Transitive Bounds . . 137

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5 Transitive Elimination Algorithms for Correlation Based Measures141

5.1 Exploiting Strong Intra-Reference Autocorrelation . . . . . . . . . . . 143

5.2 Exploiting Strong Inter-Reference Auto-Correlation . . . . . . . . . . 150

5.3 Exploiting Strong Inter-Template Auto-Correlation . . . . . . . . . . 151

5.4 Experiments with Transitive Elimination Algorithms . . . . . . . . . 154

6

5.4.1 Experiments with Intra-Reference Auto-correlation . . . . . . 159

5.4.2 Experiments with Inter-Reference Auto-correlation: Fast Fea-

ture Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

5.4.3 Experiments with Inter-Reference Auto-correlation: Fast Com-

ponent Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.4.4 Experiments with Inter-Template Auto-correlation: Video Geo-

registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.4.5 Experiments with Inter-Template Auto-correlation: Rotation /

Scale Invariant Template Matching . . . . . . . . . . . . . . . 173

5.4.6 Performance Comparison of Different Correlation Based Measures175

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

6 Partial Correlation Elimination Algorithms 177

6.1 Monotonic Formulation of Correlation Coefficient . . . . . . . . . . . 181

6.2 Basic Mode Partial Correlation Elimination Algorithm . . . . . . . . 182

6.3 Two-Stage Basic Mode PCE Algorithm . . . . . . . . . . . . . . . . . 183

6.4 Overheads of Basic Mode PCE Algorithm . . . . . . . . . . . . . . . 185

6.5 Experiments with Basic Mode PCE Algorithms . . . . . . . . . . . . 187

6.5.1 Block Motion Estimation Experiments Using Basic Mode PCE 188

6.5.2 Feature Matching Experiments Using Basic Mode PCE Algorithm190

6.5.3 Feature Tracking Experiments Using Two-stage Basic Mode

PCE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 194

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7 Extended Mode Partial Correlation Elimination Algorithms 201

7.1 Extended Mode PCE Algorithm . . . . . . . . . . . . . . . . . . . . 203

7

7.2 PCE Mode Selection and Finding Efficient Testing Scheme . . . . . . 206

7.3 Initialization Schemes for Extended Mode PCE Algorithm . . . . . . 212

7.3.1 Extended Mode Multi-Stage PCE Algorithm . . . . . . . . . . 213

7.3.2 Initialization of Extended Mode PCE with Coarse-to-Fine Scheme

215

7.4 Experiments with Extended Mode PCE algorithm . . . . . . . . . . . 217

7.4.1 Feature Tracking with Extended Mode Two-stage PCE Algorithm217

7.4.2 Template Matching with Extended Mode Two-Stage PCE Al-

gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

7.4.3 Coarse-to-Fine Initialization of Extended Mode PCE Algorithm224

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

8 Computation Elimination Algorithms for AdaBoost Based Detec-

tors 232

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

8.3 AdaBoost Global Threshold Based Early Termination Algorithm . . . 236

8.4 Early Non-Maxima Suppression Algorithm . . . . . . . . . . . . . . . 238

8.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 241

8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

9 Use of Correlation Coefficient for Video Encoding 243

9.1 Block Based Motion Compensation in Video Encoders . . . . . . . . 244

9.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

8

9.3 Maximization of Gain Guaranteed by Maximization of Correlation Co-

efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

9.4 Video Coding with Linear Compensation (VCLC) . . . . . . . . . . . 249

9.4.1 Motion Compensation using Linear Estimator . . . . . . . . . 250

9.4.2 Motion Estimation with Correlation Coefficient . . . . . . . . 252

9.5 Video Coding With Linear Compensation: System Overview . . . . . 252

9.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 254

9.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

10 Conclusions and Future Directions 259

10.1 Transitive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

10.2 PCE Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

10.3 Limitations of TEA and PCE Algorithms . . . . . . . . . . . . . . . . 262

10.4 Elimination Algorithms for Object Detection . . . . . . . . . . . . . . 262

10.5 Using Correlation Coefficient in Video Coding . . . . . . . . . . . . . 262

10.6 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Chapter 1

INTRODUCTION

Image comparison is a fundamental operation in visual information processing sys-

tems. In daily activities, we frequently compare newly observed images with those

which we have already observed and stored in our memories. The way humans per-

ceive images is quite different from the way computers process the digital image data.

Human image perception is based on patterns as a whole and the combined effect of

colors, while computers can only see images in the form of large arrays of numbers.

Therefore, the bulk of image comparisons done on computers, are mostly based on

pixel by pixel matching, and some image match measure is computed for establishing

the proximity between the two images. Image matching has often been used for the

purpose of image alignment and registration in numerous machine vision applications.

Image matching is frequently used in the areas of Image Processing, Machine Vision,

Remote Sensing, Pattern Recognition and Digital Signal Processing. Typical appli-

cations of image matching include image registration and alignment, object detection

and identification, content based image retrieval, image and video compression, object

tracking, and computing 3D structure. In all of these applications, similarity compu-

tation is an important part of the overall problem, often termed template matching.

Typically, a small template image is compared against multiple windows in a larger

reference image to evaluate an image match measure. The window which yields the

best similarity score is selected as the match location. Although typical template

matching applications may require finding a particular target in a larger image, tem-

plate matching has often been used in many other scenarios, for example, finding

point correspondences between multiple images needed to estimate the fundamental

matrix in stereo problem, or correspondence matching to compute geometric trans-

formations between two images. In video encoding, block motion estimation may also

be considered as a form of template matching.

9

10

Image match measures may be broadly divided into two categories: geometric mea-

sures computed through pixel intensity values and information theoretic measures

based on some image statistic. Some basic geometric measures include city block dis-

tance and Euclidean distance. These distance measures are defined by considering an

image as a point in a high dimensional space. Larger the distance between two points

in that space, more dissimilar the corresponding images will be. Correlation based

measures are also geometric measures, and these are defined by considering an image

as a vector in a high dimensional space. Correlation between two images is the inner

product of two image vectors. Higher correlation means higher matching between the

corresponding images. Geometric measures are generally used for the matching of

uni-modal images acquired by an optical still camera or digital video camera. These

measures are also frequently used for matching remotely sensed satellite images. In

the second category of image match measures, known as information theoretic mea-

sures, entropy and mutual information are the most commonly used image match

measures. Entropy and mutual information are computed from the image histograms

and are used for matching multi-modal medical images, for example, for matching of

MRI and PETS images. The main focus of this thesis is fast computation of some of

the geometric measures, while information theoretic measures will only be discussed

for the purpose of completeness.

In image matching applications, if the search for the best-match location is done

exhaustively over the entire search space, the image matching process turns out to

be computationally expensive. Therefore, template matching applications often em-

phasize reduction of the image matching cost by using approximate algorithms which

either approximate the search space with a smaller one, or approximate image and the

match measure by their simpler versions. A large volume of research about approx-

imate image matching schemes may be found in literature with high speedups and

varying levels of accuracy. Since approximate schemes cannot guarantee the global

maximum to be found, therefore such schemes are not suited for mission critical ap-

plications which may require high accuracy along with low computational complexity.

Bound based computation elimination schemes are a viable option for mission critical

applications, because these schemes guarantee the same accuracy as that of exhaustive

template matching schemes along with high speedups. In computation elimination

11

algorithms, instead of performing actual match measure computations, an alternate

bounding statistic is computed. Actual computations may be skipped partially or

even completely as a result of comparison of the bounding statistic with the partial

computation result or with a previous known result. Elimination algorithms offer-

ing the opportunity of skipping all of the direct computations at a search location,

may be named as complete elimination algorithms, while if only a part of the direct

computations may be skipped, the algorithms may be named as partial elimination

algorithms. If a partial elimination algorithm has been used for image matching,

then computations at a particular search location may be prematurely terminated as

soon as it is determined that the current location cannot compete an already known

best-match location. The decision to skip remaining computations is based on the

comparison of the current known maximum and the bounding statistic. In case of

complete elimination algorithms, this comparison is performed before starting the

actual match measure computations and based on the comparison result, complete

computations may be skipped.

A variety of elimination schemes have been developed for simple image match mea-

sures including SAD and SSD. Complete elimination algorithms developed for these

measures include Successive Elimination Algorithms (SEA) and its variants by Li

and Salari (1995) and Wang and Mersereau (1999). Triangular inequality based

techniques have been proposed by Kawanishi et al. (2004) and Brunig and Niehsen

(2001). Partial elimination algorithms, also known as Partial Distortion Elimination

(PDE) and Successive Similarity Detection Algorithms (SSAD) have been proposed

by Barnea and Silverman (1972); Montrucchio and Quaglia (2005); Cheung and Po

(2003); Hel-Or and Hel-Or (2005). In each case, by skipping computations, elimi-

nation algorithms reduce the computational complexity while guaranteeing that the

result of the best-match location will not be compromised.

Simple image match measures, for example SAD and SSD are not robust to image

brightness and contrast variations which occur in many practical situations. Correla-

tion coefficient is a more robust similarity measure, which denotes the strength of the

linear relationship between two image blocks and therefore it is invariant to the linear

intensity distortions. Correlation coefficient is preferred if significant intensity distor-

tions are present across the images to be matched, as indicated by several researchers,

12

for example, see (Brown, 1992; Ziltova and Flusser, 2003; Svedlow et al., 1976, 1978;

Leese et al., 1971; Dew and Holmlund, 2000; Pratt, 2007; Chalermwat, 1999). Thus in

most of the applications in which image matching is a challenging problem, correlation

coefficient has been used as a preferred similarity measure. Examples include change

detection (Townshend et al., 1992; N. Bryant, 2003; Dierking and Skriver, 2002),

motion estimation (Foster, 2005), glacier surface movement detection (Evans, 2000;

Scambos et al., 1992; Strozzi et al., 2002), ice motion estimation (Ninnis et al., 1986;

Emery et al., 1991), earthquake damage assessment (Avouac et al., 2006; Puymbroeck

et al., 2000), flood erosion and sedimentation analysis (Crowell et al., 2003), cloud

motion vector estimation (Mukherjee and Acton, 2002; Wu, 1995a; Leese et al., 1971;

Dew and Holmlund, 2000; Eumetsat, 1998, 2000; Feind and Welch, 1995), sea cur-

rents tracking from sea surface temperature images (Wu and Pairman, 1995; Emery

et al., 1986; Garcia and Robinson, 1989b; Kamachi, 1989; Wu et al., 1992; Bowen

et al., 2002; Emery et al., 2004), ocean wave propagation direction estimation from

split look images (Ouchi et al., 1999; Vachon and Raney, 1991; Vachon and West,

1992), sea surface velocities estimation from sequential satellite images (Pope and

Emery, 1994; Tokmakian et al., 1990; Garcia and Robinson, 1989a), satellite image

mosaics (Du et al., 2001; Coorg and Teller, 2000; Shum and Szeliski, 2000), image

and video geo-registration (Sheikh et al., 2004; Sheikh and Shah, 2004; Shah and

Kumar, 2003a; Irani and Anandan, 1998), crop identification in remotely sensed im-

ages (Svedlow et al., 1976), automatic image navigation (Emery et al., 2003), feature

matching under variable viewing conditions (Vincent and Lagani‘ere, 2001; Jin et al.,

2001), automatic control point extraction (Kim and Im, 2003; Oller et al., 2003; Caves

et al., 1992), underground target detection in cross bore hole configuration (Ellis and

Peden, 1997), topographic data estimation from SAR (Li and Goldstein, 1990), rect-

angular building extraction from SAR airborne systems (Simonetto et al., 2005), and

linear infrastructure mapping using airborne video imagery (Dare and Fraser, 2000).

Note that this is only a partial list of the applications using correlation coefficient as

the preferred similarity measure.

Although correlation coefficient is a more robust match measure as compared to SAD

and SSD and has also been extensively used in numerous image matching applica-

tions, however it has been criticized for its high computational complexity (Brown,

13

1992; Ziltova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a;

Chalermwat, 1999). Traditionally, correlation coefficient implementations are based

on Fast Fourier Transform (FFT) and significant efforts have been made to reduce

the time complexity of FFT, for example, see (Frigo and Johnson, 1998; Frigo, 1999;

Frigo and Johnson, 2005). However, as the template size reduces the computational

advantage of frequency domain over the spatial domain decreases and for small tem-

plate sizes, spatial domain implementations become faster. Another scenario in which

FFT based implementations may not be efficient, is finding point correspondences be-

tween two images. Each feature from one image has to be correlated at only a few

locations in the second image, often selected by a corner detection algorithm. This

may be easily done in the spatial domain while in the frequency domain, complete

computations at all search locations have to be performed. Frequency domain im-

plementations have also been criticized from some other perspectives as well (Barnea

and Silverman, 1972). For example, in most template matching applications, only

the final best match location is of interest, computations done at all remaining search

locations are redundant. While this redundancy can be exploited in the spatial do-

main, computation elimination scheme may not be devised to reduce this redundancy

in the frequency domain. FFT based implementations of correlation may also be crit-

icized for not incorporating the guess about value or location of the maximum which

is available in many template matching applications, for computation reduction pur-

pose.

Partial elimination algorithms as applied to SAD and SSD cannot be extended in

a straight forward manner for correlation coefficient based image matching. This is

because, the growth of the value of correlation coefficient, as corresponding pixels

of the two images are processed, is non monotonic, while the values of SAD and

SSD increase monotonically ensuring the final value of SAD or SSD to be equal or

larger than the intermediate values. Since the best match location for SAD or SSD is

defined as the minimum over the entire search space, the remaining computations may

be eliminated as soon as the current value of distortion exceeds the previous known

minimum. In contrast, due to non-monotonic growth pattern of correlation coefficient,

any intermediate value may not be guaranteed to be larger or smaller than the final

value of correlation coefficient. Secondly, in case of correlation coefficient, the best

14

match location over the entire search space is often defined as the location exhibiting

maximum value of correlation coefficient. Therefore a previously known maximum

may not be exploited to discard the remaining computations at an intermediate stage.

Hence, partial elimination algorithms have been broadly considered inapplicable to

correlation coefficient based template matching as indicated by Brown (1992), Ziltova

and Flusser (2003), Pratt (2007), Barnea and Silverman (1972), Wu (1995a).

Complete elimination algorithms as applied to SAD and SSD, also may not be ex-

tended for correlation coefficient based template matching. This is because, complete

elimination algorithms require tight upper bound on correlation coefficient, which

should also be computable at low computational cost otherwise the benefit of compu-

tation elimination may get eroded by the overhead cost of the bound computation.

No such bound currently exists for correlation coefficient. The well known bound

upon correlation is based on Cauchy Schwartz inequality, which is too loose to yield

any computation elimination. Therefore, due to absence of tight upper bounds upon

correlation, only very limited efforts may be found in literature for the development of

complete elimination algorithms. These efforts include the category of algorithms pro-

posed by Mattoccia et al. (2008b). These algorithms try to tighten Cauchy Schwartz

inequality based bound by using different schemes. The bound proposed by Mattoccia

et al. (2008b) is tight enough to yield elimination, but requires large number of square

root operations which has high computational complexity causing significant bound

computation overhead cost. The algorithm proposed by Mattoccia et al. (2008b) will

be discussed in more detail in Chapter 3.

The main contribution of this thesis is, we have extended the idea of computation

elimination algorithms beyond simple image match measures like SAD and SSD; we

have extended these algorithms for correlation coefficient based fast template match-

ing. To this end, we have developed two different categories of elimination algorithms.

We named these categories as ‘Transitive Elimination Algorithms’ and ‘Partial Cor-

relation Elimination’ algorithms. To obtain high speed up, Transitive elimination

algorithms (Mahmood and Khan, 2007b, 2008, 2010) exploit auto-correlation present

in the nearby search locations, by using the transitivity property of correlation. There-

fore, the speed up performance of these algorithms strongly depends on the magnitude

of auto-correlation present in the image matching systems. Transitive algorithms have

15

exhaustive equivalent accuracy and have been found to be faster than the existing

fast algorithms by an order of magnitude.

Although high autocorrelation, required for speed up of Transitive algorithms is

present in many image matching systems, it may not be guaranteed in general. Par-

tial Correlation Elimination algorithms may be efficiently used in such situations,

because these algorithms do not require high autocorrelation. However, we find that,

in image matching systems exhibiting strong autocorrelation, Transitive algorithms

are more efficient.

Partial Correlation Elimination (PCE) algorithms (Mahmood and Khan, 2007a, 2011)

are based on monotonic formulations of correlation coefficient, and allow to partially

eliminate computations at each search location. If computed by the monotonic for-

mulations, correlation coefficient monotonically decreases from +1 towards -1 as con-

secutive pixels are processed. At a particular search location, as soon as the current

value of partial correlation becomes less than previous known maximum, remaining

computations become redundant and may be skipped without any loss of accuracy.

Two main categories of Partial Correlation Elimination algorithms are ‘Basic Mode

PCE’ and ‘Extended Mode PCE’. Basic Mode PCE is more efficient for small sized

templates, while Extended Mode is more efficient for medium and large sized tem-

plates. For small and medium sized templates, initialization using two-stage template

matching approach has been found effective, while for the larger sized templates, ini-

tialization using coarse-to-fine scheme has been found faster. PCE algorithms have

been compared with existing fast exhaustive techniques, including FFTW3 based fre-

quency domain implementation and (Mattoccia et al., 2008b) techniques. For the

commonly used template sizes, partial correlation elimination algorithms have out-

performed all existing algorithms by a significant margin.

Other than these contributions, two additional research directions have also been

explored. First research direction is the use of elimination strategies to speed up

applications other than fast image match measure computations. To this end compu-

tation elimination algorithms have been developed to speed up the detection phase of

AdaBoost based edge corner detector (Mahmood, 2007; Mahmood and Khan, 2009).

In the second research direction, we have explored correlation coefficient based block

16

motion estimation and motion compensation by first order linear estimation in video

encoders. We show that if block motion estimation is performed by maximization of

correlation coefficient and motion compensation is done by linear parameter estima-

tion, then the ratio of variance of the original signal to the variance of the residue

signal, often known as gain, will be maximum resulting in minimization of entropy of

the residue signal (Mahmood et al., 2007). No such guarantee has been provided in

traditionally used schemes using minimization of SAD as motion estimation criteria

and simple difference based motion compensation. The proposed encoding scheme has

also been verified by experimentation and comparison with the traditional encoding

scheme.

The contributions of this thesis are introduced in more detail in the following sections:

1.1 Our Contributions

Main contribution of this thesis is extending the computation elimination algorithms

for correlation based fast template matching. Previously these algorithms were only

well known for simple image match measures such as SAD and SSD. Two different

types of elimination algorithms include bound based and monotonic growth based

algorithms. In this thesis, we have extended both types of computation elimination

algorithms for correlation based fast template matching.

For the implementation of bound based computation elimination algorithms for cor-

relation based template matching, we have derived novel bounds on correlation based

measures, which we have named as transitive bounds. By using transitive bounds we

have developed different types of fast template matching algorithms. We have named

these algorithms as ‘Transitive Elimination Algorithms’.

For the implementation of growth based computation elimination algorithms for cor-

relation coefficient based fast template matching, we have proposed a monotonic

decreasing formulation of correlation coefficient. While computing correlation coeffi-

cient by this monotonic formulation, as soon as partial value of correlation decreases

below an already known maximum, remaining computations may be skipped without

17

any loss of accuracy.

We have also extended the idea of monotonic growth based computation elimination

for AdaBoost based object detectors. We reduce computations in AdaBoost based

object detectors in two ways, first by making the computations monotonic and early

terminating computations, second by early non maxima suppression. Both of these

techniques are generic and may be applied to some other object detectors as well.

In the template matching problems, we have focused on correlation coefficient as a

match measure. Our choice is motivated by the fact that it is more robust than

other match measures including SAD, SSD, cross-correlation and Normalized Cross

Correlation (NCC). As a result of our proposed algorithms, correlation coefficient

based template matching has become significantly faster.

In video encoders, block matching for temporal redundancy reduction is also essen-

tially a template matching problem. Due to high computational cost of correlation

coefficient, it has not been consider a viable option for block motion estimation. As

a result of our algorithms, correlation coefficient may be used in video encoders.

We have also explored the benefits of using correlation coefficient as a similarity

measure in block matching algorithms. We find that if block matching is done by

maximization of correlation coefficient and motion compensation is done by first order

linear estimation, the variance of residue signal is guaranteed to be less than the

residue generated by minimization of SAD and simple difference. In most of the

cases, smaller variance means smaller entropy and less number of bits to encode a

video signal. The benefit of using correlation coefficient in video encoding becomes

even more pronounced if intensity and contrast variations between consecutive video

frames are significant.

A visual outline of the complete thesis is shown in block diagram in Figure 1.1. The

contributions regarding fast template matching are arranged in four chapters, from

Chapter 4 to Chapter 7. Use of early termination in AdaBoost based object detectors

is discussed in Chapter 8 and video encoding using correlation coefficient as a match

measure is discussed in Chapter 9. In the following subsections, we introduce each

contribution separately. Each of these subsections corresponds to a full chapter in

18

Core Contributions:Correlation based Fast Template Matching

Complete Elimination Algorithms

Partial Elimination Algorithms

Chapter 1 Additional Contributions

Elimination Algorithms for Fast Object

Detection

Video Coding with Linear

Compensation

Transitive Bounds

Euclidean Distance Based Bounds

Angular Distance Based Bounds

Chapter 4

Exploiting Inter-Reference Autocorrelation

Exploiting Intra-Reference Autocorrelation

Exploiting Inter-Template Autocorrelation

Chapter 5

Transitive Elimination Algorithms

Chapter 2Review of Commonly Used Image Match Measures

Chapter 3

Computation of Commonly Used Image Match Measures

Basic Monotonic Formulation

Basic Mode PCE

Two-Stage Basic Mode PCE

Chapter 6

Extended Monotonic Formulation

Extended Mode PCE

Two-Stage Extended Mode PCE

Chapter 7

Block Based Motion Compensation

Maximization of Gain by Maximization of

Correlation Coefficient

Video Coding with Linear Compensation

(VCLC)

Chapter 9

Early Termination Algorithm

Early Non-Maxima Suppression Algorithm

Chapter 8

Figure 1.1: Organization of full thesis: Related work is organized in Chapters 2 and 3.Chapters 4 and 5 are regarding Transitive Elimination Algorithms (TEA). Chapters6 and 7 are about Partial Correlation Elimination (PCE) algorithms. Chapters 8 and9 contain the additional contributions.

19

the thesis.

1.1.1 Transitive Bounds on Correlation Based Image Match

Measures

Complete elimination algorithms for SAD and SSD based image match measures have

been well investigated. However we find that these algorithms may not be easily ex-

tended for correlation based image matching. This is because of the fact that, getting

complete elimination requires that effective bounds on correlation must be known.

The effectiveness of bounds may be defined in terms of tightness and computational

cost. The bounds must be tight enough to yield computation elimination and must

have low computational cost. The exiting bounds on correlation as derived from

Cauchy Schwartz inequality are not tight, while the bounds derived by Mattoccia

et al. (2008b) have high computational cost.

In this thesis we have derived transitive bounds on correlation based measures and

we show that these bounds may be efficiently computed with low computational

overhead. We also find operating conditions under which transitive bounds become

tight enough to produce significant computation elimination.

Transitive bounds are exact, without involving any type of approximation. We have

computed two different types of transitive bounds, by using Euclidean distance and

by using angular distance. We theoretically compared the tightness of first type with

the second type and found that angular distance based transitive bounds are always

tighter than Euclidean distance based bounds.

We have also analyzed the tightness characteristics of both types of transitive bounds.

We observed that these bounds become sufficiently tight under specific conditions.

We mapped these conditions on the template matching problems and successfully

used the transitive bounds for computation elimination.

20

1.1.2 Transitive Elimination Algorithms

In Transitive Elimination Algorithms, at most of the search locations, initially tran-

sitive bounds are computed. The search locations exhibiting upper transitive bound

less than currently known maximum are skipped from the search space. Correlation

computations only follow at the non-skipped search locations. If large number of

search locations is skipped, the template matching process becomes efficient.

Transitive Elimination Algorithms require a mapping of transitive bounds to the tem-

plate matching problem. In this mapping, we have addressed two main challenges:

the overhead of bound computations must be insignificant as compared to the overall

computations and bounds must be tight enough to produce significant computation

elimination. The bound computation overhead is reduced by developing efficient algo-

rithms and tightness of bounds is guaranteed by ensuring certain operating conditions

as briefly discussed in the following paragraph.

At a particular search location, computation of transitive bounds requires two bound-

ing correlations must be known. We show that tight upper transitive bound may be

guaranteed if at least one of the two known correlations is of large magnitude. We

ensured this condition by mapping one of the two known correlations on the auto-

correlation present in the template matching problems. Auto correlation may be

present in many different forms, resulting in different mappings of transitive bounds

to the template matching problem. Each mapping resulted in a different Transitive

Elimination Algorithm. In this thesis, we have developed three different Transitive

Elimination Algorithms.

1. Exploiting strong intra-reference autocorrelation (Mahmood and Khan, 2008)

Typical template matching scenario is to match one or more small template

images at all valid search locations in a big reference image. Transitive bounds

may be used to speed up template matching process if the reference image

has high local auto-correlation, which means the spatially contiguous search

locations are highly correlated with each other. We divide the reference image

into non-overlapping rectangular windows and the central block in each window

is correlated with the outside blocks in the same window that is the computation

21

of local auto-correlation.

The mapping of image blocks r1, r2 and r3 to this problem is as follows: r1

mapped to the template image, r2 mapped to the central block in each window

and r3 mapped to an outside block in the same window. Template image is only

correlated with the central block to yield one of the two known correlations. By

using this correlation and the auto-correlation as the two known correlations,

we compute transitive bounds on the correlation between outside image blocks

and the template image. If transitive bounds are sufficiently tight, many of the

outside blocks may get eliminated from the search space.

The minimum possible size of reference image window is 3 × 3 pixels, which

means one central block (r2) have eight surrounding neighboring blocks (r3).

Template image (r1) is matched only with the central block and transitive

bounds are computed for the eight outside blocks. If auto-correlation between

the central block and each of the outside block is sufficiently high, each of the

eight out-side blocks may get eliminated. That means ratio of work done to

the total work may be as low as 1:9, ignoring the overhead required to compute

auto-correlation. To compute local auto-correlation, we have formulated a very

efficient algorithm which has reduced this overhead to a negligibly small value.

As the widow size increases, the total number of central blocks in the reference

image decreases, causing a decrease in the number of matches with the central

blocks. For example, for a window size of 5×5 pixels, ratio of work done to the

total work may reduce to 1:25, and if window size is increased to 7 × 7 pixels,

the ratio may reduce to 1:49. The speed up over exhaustive spatial domain may

be 9 times, 25 times, or 49 times for 3× 3, 5× 5, and 7× 7 pixels window sizes

respectively.

Although larger window sizes yield more speed up, window size may not be in-

creased to an arbitrarily big size. By increasing window size the auto-correlation

between central block and outside blocks decreases, which may result in loose

transitive bound. If the upper transitive bound turns out to be larger than

the current known maximum, the corresponding block will not get eliminated

and correlation will be computed on this block. Increasing the window size may

cause increase in the number of un-eliminated blocks, resulting in increase in the

22

computational cost. Thus the window size parameter is quite critical to obtain

maximum benefit of transitive bounds. We have investigated this parameter in

detail and proposed formulation for automatic computation of the window size

parameter.

2. Exploiting strong inter-reference auto-correlation (Mahmood and Khan, 2010)

Many template matching applications such as tracking an object in a surveil-

lance video, checking for missing components on a PCB production line or object

inspection over conveyor belts, require one template image to be correlated with

a set of reference frames. In such an application, the reference frames are often

highly correlated with each other. Therefore, inter-reference auto-correlation

may be used to speed up the template matching process.

We have developed a highly efficient algorithm for the computation of inter-

reference auto-correlation, which has reduced its computational cost to a neg-

ligibly small value. Based on that algorithm, we compute auto-correlation of

all reference frames in the set, with a specific frame which may be temporally

central frame. The template image is also correlated at all search locations

in the central frame. The two known correlations in this case are the correla-

tion between central frame and the other frames in the set, and the correlation

between template image and the central frame. By using these two known cor-

relations, transitive bounds on each of the other reference frames in the set,

may be computed and used to speed up the template matching process.

3. Exploiting strong inter-template autocorrelation (Mahmood and Khan, 2007b)

Certain applications require a set of template images to be correlated with a

single reference image, for example, matching an aerial video with a satellite

image or exhaustive rotation-scale invariant template matching. In such cases,

if the set of templates has high autocorrelation, correlation of one template

with the reference image yields tight bounds upon the correlation of all other

templates within the set.

Correlation of a central template with other templates in the set yield inter tem-

plate auto-correlation and correlation of the central template with the reference

image yields one of the two known correlations. Correlation of other templates

23

with the same reference image may be made faster by computing the transitive

bounds. Computational cost of other templates with the reference image may

be significantly reduced, depending on the inter template autocorrelation.

The transitive elimination algorithms are implemented in C++ and compared with

current known efficient algorithms including Bounded Partial Correlation (Di Stefano

et al., 2005), Enhanced Bounded Correlation (Mattoccia et al., 2008b), fast algorithms

for SAD (Li and Salari, 1995), (Montrucchio and Quaglia, 2005), FFT based frequency

domain implementation (William et al., 2007) and an efficient spatial domain imple-

mentation as described in (Lewis, 1995). Experiments are performed on a variety of

real image datasets. While the exact speed up of the proposed algorithms varies from

experiment to experiment, we have observed speed up ranging from multiple times

to more than an order of magnitude.

1.1.3 Basic Mode Partial Correlation Elimination Algorithm

The performance of transitive elimination algorithms, as discussed in the previous

subsections, strongly depends upon the magnitude of auto-correlation found in an

image matching application. High auto-correlation is present in many applications

but may not be guaranteed to be present in every scenario. Therefore, the need for

more generic algorithms is satisfied by development of partial correlation elimina-

tion algorithms. These algorithms may be used for correlation coefficient based fast

template matching in applications which do not exhibit high auto-correlation, while

transitive elimination algorithms still remain faster if high magnitude auto-correlation

is present.

In partial elimination algorithms, a portion of computations is mandatory to be done

at each search location before that location may be skipped from computations. Par-

tial elimination algorithms has been well investigated for SAD and SSD based distance

measures, while for correlation coefficient based image matching, only algorithms pre-

sented by Di Stefano et al. (2005) and Mattoccia et al. (2008b) may be found in lit-

erature. This is because of the fact that when SAD or SSD is computed between two

24

0 12 24 36 48 60 72 84 96 108 120 132 144−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Processed Pixels

Gro

wth

of P

artia

l Sim

ilarit

y

Threshold

Non−Monotonic

Monotonic

Figure 1.2: Growth of correlation coefficient in its traditional form (blue) and mono-tonic form (red). The curves show intermediate value at each of 144 pixel locationsfor a pair of 12 × 12 pixels blocks. Both formulations reach the same final value ofρ =-0.0647. Computations may be skipped when the partial sum in the monotonicform becomes lower than the threshold.

image blocks and corresponding pixels are processed, the partial value of distance in-

creases monotonically. Since the best match search location is defined as the location

exhibiting minimum value of distance, at a particular search location, as soon as the

current value of distance increases than previous known minimum, further compu-

tations become redundant and may be skipped without any loss of accuracy. Thus,

partial elimination algorithm required two basic properties in the match measure,

the first is monotonic growth pattern and the second is the best match definition by

minimum distance location. Unfortunately, both of these properties are missing for

the case of correlation coefficient when computed by traditional formulations. Corre-

lation coefficient does not grow monotonically and the best match is defined as the

location exhibiting maximum value of correlation coefficient.

25

Due to the two unfavorable properties, correlation has been criticized by many re-

searchers as not being capable of partial computation elimination (Brown, 1992; Zil-

tova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a). How-

ever, we observe that if correlation coefficient is computed by using the normalized

Euclidean distance formulation, it turns out to be monotonic decreasing measure. Al-

though relationship between correlation and the normalized Euclidean distance has

long been known (Rodgers and Nicewander, 1988), it has never been realized as a

means of fast computation of correlation coefficient. We, for the first time proposed

in (Mahmood and Khan, 2007a) a partial elimination algorithm for correlation coef-

ficient that outperformed all of the existing fast exhaustive-equivalent algorithms by

a significant margin for small and medium sized templates.

If correlation coefficient is computed using our proposed formulation, the similarity

starts from +1 at the first pixel of a block and monotonically decreases to the final

value of correlation coefficient till the end of the computations (Figure 1.2). Any

intermediate value of similarity is always larger than (or equal to) the final value. The

speed up occurs because at any point during the computation, if similarity happens

to be less than previous known maximum (or an initial threshold), the remaining

computations become redundant and may be skipped without any loss of accuracy. As

the total amount of skipped computations increases, the template matching process

accelerates accordingly.

In PCE algorithm, the amount of elimination depends upon the magnitude and the

location of the current known maximum. High magnitude maximum found at the

start of the search process may significantly increase computation elimination and

hence reduce the execution time. For this purpose, we have developed an intelligent

re-arrangement of PCE computations, conceptually similar to two-stage template

matching proposed by Vanderbrug and Rosenfeld (1977). In the first stage, only a

small portion of the template is matched at all search locations using PCE algorithm.

Based on the partial result, complete correlation coefficient is computed at the best

match location only, which is used as initial threshold in the second stage. By using

this strategy, we may quickly find a high threshold at no additional computational

cost. This scheme is effective for small and medium sized templates, for these sizes

26

coarse-to-fine schemes often fail to provide a high initialization threshold. PCE algo-

rithm with two-stages is exact, having exhaustive-equivalent accuracy. In contrast,

the existing two-stage algorithm for normalized cross-correlation (NCC) proposed

by Goshtasby et al. (1984) is approximate, with a non-zero probability of missing the

NCC maximum.

1.1.4 Extended Mode Partial Correlation Elimination Algo-

rithm

Basic Mode PCE algorithm, as discussed in the last section, is based on a monotonic

formulation of correlation coefficient. We have analyzed different types of overheads

associated with this monotonic form and found that this form is efficient for speeding

up small and medium sized templates. For large sized templates, the overhead of

the monotonic form may erode some of the computational advantage obtained by

elimination.

To further improve the computational efficiency, we have expanded the simple mono-

tonic form of correlation coefficient and separated the pre-computable terms from the

run-time computable terms. The resulting form of correlation coefficient, we found,

is still monotonic but more complex than the original simple form. However, we find

that full evaluation of the complex form is only required if the elimination test has to

be performed, otherwise the computations may proceed by just computing cross cor-

relation. The elimination test consists of a comparison of current value of similarity

with a previous known maximum value and requires current value of similarity must

be known.

The PCE algorithm based upon the complex monotonic form is named as Extended

Mode PCE algorithm, which is much faster on large sized templates. In order to

reduce the total number of full evaluations of the complex form, we have developed a

strategy to find the number of elimination tests that must be performed and suitable

test indices while matching two image blocks as well. This strategy is based upon

the current known value of maximum and the downward slope of the monotonic

decreasing curve. We observe that the downward slope of the monotonic decreasing

27

curve is on the average linear and a safe value of current known maximum is used for

the computation of total number of tests to be performed and their locations as well.

For large sized templates, coarse to fine scheme to find a high initial threshold at

the start of the search process, at a small computational overhead has been used.

Extended Mode PCE with two-stage approach has also been implemented, however

we observe that in many cases, coarse to fine scheme along with PCE algorithm,

provides the fastest image matching for large sized templates. Therefore we use both

of the schemes side by side. If coarse to fine scheme successfully found high maximum,

following is Extended Mode PCE, while if coarse to fine scheme failed, then two stage

Extended Mode PCE will be used for fast match measure computation.

The PCE algorithms are compared with the current known fast exhaustive equiva-

lent accuracy algorithms, including a frequency domain sequential implementation

of FFT (William et al., 2007), an optimized, adaptive and parallel implementa-

tion FFTW3 (Frigo and Johnson, 2005), a very fast spatial domain implementation

ZEBC (Mattoccia et al., 2008b), and with a spatial domain efficient exhaustive im-

plementation (Pratt, 2007). The comparisons are done over a wide variety of datasets

and on 4× 4 to 128× 128 pixels template sizes. Although the exact speed up is data

dependent, in many cases PCE algorithms have been found to be faster by more than

an order of magnitude over the other algorithms under consideration.

1.1.5 Elimination Algorithms for Fast Object Detection

Bound based computation elimination algorithms have been traditionally used to

speed up image matching applications only. However, we observe that similar schemes

may also be devised to speed up other applications in the fields of Image Processing

and Computer Vision. Some of the obvious applications which may benefit from

elimination schemes are object detectors and edge-corner detectors. As an example,

we consider AdaBoost based object detector (P.Viola and Jones, 2001, 2004), in

which the detector response is the sum of positive weights of a subset of the weak

learners in an ensemble. Using ideas of elimination algorithms, the computation of

this summation may be terminated well before completion, if established that current

28

location cannot exceed a detection threshold.

In the face detector proposed by P.Viola and Jones (2001, 2004), an early rejec-

tion scheme has also been implemented in the form of a cascade of ensembles. The

ensemble at the start of the cascade checks if some of the essential facial features,

for example eyes are missing then the current location may be discarded as being a

non-face. However, such schemes are only possible if the geometric patterns of the

object under consideration always remain fixed. In case of edge corner detectors, no

aligned geometric patterns may be expected, therefore the cascade based schemes

may no longer remain applicable. The computation elimination schemes proposed in

this thesis, for fast edge corner detection by AdaBoost based detector, are generic

and may be applied to any type of objects. In this regard, we have developed two

types of elimination algorithms, the basic early termination algorithm and early non-

maxima suppression algorithm. Both of these algorithms are briefly introduced in

the following paragraphs.

In the basic form of early termination algorithm for fast edge-corner detection, each

candidate location is initialized with the total weight of the trained ensemble. If

a weak learner classified the current location as a non-object, the weight of that

learner is subtracted from the current total weight. As more learners are processed,

the weight of the candidate location monotonically decreases, and as soon as the

current weight becomes less than the detection threshold, further computations may

be skipped without any loss of accuracy.

In order to suppress multiple responses to the same object, only local maximum in

each locality has to be retained, while the local non-maxima candidates have to be

suppressed to zero through the process of Non-Maxima-Suppression (NMS). We re-

duce the computations at local-non-maxima candidate locations by developing Early

Non-Maxima Suppression (ENMS) algorithm. In ENMS algorithm, we partially com-

pute the detector response at all candidate locations. In each local NMS window, we

choose the candidate location with the best partial result, and compute the final de-

tector response at that location. If this final response is larger than the detection

threshold, then for the remaining candidate locations in that NMS window, the early

termination threshold is raised to the final value of the local maximum. That is, in

29

a specific NMS window, a candidate location will be discarded as soon as the detec-

tor response falls below the local maximum or global detection threshold, whichever

is larger. ENMS algorithm is helpful in reducing redundant computations at non-

maxima candidate locations.

The proposed partial computation elimination algorithms are incorporated within our

previous implementation of AdaBoost based edge-corner detector (Mahmood, 2007).

The quality of the detected edge-corners has remained exactly the same, while the

speed up over the original algorithm is more than an order of magnitude. We have also

compared the quality and speed up of the edge-corners detected by Adaboost detector

with three other detectors including KLT detector (Shi and Tomasi, 1994), Harris

detector (Harris and Stephens, 1988) and Xiao’s detector (Xiao and Shah, 2003). We

find that the edge-corners detected by AdaBoost detector are of comparable quality

as KLT, Harris and Xiao detectors while the execution time speed up is up to 4.00

times faster than KLT, 17.13 times than Harris and 79.79 times than Xiao’s detector.

1.1.6 Video Coding with Linear Compensation

In traditional video encoders, in order to reduce the temporal redundancy in the video

signal, block based motion compensation techniques has often been used. In these

techniques, current video frame is divided into non-overlapping blocks. Each block

from the current frame is searched in a previous frame by using the minimization

of SAD. At the best match location, simple difference of current block and the best

match block is computed for further processing. Since SAD is not robust to intensity

and contrast variations, in the presence of such variations SAD will yield incorrect

match location, resulting in large variance of residue and lack of compression.

We observe that correlation coefficient represents the goodness of linear fit between

two image blocks (this will be discussed in more detail in Chapter 2). We have tried

to explore the benefit of this property for motion compensation in video encoders.

If block matching is done by maximization of correlation coefficient, then the best

matching block is the best linear fitting block as well. If we get the maximum cor-

relation of 1.0, the template block and the matched block are in a perfect linear

30

relationship resulting in zero residues if motion compensation is done with first or-

der linear estimation. We theoretically find that if motion estimation is done by

maximization of correlation coefficient and motion compensation is done by linear

estimation, the variance of the motion compensated signal will always be less than

the variance of simple difference signal used in traditional encoding schemes. The

overhead of this approach is one extra parameter, per block, to be encoded which

may require a customized decoder.

The use of correlation coefficient for block motion estimation may be criticized for its

high computational complexity. However, we have implemented the partial correla-

tion elimination algorithm for block motion estimation and compared the execution

time of SAD based motion estimation using Successive Elimination Algorithm (Li and

Salari, 1995) and Partial Distortion Elimination (Montrucchio and Quaglia, 2005) op-

timizations. We find that the execution time of correlation based block matching is

comparable to the execution time of optimized SAD. This shows that the use of corre-

lation coefficient as block motion estimator is a viable option and may be considered

if the resulting compression is larger than that obtained by traditional SAD based

encoding schemes.

1.2 Organization of Rest of the Thesis

Visual organization of the thesis is shown in Figure 1.1. In this figure, the flow of

concepts and dependencies between different chapters is shown by arrows. A brief

overview of the rest of the thesis is as follows:

• Theoretical properties of various commonly used image match measures are

discussed in Chapter 2. Theoretical relationships between different match mea-

sures have also been explored.

• Chapter 3 contains a review of the existing state of the art image matching

algorithms. Positioning of our core contributions within the existing work is

also explained.

• Chapter 4 contains theoretical aspects of transitive bounds on correlation.

31

• Chapter 5 contains Transitive Elimination Algorithms, along with experiments

and results.

• Chapter 6 is about Basic Mode Partial Correlation Elimination algorithm.

• Chapter 7 is about Extended Mode Partial Correlation Elimination algorithm.

• Chapter 8 contains applications of elimination schemes for fast object detection.

Early termination and Early Non-Maxima Suppression (ENMS) algorithms are

discussed in the perspective of AdaBoost based edge-corner detector.

• Chapter 9 is about video coding with linear compensation, which is a new

correlation coefficient based video coding scheme.

• Finally the thesis is concluded in Chapter 10.

Chapter 2

A REVIEW OF THE COMMONLY USED IMAGE

MATCH MEASURES

An image match measure is a function that accepts two images, I1 and I2, as input

and maps them to a single point on the line of real numbers:

M : I1 × I2 → R, (2.1)

where M is the match measure and R is the set of real numbers.

Image match measures may compute distance or dissimilarity, as well as similarity,

closeness and proximity between the input images. Image match measures which

compute dissimilarity between the input images are known as distortion measures

or distance measures, D(·, ·). Commonly used distance measures include city block

distance measure and Euclidean distance measure. A comprehensive list of distance

measures may be found in Deza and Deza (2006). For a distance measure to be

a metric, three necessary conditions must be satisfied (Bryant, 1985; Burago et al.,

2001):

1. Non-negativity: for two given images, r and t, each of sizem×n pixels: D(r, t) ≥0, and D(r, t) = 0 if and only if r = t.

2. Symmetry: Distance should be same from r to t or from t to r : D(r, t) = D(t, r).

3. Triangular Inequality: for three given images, r, s and t, each of size m × n

pixels: D(r, t) ≤ D(r, s) +D(s, t).

Other image match measures compute similarity or proximity between the two input

images. These image match measures are sometimes referred to as image similarity

measures, S(·, ·). Common examples of image similarity measures include correla-

tion based measures and mutual information based measures. For image similarity

32

33

measures to be a valid measure, only a subset of the above cited conditions may be

applicable:

1. Maximum Similarity: for two given images, r and t, each of size m× n pixels,

S(r, t) should be maximum if r(i, j) = t(i, j) where (i, j), is pixel location.

That is, the intensity values of the two images exactly match. In some cases,

similarity may also approach maximum even if pixel intensities are not exactly

equal, rather intensities of both images are related by a perfect relationship.

For example, r(i, j) and t(i, j) may be related by the relationship : t(i, j) =

α+ βr(i, j), where α and β are constants. In this case, the similarity measured

by correlation coefficient will approach maximum value.

2. Symmetry: Similarity should be same from r to t or from t to r: S(r, t) = S(t, r).

A commonly used image similarity measure, correlation coefficient, may map two

input images to any point on the real line from -1.00 to +1.00

ρ : I1 × I2 → R, − 1.00 ≤ R ≤ +1.00, (2.2)

therefore similarity score may be positive or negative. Correlation coefficient does

not follow triangular inequality, however as we will show later, it follows a different

type of transitive inequality.

Many similarity measures may also be mapped to a distance measure using some

inverse function

N(S(r, t))→ D(r, t), (2.3)

where N(·) is an inverse mapping function, which maps a similarity score S(r, t) to

a distance score D(r, t). For example, correlation based similarity measures may be

mapped to Euclidean distance as well as angular distance based measures, as we will

discuss later in this Chapter.

Besides the classification of image match measures as distance measures or similarity

measures, other classification schemes also exist. For example, image match measures

may also be classified by the basic fields in which these were originally defined. City

Block distance and Euclidean distance measures were originally defined in geometry,

34

therefore these are sometimes referred as Geometric Measures (Cha, 2007). Those

measures which were initially defined in statistics, may be called statistical measures,

for example correlation coefficient, and those which are founded in probability the-

ory are known as probabilistic measures, for example Mutual Information. Some

measures have simultaneously been defined in more than one fields, for example, cor-

relation coefficient has been defined in Euclidean Geometry as the inner product of

two zero mean and unit magnitude vectors and in Statistics Theory as covariance of

two random variables normalized by the standard deviation of the individual vari-

ables. Therefore, in such a classification, one match measure may lie in more than

one class.

Another classification of image match measures is based on the assumed relationship

between the two images to be compared (Roche et al., 1998, 1999, 2000). Some

measures assume brightness constancy for a perfect match, that is, brightness of a real

world object remains the same in the two images to be compared. Examples of such

measures include City Block distance measure, Euclidean distance measure and cross-

correlation. Other measures assume linear relationship between two images to be

matched, for example correlation coefficient, and some assume a non-linear functional

relationship between images to be compared, for example correlation ratio. Image

match measures assuming some probabilistic properties or statistics of appearance of

an object remains the same, are probabilistic similarity measures. These measures

include Entropy, Mutual Information and Mahalanobis distance measures.

Some of the most commonly used image match measures and their relationships with

each others are discussed in more detail in the following sections:

2.1 City Block Distance Measure

Different names of City Block distance measure are Manhattan distance measure,

L-1 norm, and Sum of Absolute Differences (SAD), which is often used in Image

Processing literature. SAD assumes brightness constancy assumption between the

images to be compared. SAD is the most frequently used measure for block motion

estimation in video encoders. Validity of SAD as motion estimator may be justified

35

because consecutive video frames are captured by the same sensor at very small time

gaps therefore the brightness constancy assumption may often remain valid.

Given two image blocks r and t, each of size m×n pixels, SAD is the sum of absolute

differences at all m× n pixel locations:

Φ(r, t) =m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣, (2.4)

where | · | represents absolute function. If the brightness of t and r vary by a multi-

plicative factor or change of contrast, both of the images may be normalized to unit

magnitude before computation of SAD:

Φu(r, t) =m∑x=1

n∑y=1

∣∣∣∣∣∣ r(x, y)√∑mx=1

∑ny=1 r(x, y)2

− t(x, y)√∑mx=1

∑ny=1 t(x, y)2

∣∣∣∣∣∣, (2.5)

and if t and r vary due to addition of some constant, or change of brightness, both

of the images may be normalized to zero mean before computation of SAD:

Φz(r, t) =m∑x=1

n∑y=1

|(r(x, y)− µr)− (t(x, y)− µt)|, (2.6)

where µt and µr are mean intensity values of t and r. Combining both of these

normalizations, result in Normalized SAD (NSAD):

Φzu(r, t) =m∑x=1

n∑y=1

∣∣∣∣∣∣ r(x, y)− µr√∑mx=1

∑ny=1 (r(x, y)− µr)2

− t(x, y)− µt√∑mx=1

∑ny=1 (t(x, y)− µt)2

∣∣∣∣∣∣.(2.7)

Although few authors have mentioned the normalized forms of SAD given by Equa-

tions 2.5, 2.6 and 2.7, for example (Roma et al., 2000), the use of normalized SAD

as image match measure is infrequent. In video encoders, where SAD is often used

for block motion estimation, brightness and contrast changes are not expected, while

in image alignment and registration applications, when brightness and contrast vari-

ations are expected, correlation coefficient has often been used.

36

The computational cost of normalization render normalized versions of SAD, as given

by Equation 2.7, significantly more expensive than the simple version given by Equa-

tion 2.4. This is because of the fact that the absolute function cannot be expanded

or simplified, which results in higher cost of normalization as compared to Euclidean

distance or correlation coefficient which may be rearranged to reduce computational

cost. A review of efficient image match measure computation techniques is given in

Chapter 3.

2.2 Euclidian Distance Measure

Euclidean distance is based on the fact that the shortest distance between two points

(in Euclidean space) is a straight line which may be computed by using the Pythago-

ras Theorem. The basic concept of Euclidean distance seamlessly extends from two

dimensional Euclidean space to the higher dimensional Euclidean spaces.

In order to define Euclidean distance between two image blocks r and t, both images

must be considered as points in <m×n. Euclidean distance, ∆(r, t), between r and t

is given by:

∆(r, t) =

√√√√ m∑x=1

n∑y=1

(r(x, y)− t(x, y)

)2, (2.8)

∆(r, t) is also known as Euclidean Norm or L2 norm. During image matching, we

search for a minimum value of ∆(r, t). Since square-root function in Equation 2.8

does not affect the relative order of values, therefore it may be removed to reduce

the computational cost. The resulting measure, which is ∆2, is often called Sum of

Squared Differences (SSD):

SSD =m∑x=1

n∑y=1

(r(x, y)− t(x, y)

)2, (2.9)

In image processing, SSD has often been used instead of Euclidean distance, because

of its reduced complexity.

The magnitude of ∆(r, t) between two points may vary depending on the units of

37

measurement. In order to make ∆(r, t) independent of the measurement units, it

may be normalized. In image processing, one image sensor may be mapping the real

world intensities to a wider range of image intensities as compared to another sensor.

In order to cater for the distortion effects produced by such variations, both images

must be normalized to unit magnitude. The unit magnitude normalized Euclidean

distance, ∆u(r, t), is given by:

∆u(r, t) =

√√√√√ m∑x=1

n∑y=1

r(x, y)√∑mx=1

∑ny=1 r(x, y)2

− t(x, y)√∑mx=1

∑ny=1 t(x, y)2

2

. (2.10)

Terms√∑m

x=1

∑ny=1 r(x, y)2 and

√∑mx=1

∑ny=1 t(x, y)2 are Euclidean norms of r and

t and represent the distance of each of the point r and t from the origin. After

division of each dimension of r and t with the distance from origin, transforms r and

t to points at unit distance from the origin, or at the surface of a unit sphere.

Most of the times, two images to be matched have different lighting conditions. One

image may be overall brighter than the other. In order to cancel the effect of light

intensity variations, images must be zero mean normalized. The zero mean normalized

Euclidean distance, ∆z(r, t) is given by:

∆z(r, t) =

√√√√ m∑x=1

n∑y=1

((r(x, y)− µr)− (t(x, y)− µt))2. (2.11)

Euclidean distance upon zero mean and then unit magnitude normalized images is

given by:

∆zu(r, t) =

√√√√√ m∑x=1

n∑y=1

r(x, y)− µr√∑mx=1

∑ny=1 (r(x, y)− µr)2

− t(x, y)− µt√∑mx=1

∑ny=1 (t(x, y)− µt)2

2

.

(2.12)

∆zu is also called Standardized Euclidean Distance. In the following Sections, we will

see that image matching by the minimization of ∆zu(r, t) is equivalent to the image

matching by the maximization of correlation coefficient.

38

(b) (a)

s

t r

ostrs 180),( <ππθ

r

s

t ),( srθ

),( tsθ

rsπ

stπ

),( srθ ),( tsθ

rsπ

ostrs 180),( =ππθ

stπ

Figure 2.1: Angular distance measure follows the triangular inequality.

2.3 Minkowski Distance Measure

In the previous Sections we have seen L1 norm and L2 norm distance measures named

as City Block distance measure and Euclidean distance measure. In Euclidean space,

higher order norms, such as L3, L4 and so on, are also valid distance measures. In

general, Lp norm is called Minkowski distance of order p, for p ≥ 1.

Considering images r and t, each of size m × n pixels, as two points in m × n di-

mensional Euclidean space, <m×n, Minkowski distance of order p may be defined

as:

Lp =

( m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣p)1/p

, for p > 1 (2.13)

Equation 2.13 holds only for p ≥ 1, for p ≤ 0, Lp no longer remains a metric. In

Equation 2.13, if p→∞, L∞ norm is called Chebyshev distance measure.

L∞ = limp→∞

( m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣p)1/p

, (2.14)

Chebyshev distance between r and t is the greatest of their differences along any

dimension:

L∞ = max1≤x≤m,1≤y≤n

|r(x, y)− t(x, y)| (2.15)

39

Chebyshev distance is also known as Chessboard distance, because in Chess, the

minimum number of moves the king needs to go from one square to another equals

Chebyshev distance between the centers of the squares, assuming that the squares

have unit side length, and the coordinate frame aligned with the board edges. Cheby-

shev distance has also been used in image processing, for example see (Li et al., 2006;

Jedrasiak and Nawrat, 2009).

2.4 Angular Distance Measure

Angular distance measure may be considered as the dissimilarity of the direction of the

two vectors. If the angle between two vectors is zero, both vectors have same direction

therefore having maximum positive association. If angle is 180o then their directions

are exactly opposite to each other which show maximum negative association. When

two vectors are orthogonal to each other, there is no association in their direction,

which may be considered as zero similarity.

Angular distance between two points in <m×n is the angle between the vectors joining

these points with the origin. The images r and t may be considered as points in <m×n

and the angular distance between the vectors joining these points with origin show the

dissimilarity between the images. Let πrt be the plane defined by the vectors r and t.

The angular distance, θ(r, t), between r and t, is the angle measured in the plane πrt.

There are two possible angles between r and t in the plane πrt, which are θ(r, t) and

360− θ(r, t). For maintaining consistency and without losing generality, the smaller

of the two angles may always be chosen as θ(r, t), so that 0 ≤ θ(r, t) ≤ 180o.

The angular distance measure is a valid metric because it satisfies the three conditions

of non-negativity, symmetry and triangular inequality (Mahmood and Khan, 2007b):

1. Angular distance is non-negative, θ(r, t) ≥ 0 and θ(r, t) = 0 if and only if r =t.

2. Angular distance is symmetric, θ(r, t) = θ(t, r).

3. Angular distance follows the triangular inequality of distance measures. For

three images, r, s, and t: θ(r, t) ≤ θ(r, s) + θ(s, t). A simple proof of this fact

40

is given in the following paragraph.

Let πrs and πst be the uniquely defined planes by vectors r, s and s, t as shown in

Figure 2.1. Let θ(πrs, πst) and 360 − θ(πrs, πst) be the magnitude of the two angles

between the planes πrs and πst. Without loss of generality, we may always select

the smaller of these two angles as θ(πrs, πst), which is bounded between 0o and 180o:

0o ≤ θ(πrs, πst) ≤ 180o.

The value of θ(r, t) depends on θ(πrs, πst). If the magnitude of θ(πrs, πst) is 180o, then

θ(r, t) is equal to θ(r, s) + θ(s, t). For all values of θ(πrs, πst) < 180o, the magnitude

of θ(r, t) will remain less than θ(r, s) + θ(s, t). Therefore

θ(r, t) ≤ θ(r, s) + θ(s, t), (2.16)

which shows that the angular distance follows the triangular inequality of distance

measures. �

If r · t represent inner product of two vectors and ||r||2 is the magnitude of vector r,

||t||2 is the magnitude of vector t, using the definition of inner product, the angular

distance θ(r, t) may be given by:

θ(r, t) = cos−1( r · t||r||2||t||2

)(2.17)

and the angular distance between two unit magnitude normalized vectors

θu(r, t) = cos−1( r

||r||2· t

||t||2). (2.18)

Magnitude normalization of a vector only changes the length of the vector while the

direction remains unchanged. Therefore angular distance between two unit length

normalized vectors will remain same as their un-normalized versions, θ(r, t) = θu(r, t).

Angular distance between zero mean normalized vectors is given by:

θz(r, t) = cos−1 (r − µr) · (t− µt)||r − µr||2||t− µt||2

(2.19)

Mean subtraction from each dimension of a vector is equivalent to translating the end

41

points of the vector. Since both vectors are translated in different directions, angular

distance between them may change as a result of zero mean normalization. In general

θz(r, t) 6= θ(r, t) and θz(r, t) 6= θu(r, t). Angular distance between zero mean and unit

magnitude normalized images is given by:

θzu(r, t) = cos−1( (r − µr)||r − µr||2

· (t− µt)||t− µt||2

)(2.20)

Since length normalization does not change the angle, therefore θz(r, t) = θzu(r, t).

θzu(r, t) may also be called as Standardized Angular Distance. Angular distance

measure is directly related with correlation based similarity measures and also with

Euclidean distance measures.

2.4.1 Relationship between Standardized Angular Distance

and Standardized Euclidean Distance

Considering two points on the surface of a unit sphere, standardized Euclidean dis-

tance is the length of a straight line joining them whereas standardized angular dis-

tance is the angle between the vectors joining those points with the center of the

sphere, which is at the origin. Standardized Euclidean distance will be zero if both

points lie on the same position and it will assume maximum value of 2, if the two

points lie exactly opposite to each other along a diagonal. For the minimum standard-

ized Euclidean distance; Standardized angular distance will also be minimum of 0o

and for the case of maximum standardized Euclidean distance, standardized angular

distance will also be maximum of 180o. The function relating Standardized Euclidean

distance with standardized angular distance may be derived from Equation 2.20

cos{θzu(r, t)} =( (r − µr)||r − µr||2

· (t− µt)||t− µt||2

)(2.21)

42

and from Equation 2.12, squaring both sides and simplifying, we get

∆2zu(r, t) = 2− 2

m∑x=1

n∑y=1

(r(x, y)− µr)(t(x, y)− µt)√∑mx=1

∑ny=1 (r(x, y)− µr)2

√∑mx=1

∑ny=1 (t(x, y)− µt)2

.

(2.22)

Substituting the value of cos θzu(r, t), the relating function is given by

∆zu(r, t) =√

2(1− cos{θzu(r, t)}), (2.23)

that shows if θzu(r, t) = 0o, then ∆zu(r, t) = 0, and if θzu(r, t) = 180o, then ∆zu(r, t) =

2.

2.5 Correlation Based Similarity Measures

In signal processing, cross-correlation or a closely related method known as Matched

Spatial Filter (MSF) has often been used to search a short duration signal within a

longer one. The main reason of using cross-correlation for signal detection is because

of the fact that, in the presence of white Gaussian noise, it is an optimal linear

operator for signal detection (Turin, 1960). Cross-correlation is computed by taking

inner product of two one dimensional signals. The image blocks r and t may also be

considered as two dimensional signals, cross-correlation between these signals is given

by inner product between them:

ψ(r, t) =m∑x=1

n∑y=1

r(x, y)t(x, y). (2.24)

In image processing, cross-correlation has often been used in its normalized form

to remove its bias towards brighter regions. Normalized Cross-Correlation (NCC)

between image blocks r and t is often defined as:

ψu(r, t) =

m∑x=1

n∑y=1

r(x, y)t(x, y)√m∑x=1

n∑y=1

r2(x, y)

√m∑x=1

n∑y=1

t2(x, y)

, (2.25)

43

NCC is robust to contrast variations, but it is not robust to the brightness variations.

A more robust measure, invariant to any linear change in the signal, is correlation

coefficient, ρ, which is cross correlation between zero mean and unit magnitude nor-

malized images:

ρ(r, t) =

m∑x=1

n∑y=1

(r(x, y)− µr)(t(x, y)− µt)√m∑x=1

n∑y=1

(r(x, y)− µr)2

√m∑x=1

n∑y=1

(t(x, y)− µt)2

, (2.26)

where µr and µt are means of r and t respectively. The formulation of correlation

coefficient may also be viewed as co-variance normalized by individual variances:

ρ(r, t) =σ2rt

σrσt(2.27)

A re-arrangement of Equation 2.26 is given by:

ρ(r, t) =

m∑x=1

n∑y=1

r(x, y)(t(x, y)− µt)√m∑x=1

n∑y=1

(r(x, y)− µr)2

√m∑x=1

n∑y=1

(t(x, y)− µt)2

, (2.28)

Further re-arrangement yields the computationally efficient form:

ρ(r, t) =

m∑x=1

n∑y=1

r(x, y)t(x, y)−mnµrµt√m∑x=1

n∑y=1

r2(x, y)−mnµ2r

√m∑x=1

n∑y=1

t2(x, y)−mnµ2t

, (2.29)

The formulations given by Equations 2.26, 2.27, 2.28 and 2.29 are equivalent and

yield the same value of correlation coefficient. However, they may vary in their

computational complexity, a topic discussed in detail in the next chapter.

44

2.5.1 Relationship between Correlation and Angular Distance

Measure

Correlation based measures are inversely related to the angular distance measures.

The inverse relationship is realized by the cosine function. Cosine function maps a

larger angle to a smaller value and a smaller angle to a larger value, within the 0o to

180o range. Maximum value of cosine function is 1.00 for an angular distance of 0o

and minimum value is -1.00 for a distance of 180o. For a distance of 90o, the value of

cosine function is 0.00.

Relationship between unit normalized angular distance, θu and the normalized cross-

correlation, ψu, may be written by the comparison of Equations 2.18 and 2.25:

θ(r, t) = cos−1(ψu(r, t)). (2.30)

Similarly, zero mean and unit variance normalized angular distance, θzu, may also be

related with correlation coefficient ρ by using Equations 2.20 and 2.26:

θzu(r, t) = cos−1(ρ(r, t)). (2.31)

Equation 2.31 may be simplified by using the relationship between cos−1 and sin−1:

cos−1(ρ(r, t)) =π

2− sin−1(ρ(r, t)), (2.32)

and expanding sin−1(ρ(r, t)) using Maclaurin series:

cos−1(ρ(r, t)) =π

2− ρ(r, t)− 1

6ρ3(r, t)− 3

40ρ5(r, t)− 5

112ρ7(r, t). (2.33)

For small magnitudes, higher powers of ρ will result in significantly smaller values.

Moreover, the coefficients of the higher powers of ρ are significantly small, therefore

ρ5(r, t) and higher power terms may be ignored without causing significant difference

in the value of the estimated angular distance. The relationship between θzu and ρ

may be written as:

θzu(r, t) ≈π

2− ρ(r, t)(1 +

1

6ρ2(r, t)). (2.34)

45

Using this relationship, for a given value of correlation coefficient, we may estimate

the corresponding angular distance in radians.

2.5.2 Relationship between Correlation and Euclidean Dis-

tance Measure

Cross-correlation also inversely relates to Euclidean distance measure. In order to get

the relationship between ∆ and ψ, we expand Equation 2.8:

∆(r, t) =

√√√√ m∑x=1

n∑y=1

r2(x, y) +m∑x=1

n∑y=1

t2(x, y)− 2m∑x=1

n∑y=1

r(x, y)t(x, y), (2.35)

Substituting the value of ψ from Equation 2.24 and considering the magnitude of

each image as Euclidean norm, Equation 2.35 gets simplified to the following form:

∆(r, t) =√

∆2(r, r) + ∆2(t, t)− 2ψ(r, t), (2.36)

which gives the relationship between cross-correlation and Euclidean distance mea-

sure. We may extend this relationship for the case of normalized cross-correlation

(Equation 2.25) and unit normalized Euclidean distance, ∆u (Equation 2.10). For

the case of unit normalized Euclidean distance, Euclidean norm turns out to be 1.00:

∆2u(r, r) = ∆2

u(t, t) = 1.00, (2.37)

therefore, from Equation 2.36:

∆u(r, t) =√

2(1− ψu(r, t)). (2.38)

Equation 2.38 relates NCC with unit normalized Euclidean distance.

Standardized Euclidean distance, ∆zu, as given by Equation 2.12 may also be related

to correlation coefficient given by Equation 2.26, by using Equation 2.38:

∆zu(r, t) =√

2(1− ρ(r, t)). (2.39)

46

or simply rearranging Equation 2.39:

ρ(r, t) = 1− 1

2∆2zu(r, t), (2.40)

which gives an alternate understanding of correlation coefficient (Rodgers and Nice-

wander, 1988).

2.5.3 Correlation Coefficient as a Measure of Strength of Lin-

ear Relationship

In statistical analysis, correlation coefficient has often been used to estimate the

strength of linear relationship between two random variables (Harnett, 1982; Snedecor

and Cochran, 1968; Montgomery and Peck, 1982; Spigel and Stephens, 1990). This

understanding may be extended towards image processing, to investigate the image

matching capabilities of correlation coefficient. We may assume the two image blocks

to be matched, r and t, as two random variables which are linearly associated. We

may further assume that t is the independent random variable and r is the dependent

random variable. Since r and t are linearly associated, we may estimate r from a

given value of t:

r(x, y) = αrt + βrtt(x, y), (2.41)

where r(x, y) is the estimate of r(x, y), αrt is y-intercept and βrt is the slope of the

regression line between r and t.

The regression analysis used to derive correlation coefficient formulation is based on

three commonly used estimation error terms, Sum of Squared Error (SSE), Sum of

Squared Total Deviation (SSTD) and the Sum of Squared Regression (SSR). The first

error term SSE is the sum of the squared estimation error at all pixels of r:

SSE =m∑x=1

n∑y=1

(r(x, y)− r(x, y))2 (2.42)

=m∑x=1

n∑y=1

(αrt + βrtt(x, y)− r(x, y))2. (2.43)

47

The second term SSTD is the sum of the squared total deviation of r(x, y) from its

mean µr, over all pixels of r:

SSTD =m∑x=1

n∑y=1

(r(x, y)− µr)2 (2.44)

= mnσ2r (2.45)

The third term SSR is the sum of squared difference between estimated values of r

and the mean of r, at all pixels:

SSR =m∑x=1

n∑y=1

(r(x, y)− µr)2 (2.46)

=m∑x=1

n∑y=1

(αrt + βrtt(x, y)− µr)2. (2.47)

The relationship between these three error terms is given by:

SSTD = SSE + SSR. (2.48)

The proof of this derivation is nontrivial, and is presented at the end of the current

section.

Using the three estimation error terms, coefficient of correlation has been defined as

square root of SSR to SSTD ratio (Montgomery and Peck, 1982):

ρ = ±√

SSR

SSTD. (2.49)

By using Equation 2.48:

ρ = ±√

SSR

SSE + SSR. (2.50)

This definition may be elaborated by considering a perfect linear relationship between

r and t, that is, there is zero estimation error: r(x, y) − r(x, y) = 0, or SSE = 0.

Therefore from Equation 2.48, SSTD = SSR, that means correlation coefficient will

evaluate to ±1. On the other hand, if the random variables r and t are independent

of each other, the slope of the regression line will be zero: βrt = 0. In this case, the

48

y-intercept will turn out to be equal to the mean of r: αrt = µr, which results in

y(x, y)− µr = 0. Therefore, in this case, SSR = 0 and total deviation of r is due to

the estimation error: SSTD = SSE. In this case, correlation coefficient will evaluate

to zero.

In the following subsections, we will first derive the expressions for optimal linear

regression parameters, αrt and βrt, and then using these parameters we will derive

the formulation for correlation coefficient. In this formulation, correlation coefficient

is defined as the ratio of co-variance of r and t to the individual standard deviations

of r and t:

ρrt =σ2rt

σrσt. (2.51)

In the end of this subsection, we will present a proof for the fact that SSTD=SSE+SSR.

Derivation of Optimal Linear Regression Parameters

For a given pair of random variables, r and t, optimal regression line parameters

αrt and βrt may be defined as those which minimize sum of squared estimation error

given by Equation 2.43. These optimal parameters may be computed by taking partial

derivatives of SSE with respect to αrt and with respect to βrt.

∂

∂αrtSSE = 2

m∑x=1

n∑y=1

(αrt + βrtt(x, y)− r(x, y)). (2.52)

and∂

∂βrtSSE = 2

m∑x=1

n∑y=1

[(αrt + βrtt(x, y)− r(x, y))t(x, y)]. (2.53)

In order to minimize error, both of the partial derivatives given by Equations 2.52

and 2.53, must be set to zero:

mnαrt + βrt

m∑x=1

n∑y=1

t(x, y) =m∑x=1

n∑y=1

r(x, y) (2.54)

49

and

αrt

m∑x=1

n∑y=1

t(x, y) + βrt

m∑x=1

∑y=1

t(x, y)2 =m∑x=1

∑y=1

r(x, y)t(x, y). (2.55)

Equations 2.54 and 2.55 are also known as normal equations (Montgomery and Peck,

1982). We may solve these equations simultaneously to get a closed form solution

for the optimal parameters αrt and βrt. Equation 2.54 may be simplified by dividing

both sides with the total number of pixels in each image, mn:

αrt + βrtµt = µr. (2.56)

Substituting the value of αrt from Equation 2.56 in Equation 2.55:

(µr − βrtµt)m∑x=1

n∑y=1

t(x, y) + βrt

m∑x=1

∑y=1

t(x, y)2 =m∑x=1

n∑y=1

r(x, y)t(x, y). (2.57)

Rearranging the terms, we get

βrt =

m∑x=1

n∑y=1

r(x, y)t(x, y)− µrm∑x=1

n∑y=1

t(x, y)

m∑x=1

n∑y=1

t(x, y)2 − µtm∑x=1

n∑y=1

t(x, y), (2.58)

which may be further simplified to the following form:

βrt =

m∑x=1

∑y=1

(r(x, y)− µr)(t(x, y)− µt)

m∑x=1

n∑y=1

(t(x, y)− µt)2

. (2.59)

In terms of variance and covariance, βrt may be written as:

βrt =σ2rt

σ2t

, (2.60)

where σ2rt is the covariance of the random variables r and t and σ2

t is the variance of

independent random variable t. Equation 2.60 shows that the slope of the regression

line between random variables r and t is given by the ratio of covariance, σ2rt to

the variance of independent random variable, σ2t . If the two random variables are

50

If images r and t are Gaussian Distributed

𝐼(𝑟, 𝑡) = −12

log �1

2𝜋𝑒(1 − 𝜌2(𝑟, 𝑡)�

𝜃𝑧𝑢(𝑟, 𝑡) = cos−1 { 𝜌(𝑟, 𝑡)} 𝜌(𝑟, 𝑡) = cos {𝜃𝑧𝑢(𝑟, 𝑡)}

𝛥𝑧𝑢 = �2 − 2𝜌(𝑟, 𝑡)

𝜌(𝑟, 𝑡) = 1 −12Δzu2

𝜌(𝑟, 𝑡), Correlation Coefficient

𝜃𝑧𝑢(𝑟, 𝑡), Zero Mean and Unit Variance Normalized Angular

Distance

𝛥𝑧𝑢(𝑟, 𝑡), Zero Mean and Unit Variance Normalized Angular

Distance

𝜂(𝑟|𝑡) = 𝜌(𝑟, 𝑡) 𝜂(𝑟|𝑡), Correlation

Ratio

If images r and t have linear relationship

𝐼(𝑟, 𝑡), Mutual Information

𝜌(𝑟, 𝑡) = �1 − 2𝜋𝑒[1−𝐼2(𝑟,𝑡)]

Figure 2.2: Relationships of zero mean and unit variance normalized Euclidean dis-tance ∆zu, zero mean and unit variance normalized angular distance θzu, correlationratio η and mutual information I with correlation coefficient ρ.

independent, their covariance will be zero and hence the slope of the regression line

will also be zero.

The value of the second regression parameter, αrt, may be found by substituting the

value of slope from Equation 2.60 in Equation 2.56:

αrt = µr − µtσ2rt

σ2t

. (2.61)

Equation 2.61 shows that y-intercept of regression line is given by µr − βrtµt. The

value of βrt as given by Equation 2.60 and the value of αrt as given by Equation

2.61, minimize the sum of squared estimation error (SSE), for a given pair of random

variables, r and t. Note that if r and t are independent random variables, the slope

of the regression line will become zero, therefore we get: βrt = 0, and y-intercept will

be equal to the mean of the dependent variable: αrt = µr. In this case, correlation

coefficient between r and t will also become zero.

51

Deriving Correlation Coefficient Formulation

The estimation error terms, SSTD and SSR may be simplified under the assumption

of linear association between r and t. In the last subsection, Minimum Mean Squared

Error (MMSE) linear regression parameters were derived. By using Equations 2.60

and 2.61, the formulation of SSR given by Equation 2.47 may further be simplified

as follows:

SSR =m∑x=1

n∑y=1

(µr − µtσ2rt

σ2t

+σ2rt

σ2t

t(x, y)− µr)2 (2.62)

=σ4rt

σ4t

m∑x=1

n∑y=1

(t(x, y)− µt)2 (2.63)

= mnσ4rt

σ2t

(2.64)

Substituting the value of SSTD from Equation 2.45 and the value of SSR from Equa-

tion 2.64 in correlation coefficient definition given by Equation 2.49,

ρrt =

√mnσ4

rt

mnσ2rσ

2t

, (2.65)

which simplifies to

ρrt =σ2rt

σrσt(2.66)

The formulation of correlation coefficient as given by Equation 2.66 is the basic for-

mulation of correlation coefficient. All other formulations of correlation coefficient,

found in image processing literature, may be derived from this basic formulation.

The formulation of linear regression parameters as used for simplification of SSR term

is based on minimization of least square error criteria. However, in the presence of

outliers in the data, least square fit may not be the best way of estimation of these

parameters. Therefore, estimation of linear association by using correlation coefficient

may suffer due to the presence of outliers, such as salt and pepper noise. In such cases,

appropriate noise removal procedure such as median filtering may be recommended

before starting the image matching process.

52

Proving that SSTD=SSE+SSR

We have used this relationship in the derivation of correlation coefficient formula-

tion without actually proving it. In this subsection, we will present a proof of this

relationship. Expanding the SSTD definition given by Equation 2.44:

SSTD =m∑x=1

n∑y=1

(r(x, y)− r(x, y) + r(x, y)− µr)2

=m∑x=1

n∑y=1

(r(x, y)− r(x, y))2

+m∑x=1

n∑y=1

(r(x, y)− µr)2

− 2m∑x=1

n∑y=1

(r(x, y)− r(x, y))(r(x, y)− µr),

By using the definitions of SSE and SSR:

SSTD = SSE + SSR− 2SPT, (2.67)

where Sum of Product Terms (SPT) is given by

SPT =m∑x=1

n∑y=1

(r(x, y)− r(x, y))(r(x, y)− µr) (2.68)

The desired relationship may be proved if SPT term is equal to zero. For this purpose,

now we expand SPT term:

SPT =m∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y))(αrt + βrtt(x, y)− µr) (2.69)

Expanding and taking constant terms out of summations:

SPT = αrt

m∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y))+βrt

m∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y))t(x, y)

53

−µrm∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y)). (2.70)

From Equation 2.52, we get:

m∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y)) = 0. (2.71)

Therefore first and third terms in Equation 2.70 become zero. From Equation 2.53:

m∑x=1

n∑y=1

(r(x, y)− αrt − βrtt(x, y))t(x, y) = 0. (2.72)

Which makes the second term in Equation 2.70 zero, and proves that the full expres-

sion of SPT is zero:

SPT =m∑x=1

n∑y=1

(r(x, y)− r(x, y))(r(x, y)− µr) = 0, (2.73)

which completes the proof of the fact that SSTD is the sum of SSE and SSR terms:

SSTD = SSE + SSR (2.74)

Thus we may conclude that the similarity score produced by correlation coefficient

is actually an estimate of the strength of linear relationship between the two images

to be matched. Therefore, a perfect score will only be produced if the relationship

between the two images is perfectly linear. As the image to image relationship will

deviate away from linearity, correlation coefficient may not remain the best measure

of the strength of association. In such cases, other measures such as correlation ratio

may be used as the preferred similarity measure.

2.6 Correlation Ratio

As we have discussed in the last section, the similarity score produced by correla-

tion coefficient represents the strength of linear association between two images. If

54

association between two images is not linear, then the score produced with correla-

tion coefficient may actually be far less than the actual strength of association. For

example, consider that the association between two signals is given by the cosine

function, a(x) = cos(b(x)), where x is the sample index and there are n samples:

1 ≤ x ≤ n. Considering the values of b(x) from 0 to 2π and corresponding values of

a(x) = cos(b(x)) vary from +1 to -1 at discrete intervals, the means of the two signals

are µa = 0, µb = π. From Equation 2.26:

ρ(a, b) =

n∑x=1

(sin(b(x)))(b(x)− π)√n∑x=1

sin2(b(x))

√n∑x=1

(b(x)− π)2

= 0, (2.75)

which shows that there is no linear relationship between the signals. We may also

observe that standardized Euclidean distance ∆zu and angular distance θzu also

measure the strength of linear relationship. For the case of functional relationship

a(x) = cos(b(x)), standardized Euclidean distance given by Equation 2.12 is close to

maximum value of 2:

∆zu(a, b) =√

2 = 1.414, (2.76)

while angular distance, θzu(a, b) is 900, which means both vectors are found to be

orthogonal to each other and no similarity in direction of the two vectors has been

found. In general any functional association whose area under the curve is zero,

will yield ρ(a, b) = 0.00, ∆zu(a, b) =√

2 and θzu(a, b) = 90o, despite the existence

of a perfect functional relationship (Rietz, 1919). Thus angular distance, Euclidean

distance and correlation based measures expect a linear association between the two

images to be matched. In case of a strong functional relationship, all of these measures

fail to yield correct similarity score.

If similarity between two images is defined as the strength of association between

them, then for the case of functional associations, correlation ratio (η) is a preferred

similarity measure (Roche et al., 1998, 1999, 2000). Originally correlation ratio has

been used as a tool of variance analysis (Fisher, 1925). It is a measure of the asso-

ciation between dispersion within individual categories and the dispersion across a

55

whole sample of observations. Suppose a random variable r has a set of n observa-

tions, with variance σ2r and mean µr. Suppose the set of observations may be divided

into k categories. Let µi, i ∈ {1, 2, ...k}, be the mean of each category. The variance

of means of each category is given by:

σ2µ =

∑ki=1 ni(µi − µr)2

n, (2.77)

where ni is the number of observations in ith category. Correlation ratio may be

defined as the ratio of the standard deviation of k category means to the overall

standard deviation of n observations (Rietz, 1919):

η2(r|t) =σ2µi

σ2r

(2.78)

and alternatively in the notation of expectation, we may write (Roche et al., 2000):

η2(r|t) =V ar(E[r|k])

V ar(r), (2.79)

where V ar(·) is variance and E[r|k] is the conditional expectation that represents the

category means.

In order to apply correlation ratio for image matching, we may assume that each image

pixel may contain any of the total k intensity levels. The image r may be considered

as one set of observations and the image t as another set of observations. We may

divide r into k categories depending on the corresponding values in t. For example,

all pixels locations in r(x, y) which correspond to a category value of t(x, y) = 0 is one

category in r. Similarly, all pixel positions in r(x, y) which correspond to category

value t(x, y) = 1 is the second category of r. Since the pixel values in a log2(k) bit

image, t, may vary from 0 to k, there may be k possible categories in r. The category

mean is given by:

µi(r|t) =

∑mx=1

∑ny=1{r(x, y)|t(x, y) = i}

ni, i ∈ {0, 1, 2...k} (2.80)

56

where ni = count(t(x, y) = i) and correlation ratio is given by

η2(r|t) =

∑ki=0 ni(µi(r|t)− µr)2∑m

x=1

∑ny=1 (r(x, y)− µr)2

. (2.81)

Similarly we may also compute η(t|r) which require categories to be made in t based

on the corresponding values from r(x, y) = i, ni = count(r(x, y) = i),

µi(t|r) =

∑mx=1

∑ny=1{t(x, y)|r(x, y) = i}

ni, i ∈ {0, 1, 2...k}, (2.82)

and

η2(t|r) =

∑ki=0 ni(µi(t|r)− µt)2∑m

x=1

∑ny=1 (t(x, y)− µt)2

. (2.83)

Since the strength of functional regression from r to t may be quite different from

that of the regression from t to r, therefore correlation ratio no longer remains a

symmetric similarity measure, that is, in general ηrt(t|r) 6= ηrt(r|t). In the following

subsection, we will show how the definition of correlation ratio as given by Equations

2.78 and 2.81 is related with the strength of functional association.

2.6.1 Derivation of Correlation Ratio Formulation from Func-

tional Regression

Correlation ratio may be considered as a generalization of correlation coefficient,

because correlation ratio measures the strength of functional association, while cor-

relation coefficient just measures the strength of linear association.

Assuming t to be the independent random variable and r to be the dependent random

variable, and assuming the relationship between r(x, y) and t(x, y) to be a determin-

istic function f(·). For a given a value of t(x, y), we may estimate the value of r(x, y)

by using the function f(·):r(x, y) = f(t(x, y)), (2.84)

where r(x, y) is the estimate of r(x, y). The estimation error terms, Sum of Squared

Estimation Error (SSE), Sum of Squared Total Deviation (SSTD), and Sum of

57

Squared Regression (SSR), may also be defined for the case of functional regression.

Since we have to estimate values of r, from given values of t, the direction of regression

may be represented in the error terms by using the notation: SSE(r|t) and SSR(r|t).Since SSTD have no dependence on t, therefore it will be denoted by SSTD(r).

SSE(r|t) may now be defined as

SSE(r|t) =m∑x=1

n∑y=1

(r(x, y)− r(x, y))2 (2.85)

=m∑x=1

n∑y=1

(f(t(x, y))− r(x, y))2. (2.86)

The term SSTD(r) is the sum of squared total deviation of r(x, y) from its mean µr,

over all pixels of r:

SSTD(r) =m∑x=1

n∑y=1

(r(x, y)− µr)2 (2.87)

= mnσ2r (2.88)

The term SSR(r|t), is the sum of squared difference between estimated values of r

and the mean of r, at all pixels:

SSR(r|t) =m∑x=1

n∑y=1

(r(x, y)− µr)2 (2.89)

=m∑x=1

n∑y=1

(f(t(x, y))− µr)2. (2.90)

If each of the images r and t has log2(k) number of bits per pixel, the number of

discrete intensity levels any pixel in these images may have are k. In the template

image t, suppose each intensity level i occurs ni times. SSR(r|t) formulation given

by Equation 2.90 may be written in the form of intensity levels as follows:

SSR(r|t) =k∑i=0

ni(f(i)− µr)2 (2.91)

58

Since correlation ratio measures the strength of functional regression, it may also

be defined parallel to the definition of coefficient of correlation as a measure of the

strength of linear regression:

η(r|t) = ±

√SSR(r|t)SSTD(r)

. (2.92)

Squaring both sides we get:

η2(r|t) =SSR(r|t)SSTD(r)

. (2.93)

Substituting the values of SSR from Equation 2.91 and SSTD from Equation 2.87

η2(r|t) =

∑ki=0 ni(f(i)− µr)2∑m

x=1

∑ny=1(r(x, y)− µr)2

. (2.94)

Assuming perfect functional association between r and t, then for ith category in r

corresponding to the intensity level i in t, the category mean will be f(i):

µi(r|t) =

∑mx=1

∑ny=1{f(i)}ni

, i ∈ {0, 1, 2...k} (2.95)

or

µi(r|t) = f(i), i ∈ {0, 1, 2...k}. (2.96)

which may be substituted in the correlation ratio formulation given by Equation 2.94

η2(r|t) =

∑ki=0 ni(µi − µr)2∑m

x=1

∑ny=1(r(x, y)− µr)2

. (2.97)

By using Equation 2.77, the numerator is the variance of the category means and

denominator is the variance of r:

η2(r|t) =σ2µi

σ2r

. (2.98)

Hence the conclusion is, correlation ratio measures the strength of functional relation-

ship between r and t. For a perfect functional relationship, correlation ratio turns

59

out to be exactly 1.00, and for a weak functional relationship, it is close to zero. For

the case of perfect functional relationship, each category will have only one value,

equal to its mean. Therefore, the variance of category means will become equal to

the overall variance, or SSR(r|t) = SSTD(r). In case of no functional association

between two images, all categories will have same mean, therefore the variance of

category means will become zero or alternatively SSR(r|t) = 0, therefore correlation

ratio will also become zero.

2.6.2 Relationship between Correlation Ratio and Correla-

tion Coefficient

If the functional relationship between two images is linear, the formulation of correla-

tion ratio between these images converges to the formulation of correlation coefficient.

Let the linear relationship between images r and t is given by: r(x, y) = α+ βt(x, y).

The formulation of category means in r, µi(r|t) get simplified to the following expres-

sion:

µi(r|t) =

∑mx=1

∑ny=1{α + βi}ni

, ∀i ∈ {0, 1, 2...255} (2.99)

or

µi(r|t) = α + βi, ∀i ∈ {0, 1, 2...255}. (2.100)

Substituting it in the formulation of η2(r|t)

η2(r|t) =

∑255i=0 ni(α + βi− µr)2

mnσ2r

. (2.101)

Using the fact that: α + βµt = µr,

η2(r|t) =

∑255i=0 ni(α + βi− α− βµt)2

mnσ2r

. (2.102)

or

η2(r|t) =β2

σ2r

∑255i=0 ni(i− µt)2

mn. (2.103)

60

Since i is the intensity in image t and ni is the count of that intensity,

σ2t =

∑255i=0 ni(i− µt)2

mn, (2.104)

therefore

η2(r|t) = β2σ2t

σ2r

, (2.105)

substituting the value of β from Equation 2.55, it follows that

η(r|t) = ρ(r, t), (2.106)

which proves that correlation ratio is equal to correlation coefficient, if the association

between r and t is linear.

Thus we may conclude that correlation ratio is the more general form of correlation

coefficient, because correlation ratio measures the strength of functional relationship,

whereas correlation coefficient measures the strength of only linear relationships. In

many image matching applications, the relationship between images to be matched

is multi-valued functions or stochastic functions. In the presence of multi-valued

functional relationships, correlation ratio no longer remains the best choice because

it cannot measure the strength of multi-valued functional relationships. In such cases,

joint entropy and mutual information are the preferred similarity measures. These

measures are discussed in the following section.

2.7 Entropy and Mutual Information

As we have discussed in the last section, Euclidean distance, angular distance and

correlation based measures assume a linear association between the two images to

be matched. In case of a non-linear or functional association, these measures fail to

obtain a high similarity score. For such cases, correlation ratio has been proposed to

measure the strength of functional relationship. Correlation ratio assumes that the

function is deterministic or single valued. In case the association between two images

is multi-valued or non-deterministic, the similarity score generated by correlation

61

Brightness Constancy Assumption: City Block Distance Measure, Euclidean Distance Measure, Cross-Correlation

Only Brightness Variations Zero Mean Normalized City Block Distance Measure, Zero Mean Normalized Euclidean Distance Measure, Zero Mean Cross-Correlation

Only Contrast Variations: Unit Variance Normalized City Block Distance Measure, Unit Variance Normalized Euclidean Distance Measure, Normalized Cross Correlation

Linear Relationship: Zero Mean Unit Variance Normalized City Block Distance Measure, Zero Mean Unit Variance Normalized Euclidean Distance Measure, Correlation Coefficient (ρ)

Functional Relationship: Correlation Ratio

Probabilistic Relationship: Joint Entropy, Mutual Information

Figure 2.3: An application hierarchy of the commonly used image match measures.The most general match measures are in the outer most while the most restrictivematch measures are in the inner most circles.

62

ratio will be smaller than the actual similarity. As an example, suppose a random

variable a may assume three values, a ∈ {2, 4, 6} with equal probability and another

variable b may assume six values with equal probability, b ∈ {0, 5, 10, 20, 25, 30}. The

association between a and b is such that each value from a maps to two different

values from b, with equal probability: pr(a = 2|b = 0) = 1/6, pr(a = 2|b = 30) = 1/6,

pr(a = 4|b = 5) = 1/6, pr(a = 4|b = 25) = 1/6, pr(a = 6|b = 10) = 1/6, pr(a =

6|b = 20) = 1/6, and all other conditional probabilities are zero. The mean of each

of the three categories of b|a is 15: µi(b|a) = 15. The variance of category means will

be zero: σ2µi

= 0 and as a result η(b|a) = 0. In this specific example, although there

is a perfect multi-valued functional relationship, but correlation ratio is exactly zero,

which shows that in case of multi-valued functional associations, the score generated

by correlation ratio may be smaller than the actual strength of association.

In order to measure the strength of association between images having non determinis-

tic associations, entropy and mutual information based measures have been proposed.

Entropy is a measure of the dispersion of the image histogram. Entropy of an image

r, having total 256 intensity levels and size m× n pixels, is defined as

H(r) = −255∑i=0

pr(i) log pr(i), (2.107)

where pr(i) is the probability of intensity i, which is computed from the image his-

togram. If the intensity i occurs ni number of times, the probability pr(i) is given

by:

pr(i) =nimn

(2.108)

Similarly, the entropy of the image t is given by:

H(t) = −255∑i=0

pt(i) log pt(i) (2.109)

The entropy of individual images, H(r) or H(t), are also known as marginal entropies.

A distribution having probability concentrated in small regions has low entropy while

a dispersed distribution has high entropy. As an example, if an image has only one

intensity value at all pixels, the probability of that intensity will be 1.00 and entropy

63

of that image will be 0.00 bits. In contrast, if an image has all intensity values in

exactly equal number of pixels, then for 256 intensity levels the probability of each

level will be 1/256. pr(i) log(Pr(i)) = −8/256 for i ∈ {0, 1, 2...255}, and entropy of

this image will be maximum, 8.00 bits.

The joint entropy of two images is defined as:

H(r, t) = −255∑i=0

255∑j=0

prt(i, j) log prt(i, j), (2.110)

where H(r, t) is the joint entropy and prt(i, j) is the joint probability of the intensity

pair (i, j) such that r(x, y) = i and t(x, y) = j. We may estimate prt(i, j) by counting

the frequency of occurrence of each intensity pair and dividing the count of each pair

by the total pixel count:

prt(i, j) ≈count(r(x, y) = i, t(x, y) = j)

mn. (2.111)

If r and t have an association, the joint probability density plot will have only few

high probability concentrations along each row, where as in case of unrelated images,

probability will be equally distributed over all outcomes. The dispersion or concentra-

tion of the joint probability density function may be measured by using joint entropy.

A low value of H(r, t) shows a strong association between r and t, while a high value

of H(r, t) correspond to weak association. Relationship between joint entropy and

marginal entropies is given by:

H(r, t) = H(t) +H(r|t) = H(r) +H(t|r). (2.112)

In some image regions, the joint entropy may be low because of the low marginal en-

tropies, therefore a normalization with marginal entropy is required which is provided

by the mutual information of the two images.

Mutual information may also be used to measure the association between two images,

when the association is in the form of non-deterministic function. Mutual information

is a more robust measure than the joint entropy because it has a normalizing effect.

64

Mutual information may be defined as (T. M. Cover, 1991):

M(r, t) = H(r) +H(t)−H(r, t). (2.113)

A rearrangement of terms may yield the following formulation of mutual information

M(r, t) =255∑i=0

255∑j=0

prt(i, j) log2

prt(i, j)

pr(i)pt(j), (2.114)

which is also known as Kullback-Leibler distance between the joint and the marginal

distributions.

2.7.1 Relationship between mutual information and Corre-

lation Coefficient

For random variables having Gaussian probability distribution function, a relationship

between mutual information and correlation coefficient may be derived. If r and t are

Gaussian distributed random variables, then the probability of a particular intensity

i in r is given by:

p(r(x, y) = i) =1√

2πσre− 1

2(i−µr)2

σ2r , (2.115)

and probability of a particular intensity j in t is given by

p(t(x, y) = j) =1√

2πσte− 1

2(j−µt)

2

σ2t . (2.116)

Using the identity: ∫r2e−ar

2

=1

2a

√π

a, (2.117)

marginal entropies of r and t may be computed:

H(r) =1

2log

2(2πeσ2

r), (2.118)

and

H(t) =1

2log

2(2πeσ2

t ) (2.119)

65

The joint probability density function of two Gaussian distributed random variables

is given by:

p(r(x, y) = i, t(x, y) = j) =1√

2π|Σ|e−

12

(i−µr)Σ−1(j−µt)T (2.120)

where Σ is defined as:

Σ =

[σ2r σ2

rt

σ2rt σ2

t

]Using the identity 2.117, the joint entropy may be computed as follows:

H(r, t) =1

2log

2(2πe|Σ|) (2.121)

Using the marginal and the joint entropy expressions, mutual information of two

Gaussian distributed random variables turns out to be:

I(r, t) =1

2log

[2πe

σ2rσ

2t

|Σ|

](2.122)

This expression may further be simplified by using the fact that |Σ| = σ2rσ

2t − σ4

rt to:

I(r, t) = −1

2log

[1

2πe(1− ρ2

rt)

](2.123)

Since logarithm is a monotonic function, therefore one may easily conclude that for

two images which have Gaussian distribution, maximization of mutual information is

equivalent to the maximization of correlation coefficient. Image intensities, however,

are not often Gaussian distributed, and therefore the relationship between correlation

coefficient and mutual information does not hold in general.

2.8 Conclusion

In this chapter we have discussed some commonly used image match measures in-

cluding city block distance, Euclidean distance, angular distance, correlation based

66

Table 2.1: Perfect Score (√

) and Not-Perfect Score (×) produced by different Im-age Match Measures (IMM), if the images to be matched exhibit constant intensity,additive intensity variations, multiplicative intensity variations, linear associations,functional associations and probabilistic associations. Image match measures areSAD, SSD, Cross-Correlation(CC), NCC, correlation coefficient (ρ), correlation ratio(η), Joint Entropy (JE), and Mutual Information (MI).

IMM Constant Additive Multipl. Linear Functional ProbabilisticSAD

√× × × × ×

SSD√

× × × × ×CC

√× × × × ×

NCC√

×√

× × ×ρ

√ √ √ √× ×

η√ √ √ √ √

×JE

√ √ √ √ √ √

MI√ √ √ √ √ √

measures, correlation ratio, joint entropy and mutual information. Correlation coef-

ficient has been discussed in detail due to its larger significance as an image match

measure. Relationships of correlation coefficient with other match measures have also

been elaborated and are summarized in Figure 2.2. Relationship between correlation

coefficient and linear regression has also been discussed in significant detail.

For each image match measure, the underlying assumptions about the relationship

between the images to be matched are summarized in Figure 2.3. City block distance,

Euclidean distance, angular distance and cross correlation assumes that the intensity

values of the images to be matched remain exactly same for perfect score (Table

2.1). Zero mean and unit variance normalized city block distance, zero mean and

unit variance normalized Euclidean distance and correlation coefficient, all give a

perfect score even if there is a linear relationship between the images to be matched.

Correlation ratio assumes that there is a single valued functional relationship which

may be linear or non-linear, but must be deterministic and single valued to yield a

perfect score. Joint Entropy and Mutual Information based measures assume that

the relationship between the images to be matched may be multi-valued functional

relationship which may also be called as probabilistic relationship. Thus, the use of a

particular match measure in an image matching application strongly depends on the

type of the association between the images to be matched.

Chapter 3

COMPUTATIONAL ASPECTS OF COMMONLY USED

IMAGE MATCH MEASURES

The most common template matching process consists of comparing a small template

image against multiple search locations within a relatively larger reference image

and evaluating an image match measure. The search location which yields the best

similarity score may be selected as the best match location. Suppose the template

image t of size m× n pixels, has to be matched at all valid search locations within a

relatively large reference image r of size p × q pixels. For the purpose of matching,

the reference image R is considered to be divided into overlapping rectangular blocks

rio,jo , each of size m× n pixels, where (io, jo) are the coordinates of the first pixel of

the reference block. Each of the reference block rio,jo is a candidate search location.

If the match measure evaluated during each comparison is a distance measure, then

the best match location may be defined as the search location which yields minimum

distance over the entire search space:

imin, jmin = arg minio,jo

D(rio,jo , t), (3.1)

where D(·, ·) is the function computing distance between rio,jo and t. Alternatively,

if the image match measure computed during each comparison is a similarity mea-

sure, then the best match location will be defined as the search location exhibiting

maximum similarity over the entire search space:

imax, jmax = arg maxio,jo

S(rio,jo , t), (3.2)

where S(·, ·) is a function computing similarity score between rio,jo and t.

During template matching process, the search for the best match location is done

over the two translational parameters, (io, jo). Therefore, in the context of general

image registration problem, the template matching process is sometimes referred as

67

68

translation only image registration. In the generic image registration problem, the

search for best match location is done over the entire set of geometric transformation

parameters. For example, if the assumed transformation between rio,jo and t is affine,

the search for the best match location has to be done over four affine parameters in

addition to the two translational parameters, (io, jo).

TA =

a1 a2 io

a3 a4 jo

0 0 1

. (3.3)

If a projective transformation is assumed between rio,jo and t, the search for the best

match location has to be done over six projective parameters in addition to the two

translational parameters (io, jo)

TP =

a1 a2 io

a3 a4 jo

c1 c2 1

. (3.4)

The search for the best match in eight dimensional search space, may be written as:

rmin , minio,jo

(min

a1,a2,a3,a4,c1,c2D(TP (rio,jo), t)

), (3.5)

where TP (·) is the projective transformation function, which geometrically transforms

the input image.

As the dimensionality of the search space increases, the computational cost of match

measure also increases exponentially, this makes the process of image registration

practically intractable. Keeping in view the importance of image registration, signif-

icant efforts have been done to reduce the computational cost of match measure in

the form of fast computational techniques.

Existing techniques for fast computation of match measures may be divided into two

main categories, fast approximate techniques and fast exhaustive techniques. Fast

approximate techniques are based on, as the name implies, some approximation,

which cause a reduction in the computational cost of match measure but may incur

69

associated reduction in accuracy of finding the global maximum. Most of the fast

approximate techniques are implemented in the spatial domain, whereas fast exhaus-

tive techniques may be implemented in either the frequency domain or the spatial

domain. Fast exhaustive techniques guarantee the global maximum over the entire

search space to be found.

A lot of research efforts have been dedicated for the development of fast approximate

techniques with emphasis on relatively less deterioration in accuracy and more com-

putation reduction. Fast approximate techniques may further be divided into two

categories, based upon the strategy used for computation reduction: ‘Approximate

Search Space’ techniques and the ‘Approximate Image Representation’ techniques.

Most commonly used fast approximate techniques employ search space reduction by

approximating the actual search space with a smaller one, and thus reducing the

number of times match measure has to be evaluated. In these techniques, the cost

of one-time computation of match measure remains same. In the second class of ap-

proximate techniques, the cost of one time match measure computation is reduced by

approximating the template or the reference image with a simpler representation. In

the approximate image representation techniques, the actual match measure has also

been approximated with match measure which is simpler to compute.

The exhaustive accuracy spatial domain techniques may also be divided into two

categories including ‘Complete Computation’ techniques and ‘Computation Elimina-

tion’ techniques. Complete computation techniques employ efficient rearrangement

of match measure formulations to reduce the computational complexity by separating

the pre-computable terms from those which have to be computed at run time. By do-

ing so, the order of computational complexity may remain the same however the cost

of the operations with highest order of complexity may reduce. The second category

of the exhaustive accuracy techniques are the bound based computation elimination

techniques. In these techniques a significant amount of computation is skipped by

comparison of a theoretical bound on the match measure with the current known

maximum. This comparison actually discloses the fact that a specific search location

may not be able to compete with the already known best match location. These

techniques focus on skipping most of the computations while ensuring no change in

accuracy. Bound based computation elimination techniques may further be grouped

70

into two categories including ‘Complete Elimination’ techniques which discard entire

computations at a search location, and ‘Partial Elimination’ techniques which discard

a portion of the computation at a particular search location when the unsuitability

of that search location is established.

3.1 Fast Approximate Image Matching Techniques

Fast approximate techniques reduce the computational complexity of image matching

by making different types of approximations, which may be divided into two cate-

gories. The first category includes approximations of the search space with a smaller

search space, and the second category includes image approximations with simpler

representations along with match measure approximations with simpler match mea-

sures. All these approximations cause a reduction in the accuracy of image matching

process.

3.1.1 Search Space Approximation Techniques

Search space approximation techniques include most of the commonly used fast ap-

proximate algorithms. For better comprehension, these techniques are further subdi-

vided into small search space and the large search space techniques.

Small Search Space Techniques

Most of the research on small search space approximate image matching techniques

has been done in the perspective of block motion estimation for temporal redundancy

reduction in the video encoders. In software-based video encoders, the computa-

tional cost of block motion estimation comprises almost 50% to 70% of the total

cost (Shanableh and Ghanbari, 2000). Therefore many fast search methods have

been proposed to reduce the computational complexity of the block motion estima-

tion (Huang et al., 2006a). The emphasis of all of these methods is to reduce the

number of search points by selectively checking the match measure at only a few

71

positions. Search space approximation is based upon the assumption that the image

match measure monotonically varies towards the global maximum which is close to

the starting positions. Following are the most referred techniques that fall in this

category:

i- Two Dimensional Logarithmic (TDL) search has been used by (Jain and Jain,

1981) to track the direction of the minimum of Sum of Squared Differences (SSD)

match measure. In this technique, in the first step, the match measure is com-

puted at only five initial positions. These five positions consist of one position at

the center of the search space, and the four positions are in the four directions,

left, right, up and down the central position, at a half distance in each direction.

In the second step, three more positions are searched in the direction of the min-

imum SSD as found in the first step. The step size is then halved and the above

procedure is repeated until the step size becomes unity. In the last step, all the

nine positions are searched. As an example, for a search window size of 11× 11

pixels, ±5 pixels from the center, only 13 to 21 positions may be required to be

searched as opposed to 121 positions required in the full search approach.

ii- Cross Search Algorithm (CSA) has been proposed by Ghanbari (1990). In CSA,

in the first step, match measure is computed at five search locations, including the

central location in the search window and the four locations in the four diagonal

directions, in the form of a cross (×), at half-way from center to the corner of the

search window. In the following step, four new evaluations are done, each at a

half step-size distance, around the position with minimum distortion value. The

same process continues until the step size reduces to one pixel. For a maximum

step size of w pixels, the total number of computations becomes 5 + 4 log2w

locations, for w ≥ 1.00.

iii- Three Step Search (TSS) technique proposed by Koga et al. (1981), has been used

to compute motion displacement up to 6 pixels per frame. In this technique, in

the first step, match measure is computed at nine positions including the central

position and the eight surrounding positions at half way in each of the eight

principal directions. At the position with minimum distortion, the search step

size is halved and the next eight new positions are searched. As an example,

72

consider a search window of size 23 × 23 pixels, or ±11 pixels in the vertical

and horizontal directions. In the first step, match measure will be computed at

nine positions including the central position (0, 0) and eight positions at (±6,±6)

pixels, in the eight directions. In the second step, eight more evaluations are done

at (±3,±3) around the minimum distortion position found in the step one. In the

third step, eight more evaluations follow at (±1,±1) pixels around the minimum

distortion position as found in step two. Thus, in TSS technique, instead of 529

evaluations in full search, only 25 evaluations are done. Improvements over the

basic algorithm has been proposed in the form of New Three Step Search (NTSS)

(Li et al., 1994) and Four Step Search (FSS) (Po and Ma, 1996).

iv- Orthogonal Search Algorithm (OSA) has been proposed by Puri et al. (1987). In

OSA, each step consists of two stages, a horizontal stage followed by a vertical

stage. In the first step, the match measure is evaluated at the center of the

search window and at two points in the horizontal direction at half-way from

the center to the end of the search window. Two more evaluations are done in

vertical direction around the position of minimum distortion in the horizontal

direction. In the following step, same procedure is repeated with the step size

reduced by half. Since in each new step, the match measure is evaluated at only

four new locations, the total number of evaluations is 1 + 4 log2w, where w is the

initial step size. Therefore, OSA may be considered as the fastest algorithm in

the category of small search space approximate techniques.

v- Modified Motion Estimation Algorithm (MMEA) has been proposed by Kappa-

gantula and Rao (1985). In MMEA, if distortion value at center of the search

space (i, j), is less than a minimum threshold, the search is stopped and the block

is marked as unchanged. Otherwise, the match measure is evaluated at four new

locations: (i−4, j), (i, j+4), (i+4, j), and (i, j−4), assuming a search window of

±7 pixels. If the minimum from these four locations is larger than the distortion

at central location, then the algorithm proceeds to the next step, otherwise the

minimum vale at the central location is used as the best available, and the search

process is stopped. Assuming that from the previous step the pixel position that

had the minimum mismatch was at (i−4, j). Further positions to be searched are

(i− 4, j − 4) and (i− 4, j + 4). Therefore, during the first step, match measure

73

is evaluated at seven locations. In the following steps, the same procedure is

repeated with half the step size. Therefore, with this method for w = 7 pixels,

only 19 evaluations are required.

vi- Conjugate Direction Search (CDS) (Srinivasan and Rao, 1985): All small search

space approximate algorithms, discussed in this subsection try to find a line in the

search space along which the minimum value of distortion function may be found.

In the case of two parameter search, the minimum value along one parameter may

be found first and then the minimum value along the second parameter may be

searched. This type of approach is called as ’One at a Time Search’ (OTS).

Conjugate Direction Search (CDS) is an extension of OTS technique. For two

variable functions to be minimized, CDS obtains two conjugate direction vectors.

Search is done along each direction using OTS approach. An improvement over

the basic algorithm has been proposed by Fast One Step Search (FOSS) algorithm

by Ramachandran and Srinivasan (2001).

If (i, j) is the center of the search space, the first step of the CDS algorithm

proceeds by evaluating the match measure at three positions (i, j), (i, j+ l), and

(i, j− l), and the position with minimum distortion is found. Suppose if (i, j+1)

position yield minimum distortion, (i, j + 2) is computed and the minimum of

(i, j), (i, j + l), and (i, j + 2) is searched. If minimum is found to be between

two high values, search in this direction will stop, otherwise further positions are

checked in the same direction. In the next step of CDS algorithm, the search

continues in the j-direction, similar to the first step.

vii- Diamond Search (DS) algorithm has been proposed by Zhu and Ma (2000). In

this algorithm, the match measure has been evaluated at the center of the search

space and at eight locations around it using the Large Diamond Search Pattern

(LDSP). If minimum value of distortion is observed at the central position, then

match measure is evaluated at six more locations around the center in the form

of Small Diamond Search Pattern (SDSP). Otherwise if minimum of LDSP occur

at some outer point, LDSP is shifted around the minimum distortion position.

Thus, in all of the small search space approximation techniques, the search space is

assumed to be quite smooth and the global maximum is also assumed to be close

74

enough to the initial starting point. If the search space is not smooth, or the position

of the global maximum is far enough from the initial starting point, then these tech-

niques may get stuck in the intermediate local maxima and fail to reach the actual

global maximum. In such cases, the large search space approximate techniques are

preferred over the small search space techniques. The large search space techniques

are discussed in the following section.

Large Search Space Techniques

The small search space techniques as discussed in the last subsection are based on the

assumption of monotonic match measure variation in the proximity of a maximum.

This assumption often causes false estimations in the presence of large displacements

and a larger search region. Large search space techniques have been developed to

handle large displacements, larger search regions and non-smooth variation of match

measure due to the presence of local maxima. Some of the commonly used large

search space techniques are discussed in the following paragraphs:

1. In order to detect large object motion, Hierarchical Block Matching (Bier-

ling, 1988), which is also commonly known as Coarse-to-Fine template match-

ing(A. Rosenfeld, 1977; Burt and Adelson, 1983) has often been used. In coarse

to fine matching, both of the template and the reference images are low-pass

filtered and sub-sampled multiple times. The resulting sequences of images with

reducing sizes are known as image pyramids. The smallest sized images, also

considered as coarse image representation, are at the top level of the pyramids

and the largest sizes having maximum detail are at the lowest pyramid levels.

At higher levels of the pyramids, the motion speed also reduces in accordance

with the total amount of sub-sampling done till that level.

Image matching is initially done across the top level images in the template

image and the reference image pyramids. The best match position found at

the top level is propagated to the next lower level. At the lower level, match

measure is evaluated at only few locations around the expected location of

maximum. The same procedure is repeated for each of the next level, until

75

the lowest pyramid level is reached. In some implementations, the intermediate

levels are skipped and the best match found at the top pyramid level is directly

propagated to the original image.

2. To further speed up coarse-to-fine template matching scheme, an approximate

algorithm using Walsh transform has been proposed (Nillius and Eklundh,

2002). In this technique, the template image and each of the search locations

is efficiently projected on to the Walsh basis using binary tree of filters. Image

match measure is computed using only a part of Walsh basis. The performance

of the coarse to fine scheme using Walsh basis has been studied and it has been

reported that for only 1% loss of accuracy, a speed up of 9% to 23% may be

obtained.

3. Two-stage template matching has been proposed by Vanderburg and Rosenfeld

(1977) using Sum of Absolute Differences (SAD) as the match measure. In

the first stage of this algorithm, only a part of the template image, called sub-

template, is matched at all search locations. In the second stage, the remaining

portion of the template is matched at selected search locations, exhibiting SAD

value less than a specific threshold in the first stage. The algorithm may fail to

detect the presence of an object or may also make false detections.

The two-stage template matching algorithm has been extended for normalized

cross-correlation (NCC) by Goshtasby et al. (1984). The basic algorithm re-

mains same, while instead of SAD, NCC has been used. NCC based two-stage

template matching algorithm is also an approximate algorithm, with non-zero

probability of missing NCC maximum.

4. Sun et al. (2003) have proposed Correlation-based Adaptive Predictive Search

(CAPS) algorithm for fast template matching in the large search space. In

CAPS algorithm, the search space is sub-sampled based on the width of the

auto-correlation function of the template image. Horizontal and vertical width

of the autocorrelation function is the distance in the horizontal and the vertical

directions, for which autocorrelation remains higher than a predefined threshold

value. Search space is sub-sampled in both directions at half of the horizontal

and the vertical width. Once a position with correlation higher than a specific

76

threshold is found, full search is carried out in the neighborhood of this position.

The large search space techniques as discussed in this subsection reduce the computa-

tional cost of image matching by approximating the actual search space with a smaller

search space. The next category of fast approximate techniques reduces the compu-

tational cost by approximating image representation with a simpler representation

and also approximates the match measure formulation with a simpler formulation.

3.1.2 Algorithms Using Approximate Image Representations

In approximate image representation algorithms, either the template image or the

reference image or both are approximated with simpler representations. In order to

efficiently compute the image match measure, approximate formulations of the match

measures have also been proposed along with each technique.

1. Briechle and Hanebeck (2001) have approximated the template image as sum

of rectangular basis functions. The correlation is computed for each of the

basis functions instead of the original images. The final value of the correlation

has been computed as the weighted sum of the correlations of the individual

basis functions. The execution time speed up has been obtained by reducing

the number of basis functions, which increases the approximation. Moreover,

instead of using the actual definition, an approximate correlation formulation

has also been proposed.

2. Yoshimura and Kanade (1994) have used Karhunen-Loeve transform to obtain

eigen images of a set of rotated templates. If the set of eigen images is smaller

than the set of templates, computations may be saved by using this alternate

representation. In order to compute normalized correlation between eigen im-

ages and the reference image, an approximate formulation of the normalized

correlation has also been proposed. Further computation reduction has also

been obtained by employing coarse to fine strategy.

3. Schweitzer et al. (2002) have efficiently computed the least squares approxima-

tion polynomials for each search location in the reference image, by using the

77

integral images proposed by P.Viola and Jones (2001). Computational cost of

estimating the best fit polynomial increases with the order of the polynomial.

It has been experimentally shown that order two polynomials provide enough

approximation required for image matching. An approximate formulation of

normalized correlation has also been proposed to compute the match measure

efficiently with the newly propose reference image representation. In this algo-

rithm, the template image has been used without approximation.

4. In order to reduce the computational cost of the image matching, both the tem-

plate and the reference images may be approximated by one bit per pixel binary

representations. The computational cost is reduced because the computations

are done on one bit data instead of the eight bits. Conversion from 8 bit per

pixel, gray scale image, to one bit per pixel, binary image, may be done by using

a global thresholding scheme as proposed by N.Otsu (1979). However, in global

thresholding scheme, important details from some of the image regions may

be lost, therefore an adaptive local thresholding scheme have been proposed

by NiBlack (1986).

The image match measure used for the binary images is based on bitwise XNOR:

γb(r, t) =m∑x=1

n∑y=1

rb(x, y)⊕ tb(x, y), (3.6)

where rb and tb are the binary images each of size m× n pixels, converted from

the gray scale images r and t. The operator ⊕ represents the binary function

XNOR. In case of translational parameter search, best match location is one at

which γb(r, t) is maximum.

Thus we may conclude that approximate image matching techniques obtain fast speed

up at the expense of the loss in accuracy. If exhaustive equivalent accuracy is required,

approximate techniques may not be used. The fast exhaustive accuracy algorithms

have been proposed in both frequency domain and spatial domain. In the following

section, fast exhaustive algorithms in frequency domain are discussed.

78

3.2 Fast Exhaustive Accuracy Image Matching in

Frequency Domain

In many cases, exhaustive computation of cross-correlation, NCC, and ρ has been

most efficiently done by using frequency domain transformation of the template and

the reference image. Once correlation based measures are efficiently computed, Eu-

clidean distance based measures may also be found by using the relationships dis-

cussed in Chapter 2. In order to transform images from spatial domain to frequency

domain, Discrete Fourier Transform (DFT) has often been used.

Considering the template image t of size m × n pixels to be matched at all search

locations in the larger sized reference image r of size p × q pixels. Two dimensional

DFT of the template image may be defined as:

T (u, v) =1

MN

M∑x=1

N∑y=1

t(x, y)e−j2π(uxM

+ vyN

), (3.7)

where T (u, v) is the transformed template image and (u, v) are the index locations

in frequency domain. The parameters M and N are defined as M = m + p − 1 and

N = n+ q − 1. Since m < M and n < N , the extra image space is filled with zeros,

commonly known as zero padding.

The 2-D Discrete Fourier Transform may also be computed by using two 1-D trans-

forms by using the separability property of Fourier Transform. First 1-D transforma-

tion may be done in the direction of rows only

T (x, v) =1

N

N∑y=1

t(x, y)e−j2π( vyN

), (3.8)

and the second transformation may then be done in the direction of columns only

T (u, v) =1

M

M∑x=1

T (x, v)e−j2π( vyN

). (3.9)

The computational cost of the 2-D transformation is significantly reduced if computed

79

by using two 1-D transforms. Similarly 2-D transformation of the reference image may

also be computed in two steps, first transformation along the rows only and the second

transformation along the columns only.

3.2.1 Fast Fourier Transform (FFT) Algorithms

The straight forward computation of DFT may require computational complexity of

the order of O(M2N2), however a large volume of fast computational algorithms have

been developed. Following are often used categories of the Fast Fourier Transform

(FFT):

1. Radix-2 Algorithms: In these algorithms, the original problem has been suc-

cessively broken down into problems of smaller sizes and then bottom up com-

putation schemes have been used to speed up the computations. The basic

implementations of these algorithms were developed by Danielson and Lanczos

(1942) and later by Cooley and Tukey (1965). These implementations require

each image dimension, the rows M and the columns N , to be in exact powers of

2. Therefore, M and N are selected as M = 2dlog2(m+p−1)e and N = 2dlog2(n+q−1)e,

where de represent the ceiling function. The computational cost of DFT if com-

puted by the FFT algorithms, reduces to O(MN log2(MN)). Using the separa-

ble property of Fourier transform, the complexity of the 2-D transform further

reduces to O(max(MN log2M,MN log2N)).

2. Mixed Radix algorithms (Singleton, 1969) do not require the image size to be

in exact powers of 2. These algorithms successively divided the problem to

the smallest prime factor along the respective image direction. Mixed radix

algorithms may be considered as a generalization of the radix 2 algorithms,

because these algorithms can break down any composite size to its factors. In

these algorithms, if the smallest prime factor is quite small, these algorithms

compute the transform efficiently, while if the smallest prime factor of N (or

M) is quite large, the performance deteriorates accordingly. In worst case,

if N (or M) is a prime number, then no subdivision of the problem is possi-

ble. Therefore the complexity of transformation will increase to the original

80

complexity of O(max(MN2, NM2)) instead of the reduced complexity of FFT,

O(max(MNlog2N,MNlog2M)).

3. The Prime Factor Algorithms: (Good, 1960; Thomas, 1963; Chan and Ho, 1991)

Another related class of FFT algorithms is Prime Factor FFT algorithms which

efficiently perform transformation if the radix is a prime number. However,

these algorithms do not perform well for even numbered radices.

4. The Split Radix Algorithms: These algorithms rearrange computations in the

basic FFT implementation of Cooley and Tukey (1965) by blending radix 2 and

radix 4 to achieve fast speed up. Some of the well known split-radix algorithms

have been proposed by (Yavne, 1968; Duhamel and Hollmann, 1984; Vetterli

and Nussbaumer, 1984; Sorensen et al., 1986; Duhamel and Vetterli, 1990). A

new radix-2/8 split radix FFT algorithm has been proposed by Bouguezel et al.

(2004).

Numerous FFT implementations are available publicly, both commercially as well as

freely, over various hardware platforms. Most of the well known implementations have

been bench-marked by M. Frigo and S. G. Johnson for the accuracy and the speed

up point of view by using their software named benchFFT. The benchmark results of

all these implementations over different types of commonly used hardware platforms

are available at their web site: http://www.fftw.org/benchfft/. In the following

paragraphs, we will briefly discuss only two freely available FFT implementations,

which we have used for execution time comparisons in the later chapters.

A simple FFT routine with minimal interface has been provided by William et al.

(2007). This is a simple sequential implementation of the radix-2 FFT as proposed

by Cooley and Tukey (1965). This routine has been originally written by Rader

and Brenner (1976). This FFT routine may be considered as a base line of the

FFT algorithms as it does not exploit the optimizations for example split radix or

parallelism.

A comprehensive collection of fast C routines for computation of Discrete Fourier

Transform is freely available from www.fftw.org, as Fastest Fourier Transform in

the West (FFTW) (Frigo and Johnson, 1998, 2005; Johnson and Frigo, 2007). The

81

recent version of FFTW has been named as FFTW3, which is based upon Cooley-

Tukey algorithm, and also uses prime factor algorithm, Raders algorithm for prime

sizes, and a split-radix algorithm as well. The input data to FFTW3 may have any

arbitrary length, including prime numbers. FFTW3 adapts the DFT algorithm to

the underlying hardware in order to maximize the performance. FFTW3 also utilizes

SIMD instructions which perform same operation on all elements in a data array,

in parallel. The computation of FFTW3 has been split into two phases a planner

learns the fastest way to compute the transform on a given hardware and makes a

plan, which is then executed in next phase, to transform the input. FFTW3 interface

has been organized into three levels of increasing complexity, the basic interface, the

advanced interface, and the guru interface for the expert users.

We have compared the execution time of FFTW3 based image matching techniques

based on convolution theorem, as discussed in the following section, with partial

correlation elimination algorithms discussed in Chapters 6 and 7. Note that the

implementation of PCE algorithms is sequential, without using any parallelism or

hardware specific optimization. Therefore, the comparison of PCE with FFTW3

appears unjustified; however we observe that even then, PCE has remained faster

than FFTW3 in many cases. The details of these comparisons and the results are

discussed in the Chapters 6 and 7.

3.2.2 Image Matching by Correlation Theorem

During the process of cross correlation computation, the template image is matched

with each valid search location, where a valid search location is a block in the reference

image having same size as that of the template image size. For a particular position,

cross correlation is computed by multiplying corresponding pixels and then adding

the result. The equation of cross correlation was given in Chapter 2, may be repeated

here for easy reference:

ψ(r, t) =m∑x=1

n∑y=1

r(x, y)t(x, y). (3.10)

82

Convolution is a very similar process to the computation of cross correlation. Con-

volution between the reference image r and the template image t may be written

as:

Cr,t(io, jo) =1

mn

m∑x=1

n∑y=1

t(x, y)r(io − x, jo − y). (3.11)

In this equation, (io, jo) shows a particular displacement, the minus signs shows that

the signal r is flipped about the origin, (io, jo) in this case. This flipping is inherent in

the definition of the convolution due to its interpretation as a way to compute output

of an LTI system via its impulse response. The convolution process, as described

by Equation 3.11, may be summarized as flipping one image about the origin, then

shifting that image with respect to the other by changing the values of (io, jo) and

computing sum of products over all values of (x, y). For more details about convo-

lution, any digital image processing text, for example (Gonzalez and Woods, 2002),

may be consulted.

The convolution theorem states that convolution in spatial domain is equivalent to

the point by point multiplication in the frequency domain

r(io, jo) ∗ t(io, jo)⇐⇒ R(u, v)T (u, v), (3.12)

and convolution in frequency domain is equivalent to point by point multiplication in

spatial domain:

R(u, v) ∗ T (u, v)⇐⇒ r(io, jo)t(io, jo), (3.13)

To establish the link between correlation in spatial and frequency domains, we observe

that the general formulation of correlation is given by:

ρr,t(io, jo) =1

mn

m∑x=1

n∑y=1

t∗(x, y)r(io + x, jo + y). (3.14)

where t∗ denotes the complex conjugate of t. The positive signs in this equation in

the indices of r(io + x, jo + y) shows that r is not mirrored around the origin. Using

the similarity of correlation and convolution formulations, correlation theorem has

been defined as

r(io, jo) ◦ t(io, jo)⇐⇒ R∗(u, v)T (u, v), (3.15)

83

where R∗(u, v) shows the complex conjugate of the frequency domain representation

of image r. Equivalent dual form of correlation theorem follows from the duality

property of Fourier Transform:

R(u, v) ◦ T (u, v)⇐⇒ r∗(io, jo)t(io, jo). (3.16)

Using correlation theorem, cross correlation may be computed in frequency domain

as follows:

1. Take Fourier Transform of the images r and t

R(u, v) = F{r(x, y)} (3.17)

T (u, v) = F{t(x, y)} (3.18)

2. Compute complex conjugate of any one of the two images in frequency domain

R∗(u, v) = conj(R(u, v)) (3.19)

3. Compute Point by point complex multiplication of T (u, v) and R∗(u, v)

P (u, v) = T (u, v)R∗(u, v) (3.20)

4. Take the inverse Fourier transform of the product

r(io, jo) ◦ t(io, jo) = F−1{P (u, v)} (3.21)

Note that to perform point-by-point multiplication both R and T must be of the

same size. Therefore, it is necessary to zero pad both r and t before taking their

Fourier transforms. Zero padding also helps in avoiding undesirable overlap of one

image with the next period of the other image. This is because Fourier transform

considers both images as two dimensional periodic signals. Therefore the template

image must be zero padded equal to the size of reference image and reference image

must be zero padded equal to the size of template image. For template image of size

84

m× n pixels and reference image of size p× q pixels, the zero padded images will be

of size (p+m− 1)× (q + n− 1).

The computation of cross-correlation is straight forward by using the correlation

theorem. However, the computation of normalized cross correlation (NCC) given by

Equation 2.25 and correlation coefficient (ρ) given by Equation 2.29 needs separate

computation of normalization parameters. At each search location, the computed

value of cross correlation is normalized by the separate computed parameters. The

cost of correlation coefficient may be reduced by rearranging the formulation given

by Equation 2.28. Since FFT transformation involves computation on real numbers,

therefore if one or both of the images are converted from digits to real numbers,

computational complexity will remain same. The template image t may be zero mean

and unit variance normalized by subtracting mean µt and by dividing the standard

deviation term:

tzu(x, y) =t(x, y)− µt√∑m

i=1

∑nj=1(t(i, j)− µt)2

, (3.22)

and Equation 2.28 may be written as

ρ(r, t) =1√

m∑i=1

n∑j=1

(r(i, j)− µr)2

m∑x=1

n∑y=1

r(x, y)tzu(x, y), (3.23)

which shows that cross correlation between r and tzu may be computed by using

correlation theorem and the resulting value normalized by reference image standard

deviation term will generate correlation coefficient.

3.2.3 Image Matching by Phase Only Correlation

Phase only correlation method has been investigated by many researchers (Kuglin

and Hines, 1975; Reddy and Chatterji, 1996; Foroosh et al., 2002) and may also be

found in digital image processing text books, for example see (Pratt, 2007). Phase

only correlation method may only be used for translation only image registration

applications. If the images t and r are just translated versions of each other, such

85

that the shift is (xo, yo),

r(x, y) = t(x− xo, y − yo) (3.24)

then by the shift property of Fourier transform, frequency domain representations of

the images will be related by a complex exponential term

R(u, v) = T (u, v)e−i(uxo+vyo), (3.25)

where R is Fourier transform of r and T is Fourier transform of t. The phase shift

term may be computed by using the cross power spectrum of the two images:

G(u, v) =R(u, v)T ∗(u, v)

|R(u, v)T (u, v)|= e−i(uxo+vyo) (3.26)

The amount of shift may be found by taking the inverse Fourier transform of the

function G(u, v)

F−1{G(u, v)} = δ(x− xo, y − yo). (3.27)

After taking the inverse Fourier transform, a peak in the spatial domain shows the

position of shift (xo, yo).

Accuracy studies of phase only correlation has been done by Manduchi and Mian

(1993) in comparison with cross correlation for input images corrupted with additive

white Gaussian noise. It has been reported that the phase correlation technique is

more sensitive to noise as compared to the direct cross correlation technique both

for the low pass signals and the high pass input signals. Less accuracy of phase only

correlation technique has also been reported by others, for example (Caelli and Liu,

1988).

The computational cost of correlation computation by convolution theorem as well

as by phase only correlation involves two Fourier transforms in the forward direction,

each having complex computations of the order of (p+m−1)(q+n−1) log2(p+m−1) + (p + m − 1)(q + n − 1) log2(q + n − 1) = (p + m − 1)(q + n − 1) log2((p + m −1)(q+n−1)) for the case of convolution theorem and pq log2 p+pq log2 q = pq log2(pq)

for the case of phase only correlation. The domain transformations are followed by

complex multiplications of the order of O((p + m − 1)(q + n − 1)) for correlation

86

theorem and O(pq) for the case of phase only correlation. Phase only correlation also

require computation of magnitude |R(u, v)T (u, v)| and then pixel by pixel division

of R(u, v)T ∗ (u, v) by the magnitude. Each of these operations has a computational

complexity of the order of O(pq). In both methods, one domain transformation in

the inverse direction having same complexity as the transformation in the forward

direction is also required. The computational cost of computing complex conjugate

may be considered negligible. The dominant computational complexity of correlation

computation by convolution theorem remains O((p+m− 1)(q+n− 1) log2((p+m−1)(q+n−1))), and for phase only correlation, the dominant complexity has remained

O(pq log2(pq)). Both of these complexities are quite smaller than the complexity

of correlation computation in spatial domain, O(pq × mn). Therefore correlation

computation has often been done in the frequency domain by using FFT for domain

transformation.

Often correlation coefficient implementations are based on Fast Fourier Transform

(FFT) and significant efforts have been made to reduce the time complexity of FFT.

However, as the template size reduces, the computational advantage of frequency

domain over spatial domain decreases and for small template sizes, spatial domain

implementations become faster, (see Pratt (2007) and Lewis (1995)). This is because

of the fact that, for small template sizes, the overheads involved in frequency domain

computations become significantly larger than the direct computational cost in the

spatial domain. Another scenario in which FFT based implementation may not be

efficient, is finding point correspondences between two images. Each feature from one

image has to be correlated at only a few locations in the second image, often selected

by a corner detection algorithm. This may be efficiently computed in spatial domain

while in frequency domain, complete computations at all search locations have to be

performed.

Thus, despite the availability of efficient FFT routines, spatial domain computation

of correlation is still of significant practical importance. Spatial domain exhaustive

accuracy algorithms discussed in the following section are also important because

these algorithms provide a base for the development of more efficient spatial domain

algorithms, which are bound based computation elimination algorithms. The main

contributions of this thesis also fall within the category of bound based computation

87

elimination algorithms. The discussion on bound based computation elimination

algorithms will follow the discussion on complete computation fast spatial domain

algorithms in the next section.

3.3 Fast Exhaustive Spatial Domain Techniques

The straight forward way of computing the image match measures discussed in Chap-

ter 2, is to perform complete computations in spatial domain, achieving exhaustive

accuracy. In order to speed up these exhaustive implementations, different tech-

niques have been used. Frequently used techniques include efficient rearrangement of

the match measure formulation, efficiently pre-computing the normalization parame-

ters using the integral image approach or by using the running sum approach. These

techniques are discussed in more detail in the following subsections.

3.3.1 Efficient Rearrangement of Match Measure Formula-

tion

In many cases, spatial domain formulation of a match measure may be rearranged

such that the number of operations with highest order of complexity may be reduced.

Different terms in the match measure formulation may have different order of compu-

tational complexity. An effective rearrangement will separate lower complexity terms

and the higher complexity terms such that the number of operations with higher

complexity terms may be reduced to as fewer as possible. As an example, consider

the formulation of correlation coefficient as given in chapter 2, repeated here for ease

of reference:

ρ(r, t) =

m∑x=1

n∑y=1

(r(x, y)− µr)(t(x, y)− µt)√m∑x=1

n∑y=1

(r(x, y)− µr)2

√m∑x=1

n∑y=1

(t(x, y)− µt)2

. (3.28)

88

In this formulation, the mean and the variance terms for the template image need to

be computed only once for one specific template image, matched over p× q search lo-

cations. Repeated computation of these two terms may be easily avoided if computed

once and stored for repeated usage.

For the reference image, the mean and the variance terms related to the search lo-

cations have to be computed once for each search location, therefore these terms are

computed p × q times if the size of the reference image is p × q pixels. If multiple

templates are to be matched with one reference image, repeated computation of these

terms may be easily avoided by computing only once and stored in the memory for

repeated usage. The reference image related terms may also be efficiently computed

by using the integral image approach or by using the running sum approach, discussed

in the following subsections.

If the mean and the variance terms are available from pre-computations, in the for-

mulation of correlation coefficient as given by Equation 3.28, in the numerator, four

operations have computational complexity of the order of O(mnpq), including two

real number subtractions, one real number multiplication and ten one real number

addition. A rearrangement of numerator term in Equation 3.28 may yield a more

computationally efficient form:

ρ(r, t) =

m∑x=1

n∑y=1

r(x, y)t(x, y)−mnµrµt√m∑x=1

n∑y=1

r2(x, y)−mnµ2r

√m∑x=1

n∑y=1

t2(x, y)−mnµ2t

. (3.29)

In this formulation only one integer multiplication and one integer addition has the

computational complexity of the order of O(mnpq). Ignoring the computational cost

of mean and variance terms, all remaining operation in Equation 3.29 have computa-

tional complexity of the order of O(pq) which is significantly smaller than the order

O(mnpq).

Thus efficient rearrangement of match measure formulation may reduce the compu-

tational cost significantly. The other techniques to speed up complete computation

methods are the use of pre-computable terms by efficient methods. For example, in

89

correlation coefficient formulation given by Equation 3.29, mean and variance terms

of the reference image may be pre-computed and stored in memory for repeated use.

In the following subsections we will discuss efficient pre-computation techniques, often

used to speed up the image match measure computations.

3.3.2 Integral Image Approach

The concept of integral image has been exploited by P.Viola and Jones (2001, 2004)

for efficient computation of rectangular features used for real time object detection.

Integral images have also been used by Schweitzer et al. (2002) for the estimation

of polynomial parameter used for fast approximate template matching. In these

applications, integral images have been used to find the summation of an arbitrary-

sized image patch at a very low computational cost. The normalization parameters

used in different match measure formulations, including mean and variance related

terms, may also be efficiently pre-computed by using the integral image approach.

An integral image I of the reference image r has been defined as an image of the same

size as that of r, but at each location in I, the sum of all previous locations of r is

contained:

I(x, y) =x∑i=1

y∑j=1

r(i, j). (3.30)

Thus the integral image I contains sum over all rectangular regions in the image r

that have their sides parallel to horizontal and vertical axis, their top left corner at

origin, and their bottom right corner at the (x, y) location.

Once integral image has been computed, sum over any arbitrary sized rectangular

region of r may be computed very efficiently, in only four operations. Suppose we

want to compute sum of a rectangular patch of r, with (x1, y1) as its top left corner

and (x2, y2) as its bottom right corner. The sum of all pixels included in this patch,

r(x1 : x2, y1 : y2) is given by:

x2∑i=x1

y2∑j=y1

r(i, j) = I(x2, y2)− I(x2, y1 − 1)− I(x1 − 1, y2) + I(x1 − 1, y1 − 1) (3.31)

90

The integral image itself may be computed efficiently in only one pass of the reference

image r, by using a temporary array s(x, y) and the following recursive formulation:

s(x, y) = s(x, y − 1) + f(x, y) (3.32)

I(x, y) = I(x− 1, y) + s(x, y) (3.33)

Thus for each location two summations are required to make the integral image itself

which are total of 2pq summations followed by 4pq summations required to compute

sums for all patches of size m×n pixels. Therefore, the total cost of computing sums

of all patches in r, of the same size as that of the template image, is 6pq.

In the following subsection, a more efficient summation method is discussed which

can compute the sum of all blocks in the reference image r, of the same size m × npixels, in only 4pq summation operations. However, the integral image approach is

more generic and may be used to compute summations over blocks of varying sizes,

each in just 4 operations, assuming that the integral image is available pre-computed.

3.3.3 Running Sum Approach

In most template matching problems, all patches in the reference image over which

summation has to be computed have same size, m × n pixels. In this case, the sum

may be computed even more efficiently, using the running sum approach.

In the running sum approach, the summation of a block is separated into two steps,

summation along each row is computed first and then summation along each column is

computed. Considering the first step only, summation process will proceed as follows:

1. Allocate a temporary array S, of same size as that of the reference image r.

2. For each row in r, copy the value from fist cell to the corresponding position in

S: S(x, 1) = r(x, 1).

3. For each row, compute sum of first two cells and place in the position of second

cell in S. Then compute sum of first three cells and place in third cell in S.

91

Repeat the process till summation over first n cells is computed in each row.

This can be efficiently done by adding S(x, y − 1) + r(x, y) and placing it in

S(x, y): S(x, y) = s(x, y − 1) + r(x, y).

4. For each row, for cell numbers larger than n, add the current cell value in the

previous sum and subtract one value from the trailing edge. Since previous sum

is available in S(x, y − 1), therefore add r(x, y) in it and subtract r(x, y − n).

Place the final value at S(x, y): S(x, y) = S(x, y − 1) + r(x, y)− r(x, y − n).

5. Continue same process for each row, until row end is reached.

The summation process during first step may also be written in the form of equations:

S(x, y) =

r(x, y) if y = 1;

S(x, y − 1) + r(x, y) if y > 1 & y ≤ m;

S(x, y − 1)− r(x, y −m) + r(x, y) if y > m;

(3.34)

In the second step, summation along each column of array S is computed. The second

step may be written as follows:

1. Allocate a temporary array C, of same size as that of the reference image r.

2. For each column in S, copy the value from fist cell to the corresponding position

in C: C(1, y) = S(1, y).

3. For each column in S, compute sum of first two cells and place at the second

position in C. Then compute sum of first three cells and place in third position

in C. Repeat the process till summation over first m cells is computed in each

column: C(x, y) = C(x− 1, y) + S(x, y).

4. For each column in S, for cell numbers larger than m, add the current cell value

in the previous sum and subtract one value from the trailing edge: C(x, y) =

C(x− 1, y) + S(x, y)− S(x−m, y).

92

Figure 3.1: The template image of size 101 × 101 pixels and the reference imageof size 736 × 1129 pixels (shown in reduced size) are used to generate correlationcoefficient based similarity surface, shown in Figure 3.2. The images are taken fromwww.earth.google.com.

The second step process may also be written in equation form:

C(x, y) =

S(x, y) if x = 1;

C(x− 1, y) + S(x, y) if x > 1 & x ≤ n;

C(x− 1, y)− S(x− n, y) + S(x, y) if x > n;

(3.35)

In the running sum approach, for each summation, we need two operations in the

first step and two operations in the second step. Therefore, we get each summation,

over the same size block, in just 4 operations.

Another advantage of running sum approach, over the integral image approach, is

avoidance of overflow errors. The values in the integral image may soon become

larger than the maximum integer size, causing overflow error. This error may be

avoided by increasing the integer size, for example using long or double data types in

C language. However, this will cause more memory overhead and also increase the

93

Figure 3.2: Correlation coefficient based similarity surface generated by matchingthe template and the reference images shown in Figure 3.1. The similarity surface iscomputed by using the correlation theorem based fast exhaustive frequency domaintechnique.

computational cost. Overflow problems may not appear in the running sum approach.

3.4 Bound Based Computation Elimination Algo-

rithms

In Sections 3.2 and 3.3, different fast exhaustive image matching techniques were

discussed. Some of these techniques are implemented in frequency domain while

others are implemented in spatial domain. In all of these techniques, the template

image is matched at all valid search locations in the reference image and complete

computations are performed. Since the match measure values are computed at all

search locations, a plot of these values over the entire search space may be visualized

as the match measure surface. For the case of similarity measures, the match surface

may also be called as similarity surface. A similarity surface, for the case of correlation

coefficient based template matching, is shown in Figure 3.2. On this surface, multiple

94

peaks and valleys may be seen, while the best match location is visible in the form of

the highest peak.

In most of the image matching applications, complete computations of the match sur-

face is redundant, because interest is only in finding the best match location which

requires complete computations to be done only in a small region around the peak

location. If the important region around the peak is found by some alternate tech-

nique, then the redundant computations at all other locations may be skipped. This is

the key idea behind the bound based computation elimination algorithms in which a

bound is used to classify the search locations falling in the redundant region or in the

important peak region. The core contributions of this thesis, as discussed in chapter

1, also fall in the category of bound based computation elimination algorithms.

In bound based computation elimination algorithms, instead of actually computing

the match measure, an alternate statistic is computed which is essentially a bound

upon the match measure under consideration. In case of distance measures, such

as SAD and SSD, the best match is defined by minimum value of match measure

over the entire search space; therefore a lower bound is required for classification. At

a particular search location, if the value of lower bound is found to be larger than

already known minimum, then that search location may be labeled as redundant and

skipped from the computations. At that particular location, actual value of match

measure is guaranteed to be larger than the previous known minimum.

Same idea may also be applied to the similarity measures, for example cross-correlation,

NCC and correlation coefficient. For the similarity measures, the best match location

is defined by the maximum value of match measure over the entire search space; there-

fore in this case an upper-bound is required for elimination. At a particular search

location, if the upper bound is found to be smaller than a previous known maximum

then that location may be labeled as redundant and skipped from computations.

The bounds used in the elimination algorithms are ensured to be exact without using

any approximation. Therefore the skipped locations are guaranteed to be falling in

the redundant region. It is impossible that a search location in the peak region gets

skipped. Thus, the computation elimination algorithms reduce computational cost

without any compromise on the match accuracy. These techniques guarantee the

95

same accuracy as that of the exhaustive techniques performing complete computa-

tions. Therefore, bound based computation elimination techniques are also called as

‘Exhaustive Equivalent Accuracy’ techniques.

In elimination algorithms the execution time speed up strongly depends upon the ratio

of the search locations labeled as redundant or skipped, to the total search locations.

As the amount of skipped computations increases, the template matching process

accelerates accordingly. The amount of skipped computations strongly depends upon

the position of the maximum found in the search process. Maximum found close to

the start of the search process will generate significantly large elimination as compared

to the maximum found near the end of the search process. Similarly, a maximum of

high magnitude will cause significantly large elimination in the subsequent region as

compared to maximum with small magnitude.

In several template matching applications, a guess about the location of the maximum

may be known from the context of the problem. This guess may define a region in the

search space in which probability of finding the peak is maximum. In the elimination

algorithms, the search process may start from the most probable region. That is why

spiral search is popular in block matching applications. In the absence of any guess

about the position of the maximum, approximate image matching techniques, as we

have discussed in Section 3.1, have been used to find the approximate position of the

maximum. The computational cost of the approximate search is justified by increase

in computation elimination resulting from a higher maximum found at the start of

the search process.

The amount of skipped computations also depends upon the tightness of the bounds.

A tighter bound may produce significantly larger amount of eliminated computations

as compared to a loose bound. In case of similarity computing image match measures,

a tight upper bound is required. A tight upper bound is one which is close to the actual

value of similarity from the upper side. For example, the bound computed by Cauchy

Schwartz inequality on cross-correlation, NCC or correlation coefficient is a loose

upper bound, because it always remains at maximum height above actual similarity.

In case of distortion computing image match measures, a tight lower bound is required,

which approaches actual value of distortion from below. If the bounds are tighter,

96

the elimination test will be successful more often, causing increased computation

elimination.

The main overhead of the bound based computation elimination algorithms is the

computational cost of the bound. High speed ups can only be obtained if the over-

heads are significantly smaller than the benefits obtained by the skipped search lo-

cations. If a low cost bound is not used, the overall cost may approach to the cost

of the exhaustive techniques, resulting in no speed up. If the computational cost of

the bound becomes larger than benefit obtained by skipped computations, the overall

cost may also increase than the cost of the corresponding exhaustive algorithms. We

have observed that for small template sizes, the computational cost of the bound

used in ZEBC algorithm (Mattoccia et al., 2008b) significantly increased than the

benefit of skipped computations (Mahmood and Khan, 2010, 2011), yielding ZEBC

algorithm slower than the corresponding fast exhaustive algorithm.

Based upon different types of computation elimination strategies, the elimination al-

gorithms may be broadly divided into two different categories, ‘Partial Elimination

Algorithms’, and ‘Complete Elimination Algorithms’. In Partial Elimination Algo-

rithms, at each search location a portion of the match measure is computed and using

the result of that portion, a bound is computed, which is then used to perform the

elimination test. In most of the cases, the elimination test consists of just comparing

the bound value with the previous known maximum. If the bound value is found

to be less than the previous known maximum the elimination test is successful. In

that case, at that particular search location, the remaining computations of match

measure may be skipped without any loss of accuracy. On the other hand if the

elimination test is not successful, then some more computations are performed and

then elimination test is reevaluated with the newly computed bound. Same pro-

cess is repeated until the elimination test becomes successful, or the computations at

that location get completed. In this type of elimination algorithms, some computa-

tions are mandatory at each search location, therefore these algorithms are named

as ‘Partial Elimination Algorithms’. Previously known partial elimination algorithms

include Partial Distortion Elimination (PDE) algorithms and Bounded Partial Cor-

relation (BPC) elimination algorithms. A significant part of the core contributions

of this thesis are Partial Correlation Elimination (PCE) algorithms which are partial

97

elimination algorithms for fast image matching by the maximization of correlation

coefficient. We have proposed two main categories of PCE algorithms, Basic Mode

PCE discussed in Chapter 6 and Extended Mode PCE discussed in Chapter 7.

In ‘Complete Elimination Algorithms’, at a particular search location, the alternate

statistic is computed before the start of the image match measure computations.

The elimination test consists of comparison of the bounding statistic with the pre-

vious known maximum. If the bound evaluated to be less than the previous known

maximum, the computations of the match measure are completely skipped at that

particular search location. Otherwise, if the bounding statistic is found to be larger

than the previous known maximum, the elimination test is unsuccessful. In this case,

complete computations of the match measure have to be performed at that particular

search location. The well known complete elimination algorithms include ’Successive

Elimination Algorithm’ and ’Enhanced Bounded partial Correlation’ algorithms. A

significant part of the core contributions of this thesis is Transitive Elimination Al-

gorithms (TEA), which are complete elimination algorithms. We have discussed the

theoretical aspects of TEA algorithms in Chapter 4 and different TEA algorithms are

discussed in Chapter 5.

In the following subsections, the previously known computation elimination algo-

rithms are discussed in more detail. For the purpose of completeness, our proposed

algorithms, PCE and TEA, are also briefly described. TEA will be discussed in sig-

nificant detail in Chapters 4 and 5, Basic Mode PCE will be discussed in Chapter 6

and Extended Mode PCE will be discussed in Chapter 7.

3.4.1 Successive Similarity Detection Algorithms

Successive Similarity Detection Algorithms (SSDA) are the first computation elimina-

tion algorithms developed by Barnea and Silverman (1972). Later, SSDA algorithms

have been extensively studied in the perspective of block motion estimation in video

encoders where SSDA has also been renamed as Partial Distortion Elimination (PDE)

algorithms. For example, Eckart and Fogg (1995); Quaglia and Montrucchio (2001);

Kim and Choi (1999, 2000); Montrucchio and Quaglia (2005); Huang et al. (2006b),

98

may be seen as important references to PDE algorithms.

SSDA (or PDE) algorithms exploit the monotonic growth pattern of some of the

image match measures evaluating distortion or distance between the two images.

One such measure is city block distance measure or commonly known as Sum of

Absolute Differences (SAD), as given by Equation 2.4:

Φ(r, t) =m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣, (3.36)

where | · | represents the absolute function. In the formulation of SAD, the distortion

is the sum of absolute value of the difference between corresponding pixels. While

computing SAD, at each pixel location, the current distortion is added in the previous

sum. For example, if SAD computation has been done up to u rows and v−1 columns,

then SAD for (u, v) position is given by the sum of previous SAD summation and the

absolute value of the current difference:

SADu,v(r, t) = SADu,v−1(r, t) +∣∣r(u, v)− t(u, v)

∣∣. (3.37)

Since |r(u, v)− t(u, v)| ≥ 0, therefore

SADu,v(r, t) ≥ SADu,v−1(r, t). (3.38)

In general we may write

SADu,v(r, t) ≤ SADm,n(r, t), ∀u ≤ m and v ≤ n, (3.39)

where SADm,n(r, t) is the complete value of SAD computed over m × n pixels. In-

equality 3.39 states that the partial value of SAD is always a lower bound upon the

final value of SAD. If the previous known minimum of SAD is SADmin, then if

SADu,v(r, t) > SADmin, (3.40)

99

then it is guaranteed that

SADm,n(r, t) > SADmin. (3.41)

Therefore, Equation 3.40 may be considered as sufficient condition for elimination.

If this condition is satisfied for a specific value of (u, v), computation beyond the

position (u, v) is redundant and may be skipped without any loss of accuracy. A

more intuitive way to rephrase the same result is that as the computation of SAD

proceeds in a particular block, processing more pixels can never decrease its partial

sum. Hence, once the current partial sum exceeds a previously known minimum

the location is no longer a viable best match location; remaining computations may

therefore be skipped.

Another image match measure exhibiting monotonic growth property is the squared

Euclidean distance, also known as Sum of Squared Differences (SSD), as given by

Equation 2.9. In the formulation of SSD, the overall distortion is the sum of the

squared differences between the corresponding pixel values. If SSD has been computed

for u× (v − 1) pixels, then SSD for u× v pixels is given by:

SSDu,v(r, t) = SSDu,v−1(r, t) +[r(u, v)− t(u, v)

]2. (3.42)

Since [r(u, v)− t(u, v)]2 ≥ 0, therefore

SSDu,v−1(r, t) ≤ SSDu,v(r, t), (3.43)

which may be generalized as

SSDu,v(r, t) ≤ SSDm,n(r, t), ∀u ≤ m and v ≤ n, (3.44)

where SSDm,n(r, t) is sum of squared difference between images r and t over m × npixels. The condition given by Equation 3.44 is the sufficient condition for elimination,

because if this condition is satisfied, the complete value of SSD is guaranteed to be

less than the previous known minimum. Therefore, computation beyond (u, v) is

redundant and may be skipped without any loss of accuracy.

100

In PDE algorithms, if sufficiently low minimum is found at the start of the search

process, amount of computation elimination will increase significantly. To exploit this

fact, in block matching applications, search is started from the center of the search

space and proceeds outwards in spiral form, known as Spiral PDE in H.263 software

implementation by ITU-T (1995).

Bounded Partial Correlation (BPC) Elimination Technique

The cross correlation (Equation 3.10) and normalized cross-correlation (NCC) (Equa-

tion 2.25) increase monotonically as consecutive pixels are processed, because only

positive values are added after processing each pixel. However, the concept of SSDA

or PDE may not be extended in a straight forward manner for these measures, be-

cause the correlation based measures are similarity measures, therefore the best match

location is defined as the maximum value of cross correlation (or NCC). If a previous

maximum is known, it may not be utilized to skip computations of cross-correlation

(or NCC) on the remaining search locations as was the case of PDE for SAD and

SSD match measures.

In order to skip computation in correlation based match measures, a theoretical upper

bound on the final correlation value must be known in advance. If at a particular

search location, the upper bound on correlation is found to be lower than previous

known maximum remaining computations at that location may be skipped without

any loss of accuracy.

Since cross-correlation is equivalent to computing the dot product or inner product of

two images, a well known upper bound upon inner product has been given by Cauchy

Schwartz inequality:

m∑x=1

n∑y=1

r(x, y)t(x, y) ≤

√√√√ m∑x=1

n∑y=1

r2(x, y)

√√√√ m∑x=1

n∑y=1

t2(x, y). (3.45)

It turns out that in case of NCC, Cauchy Schwartz inequality yields +1 as the upper

101

bound upon NCC

m∑x=1

n∑y=1


n∑y=1

r2(x, y)

√m∑x=1

n∑y=1

t2(x, y)

≤ 1.00. (3.46)

Hence the bound given by Cauchy Schwartz inequality may not be directly used for

computation elimination, because it always remains fixed at the maximum possible

value of cross-correlation or NCC. Such a bound is called loose bound, because no

matter how small actual value of cross-correlation is, the bound yielded by Cauchy

Schwartz inequality will always stay at maximum value. Therefore, it is not possible to

find maximum which is even higher than the upper bound given by Cauchy Schwartz

inequality. Hence this bound may not be directly used for computation elimination

and no other useful bound upon inner product has been known which may be used

instead of Cauchy Schwartz inequality.

An indirect way to exploit Cauchy Schwartz inequality for computation elimination

has been proposed by di Stefano et al. (2003); Stefano and Mattoccia (2003); di Ste-

fano and Mattoccia (2003), Bounded Partial Correlation (BPC) Algorithm. They

observed that if Cauchy Schwartz inequality based bound is computed on a portion

of the images to be matched and on the remaining portion cross-correlation is com-

puted, then the sum of partial bound and the partial correlation is also a bound on

the final value of cross correlation between those images. Suppose, cross-correlation

is computed on a small portion of the image of size u×v pixels and Cauchy Schwartz

inequality based bound is computed on the remaining image, i.e., from u + 1 to m

rows and v + 1 to n columns, then BPC bound is given by

m∑x=1

n∑y=1

r(x, y)t(x, y) ≤u∑x=1

v∑y=1

r(x, y)t(x, y)+

√√√√ m∑x=u+1

n∑y=v+1

r2(x, y)

√√√√ m∑x=u+1

n∑y=v+1

t2(x, y) (3.47)

102

The image portion on which cross-correlation is computed may be called as correlation-

area and the remaining image portion on which bound is computed, as the bound-area.

If correlation-area is reduced to zero, BPC bound will reduce to Cauchy Schwartz in-

equality. As the correlation area increases, BPC bound starts moving towards the

actual cross-correlation, and if bound area reduces to zero, BPC bound exactly match

cross-correlation between the two images.

The BPC bound may also be computed for normalized cross correlation by dividing

both sides of Equation 3.47 by L2 norm of the images:

m∑x=1

n∑y=1


n∑y=1

r2(x, y)

√m∑x=1

n∑y=1

t2(x, y)

≤

+

u∑x=1

v∑y=1

r(x, y)t(x, y) +

√m∑

x=u+1

n∑y=v+1

r2(x, y)

√m∑

x=u+1

n∑y=v+1

t2(x, y)√m∑x=1

n∑y=1

r2(x, y)

√m∑x=1

n∑y=1

t2(x, y)

(3.48)

BPC bound has also been extended for correlation coefficient, named as Zero mean

Normalized Cross Correlation (ZNCC) base image matching by (Di Stefano et al.,

2005). Subtracting image means in Equation 3.48 yields one formulation of BPC

bound for correlation coefficient (Di Stefano et al., 2005):

m∑x=1

n∑y=1

(r(x, y)− µr)(t(x, y)− µt)√m∑x=1

n∑y=1

(r(x, y)− µr)2

√m∑x=1

n∑y=1

(t(x, y)− µt)2

≤

u∑x=1

v∑y=1

(r(x, y)− µr)(t(x, y)− µt)2 +

√m∑

x=u+1

n∑y=v+1

(r(x, y)− µr)2

√m∑

x=u+1

n∑y=v+1

(t(x, y)− µt)√m∑x=1

n∑y=1

(r(x, y)− µr)2

√m∑x=1

n∑y=1

(t(x, y)− µt)2

(3.49)

103

An alternate form of BPC bound for correlation coefficient may be obtained by sub-

stituting the bound upon cross-correlation as given by Equation 3.47 in correlation

coefficient formulation given by Equation 3.29:

ρ(r, t) ≤

u∑x=1

v∑y=1

r(x, y)t(x, y) +

√m∑

x=u+1

n∑y=v+1

r2(x, y)

√m∑

x=u+1

n∑y=v+1

t2(x, y)−mnµrµt√m∑x=1

n∑y=1

r2(x, y)−mnµ2r

√m∑x=1

n∑y=1

t2(x, y)−mnµ2t

.

(3.50)

At a particular search location, after processing u× v pixels, both BPC bounds may

be compared with the current known correlation maximum. If any BPC bound is

found to be less than the current known maximum, the elimination condition has

been satisfied because the final value of correlation coefficient is guaranteed to be less

than the current known maximum. Therefore, remaining computations at the current

search location becomes redundant and may be skipped without any loss of accuracy.

If comparison of BPC bound with the current known maximum shows that the BPC

bound is higher than the maximum, no decision can be made. Therefore computations

of correlation will proceed for few more pixels and then BPC bound will again be

computed and compared against the current known maximum. The same process

will be repeated until the elimination condition is successful or the computations at

the current location get completed.

As the bound area reduces, BPC bound approaches actual correlation value and be-

comes tighter. Therefore, the probability of getting the elimination condition satisfied

increases accordingly. In order to reduce the chances of elimination condition failure,

sufficient correlation area may be selected. However, selecting a large correlation area

may incur a large overhead cost of direct correlation computations and selecting a

small correlation area will make BPC bound to be significantly loose - this tradeoff

is fundamental to BPC strategy.

The suitable size of correlation area depends on the magnitude of current known

maximum or the initial threshold. In order to find high initial threshold, any prior

104

information about the location of maximum may be utilized by starting the search

process from the expected best match location. In the absence of any initial guess,

the threshold has been automatically found by using coarse-to-fine scheme by (Di

Stefano et al., 2005).

3.4.2 Partial Correlation Elimination Algorithms

Partial Correlation Elimination (PCE) algorithms are one of the important contribu-

tions of this thesis. These algorithms are in the category of partial elimination algo-

rithms and extend the concept of Partial Distortion Elimination (PDE) to correlation-

coefficient based fast template matching. PCE algorithms will be discussed in signif-

icant detail in Chapters 6 and 7.

3.4.3 Successive Elimination Algorithms

Successive Elimination Algorithms fall in the category of complete elimination algo-

rithms, because in these algorithms the bound on the match measure is computed

before starting the actual match measure computations. The elimination test consists

of comparison of the bound statistic with the previous known minimum or maximum.

If elimination condition is satisfied, the search location is eliminated from the search

space; otherwise complete computations are done on that search location. That is,

the elimination test is performed only once, and in case of unsuccessful elimination

test, no subsequent test is done. The basic successive elimination algorithm was

compatible with the definition of complete elimination algorithms, however various

extensions of this algorithm deviate from that definition. For example, in multilevel

successive elimination algorithm, the elimination test is performed multiple times

with increasing bound tightness and after execution of additional computations.

The original Successive Elimination Algorithm (SEA) was developed by Li and Salari

(1995) for Sum of Absolute Differences image match measure. This algorithm is based

on the following lower bound on SAD between two images r and t, from Equation

105

3.36:

m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣ ≥ ∣∣ m∑

x=1

n∑y=1

|r(x, y)| −m∑x=1

n∑y=1

|t(x, y)|∣∣, (3.51)

Since image values are always positive, r(x, y) ≥ 0 and t(x, y) ≥ 0, therefore Inequal-

ity 3.51 may be simplified as follows:

m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣ ≥ ∣∣ m∑

x=1

n∑y=1

r(x, y)−m∑x=1

n∑y=1

t(x, y)∣∣, (3.52)

The sum of all blocks of r of size m × n pixels may be found efficiently using the

running sum approach. Those search locations for which the lower SEA bound upon

SAD is found to be higher than previous known minimum, may be skipped from

computation without any loss of accuracy. Since there is no approximation involved

in SEA bound given by Equation 3.52, therefore there is no loss of accuracy associated

with computation skipping.

Successive Elimination Algorithm has also been extended for Euclidean distance based

image match measure by Wang and Mersereau (1999). The lower bound on Euclidean

distance is based on the relationship between SAD and Euclidean distance,

m∑x=1

n∑y=1

∣∣r(x, y)− t(x, y)∣∣ ≤ √mn

√√√√ m∑x=1

n∑y=1

(r(x, y)− t(x, y)

)2, (3.53)

Therefore, by Equation 3.52, Euclidean distance has the same lower bound as SAD:

∣∣ m∑x=1

n∑y=1

r(x, y)−m∑x=1

n∑y=1

t(x, y)∣∣ ≤ √mn

√√√√ m∑x=1

n∑y=1

(r(x, y)− t(x, y)

)2, (3.54)

Any search location where absolute of the difference of search-location-sum and the

template-sum exceeds the previous known minimum, may be skipped without any

loss of accuracy.

The basic SEA algorithm has been extended by several researchers. For example, Gao

et al. (2000) have developed a Multi level SEA algorithm (MSEA). Since the amount

106

of eliminated search locations depends upon the tightness of the lower bound, MSEA

makes the bound tighter by computing the norm values on smaller sub-block sizes.

As an example, if the images to be matched, r and t, are divided into four sub-blocks

and norms are computed over each block independently, then the sum of these partial

bounds will be tighter than the original bound:

∣∣ m∑x=1

n∑y=1

r(x, y)−m∑x=1

n∑y=1

t(x, y)∣∣ ≤

∣∣m/2∑x=1

n/2∑y=1

r(x, y)−m/2∑x=1

n/2∑y=1

t(x, y)∣∣+∣∣ m∑x=m/2+1

n/2∑y=1

r(x, y)−m∑

x=m/2+1

n/2∑y=1

t(x, y)∣∣+

∣∣m/2∑x=1

n∑y=n/2+1

r(x, y)−m/2∑x=1

n∑y=n/2+1

t(x, y)∣∣+∣∣ m∑

x=m/2+1

n∑y=n/2+1

r(x, y)−m∑

x=m/2+1

n∑y=n/2+1

t(x, y)∣∣

(3.55)

If the block size is made even smaller, the bound becomes tighter. In the limiting

case, if each block consists of only one pixel, the bound approaches the actual value

of SAD. In MSEA algorithm, the minimum block size used is of 2 × 2 pixels. The

first elimination test is performed with full block norms. If the test is successful, the

search location is skipped; if the test is unsuccessful, then block width and height

is divided by two and norms are computed over the four blocks. If elimination test

is still unsuccessful, further division is done in the same way until the size of small

blocks reach 2 × 2 pixels. An earlier paper by Lee and Chen (1997) had already

formulated a very similar algorithm with the name of Block Sum Pyramid, which is

essentially the same formulation as MSEA. This algorithm has also been improved

by Ahn et al. (2004), by developing more effective unsuitable block detection schemes.

Zhu et al. (2005) have also tried to reduce the granularity of MSEA by developing Fine

Granularity Successive Elimination (FGSE) algorithm. In FGSE algorithm, the gap

between two MSEA levels is reduced by introducing intermediate levels. Moreover,

in FGSE algorithm, the starting level is decided based on the elimination level of the

neighboring blocks. That is if most of the neighboring blocks get eliminated at level

3, then current block matching is also started from the third level, which reduces the

computation cost of unsuccessful elimination tests at the coarser levels.

107

Another related way to speed up image matching by minimization of Sum of Squared

Differences (SSD), is ‘Projection Kernels’ algorithm proposed by Hel-Or and Hel-Or

(2003, 2005). Projection Kernels algorithm has been motivated by the real time object

detection scheme proposed by (P.Viola and Jones, 2001, 2004), in which summations

of image sub-blocks were computed very efficiently using the integral image approach.

In Projection Kernels algorithm, sum of different image partitions has been obtained

by projecting the images on Walsh Hadamard basis vectors. Most of the computation

elimination has been obtained by the full image sum comparison, which is very similar

to SEA algorithms developed by (Li and Salari, 1995) and (Wang and Mersereau,

1999). If a search location is not eliminated by the image sum comparison, more

projections are computed to make the lower bound on SSD tighter. This concept

is quite similar to the concept of MSEA by (Gao et al., 2000) and (Lee and Chen,

1997), where tight lower bounds were obtained by partitioning the images into smaller

blocks. The pattern matching with Projection Kernel algorithm has been compared

with the naive FFT implementation and claimed two orders of magnitude speed up,

while no comparison has been done with more relevant MSEA implementations.

3.4.4 Enhanced Bounded Partial Correlation Elimination Al-

gorithm

Enhanced Bounded partial Correlation (EBC) elimination algorithm has been devel-

oped by (Mattoccia et al., 2008a,b) for cross-correlation, NCC and correlation coeffi-

cient based image match measures. EBC algorithm is an extension of Bounded Partial

Correlation (BPC) elimination algorithm, very similar to Multilevel SEA (MSEA) ex-

tension of the basic SEA algorithm. EBC algorithm is based on increasingly tight

upper bound on correlation by computing the bound on smaller sized image parti-

tions. It falls in the category of complete elimination algorithms because the bound

statistic is compared with the previous known correlation maximum before starting

the actual correlation computations. If the bound is found to be less than the previ-

ous known maximum, all computations at that search location are skipped. However,

if the correlation maximum is found to be less than the bound, then partial elimina-

tion tests follow. Therefore EBC algorithm may also be considered as a cascade of

108

complete and partial elimination algorithms.

The cross-correlation between two images is equivalent to the inner product of two

vectors and the inner product is bounded from above by Cauchy Schwartz (CS)

inequality. CS inequality yields the maximum possible value of cross-correlation,

therefore one may never find correlation maximum higher than the bound given by

CS inequality. As a result, the bound computed by CS inequality may not be used

for computation elimination. However, just like the bound tightening process used

in MSEA algorithm, the bound based upon CS inequality may also be tightened

by dividing the two images to be matched into smaller partitions and then com-

puting CS bound upon all corresponding partitions and computing the final value

of bound as sum of the partition-bounds. Hence the final value of bound may also

be termed Multi-level Cauchy Schwartz (MCS) bound, parallel to Multilevel SEA

(MSEA) bound. MSEA is actually more generic than MCS bound currently used

in EBC and ZEBC algorithms, because MCS bound is computed at only one level,

while MSEA bound was evaluated at multiple levels. This is because, MSEA bound

requires computing summations, while MCS bound requires computing square root

operations which are very costly, making MCS bound computationally expensive.

To understand the formulation of MCS bound, consider the problem of matching two

images r and t, of size m × n pixels and each divided into small non-overlapping

partitions of size ∆x×∆y pixels, such that 1 ≤ ∆x ≤ m and 1 ≤ ∆y ≤ n. The total

number of partitions in each image are given by (mn)/(∆x∆y). MCS inequality may

be given by:m∑x=1

n∑y=1

r(x, y)t(x, y) ≤

m/∆x−1∑j=0

n/∆y−1∑k=0

√√√√ (j+1)∆x∑x=j∆x+1

(k+1)∆y∑y=k∆y+1

r2(x, y)

√√√√ (j+1)∆x∑x=j∆x+1

(k+1)∆y∑y=k∆y+1

t2(x, y) ≤

√√√√ m∑x=1

n∑y=1

r2(x, y)

√√√√ m∑x=1

n∑y=1

t2(x, y) (3.56)

One may easily observe from this equation that if ∆x = 1 and ∆y = 1, then MCS

109

bound will exactly match cross-correlation value:

m∑x=1

n∑y=1

r(x, y)t(x, y) =

m−1∑j=0

n−1∑k=0

√√√√ (j+1)∑x=j+1

(k+1)∑y=k+1

r2(x, y)

√√√√ (j+1)∑x=j+1

(k+1)∑y=k+1

t2(x, y), (3.57)

which shows that if each partition has just one pixel, then MCS bound will become

equal to actual inner product value. As the partition size increases, MCS bound

moves towards CS bound and in the limiting case of ∆x = m and ∆y = n, MCS

bound will exactly match CS inequality bound:

0∑j=0

0∑k=0

√√√√ m∑x=1

n∑y=1

r2(x, y)

√√√√ m∑x=1

n∑y=1

t2(x, y) =

√√√√ m∑x=1

n∑y=1

r2(x, y)

√√√√ m∑x=1

n∑y=1

t2(x, y) (3.58)

Hence, there is an inherent tradeoff to be balanced: selecting large number of parti-

tions will increase the number of square-root operations which will incur significant

cost of MCS bound computation. Selecting very large partition sizes will render MCS

bound too loose to generate any elimination. Therefore a suitable value of partition

size is critical and an algorithm has been proposed by Mattoccia et al. (2008a) to

automatically select the number of partitions parameter.

In EBC algorithm, the value of MCS bound is computed at each search location for

a suitable size of ∆x and ∆y. In order to reduce the computation cost, ∆x has been

recommended by Mattoccia et al. (2008a) to be m/8 and ∆y to be same as n. That

is, each image is divided into 8 partitions along rows and no partitioning has been

done along columns. For these settings, MCS bound reduces to

m∑x=1

n∑y=1

r(x, y)t(x, y) ≤7∑j=0

√√√√ m(j+1)/8∑x=jm/8+1

m∑y=1

r2(x, y)

√√√√ m(j+1)/8∑x=jm/8+1

n∑y=1

t2(x, y) (3.59)

110

One of the limitations of this approach is the assumption that the number of rows in

template image has to be divisible by 8. For template sizes not divisible by 8, one

may choose a suitable number of partitions, such that each partition is of equal size.

However, if the number of rows is prime, the only factor exists is 1. That is, one

partition consists of only one image row.

For very small partition sizes, MCS bound becomes very tight and complete elimina-

tion test may evaluate to be successful at very large number of search locations. How-

ever, the cost of MCS bound computation may become significant for small partition

sizes. From small to medium sized templates, the cost of MCS bound computation

for ∆x = 1 and ∆y = m exceeds the direct computational cost of cross-correlation.

Therefore, EBC algorithm becomes slower than the exhaustive spatial domain imple-

mentations of correlation. In our experiments, we have observed that EBC algorithm

perform best for template sizes in the range of 64× 64, 72× 72, 80× 80 (Mahmood

and Khan, 2011).

At a particular search location, if MCS bound given by Equation 3.59 is found to be

smaller than the current known correlation maximum, complete computations at that

location are skipped without any loss of accuracy. On the other hand, if the bound

is found to be larger than the known maximum, the bound is tightened by replacing

first partition from the bounded area and computing cross-correlation at that area.

The expression of enhanced BPC bound may be written as:

m∑x=1

n∑y=1

r(x, y)t(x, y) ≤m/8∑x=1

n∑y=1

r(x, y)t(x, y)+

7∑j=1

√√√√ m(j+1)/8∑x=jm/8+1

m∑y=1

r2(x, y)

√√√√ m(j+1)/8∑x=jm/8+1

n∑y=1

t2(x, y) (3.60)

The bound is again compared with the current known maximum and if found less,

the remaining computations may be skipped. Alternatively, if the bound is still larger

than the known maximum, more partitions are included in the correlation area and

excluded from the bound area. In general if cross-correlation has to be computed

111

upon p partitions, the expression for enhanced BPC bound may be written as

m∑x=1

n∑y=1

r(x, y)t(x, y) ≤pm/8∑x=1

n∑y=1

r(x, y)t(x, y)+

7∑j=p+1

√√√√ m(j+1)/8∑x=jm/8+1

m∑y=1

r2(x, y)

√√√√ m(j+1)/8∑x=jm/8+1

n∑y=1

t2(x, y) (3.61)

The same bound may be easily extended for NCC and for correlation coefficient as

we have mentioned in the discussion on BPC algorithm.

As we have already mentioned, MCS bound suffers from high overhead of the square-

root operation. As the partition size reduces, the number of square-root operations

increases, causing corresponding increase in the bound computation cost. For tem-

plate sizes having a prime number as the number of rows, the cost of MCS bound

computation is significant which makes EBC algorithm to be quite slow. For small

templates having sizes in the range of 4×4 to 15×15 pixels, the number of partitions

must be made equal to the number of rows, which cause significant computational

cost making these algorithms slower than the exhaustive spatial domain implemen-

tations. Therefore EBC or ZEBC algorithm no longer remains a choice for these

sizes. For templates having an even number of rows, 16, 18, 20, 22, partition size 2

may be selected whereas for odd number of rows, such as 17, 19, 23, again partition

size of 1 has to be selected. For m=21, partition size of 7 may cause a reduction in

computational cost. Similarly for m=24, 25, 26, 27, 28, 29, 30, 31 and 32, suitable

partition sizes are 8, 5, 13, 9, 7, 1, 6, 1, and 8 respectively. For very large partition

sizes, for example for m=26, partition size of 13, there will be only two partitions

and MCS bound will be closer to CS bound causing a reduction in the eliminated

computations.

3.4.5 Transitive Elimination Algorithms

A major contribution of this thesis is the development of Transitive Elimination

Algorithms, Mahmood and Khan (2007b, 2008, 2010), which fall in the category of

112

complete elimination algorithms for cross-correlation, NCC and correlation coefficient

based image match measures. Transitive Elimination Algorithms will be discussed in

significant detail, in Chapters 4 and 5.

3.4.6 Chapter Summary

In this chapter we have discussed the organization of the image match measure com-

putation techniques. These techniques are broadly divided into two categories: ap-

proximate accuracy and exhaustive accuracy techniques. Approximate accuracy algo-

rithms are further divided into two categories, those approximating the search space

with a smaller search space and those approximating the match measure and images

with simpler versions. Approximate search space algorithms were further divided into

large search space and small search space algorithms. Large search space approxi-

mate algorithms included coarse to fine approach and two-stage template matching,

while small search space approximate algorithms included three step search, two di-

mensional logarithmic search, four step search, diamond search and other algorithms.

The second category of approximate algorithms included polynomial image approx-

imation, binary image approximation, approximating image with rectangular filter

basis, approximating images with Walsh Transform basis functions (Figure 3.3).

The exhaustive equivalent accuracy techniques have the two main approaches, fre-

quency domain and spatial domain techniques. Frequency domain techniques in-

clude FFT based correlation computation and the phase only correlation. Spatial

domain algorithms are further subdivided into two classes, complete computation

algorithms and bound based computation elimination algorithms. Complete compu-

tation algorithms have been made efficient by reformulation of the match measure

and by pre-computing the normalization parameters by running sum approach or the

integral image approach. Bound based computation elimination techniques, which

are the main topic of this thesis, are divided into two types, complete elimination

and partial elimination algorithms. Partial elimination algorithms consist of Partial

Distortion Elimination (PDE), Bounded Partial Correlation (BPC) elimination, Par-

tial Correlation Elimination (PCE) algorithms. Complete elimination algorithms in-

cluded Successive Elimination Algorithm (SEA), Multilevel SEA (MSEA), Enhanced

113

Complete Computation Algorithms

Image Matching Algorithms

Fast Approximate Accuracy Algorithms

Fast Exhaustive Accuracy Algorithms

Frequency Domain Algorithms

Spatial Domain Algorithms

Approximate Search Space Algorithms

Approx. Image Representation Algo.

Computation Elimination Algorithms

Complete Elimination Algorithms

Partial Elimination Algorithms

Efficient Rearrangement

Efficient Pre-computation

ConvolutionTheorem

Phase-OnlyCorrelation

Large Search Space Algorithms

Small Search Space Algorithms

T D L Search

Cross Search

Three Step Search

Cross Search

New Three Step

Four Step Search

Orthogonal Search

Modified Motion Est.

Conjugate Dir. Search

Diamond Search

Coarse-to-Fine /Hierarchical BM

Two-Stage Template Matching

Block Matchingusing Walish Trans.

Correlation Adaptive Predictive Search

Sum of Rectangular Basis Functions

Polynomial Image Approximation

Eigen Image Approximation

Binary Image Approximation

Successive Elimination Algo.

Transitive Elimination Algo.

Enhanced Bounded Corr.

Multi-scale S E A

Fine Granularity SEA

Pattern Matching with Projection

Kernels

SSDA or Partial Distortion Elimination

Bounded Partial Correlation Elim.

Correlation Elimination Algorithm

Figure 3.3: An Organization of Image Match Measure Computation Algorithms.

114

Bounded Correlation (EBC) elimination algorithm and Transitive Elimination Al-

gorithms (TEA). Since Partial Correlation Elimination algorithms and Transitive

Elimination Algorithms formulate major contributions of this thesis, both of these

algorithms are discussed in significant detail in Chapters 4 to 7.

Chapter 4

TRANSITIVE BOUNDS ON THE CORRELATION

BASED MEASURES

Due to guaranteed exhaustive equivalent accuracy, bound based computation elim-

ination algorithms constitute an important part of image matching techniques, and

one of the main topics of this thesis. In Chapter 3 two categories of elimination

algorithms were discussed, partial elimination algorithms and complete elimination

algorithms. In this chapter we will focus on developing complete elimination algo-

rithms for correlation based similarity measures. In this category, the elimination test

is performed before starting the match measure computations and if the elimination

test is found to be successful, complete computations at the current search location

are skipped without any loss of accuracy.

Complete elimination algorithms have been well investigated for match measures such

as Sum of Squared Differences (SSD) and Sum of Absolute Differences (SAD) (Li and

Salari, 1995; Gao et al., 2000; Lee and Chen, 1997; Zhu et al., 2005; Ahn et al.,

2004; Wang and Mersereau, 1999; Kawanishi et al., 2004; Brunig and Niehsen, 2001;

Cheung and Po, 2003). However, for correlation-based measures which include cross-

correlation, Normalized Cross Correlation (NCC) and correlation-coefficient, only lim-

ited effort in this regard is found in literature. This is because, complete elimination

algorithms require tight upper bound on correlation, which should also be computable

at a low computational cost, otherwise the benefit of computation elimination may

get eroded by the overhead cost of the bound computation. The well known bound

on correlation based on Cauchy Schwartz inequality is too loose to yield any com-

putation elimination. Therefore, to the best of our knowledge, only one algorithm

proposed by Mattoccia et al. (2008b) is found in literature which tries to tighten

Cauchy Schwartz inequality based bound by using a partitioning technique. In this

technique, Cauchy Schwartz inequality is computed over smaller partitions of the two

images to be matched and the final bound is computed as the sum of the bounds for

115

116

all partitions (Chapter 3 may be seen for more details of this algorithm). The bound

computed by this partition technique may become tight enough to yield elimination,

but requires large number of square root operations which has high computational

complexity causing significant bound computational cost overhead. Therefore, this

bound provides limited speedup for small and medium sized templates as well as for

the templates having number of rows which are a multiple of large prime numbers.

In contrast, in this chapter we present transitive bounds on correlation based im-

age match measures which have low computational complexity and we also develop

methods to make them tight enough to produce significant computation elimination.

The best known direct bounds on correlation coefficient are either too loose to gen-

erate any computation elimination or have very high computational cost. While

searching for direct bounds on correlation based measures we discovered a special

type of bounds, which we named as transitive bounds. To the best of our knowl-

edge, transitive bounds have not been used to speedup correlation based template

matching applications, before us. The use of transitive bounds is motivated by the

fact that we were not able to find any direct bounds on correlation which are tight

enough to yield computation elimination and with low computational cost overhead.

We explored the transitive bounds in detail, and discovered conditions under which

transitive bounds remain tight enough to yield significant elimination. Moreover, we

developed fast and efficient algorithms for bound computation, which significantly

reduced the computational cost overhead of these bounds.

Since correlation based image match measures are geometric similarity measures,

they can be related to geometric distance measures, including Euclidean distance and

Angular distance measures. Both Euclidean distance and Angular distance being

metrics, are non negative, symmetric and follow the triangular inequality of distance

measures. The relationship between correlation based measures and Euclidean dis-

tance measure may be used to transform the triangular inequality for Euclidean dis-

tance into transitive bounds on the correlation based measures. Similarly, exploiting

the relationship between angular distance measure and correlation based measures,

the triangular inequality for angular distance measure may also be transformed into

another formulation of transitive bounds on the correlation based measures. Tight

transitive bounds are more useful from computation elimination perspective. The

117

transitive bounds on correlation based measures, derived from the two different tri-

angular inequalities vary in tightness. We theoretically show that the bounds based

on angular distance measure are tighter than the bounds based on Euclidean distance

measure.

In this chapter, we will analyze the tightness characteristics of the angular distance

based transitive bounds in detail and define the conditions under which both upper

and the lower transitive bounds become tight, the conditions under which only upper

bound becomes tight and the lower bound remains loose and the conditions under

which both the upper and the lower bounds remain loose. Angular distance based

transitive bounds and the tightness conditions are exploited for the development of

transitive elimination algorithms in the following chapter.

4.1 Derivation of Angular Distance Based Transi-

tive Bounds

Let r1 and r2 be the two image blocks, each of size m × n pixels, and ψ1,2 be the

cross-correlation between these vectors:

ψ1,2 =m−1∑i=0

n−1∑j=0

r1(i, j)r2(i, j). (4.1)

r1 and r2 may also be considered as vectors in Rm×n space. Let θ1,2 be the angular

distance between these vectors. Using the definition of scalar product, θ1,2 can be

related with cross-correlation, ψ1,2:

θ1,2 = cos−1 ψ1,2

||r1||2||r2||2, (4.2)

where ||.||2 denotes L2 norm. The angular distance is symmetric, θ1,2 = θ2,1, and

bounded between 0 ◦ and 180 ◦. In addition, angular distance also follows the trian-

gular inequality of distance measures (Mahmood and Khan, 2007b), that is for three

118

image blocks r1, r2 and r3 (Figure 4.1):

θ1,2 + θ2,3 ≥ θ1,3 ≥ |θ1,2 − θ2,3| (4.3)

where θ1,3 is the angular distance between r1, r3 and θ2,3 is the angular distance

between r2, r3. The minimum and the maximum angular distance between r1 and r3

occurs when r3 lies in the same plane as r1 and r2 (see Figure 4.1). Therefore the

upper and lower triangular bounds are also bounded between 0 ◦ and 180 ◦ and the

triangular inequality in Equation 4.3 may be written as:

min{360 ◦ − (θ1,2 + θ2,3), (θ1,2 + θ2,3)} ≥ θ1,3 ≥ |θ1,2 − θ2,3| (4.4)

To link this inequality to correlation, we observe that the cosine function monoton-

ically decreases from +1 to -1 as θ varies from 0 ◦ to 180 ◦. Taking cosine of the

triangular inequality, we get the basic form of transitive inequality:

cos(θ1,2 + θ2,3) ≤ cos(θ1,3) ≤ cos(θ1,2 − θ2,3). (4.5)

This may be rearranged using trigonometric identities to:

cos θ1,2 cos θ2,3 −√

1− (cos θ1,2)2

√1− (cos θ2,3)2 ≤ cos θ1,3

≤ cos θ1,2 cos θ2,3 +√

1− (cos θ1,2)2

√1− (cos θ2,3)2 (4.6)

Multiplying this inequality with (||r1||2||r2||2)(||r2||2||r3||2) and simplifying using Equa-

tion 4.2, we get transitive inequality for cross-correlation:

ψ1,2ψ2,3 +√

(||r1||2||r2||2)2 − ψ21,2

√(||r2||2||r3||2)2 − ψ2

2,3

(||r2||2)2

≤ ψ1,3 ≤

ψ1,2ψ2,3 −√

(||r1||2||r2||2)2 − ψ21,2

√(||r2||2||r3||2)2 − ψ2

2,3

(||r2||2)2(4.7)

119

This inequality provides transitive bounds on cross-correlation between r1 and r3, if

cross-correlation between r1 and r2 and that between r2 and r3 is already known.

Cross-correlation is often used in its normalized form to remove its bias towards

brighter regions. Normalized Cross-Correlation (NCC) between image blocks r1 and

r2 is defined as:

φ1,2 =ψ1,2

||r1||2||r2||2, (4.8)

Angular distance between two image blocks may also be written in terms of NCC as

θ1,2 = cos−1(φ1,2). Transitive inequality given by Equation 4.7 gets modified for NCC

as follows:

φ1,2φ2,3 +√

1− φ21,2

√1− φ2

2,3 ≤ φ1,3 ≤ φ1,2φ2,3 −√

1− φ21,2

√1− φ2

2,3. (4.9)

This inequality yields transitive bounds on NCC between image blocks r1 and r3, if

NCC between r1 and r2 and that between r2 and r3 is already known.

NCC is robust to contrast variations, but it is not robust to the brightness variations.

A more robust measure, invariant to all linear changes in the signal, is correlation

coefficient, defined as:

ρ1,2 =ψ1,2 −mnµ1µ2

||r1 − µ1||2 ||r2 − µ2||2, (4.10)

where µ1 and µ2 are the means of r1 and r2 respectively. Correlation-coefficient can

also be written in terms of the angular distance as follows: ρ1,2 = cos(θ1,2), where θ1,2

is angular distance between r1 − µ1 and r2 − µ2. Transitive inequality in terms of θ

can be derived by following the same steps as that for θ, and yields:

cos(θ1,2 + θ2,3) ≤ cos(θ1,3) ≤ cos(θ1,2 − θ2,3). (4.11)

This can be expanded to transitive inequality for correlation-coefficient:

ρ1,2ρ2,3 +√

1− ρ21,2

√1− ρ2

2,3 ≤ ρ1,3 ≤ ρ1,2ρ2,3 −√

1− ρ21,2

√1− ρ2

2,3. (4.12)

This inequality gives bounds on correlation coefficient between image blocks r1, r3, if

the values of ρ1,2 and ρ2,3 are known.

120

(a)

(c)

(b)

(d) o180', =ππφ

1r

2r

3r

2,1θ

3,1θ

3,2θ

π π 'π 'π

2r 2r

3r 3r 1r

1r

2,1θ 2,1θ 3,2θ 3,2θ

o0', =ππφ

',ππφ

1r

2r

3r 2,1θ

3,2θ

π

'π

Figure 4.1: Triangular inequality for angular distance measure: (a) Image blocks r1,r2 and r3 represented as vertices and angular distance between them is shown asedges of a triangle. (b) θ1,3 depends on the angle between planes π and π′. (c)-(d)θ1,3 becomes maximum θ1,2 + θ2,3 when φπ,π′ = 180 ◦ and minimum |θ1,2 − θ2,3| whenφπ,π′ = 0 ◦.

121

In statistics literature, we find that angular distance based transitive bounds on cor-

relation coefficient have been very briefly mentioned in Sigley and Stratton (1942)

and Langford et al. (2001). However, comprehensive mathematical treatment, analy-

sis and their practical utility for speeding up the template matching process has not

been done before us. Emphasis of most of the researchers from the field of statistics

has been on the fact that positive coefficient of correlation is not transitive, for exam-

ple see Sotos et al. (2007, 2009). The notion of transitivity assumed by these authors

is, if r1 and r2 are positively correlated and r2 and r3 are also positively correlated,

then it is not necessary that r1 and r3 will also be positively correlated. This result

may also be seen from Equation 4.12, by substituting, for example, ρ1,2 = ρ2,3 = .50,

then −.50 ≤ ρ1,3 ≤ 1.00, that means ρ1,3 may turn out to be negative. This result

shows that we need to identify ranges of ρ1,2 and ρ2,3 in which both upper and lower

bounds remain close enough, or the bounds remain tight. The tightness of transitive

bounds will be discussed in detail in the Sections 4.3 and 4.4.

4.2 Derivation of Euclidean Distance Based Tran-

sitive Bounds

In the previous section, we exploited the link between correlation based match mea-

sures and angular distance to derive transitive inequalities for correlation. A different

set of transitive inequalities may also be derived by exploiting the relationship be-

tween correlation and Euclidean distance based measures. For Euclidean distance

based image match measures, the image blocks r1, r2 and r3 may be considered as

points in Rm×n. Let ∆1,2 be Euclidean distance between r1 and r2, from Equation

2.8

∆1,2 =

√√√√ n∑x=1

m∑y=1

(r1(x, y)− r2(x, y))2. (4.13)

Similarly ∆1,3 be the Euclidean distance between r1 and r3 and ∆2,3, be Euclidean

distance between r1 and r3.

122

The Euclidean distance being a metric, follows three properties of the distance mea-

sures, the non-negativity property states that all distances are always positive: ∆1,2 ≥0, ∆1,3 ≥ 0, ∆2,3 ≥ 0, the symmetry of distance measures require that ∆1,2 = ∆2,1,

∆1,3 = ∆3,1, ∆2,3 = ∆3,2. The triangular inequality for distance measures requires

that:

|∆1,2 −∆2,3| ≤ ∆1,3 ≤ ∆1,2 + ∆2,3. (4.14)

Squaring all sides of the inequality

(∆1,2 −∆2,3)2 ≤ ∆21,3 ≤ (∆1,2 + ∆2,3)2, (4.15)

which may be written as

∆21,2 + ∆2

2,3 − 2∆1,2∆2,3 ≤ ∆21,3 ≤ ∆2

1,2 + ∆22,3 + 2∆1,2∆2,3. (4.16)

To relate Euclidean distance to cross-correlation, we note, from Equation 2.36 that

∆21,2 = ∆2

1,1 + ∆22,2 − 2ψ1,2, (4.17)

where ∆21,1 and ∆2

2,2 represent Euclidean norm or magnitude of each of the image.

Substituting the value of Euclidean distance in terms of Euclidean norms and cross-

correlation, in the triangular inequality given by Equation 4.16 yields

∆21,1 + 2∆2

2,2 + ∆23,3 − 2ψ1,2 − 2ψ2,3 − 2

√(∆2

1,1 + ∆22,2 − 2ψ1,2)(∆2

2,2 + ∆23,3 − 2ψ2,3)

≤ ∆21,1 + ∆2

3,3 − 2ψ1,3 ≤

∆21,1 + 2∆2

2,2 + ∆23,3 − 2ψ1,2 − 2ψ2,3 − 2

√(∆2

1,1 + ∆22,2 − 2ψ1,2)(∆2

2,2 + ∆23,3 − 2ψ2,3).

(4.18)

The first part of the inequality yields the upper bound on cross correlation:

(ψ2,3 + ψ1,2 −∆22,2) +

√(∆2

1,1 + ∆22,2 − 2ψ1,2)(∆2

2,2 + ∆23,3 − 2ψ2,3) ≥ ψ1,3, (4.19)

123

and the second part of the inequality yields the lower bound on cross correlation

ψ1,3 ≥ (ψ2,3 + ψ1,2 −∆22,2)−

√(∆2

1,1 + ∆22,2 − 2ψ1,2)(∆2

2,2 + ∆23,3 − 2ψ2,3). (4.20)

Similar inequalities may also be derived for Normalized Cross Correlation (NCC),

as given by Equation 4.8. NCC is cross correlation between two unit magnitude

normalized images. Since Euclidean norm of unit magnitude normalized images is

1.00, therefore the upper bound on NCC may be obtained from Equation 4.19:

(φ2,3 + φ1,2 − 1) + 2√

(1− ψ1,2)(1− ψ2,3) ≥ φ1,3, (4.21)

and the lower bound on NCC may be obtained from Equation 4.20:

φ1,3 ≥ (φ2,3 + φ1,2 − 1)− 2√

(1− ψ1,2)(1− ψ2,3). (4.22)

Sometimes the images to be matched also contain additive intensity variations in

addition to the multiplicative or contrast changes. Robustness to both additive and

multiplicative changes requires image match measure to be computed on zero mean

and unit variance normalized images. Euclidean distance between two zero mean and

unit variance normalized images, ∆zu(1, 2) = ∆1,2, is given by Equation 2.12:

∆1,2 =

√√√√ n∑x=1

m∑y=1

(r1(x, y)− µ1

σ1

− r2(x, y)− µ2

σ2

)2. (4.23)

Triangular inequality for this case is given by:

|∆1,2 − ∆2,3| ≤ ∆1,3 ≤ ∆1,2 + ∆2,3. (4.24)

Squaring all sides we get:

(∆1,2 − ∆2,3)2 ≤ ∆21,3 ≤ (∆1,2 + ∆2,3)2. (4.25)

In order to related the normalized Euclidean distance with correlation coefficient,

124

-0.5

-0.25

0

0.25

0.5

0.75

1

Bou

nds

on ρ

1,3

1.00

0.95

0.80

0.60

0.40

0.20

a=0.00

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on ρ

1,3

ρ2,3

1.00

0.95

0.80

0.60

0.40

0.20

a=0.00

Figure 4.2: Angular distance based transitive bounds for a = {0.00, 0.20, 0.40,0.60, 0.80, 0.95, 1.00}, where a is ρ1,2. The upper transitive bounds are shown bysolid lines and the lower bound by dotted lines. The bounds change from circle toellipse and finally the upper and lower bounds merge together in the diagonal line fora = 1.00.

Equation 2.39 may be used:

ρ1,2 = 1− 1

2∆2

1,2. (4.26)

Simplifying the resulting expression, we get normalized Euclidean distance based

bounds on correlation coefficient:

(ρ1,2 + ρ2,3 − 1) + 2√

(1− ρ1,2)(1− ρ2,3) ≥ ρ1,3

≥ (ρ1,2 + ρ2,3 − 1)− 2√

(1− ρ1,2)(1− ρ2,3). (4.27)

The set of transitive bounds on cross correlation given by Equations 4.19 and 4.20,

NCC by Equations 4.21, 4.22 and correlation coefficient by Equation 4.27 are parallel

to the bounds based on angular distance, given by Equations 4.7, 4.9 and 4.12, as

derived in the last section. In Section 4.4, we will compare first type of transitive

bounds on correlation coefficient given by Equation 4.12 with the second type of

bound given by Equation 4.27, and find that the first type of bounds which are based

125

on angular distance may be preferred over the bounds based on Euclidean distance

due to more tightness. In the following section, different visualizations of transitive

bounds are presented which are helpful for better comprehension of both types of

bounds.

4.3 Visualization of Transitive Bounds on Corre-

lation

In order to get computation elimination, tight bounds on correlation coefficient are

necessarily required. In order to understand the tightness characteristics of the tran-

sitive bounds, these bounds are visualized by plotting the bound surfaces. Both,

angular distance based and Euclidean distance based bounds are separately visual-

ized in the following subsections.

4.3.1 Visualization of Angular Distance Based Transitive Bounds

The angular distance based transitive bounds, as given by Equation 4.12, may be

visualized by fixing one of the two bounding correlations, ρ1,2 and ρ2,3, to a constant

value and varying the second correlation in its full range of +1 to -1. We may fix

ρ1,2 = a, where a is a constant, and study the variation of bound with the variation

of ρ2,3. Putting ρ1,2 = a in Equation 4.12:

aρ2,3 +√

1− a2

√1− ρ2

2,3 ≤ ρ1,3. (4.28)

aρ2,3 −√

1− a2

√1− ρ2

2,3 ≥ ρ1,3. (4.29)

In both of these inequalities, taking the term aρ2,3 on the other side:

√1− a2

√1− ρ2

2,3 ≤ ρ1,3 − aρ2,3. (4.30)

−√

1− a2

√1− ρ2

2,3 ≥ ρ1,3 − aρ2,3. (4.31)

126

Squaring both sides and rearranging, we find that both of the Equations 4.30 and

4.31 reduce to the same equation of ellipse:

ρ21,3 + ρ2

2,3 − 2aρ1,3ρ2,3 ≤ 1− a2. (4.32)

ρ21,3 + ρ2

2,3 − 2aρ1,3ρ2,3 ≥ 1− a2, (4.33)

which means:

ρ21,3 + ρ2

2,3 − 2aρ1,3ρ2,3 = 1− a2. (4.34)

For a = 0.00, we get the equation of unit circle

ρ21,3 + ρ2

2,3 = 1. (4.35)

For a = 1.00, Equation 4.34 reduce to

ρ1,3 = ρ2,3, (4.36)

which is equation of straight line passing through origin, at 45o.

In Figure 4.2, we have plotted Equation 4.34 for different values of a, including a =

{0.00, 0.20, 0.40, 0.60, 0.80, 0.95, 1.00}. We observe that, as the value of a increases

from 0.00 to 1.00, the unit circle transforms into ellipse and the minor axis of the

ellipse continuously shrinks as the value of a increases. Ultimately the minor axis

becomes zero for a = 1.00, where the ellipse degenerates into a single diagonal line:

ρ(2, 3) = ρ(1, 3).

For any value of ρ2,3, the vertical distance between the upper and the lower bounds

in Figure 4.2, shows a range in which ρ1,3 is constrained. We observe that for larger

magnitudes of a, for example, for a = 0.95, the range containing ρ1,3 remains almost

same for all values of ρ2,3. However, for smaller magnitudes of a, for example, for

a = 0.00, the range of ρ1,3 is very close to maximum, and shrinks only when ρ2,3

approaches -1.00 or +1.00. Maximum range of ρ1,3 occurs only when both correlations

ρ1,2 and ρ2,3 are 0.00. This analysis shows that if at least one of ρ1,2 or ρ2,3 has high

magnitude, the transitive bounds will become tight.

127

If both correlations ρ1,2 and ρ2,3 are varied from -1 to +1, rather than keeping one of

them fixed, a bounding surface is generated. Such bounding surfaces are plotted for

upper and lower transitive bounds by using Equation 4.12, shown in Figures 4.3 and

4.4. The upper bound surface approaches the lowest values when one of ρ1,2 and ρ2,3

approaches the highest value of +1 and the other approach the lowest value of -1.00.

The upper bound surface remains at maximum value of +1 if both correlations are

equal, ρ1,2 = ρ2,3. The lower bound surface approaches maximum value if both ρ1,2

and ρ2,3 are +1 or both are -1. When one of ρ1,2 and ρ2,3 is +1 and other -1, the

lower bound approaches minimum. Thus the behavior of upper and the lower bounds

is quite different.

Upper and lower bound surfaces if combined, form the space containing ρ1,3, shown

in Figure 4.5. No value of ρ1,3 can occur outside this space. The shape of this space is

very similar to a special type of tetrahedron having each 2D section as an ellipse and

four corners at (ρ1,2, ρ2,3, ρ1,3) = (-1,1,-1), (-1,1,-1), (-1,-1,1), (1,1,1). At each of the

corner, upper and lower bound surfaces meet, reducing the range of ρ1,3 to a single

value. If both ρ1,2 and ρ2,3 have equal value of +1 or -1, ρ1,3 is +1. If any one of

these correlations is -1 and the other is +1, then ρ1,3 can have only one value, which

is -1. The portions of space close to the corners if tetrahedron are of special interest,

because in these regions both bounds are sufficiently close to each other, causing tight

upper and lower transitive bounds.

The range of ρ1,3 is important because small range is more useful as compared to a

larger range. Small range results when both bounds are tight and large range results

if one or both bounds are loose. The range of ρ1,3 is plotted as shown in Figure 4.6.

The smaller ranges are shown in blue colors, while the larger ranges are shown in red

color.

4.3.2 Visualization of Euclidean Distance Based Bounds

In order to visualize Euclidean distance based transitive bounds, we fix one of the

two bounding correlations, ρ1,2 and ρ2,3 and see the bound variation by varying the

other one. If we fix the value of ρ1,2, in Equation 4.27, to a constant value a, and

128

−1−0.5

00.5

1

−1

−0.5

0

0.5

1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ1,2ρ2,3

Upp

er B

ound

on

ρ 1,3

ρ1,2

ρ2,3

ρ1,2=ρ2,3

Figure 4.3: Angular distance based upper transitive bound surface shown in pseudocolors.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ1,2ρ2,3

Low

er B

ound

on

ρ 1,3

Figure 4.4: Angular distance based lower transitive bound surface shown in pseudocolors.

129

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ1,2ρ2,3

Bou

nd o

n ρ 1,

3

Figure 4.5: Space of ρ1,3 based on upper and lower transitive bounds, computed fromEquation 4.12.

Figure 4.6: Range of ρ1,3 computed from (upper - lower) transitive bounds based onangular distance.

130

-0.5

-0.25

0

0.25

0.5

0.75

1

Upp

er a

nd L

ower

Bou

nds

on ρ

1,3

0.60

0.40

0.00

0.95

0.80

0.20

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Upp

er a

nd L

ower

Bou

nds

on ρ

1,3

0.60

0.40

0.00

ρ2,3

0.95

0.80

0.20

Figure 4.7: Euclidean distance based bounds for ρ(1, 2) = {0.00, 0.20, 0.40, 0.60, 0.80,0.95, 1.00}. Margin between upper and lower bounds reduces as the value of ρ(1, 2)increases, and ultimately becomes zero when ρ(1, 2) approaches 1.00.

simplify both sides of the equation to get the following form:

(ρ1,3 − ρ2,3)2 + 2(1− a)(ρ1,3 + ρ2,3) = 3− 2a− a2. (4.37)

We plot this Equation for a = {0.00, 0.40, 0.80, 0.95, 1.00}, shown in Figure 4.7. As

the value of a increases, the quadratic curves converge towards center, and for a = 1,

become diagonal line: ρ(2, 3) = ρ(1, 3). In Figure 4.7, for very high values of ρ1,2,

for example, for ρ1,2 = .95, the range of ρ1,3 increases as the value of ρ2,3 decreases.

Also, as the value of ρ1,2 decreases, the range of these bounds increases rapidly. Tight

bounds can only be obtained if both ρ1,2 and ρ2,3 have high values.

In Figure 4.8, we have plotted the upper bound surface, and in Figure 4.9, the lower

bound surface is plotted. The upper bound surface approaches ρ1,3 = −1 only if one

of ρ1,2 and ρ2,3 is +1 and the other is -1. The lower bound surface shows that lower

bound is significantly loose for most of the values of ρ1,2 and ρ2,3. When both ρ1,2

and ρ2,3 are -1, the lower bound approaches -7.00, which is even lower than the least

possible value of correlation coefficient.

The ρ1,3 space is shown in Figure 4.10 by plotting both upper and lower bound surfaces

131

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ρ1,2ρ2,3

Upp

er B

ound

on

ρ 1,3

Figure 4.8: Euclidean distance based upper transitive bound surface shown in pseudocolors.

simultaneously. We observe that the combined bound surface resembles a cone with

tip at (1,1,1) and comes down as the values of ρ1,2 and ρ2,3 reduces from 1.00. The

space of ρ1,3 is significantly larger than the space computed from angular distance

based transitive bounds shown in Figure 4.5. Also, the space shown in Figure 4.10

is open, while the space shown in Figure 4.5 was a closed space. Thus we observe

that the angular distance based transitive bounds ranges are significantly smaller

than Euclidean distance based bounds. In the following section, we will theoretically

compare the tightness of angular distance based bounds with Euclidean distance based

bounds and we find that angular distance based bounds are tighter.

4.4 Tightness of Euclidean and Angular Distance

Based Transitive Bounds

In bound based computation elimination algorithms, the tightness of the bound is an

important parameter from the algorithm performance point of view. Tight bounds

132

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−7

−6

−5

−4

−3

−2

−1

0

1

ρ1,2ρ2,3

Low

er B

ound

on

ρ 1,3

Figure 4.9: Euclidean distance based lower transitive bound surface shown in pseudocolors.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−7

−6

−5

−4

−3

−2

−1

0

1

ρ1,2ρ2,3

Upp

er B

ound

on

ρ 1,3

Figure 4.10: The upper and lower bound surface merge to form a cone with tipat (1,1,1). Difference between upper and lower Euclidean bounds show the boundtightness. Euclidean bounds become tight only if both of the bounding correlationsare high.

133

produce more elimination as compared to the loose bounds. In the comparison of

two types of transitive bounds, we find that angular distance based bounds are the-

oretically tighter than Euclidean distance based bounds. The upper and the lower

transitive bounds are separately compared in the following subsections:

4.4.1 Comparison of Upper Transitive Bounds

A tight upper bound is one which approaches the actual measure from above. If

multiple upper bounds are available for the same measure, the smallest upper bound is

the tightest bound. The upper transitive bound based on Euclidean distance is found

to be greater than the upper transitive bound based on angular distance, therefore

the angular distance based bounds are tighter than Euclidean distance based bounds.

In this comparison, we have considered the upper bounds on correlation coefficient

only, analysis of cross-correlation and NCC follows in a similar way.

Correlation coefficient is bounded between +1.00 and -1.00: −1 ≤ ρi,j ≤ +1, therefore

following inequality is always true:

1

2

√(1− ρ1,2)(1− ρ2,3) +

1

2

√(1 + ρ1,2)(1 + ρ2,3) ≤ 1 (4.38)

The left hand side of Inequality 4.38 approaches maximum value of +1 when both

correlations are equal: ρ1,2 = ρ2,3, for all other combinations, ρ1,2 6= ρ2,3, it remains

less than +1.

In order to bring Inequality 4.38 in a form which will be changed to the formulation

of transitive bounds, we need to include some more terms in it. For this purpose, we

use the non-negativity property of the distance measures, that is, a product of two

normalized Euclidean distance terms will always be positive:

∆1,2∆2,3 ≥ 0. (4.39)

Therefore the term ∆1,2∆2,3 may be multiplied on both sides of inequality 4.38:

∆1,2∆2,3

[12

√(1− ρ1,2)(1− ρ2,3) +

1

2

√(1 + ρ1,2)(1 + ρ2,3)

]≥ ∆1,2∆2,3. (4.40)

134

We may convert Inequality 4.40 to just correlation coefficient terms by using the fol-

lowing relationship between correlation coefficient and normalized Euclidean distance,

given by Equation 2.39:

∆1,2 =√

2(1− ρ1,2), (4.41)

which may be used to derive following equation:

∆1,2∆2,3 = 2√

(1− ρ1,2)(1− ρ2,3) (4.42)

Therefore, inequality 4.40 may be converted to correlation coefficient terms by using

Equation 4.42:√(1− ρ1,2)(1− ρ2,3)

[12

√(1− ρ1,2)(1− ρ2,3) +

1

2

√(1 + ρ1,2)(1 + ρ2,3)

]≥

√(1− ρ1,2)(1− ρ2,3), (4.43)

which may be rearranged to the following final form

ρ1,2ρ2,3 +√

(1− ρ21,2)(1− ρ2

2,3) ≤ (ρ1,2 + ρ2,3 − 1) + 2√

(1− ρ1,2)(1− ρ2,3). (4.44)

Left hand side of Inequality 4.44 is the upper transitive bound on ρ1,3, based on

angular distance measure as given by 4.12, while the right hand side is the upper

transitive bound based on Euclidean distance measure, given by Inequality 4.27. Since

an upper bound with minimum value is tighter, therefore angular distance based

transitive bound is tighter than Euclidean distance based bound.

In order to comprehend the result given by Inequality 4.44, both types of transitive

bounds are simultaneously plotted for same values of ρ1,2 and ρ2,3. We observe that

upper and lower angular distance based transitive bounds are contained within the

Euclidean distance based transitive bounds, as shown in Figure 4.11.

4.4.2 Comparison of Lower Transitive Bounds

If multiple lower bounds are available, then the best lower bound is one which is the

maximum of all bounds. In this subsection, we show that angular distance based

135

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

LA

UE

LE

(b)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

UE

LA

LE

(a)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UE

UA

LA

LE

(d)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

LA

UE

LE

(c)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

LA

UE

LE

(b)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

UE

LA

LE

(a)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UE

UA

LA

LE

(d)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

LA

UE

LE

(c)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

LA

UE

LE

(b)

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Bou

nds

on C

orr-C

oeffi

cien

t

UA

UE

LA

LE

(a)

Figure 4.11: Comparison of upper transitive bounds based on angular distance (UA)with the upper transitive bounds based on Euclidean distance (UE). Angular distancebased lower transitive bounds (LA) are also compared with Euclidean distance basedlower transitive bounds (LE). (a) Plot of ρ2,3 on x-axis and bounds on y-axis forρ1,2 = 0.40. (b) Plot of ρ2,3 on x-axis and bounds on y-axis for ρ1,2 = 0.50. (c) Plotof ρ2,3 on x-axis and bounds on y-axis for ρ1,2 = 0.60. (d) Plot of ρ2,3 on x-axis andbounds on y-axis for ρ1,2 = 0.70.

lower transitive bound is always larger than (or equal to) Euclidean distance based

lower transitive bound.

Using the fact that normalized Euclidean distance is bounded between 0.00 and 2.00

and correlation coefficient is bounded between +1.00 and -1.00, the proof of the

following inequalities is trivial:

1

2

√(1 + ρ1,2)(1 + ρ2,3) ≤ 1, (4.45)

∆1,2∆2,3 ≥ 0. (4.46)

Multiplying both sides of Inequality 4.45 with ∆1,2∆2,3 will not change the direction

of inequality:1

2∆1,2∆2,3

√(1 + ρ1,2)(1 + ρ2,3) ≤ ∆1,2∆2,3. (4.47)

The proof of following inequality is also trivial:

ρ1,2ρ2,3 ≥ ρ1,2 + ρ2,3 − 1. (4.48)

136

Multiplication of Inequality 4.47 by -1 will invert the direction of inequality:

−1

2∆1,2∆2,3

√(1 + ρ1,2)(1 + ρ2,3) ≥ −∆1,2∆2,3. (4.49)

Adding Inequalities 4.49 and 4.48:

ρ1,2ρ2,3 −1

2∆1,2∆2,3

√(1 + ρ1,2)(1 + ρ2,3) ≥ (ρ1,2 + ρ2,3 − 1)− ∆1,2∆2,3, (4.50)

substituting the value of ∆1,2∆2,3 from Equation 4.42:

ρ1,2ρ2,3 −√

(1− ρ21,2)(1− ρ2

2,3) ≥ (ρ1,2 + ρ2,3 − 1)− 2√

(1− ρ1,2)(1− ρ2,3). (4.51)

Left hand side of Inequality 4.51 is the lower transitive bound on ρ1,3, based on angular

distance measure as given by 4.12, while the right hand side is the lower transitive

bound based on Euclidean distance measure, given by Inequality 4.27. Inequality 4.51

shows that Euclidean distance based lower transitive bound is always less than angular

distance based lower transitive bound. That proves the fact that the lower transitive

bound when derived from angular distance is tighter than the bound formulation

based on Euclidean distance.

Inequality 4.51 may also be observed by plotting angular distance based lower tran-

sitive bound (Inequality 4.12) and Euclidean distance based lower transitive bound

(Inequality 4.27), as shown in Figure 4.11. In this figure, we observe that angular

distance based transitive bounds are contained within Euclidean distance based tran-

sitive bounds. Therefore, in the following section, we will further explore the tightness

characteristics of angular distance based transitive bounds which will be used for the

development of transitive elimination algorithms in Chapter 5.

137

1r

2r

3r

2,1θ

3,2θ 1r

2r

3r 2,1θ 3,2θ

1r

2r

3r

2,1θ 3,2θ

(a) (b) (c)

Figure 4.12: The tightness of Transitive bounds: (a) Case 1: Both angles are small(b) Case 2: One angle is small and the other is large (c) Case 3: Both angles arelarge.

4.5 Tightness Analysis of Angular Distance Based

Transitive Bounds

For a particular search location, transitive bounds indicate the maximum and the min-

imum limits on correlation, which can be used to discard unsuitable search locations.

For example, at a specific location, if the maximum limit is less than the correlation

value at some previous location, correlation computation becomes redundant and

may be skipped without any loss of accuracy. As the percentage of skipped search

locations increases, the template matching process accelerates accordingly. In order

to compute angular distance based transitive bounds, three transitive inequalities

were presented in Section 4.1. In each of these inequalities, there are two Bounding

Correlations which must be known in order to find bounds on the third Bounded

Correlation. For example, in Equation 4.12, ρ1,2 and ρ2,3 are the two bounding cor-

relations which constrain the upper and the lower limits on the bounded correlation

ρ1,3.

The tightness of the transitive bounds depends on the magnitude of the two bounding

correlations, and requires that the upper bound to be low and the lower bound to

be high. This dependency may be more clearly understood by considering transitive

inequalities in terms of angular distances as given by Equations 4.5 or 4.11. In these

equations, a tight upper bound means cos(θ1,2 − θ2,3) resulting a value significantly

138

lesser than +1, which implies |θ1,2 − θ2,3| has a value significantly larger than 0 ◦.

Similarly, lower bound will be tight if cos(θ1,2 + θ2,3) results a higher value, which

implies that θ1,2 +θ2,3 should have a value close to 0 ◦. Considering different ranges of

values which θ1,2 and θ2,3 may assume, three possible cases are shown in Figure 4.12:

1. Case I: If both angles are small (Figure 4.12a), their difference will be even

smaller and their sum will also be a relatively small number. Therefore both

upper and lower transitive bounds will approach +1. This ensures tight upper

and lower bounds because in this case, the bounded correlation will also be very

high.

2. Case II: If one angle is small while the other is large (Figure 4.12b), then their

difference will be large, resulting in a tight upper bound, and their sum will also

be a relatively large number, resulting in a loose lower bound.

3. Case III: If both of the angles are large (Figure 4.12c), then their difference will

be a small number, resulting in a very loose upper bound while their sum will

be a significantly larger number, resulting in a very loose lower bound.

In these three cases, Case I yields tight upper and lower bounds and can potentially

be exploited for computation elimination. However, practically, this case may happen

quite infrequent because it is less likely to get all of the three image patches to be

highly correlated. Case III yields loose upper and lower bounds therefore this case

cannot be exploited for computation elimination. Case II yields a tight upper bound,

and requires that one of the two bounding correlations has high magnitude, may be

exploited for computation elimination.

We have experimentally studied the characteristics of upper and lower transitive

bounds based on angular distance. Figure 4.13 shows the variation of these bounds

with the variation of ρ1,2 and ρ2,3 on a real image dataset. In Figure 4.13, each pair

of upper and lower bounds corresponds to a fixed value of ρ1,2, while the variation

along the x-axis is due to the variation in ρ2,3 on consecutive pixel positions in the

bigger image. From Figure 4.13, it can be observed that if both of the correlations,

ρ1,2 and ρ2,3, are large, then both of the upper and lower bounds become tight. If

139

0 8

1

s

1, 23

0.4

0.6

0.8

1

d B

ound

s

1, 234

5

0

0.2

0.4

0.6

0.8

1

ficie

nt a

nd B

ound

s

1, 234

5

6Upper Bounds

0 6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

on C

oeffi

cien

t and

Bou

nds

1, 234

5

6

7

8

Upper Bounds

Lower Bounds

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Cor

rela

tion

Coe

ffici

ent a

nd B

ound

s

1, 234

5

6

7

891011

Upper Bounds

Lower Bounds

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

Cor

rela

tion

Coe

ffici

ent a

nd B

ound

s

Pixel Position

1, 234

5

6

7

891011

Upper Bounds

Lower Bounds

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

Cor

rela

tion

Coe

ffici

ent a

nd B

ound

s

Pixel Position

1, 234

5

6

7

891011

Upper Bounds

Lower Bounds

Figure 4.13: Variation of upper and lower bounds on correlation coefficient with thevariation of bounding correlations ρ1,2 and ρ2,3. In this figure, ρ1,2 varies across thecurves while ρ2,3 varies as the pixel position varies along a row in the reference image.Curves 1 to 5 shows the upper bounds for ρ1,2= 0.306, 0.441, 0.571, 0.722, and 0.896respectively. Curve 6 is the actual value of correlation coefficient. Curves 7 to 11show the lower bounds for ρ1,2= 0.896, 0.722, 0.571, 0.441, and 0.306 respectively.Cauchy Schwartz inequality based upper bound is always +1 and Cauchy Schwartzlower bound is always -1.

140

one of the two correlations is high and the other is low then the upper bound remains

tight while the lower bound becomes loose.

4.6 Conclusion

In this chapter we have presented the derivation of transitive bounds on correlation

based measures from two different approaches. The resulting bounds are compared

and it is proved that for the case of correlation coefficient angular distance based

bounds are tighter than Euclidean distance based bounds. Angular distance based

bounds are further studied from the tightness perspective and a practically useful

case is identified, which is Case II. In the following chapter, Case II will be exploited

for the development of transitive elimination algorithms.

Chapter 5

TRANSITIVE ELIMINATION ALGORITHMS FOR

CORRELATION BASED MEASURES

In order to get good elimination performance, tight transitive bounds are essentially

required. In Chapter 4, we have shown that angular distance based transitive bounds

are tighter than the Euclidean distance based bounds. Moreover, in Chapter 4, the

tightness characteristics of the angular distance based bounds were also studied and an

important case was identified in which tight upper transitive bound may be obtained.

Building upon the results of Chapter 4, in the current chapter, we move forward to

develop transitive elimination algorithms.

We manage to get tight upper transitive bound by ensuring at least one of the two

bounding correlations, to be of large magnitude. This is achieved by exploiting differ-

ent forms of autocorrelation found in the images to be matched as one of the bounding

correlation. Most of the template matching applications exhibit strong autocorrela-

tion in one of the following three forms: strong intra-reference autocorrelation, strong

inter-reference autocorrelation or the strong inter-template autocorrelation. To ex-

ploit each of these types, we have proposed three different transitive elimination

algorithms.

In the three transitive elimination algorithms proposed in this chapter, we use au-

tocorrelation as one of the two bounding correlation. In order to get the second

bounding correlation, we divide the search locations into two categories, bounding lo-

cations and bounded locations. We ensure the bounding category to be only a small

fraction of the total search locations while the bounded category to be the bulk of the

locations. Correlation at the bounding search locations must be computed because

it will be used as the second bounding correlation, while the computations at the

bounded search locations may be skipped by using the transitive bounds.

Following is an overview of the three transitive elimination algorithms:

141

142

1. Exploiting strong intra-reference autocorrelation (Mahmood and Khan, 2008)

Most natural images are low-frequency signals, hence exhibit high local spatial

autocorrelation. We divide the reference image into non overlapped windows of

equal size and compute local autocorrelation of the central block in each window

with its neighbors. This autocorrelation is used as first bounding correlation.

Template image is only matched with the central block in each window to get

the second bounding correlation. The correlation of the template with other

blocks in each window is used as bounded correlation and may be skipped by

using transitive bounds. This concept is illustrated in Figure 5.1.

The computation of the local autocorrelation is an algorithmic overhead there-

fore we also present an efficient algorithm for the computation of local autocor-

relation. As a result, this overhead turns out to be insignificant as compared to

the amount of computation elimination achieved in this algorithm.

2. Exploiting strong inter-reference autocorrelation (Mahmood and Khan, 2010)

Tracking an object in a surveillance video, checking for missing components on

a PCB production line or object inspection over conveyor belts requires one

template image to be correlated across multiple reference frames. In such an

application, the reference images are often highly correlated with each other,

because the camera is often static, a fact which can be exploited for high elimi-

nation. The temporal autocorrelation between consecutive frames is used as one

bounding correlation. The object template is fully correlated with a temporally

central frame and the resulting correlations are used as the second bounding

correlations. The correlation of object template with other frames is used as

bounded correlation and may be skipped by using the transitive bounds. This

concept is illustrated in Figure 5.2. The computation of autocorrelation be-

tween different frames is an overhead of this algorithm. We have formulated

an efficient algorithm for the computation of inter-frame autocorrelation, which

reduces the overhead to a significantly small amount.

3. Exploiting strong inter-template autocorrelation (Mahmood and Khan, 2007b):

Certain applications require a set of template images to be correlated with a

single reference image, for example, matching an aerial video with a satellite

143

image or exhaustive rotation-scale invariant template matching. In such cases,

if the set of templates has high autocorrelation, correlation of one template with

the reference image yields tight bounds on the correlation of all other templates

within the set with the same reference image. This concept is illustrated in

Figure 5.3. The correlation between templates is an overhead, but it is quite

small amount of computations therefore may be easily ignored. The correlation

of one template with the full reference image is a small part of the overall

required computations.

Transitive elimination algorithms are implemented in C++ and compared with cur-

rent known efficient algorithms including Enhanced Bounded Correlation Mattoccia

et al. (2008b), Bounded Partial Correlation Di Stefano et al. (2005), SAD with SEA

algorithm Li and Salari (1995) and PDE algorithm Montrucchio and Quaglia (2005),

FFT based frequency domain implementation William et al. (2007) and the fast ex-

haustive spatial domain implementation as discussed in Chapter 3. Experiments are

performed on a variety of real image datasets. While the exact speedup of the pro-

posed algorithms varies from experiment to experiment, we have observed speedups

ranging from multiple times to more than an order of magnitude.

5.1 Exploiting Strong Intra-Reference Autocorre-

lation

The most common case of template matching requires a single template to be corre-

lated with a single reference image. In such applications, local spatial autocorrelation

of the reference image may be exploited for fast template matching. For this purpose,

we divide the search locations within the reference image into non overlapping rect-

angular groups and compute local autocorrelation (AS) of the central location with

the neighboring locations of the group (Figure 5.1).

In each group, the template image is correlated with the central search location, to

yield Central Correlation (CC) and the correlation of the template with the remain-

ing locations is delayed until the evaluation of the elimination test. As shown in

144

Template Image

Reference Search Locations

Central Correlation

Figure 5.1: Groups of Search Locations: A ‘search location’ is the central pixel of apossible matching location of the template, within the reference image. Small squaresshow 81 search locations divided into non overlapping 3×3 groups. Each group has acentral search location shown in red and neighboring search locations shown in blue.The template always has to be correlated with central locations while its correlationwith the neighboring locations may be eliminated based upon the transitive bounds.

Figure 5.1, both local autocorrelation and central correlation are used as bounding

correlations to compute transitive bounds for the remaining locations, and those with

upper bounds less than a current known maximum (or less than a conservative initial

threshold) may be skipped, without any loss of accuracy. Since the spatial autocor-

relation with close neighbors is often high for natural images, this results in a tight

upper bound and hence high elimination at most locations. Complete pseudo-code

for this algorithm is shown as Intra-Ref-TEA.

In Algorithm 1 the speedup is obtained from bounded correlations, shown as dotted

arrows in Figure 5.1, whereas the bounding correlations constitute an overhead for

the algorithm. There are two types of overheads: the computation of the local spatial

autocorrelation of the reference image and the computation of the central correlation

in each group. For the first type, the standard implementation has computational

complexity of the order of O(mnpq) Mahmood and Khan (2008), where m× n is the

template size and p×q is the reference image size. However, redundant computations

can be eliminated by using a more efficient algorithm, which reduces the computa-

tional complexity to O(shswpq) (as discussed later in this section), where sh × sw is

the size of the group of locations.

145

Input: Template Image, Reference Image, AS, Cmax, Size of Group ofLocations

beginAS ⇐ Local Spatial Auto-correlation;Cmax ⇐ Initial correlation threshold;foreach Group of Search Locations do

CC ⇐ correlate(Template, Central Search Location);if CC > Cmax then

(Cmax, imax, jmax)⇐ (CC ,Central location indices);foreach Remaining Search Location Within the Current Group do

UpperBound⇐ ASCC +√

(1− AS2)(1− CC2);if UpperBound < Cmax then

Skip Current Location;endelse

C⇐ correlate(Template,Current Search Location);endif C > Cmax then

(Cmax, imax, jmax)⇐ (C, Current Location Indices );end

end

endendreturn imax, jmax,Cmax;

Algorithm 1: Intra-Ref-TEA

146

For the overhead due to central correlation, we observe that at least one correlation is

a must for each group. Since the number of groups are pq/shsw, and one correlation

of the template of size m × n is must for each group, the overhead cost is given as

O(mnpq/shsw).

The total overhead for both types can be written as the summation of the two over-

heads:

η = ξ(shswpq +mnpq

shsw), (5.1)

where ξ is a machine dependent constant. If k templates are to be matched with the

same reference image, the local autocorrelation overhead is further amortized to yield

a total overhead of

η = ξ(shswk

+mn

shsw)pq, (5.2)

Assuming the cost of spatial domain template matching to be ξmnpq, a theoretical

upper bound upon the speedup of Intra-Ref-TEA may be written as:

SpeedUp ≤ mn

( shswk

+ mnshsw

). (5.3)

As an illustration, if 10 templates each of size 64×64 pixels are to be matched with a

reference image (of any size) and group size is 5×5, the upper bound upon maximum

achievable speedup over spatial domain is 24.624.

Equation (5.3) indicates that more speedup is possible on larger group sizes. How-

ever, on larger sizes the local autocorrelation may decay down to a small value,

hence reducing the tightness of the transitive bounds and therefore resulting in re-

duction in percentage elimination. The proper choice of the group-size parameter,

therefore, depends upon the spread of the local autocorrelation function in the ref-

erence image and the magnitude of the known correlation maximum. The smallest

size of a symmetrical group is 3 × 3 search locations, which means that the cen-

tral search location will be correlated with its eight neighbors only. Practically one

may adapt to the proper group size by observing the computation elimination. For

sh × sw group size, computations at one location are mandatory, maximum number

of skipped locations are shsw− 1. If percentage of eliminated computations approach

the maximum limit (shsw − 1)/(shsw) × 100, the group size may be increased to

147

(sh + 1)× (sw + 1). This is because, approaching the maximum limit of elimination

indicates that the reference image may have a wider autocorrelation that may allow

even larger group size to get more speedup. On the other hand, if the computation

elimination reduces to less than the maximum limit of the smaller group given by

((sh − 1)(sw − 1)− 1)/((sh − 1)(sw − 1))× 100, then the size may be reduced to the

smaller group size, (sh − 1)× (sw − 1). We experimentally observed that for images

in our datasets, the group size of 5× 5 yields good computation elimination therefore

we have used the size of 5× 5 in all of our experiments.

As mentioned earlier, the computation of local autocorrelation can be made much

more efficient than its standard implementation by exploiting the redundancy in its

computation. We propose an algorithm in which the correlation between central

location rc, and a nearby location rn, is computed simultaneously over all groups,

through pixel by pixel multiplication of the reference image with its (wr, wc) translated

version, where (wr, wc) is the row, column difference between rc and rn. Then using

the running-sum approach, we compute the sum of all m × n blocks in the product

array, in just four operations per block. This results in correlation of each search

location with a (wr, wc) pixels translated location. We copy only required values

in a final LA-Array as shown in LA-Algorithm. The same process is repeated shsw

times, and each time pq integer multiplications and 4pq additions are done. Therefore

the overall complexity of the proposed algorithm for local spatial autocorrelation

computation is O(shswpq). Additional memory required by LA-Algorithm consists

of three arrays: Pr, Sf and LA, each of size equal to that of the reference image, p×q.

In LA-Algorithm, an efficient running sum algorithm is used to compute the summa-

tion over all m× n blocks of the products in Pr array. In this algorithm, summation

along the rows is computed first and then over these row-sums, summation along

the columns is computed. For the computation of row-sums, in each row first n

columns are summed up and then next sums are computed by adding the leading col-

umn and subtracting the trailing column. Once row-sums are complete, column-sums

are computed by using the same strategy over the row-sums. The pseudo-code for

Running-Sum-Algorithm is given as Algorithm 3. In this algorithm, for each internal

m × n block sum, only 4 operations are required. If there are p × q blocks to be

summed up, overall complexity of the Running-Sum-Algorithm is O(pq).

148

Input: Reference Image, Size of Group of Locations, Size of Template Imagebegin

Iref ⇐ Reference image;(m,n)⇐ Template Image Rows and Columns ;(sh, sw)⇐ Size of Group of Locations;for wr = 1 to sh do

for wc = 1 to sw doforeach pixel (i, j) in Reference-Image do

Pr(i, j)⇐ Iref(i, j)Iref(i+ wr, j + wc);endSf ⇐ Running sum of all m× n patches in Pr;Comment: Copy Only Required Values From Sf to LA-array ;foreach (i, j) in the final LA-array do

LA(i+ wr, j + wc)⇐ Sf (i+m, j + n);i⇐ i+ sh;j ⇐ j + sw;

end

end

endend

Algorithm 2: Local Autocorrelation (LA) Algorithm

Figure 5.2: Exploiting strong inter-frame autocorrelation for fast template matching.Template is fully correlated with only one frame (shown dark red), while for theremaining frames transitive bounds are computed.

149

Input: Reference Image, Template Image Sizebegin

(p, q)⇐ Reference-Image Rows and Columns;(m,n)⇐ Template-Image Rows and Columns;Comment: One pass through all reference image rows;for i⇐ 1 to p do

sum⇐ 0;Comment: Compute sum over first n columns, where: n < p;for j ⇐ 1 to n do

sum = sum + Pr[i, j];endSr[i, n]⇐ sum;Comment: Sr is a temporary array which contains row-sums;Comment: Onward use running sum;for j ⇐ n+ 1 to q do

Sr(i, j) = Sr(i, j − 1) + Pr(i, j)− Pr(i, j − n);end

endComment: One pass through all reference image columns.;for j ⇐ 1 to q do

sum⇐ 0;Comment: Compute sum over first m row-sums, where: m < q ;for i⇐ 1to m do

sum = sum + Sr[i, j];endSf [m, j]⇐ sum;Comment: Sf is final array to hold summation values;Comment: Onward use running sum;for i⇐ m+ 1 to p do

Sf (i, j) = Sf (i, j − 1) + Sr(i, j)− Sr(i, j − n);end

endend

Algorithm 3: Efficient Running-Sum-Algorithm

150

5.2 Exploiting Strong Inter-Reference Auto-Correlation

In some template matching applications, for example tracking objects across a video

sequence, one template image has to be correlated with multiple reference frames.

If the reference frames are correlated temporally, such as in the case of a static

surveillance camera, we can exploit their temporal autocorrelation (AT ) to get tight

transitive bounds. The concept is illustrated in Figure 5.2. In this scenario, the

central correlation (CC) is obtained by completely correlating the template image

with a specific reference frame. The correlation with the remaining frames is delayed

until evaluation of the transitive elimination test.

Using AT and CC as bounding correlations, we compute transitive upper and lower

bounds on all search locations in the remaining frames and those match locations

with upper bound less than the current known maximum (or an initial correlation

threshold), may be discarded without any loss of accuracy.

In some applications, for example automatically checking the missing components in

a circuit board manufacturing facility, the three image patches may happen to be

very similar. Therefore we may get both upper and lower bounds to be tight as given

by Case I. In such applications, all search locations where upper bound is less than

maximum of the lower bound, may also be skipped without any loss of accuracy. The

pseudo code for the complete algorithm is given as Inter-Ref-TEA.

This algorithm also carries an overhead but this time it is the temporal autocorre-

lation of the sequence of reference frames. We employ a similar strategy as in the

previous case and compute this overhead in O(pq), where pq is the size of reference

image. This is done by multiplying, pixel by pixel, the two reference frames and

then using the running sum approach to compute the summation of all patches of

size m× n in the product array. This summation of products is the cross-correlation

between corresponding blocks of the two frames. Since the complexity of running

sum algorithm is O(pq) and before that pq integer multiplications were carried out,

therefore overall complexity of inter-frame autocorrelation computation is of the order

of O(pq), which is significantly smaller than the complexity of one template corre-

lated with one reference frame in O(mnpq). Hence the computational cost of inter

151

frame autocorrelation computation is insignificant as compared to the overall cost of

template matching.

Input: Sequence of Reference Images, Template Image, Initial CorrelationThreshold

beginfc ⇐ Fully Correlated Reference Frame;CC ⇐ correlate(Template Image, fc);return fc, imax, jmax,max(CC);foreach of the Remaining Reference Image, fk do

AT ⇐ Autocorrelate fc with fk;Lmax ⇐ Compute Maximum of Lower Bound over fk;Cmax ⇐ Initial Correlation Threshold;if Lmax > Cmax then

Cmax = Lmax

foreach Search Location in fk do

UpperBound⇐ ATCC +√

(1− AT 2)(1− CC2);if UpperBound < Cmax then

Skip Current Locationelse

C⇐ Correlate Template With Current Locationif C > Cmax then

Update (Cmax, imax, jmax)⇐ (C, Current Location Indices )return fk, imax, jmax,Cmax

end

endend

Algorithm 4: Inter-Ref-TEA

5.3 Exploiting Strong Inter-Template Auto-Correlation

In some template matching applications, for example registration of an aerial video

with a satellite image (Shah and Kumar, 2003b), a sequence of template frames are

to be correlated with the same reference image. In such applications, if consecutive

template frames exhibit strong inter-template auto-correlation, the transitive bounds

may be used to speedup the template matching process. For this purpose, we divide

the sequence of template frames into groups such that all templates within each

group exhibit strong autocorrelation A′T with the temporally central frame. One such

152

Figure 5.3: Inter-Template-TEA: Exploiting strong inter-template autocorrelation forfast template matching.

153

group of templates is shown in Figure 5.3, in which the central template is shown in

red and central correlation CC is obtained by correlating the central template with

the complete reference image. Then using A′T and CC as bounding correlations, we

compute the transitive bounds upon the correlation of each remaining template in

the group. All match locations with upper transitive bounds less than the current

known maximum or the initial correlation threshold, may be discarded without any

loss of accuracy.

Input: A Sequence of Template Images, Reference Image, Size of Group ofTemplates, Initial Threshold

beginforeach Group of Templates do

tc ⇐Central-Template;CT ⇐ correlate(tc,Reference-Image);tn ⇐Non-central-templates in current group;foreach Template tn in the Current Group do

CA ⇐ correlate(tc, tn);foreach Search Location in the Reference Image do

UB ⇐ CT ∗ CA +√

(1− C2T )(1− C2

A);if UB < Corrmax then

Skip Current Location;else if cl ⇐ Current Location Values then

Corr⇐ correlate(cl, tn);if Corr > Corrmax then

(Corrmax, imax, jmax)⇐ (Corr,Current Location Indices);

endreturn te, imax, jmax,Corrmax;

end

endend

Algorithm 5: Inter Template(IT) TEA

In large template video sequences, the temporal autocorrelation may significantly vary

over time, requiring different group lengths. To find the appropriate group length at

runtime, we have developed a simple algorithm which adapts the length of current

group using the percentage computation elimination results of the previous group.

Let actual elimination obtained in the k − 1st group be ek−1act , and the maximum

154

possible elimination be ek−1max

ek−1max = (L[k − 1]− 1)/L[k − 1], (5.4)

where L[·] denotes the length of a group. Equation 5.4 is based on the fact that

one central correlation must be performed. If both of these eliminations are close to

each other, then autocorrelation may be under-utilized and the group length may be

increased, while if ek−1act is significantly less than ek−1

max, then autocorrelation is less than

expected, therefore group length, L[k − 1], must be decreased for the next group:

L[k] =

L[k − 1] + 2, if ek−1

max − ek−1act < δl

L[k − 1]− 2, if ek−1max − ek−1

act > δh

L[k − 1], otherwise

(5.5)

where δl and δh are low and high thresholds on elimination. Keeping a very low value

of δl will result in an increase in the number of groups and hence the overhead of the

number of fully correlated templates, while keeping a high value of δh may cause an

increase in computational cost due to reduction in elimination.

The only overhead in this algorithm is the computation of inter-template autocorre-

lation which is of the order of O(mn), where m × n is the template size. Therefore,

the computational cost of this overhead is negligibly small as compared to the overall

computations.

5.4 Experiments with Transitive Elimination Al-

gorithms

We have performed extensive empirical evaluation of the three different types of

template matching problems described in the last three sections. Our experiments

are performed on ten different datasets, consisting of 424 reference images and 8465

template images. The size of reference images ranges from 240 × 320 to 1394×2194

pixels, while the smallest template is of size 16× 8 and the largest contain 128× 128

155

pixels. No template image is selected from within a reference image and contains

various types of distortions described in each subsection.

The proposed algorithms were implemented in C++ and compared with the cur-

rently known fast exhaustive template matching techniques including FFT based

frequency domain implementation Lewis (1995), Zero-mean Bounded Partial Cor-

relation (ZBPC) Di Stefano et al. (2005), Zero-mean Enhanced Bounded Correla-

tion(ZNccEbc) Mattoccia et al. (2008b) and an exhaustive spatial domain implemen-

tation (Spat) Haralick and Shapiro (1992). We have implemented the ZBPC algorithm

and all experiments are carried out with the correlation area of 20% and bound area

of 80%, as recommended in Di Stefano et al. (2005). Implementation of ZNccEbc

algorithm provided by the original authors Mattoccia et al. (2008b) has been used

and the parameter representing the number of partitions, r, has been selected to be 8

if possible, as recommended in Mattoccia et al. (2008b). However, for template sizes

that are not divisible by 8, some suitable value of r has been selected as described

later.

Other than correlation based measures, we have also implemented Sum of Abso-

lute Differences (SAD) with Partial Distortion Elimination []Montrucchio and Quaglia

(2005) and Successive Elimination Algorithm (Li and Salari, 1995) optimizations. In

order to ensure a realistic comparison, we have used only sequential implementations

of all algorithms. The execution times are measured on an IBM machine with Intel

Core 2 CPU 2.13 GHz processor and 1GB RAM.

Experiments are divided into six subsections. First five subsections correspond to

the three proposed elimination algorithms using the correlation coefficient match

measure and in the sixth section, the elimination performance of different correlation

based measures is compared with each other. The datasets used in each group,

implementation codes, and experimental setup details along with complete results

are available on our web site: http://cvlab.lums.edu.pk/tea.

156

Figure 5.4: Satellite Image (SI) dataset used for experiments on exploiting intrareference autocorrelation.

157

Figure 5.5: Two Circuit Board (TCB) and Circuit Board (CB) datasets used forexperiments on exploiting intra reference autocorrelation.

158

Figure 5.6: Aerial Image (AI) dataset used for experiments on exploiting intra refer-ence autocorrelation.

159

5.4.1 Experiments with Intra-Reference Auto-correlation

These experiments are performed on four datasets: Satellite Images (SI) dataset,

Aerial Images (AI) dataset, Circuit Board (CB) dataset and Two Circuit Boards

(TCB) dataset (see Table 5.1 and Figures 5.4, 5.5 and 5.6). The images to be matched

have projective distortions due to difference in viewing geometry. In addition, the ref-

erence image of SI dataset has high brightness while the templates have low brightness

and contrast. These brightness and contrast variation were synthetically introduced

in the dataset. In CB and TCB datasets, templates and the reference images are

taken from different boards. In AI dataset, available from flicker.com under ‘Creative

Commons’ license, templates and the reference are aerial images of the same scene,

taken while the aircraft was in two different locations.

For the Intra-Ref-TEA algorithm, the results reported in Table 5.2 are for a group

size of 5 × 5 search locations for all datasets. For ZNccEbc algorithm, when the

number of rows of the template was not a factor of 8, we picked the factor which was

perceived to generate higher speedup. We selected r={8, 8, 8, 8, 8, 8, 17, 17, 17, 5, 9}for SI(a,b,c), CB(a,b,c), TCB(a,b,c) and AI (a,c) respectively. Dataset TCB(a,b,c) is

also experimented with r = {2, 3, 4}, which yielded execution times {329.51, 403.17,

455.63} seconds. These timings are significantly larger than the timings for r = 17,

as given in Table 5.2. The templates in AI.b dataset have 97 rows, which being a

prime number cannot be factorized, therefore one may select r = 1 or r = 97. We

experimentally compared the two choices and found r = 97 to be more efficient. In

Table 5.2, the ZNccEbc results on AI.b dataset are reported for r = 97.

Instead of using coarse-to-fine strategy to initialize ZNccEbc, ZBPC and Intra-Ref-TEA,

a fixed initial correlation threshold of ρ =0.80 has been used. Table 5.2 shows the

total execution time taken by each algorithm on each dataset. The execution time re-

ported for Intra-Ref-TEA includes the local auto-correlation computation overhead

which is {1.463s, 0.270s, 0.505s, 0.963s} for AI, CB, SI and TCB datasets respectively.

The execution time speedup of Intra-Ref-TEA over other algorithms is dataset de-

pendant. Maximum observed speedup over ZBPC is 15.549 times, over ZNccEbc is

4.464 times, over FFT is 24.626 times and over Spat is 22.680 times. Intra-Ref-TEA

160

Table 5.1: Dataset description for experiments with Intra-Ref-TEA

DatasetTemplate Sizes Total Reference

a b c Frames SizeSI 64×64 112×112 128×128 711 800×1000

TCB 34×34 51×51 68×68 579 807×1716CB 16×8 24×12 32×16 328 762×1000AI 95×95 97×97 99×99 171 1453×1548

has remained faster than other correlation coefficient based algorithms, while for CB

and SI datasets SAD has exhibited highest speed. However SAD badly suffers from

lack of accuracy over these datasets, due to brightness and contrast variations. For

SI, the accuracy of SAD is zero percent, and for the CB dataset, out of 328 templates

only 25 correctly matched. However, the accuracy of the correlation coefficient based

algorithms has remained 100% over all datasets.

Over a portion of the four datasets {AI.a, CB.c, TCB.c, SI.c}, the variation of %

computation elimination and average execution time per template has been studied

by varying the group size parameter to {3×3, 5×5, 7×7 and 9×9} (see Table 5.6).

The datasets AI.a and TCB.c has shown best performance at groups size of 5×5

while CB.c and SI.c performed best at 3×3 and 7×7, respectively. Thus by tuning

the group size parameter, speedups reported in Table 5.2 may further be improved

for CB and SI datasets, even though all experiments reported in Table 5.2 are for

5×5 group size.

Maximum, minimum, and average speed up of TEA over other algorithms and confi-

dence interval for confidence level of 0.95 is reported in Table 5.3. Average speedup

along with confidence intervals is plotted in Figure 5.7.

5.4.2 Experiments with Inter-Reference Auto-correlation: Fast

Feature Tracking

In this experiment, manually extracted features are tracked across Pedestrian (PED)

and Cyclist (CYC) video datasets. Both videos were acquired in a typical surveillance

scenario (see Table 5.7 and Figure 5.8). Both datasets contain dissimilarities produced

161

Table 5.2: Total execution time in seconds taken by Intra-Ref-TEA and other algo-rithms upon datasets described in Table 5.1

Dataset IR-TEA ZBPC ZNccEbc FFT SAD Spat

AI.a 108.89 1176.6 306.91 368.05 460.89 2375.9AI.b 141.61 1604.3 632.19 474.49 625.14 3099.4AI.c 185.53 2194.7 413.00 639.87 812.38 4207.8CB.a 9.71 24.14 29.34 193.07 1.45 35.62CB.b 17.14 51.86 28.76 188.03 2.91 70.66CB.c 26.58 81.72 31.94 191.25 5.87 118.58

TCB.a 63.80 675.70 249.03 880.89 50.23 870.78TCB.b 103.10 1426.0 263.64 848.97 130.38 1827.0TCB.c 160.75 2499.5 278.67 838.24 267.31 3260.90

SI.a 352.68 2332.0 460.55 1307.5 5.13 2557.0SI.a 449.84 5717.9 831.12 1152.7 12.13 6356.6SI.a 465.31 6882.4 961.84 1108.9 15.76 7667.3

Table 5.3: Maximum, minimum and average speedup of Intra-Ref-TEA (Table 5.2).Speedup is computed by divided the execution time of each algorithm by the executiontime of TEA. Confidence intervals zασ/

√N are also computed for α = .05 (confidence

level of 0.95), zα = 1.645 and σ is standard deviation of the speedup for N = 7datasets.

Dataset ZBPC ZNccEbc FFT SAD Spat PCEMaxSpeedup 15.55 4.46 19.88 4.41 22.68 1MinSpeedup 2.49 1.20 2.38 0.01 3.67 1AvgSpeedup 9.72 2.40 7.01 1.45 14.01 1ConfInterval 9.72±2.25 2.40±0.48 7.01±2.59 1.45±0.870 14.01±3.52 1±0

Table 5.4: Fast sequence to reference image alignment: computation elimination (%)comparison between different algorithms

Dataset TEA ZBPC ZNccEbc FFT SAD SpatialCB.1 87.278 2.832 96.201 00.00 63.517 00.00CB.2 90.528 2.524 94.426 00.00 52.131 00.00CB.3 92.137 2.407 95.310 00.00 50.683 00.00CB.4 93.045 2.211 96.275 00.00 49.365 00.00SI.1 86.368 9.035 93.563 00.00 99.841 00.00SI.2 89.851 10.487 93.521 00.00 99.807 00.00SI.3 91.767 10.365 93.687 00.00 99.806 00.00SI.4 93.03 10.403 94.220 00.00 99.826 00.00SI.5 94.044 10.778 94.165 00.00 99.81 00.00

162

ZNccEbc ZBPC FFT SAD Spat PCE0

5

10

15

Sca

led

Ave

rage

Exe

cutio

n T

imes

Figure 5.7: Plot of average execution time speedup of TEA on Video Geo-registrationdataset. Confidence intervals for confidence level of 0.95 are also plotted. Correspond-ing values may be seen from Table 5.3.

Table 5.5: Local autocorrelation computation time in seconds for LA-Algorithm andby the previous algorithm Mahmood and Khan (2008) for a group size of 5×5 searchlocations.

SI Dataset SI.1 SI.2 SI.3 SI.4 SI.5LAF Time 0.499 0.499 0.499 0.484 0.484CEA Time 8.937 13.780 18.639 24.216 30.184CB Dataset CB.1 CB.2 CB.3 CB.4 -LAF Time 1.671 1.671 1.671 1.656 -CEA Time 8.390 18.013 32.730 49.260 -

Table 5.6: Intra-Ref-TEA: Variation of percent computation elimination (%E) andaverage execution time (sec) per template (T) by varying the group size parameter(GrSz).

GrSz 3×3 5×5 7×7 9×9DSet T %E T %E T %E T %EAI.a 7.01 87.2 2.74 95.1 2.99 94.6 4.69 91.4CB.c 0.18 85.3 0.24 80.3 0.25 79.2 0.27 77.3

TCB.c 2.04 88.8 0.86 95.4 0.96 94.8 1.79 90.1SI.c 3.23 88.9 1.24 95.8 1.13 96.1 1.6 94.4

163

Figure 5.8: (a) Pedestrian dataset: four reference frames and 21 feature templates.(b) Cyclist dataset: four reference frames and 5 feature templates.

164

Figure 5.9: Fast component tracking dataset: 6 reference frames and 5 componenttemplates. Original images were taken from (Mattoccia et al., 2008a).

165

Table 5.7: Dataset description for fast feature tracking/fast component tracking ex-periments

Dataset # of Feat. Feat. Size # of Frames Frame SizePED 21 23 × 11 325 240 × 320CYC 5 17 × 17 38 240 × 320CT.a 6 63 × 63 16 479 × 640CT.b 1 178 × 62 16 479 × 640CT.c 1 136 × 104 16 479 × 640CT.d 1 147 × 63 16 479 × 640AT 20 95 × 95 25 1453 × 1548

by human motion as well as frame to frame illumination variations. Initial correlation

threshold is set to 0.70 for each of the ZNccEbc, ZBPC and Inter-Ref-TEA algorithms.

The partition parameter r in ZNccEbc has been selected to be 23 for PED and 17 for

the CYC. The total execution times for the Inter-Ref-TEA, in Table 5.8, include the

time of central correlation and inter-frame temporal autocorrelation overheads.

In these experiments Inter-Ref-TEA has remained significantly faster than all other

algorithms. The maximum execution time speedup over ZBPC is 7.147 times, over

ZNccEbc is 9.410, over FFT is 15.073, over SAD is 2.020 and over Spat is 7.400 times.

The slow execution times for the ZNccEbc algorithm are due to unfavorable template

sizes which increased the bound computation overhead. Percentage of eliminated

computations is also reported in Table 5.9. For the PED dataset the ZNccEbc algo-

rithm has obtained maximum elimination, while for the CYC dataset Inter-Ref-TEA

has obtained maximum computation elimination. Despite high computation elimina-

tion obtained by the ZNccEbc algorithm in PED dataset, it has remained significantly

slower than all other algorithms, including the exhaustive spatial domain implemen-

tation, Spat. It is because of the fact that, in ZNccEbc algorithm, the overhead cost

of bound computation is significantly larger than the elimination benefit obtained by

the skipped computations.

166

Table 5.8: Total time in seconds for datasets described in Table 5.8 for Inter-Ref-TEAand other algorithms.

Data IRTEA ZNccEbc ZBPC FFT SAD Spat

PED 58.30 548.61 340.38 268.91 110.27 374.51CYC 1.50 14.04 10.72 22.61 3.03 11.1CT.a 12.76 65.90 198.50 166.08 86.27 263.45CT.b 4.06 27.92 82.05 27.59 49.53 88.53CT.c 3.31 29.88 101.64 27.69 73.47 125.72CT.d 2.05 8.62 57.70 27.72 38.08 81.68AT 223.43 5235.1 22668 4171.3 6105.7 27099

Table 5.9: Percentage computation elimination in Inter-Ref-TEA and other elimina-tion algorithms.

Dataset IR-TEA ZNccEbc ZBPC SAD

PED 80.496 93.839 12.571 75.583CYC 93.749 89.607 8.451 77.259CT.a 92.250 97.691 24.957 69.192CT.b 88.150 93.047 8.103 49.063CT.c 91.029 98.631 19.751 43.889CT.d 93.162 99.560 29.585 57.397AT 95.755 98.947 17.733 78.318

167

5.4.3 Experiments with Inter-Reference Auto-correlation: Fast

Component Tracking

In this dataset there is no local motion and the component templates are signifi-

cantly larger in size as compared to the feature templates. Two types of datasets

are used: Component Tracking (CT) and Aerial Tracking (AT) (see Figure 5.9 and

Table 5.7). Original images in CT were taken from Mattoccia et al. (2008a) and

AT dataset is a portion of the AI dataset used in Subsection 5.4.1. Following frame

to frame variations were synthetically produced: affine photometric variations, non-

linear photometric variations, complementing, sharpening by edge-enhancements and

geometrically transforming the original images.

Initial correlation threshold of 0.70 has been used for Inter-Ref-TEA, ZNccEbc and

ZBPC. The central correlations in Inter-Ref-TEA have been computed by using the

FFT based implementation. For ZNccEbc, the r parameter has been selected to be

{7, 89, 8, 7, 5} for CT(a-d) and AT. For CT.b, we experimented with r = 2 as well,

however we found that the performance of ZNccEbc is better with r = 89, which

is reported in Table 5.8. From the total execution times reported in Table 5.8, the

maximum speedup observed by Inter-Ref-TEA over ZNccEbc algorithm is 23.431

times, over ZBPC is 101.46, over FFT is 18.669, over SAD is 27.327 and over Spat is

121.290 times.

Maximum, minimum, and average speed up of TEA as compared to other algorithms

with confidence interval for confidence level of 0.95 is reported in Table 5.10. Average

speed up of TEA and confidence intervals are also plotted in Figure 5.10.

5.4.4 Experiments with Inter-Template Auto-correlation: Video

Geo-registration

These experiments are performed on two datasets DS1 and DS2 (see Table 5.11 and

Figure 5.11). The two reference images are 800K pixels and 3000K pixels satellite

images taken from Google Earth, earth.google.com. The video frames are acquired

by modeling a flight simulation on satellite images of the same area but captured

168

Table 5.10: Maximum, minimum and average speedup of TEA for Video Geo-registration experiment (Table 5.8). Speedup is computed by divided the executiontime of each algorithm by the execution time of TEA. Confidence intervals zασ/

√N

are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ is standarddeviation of the speedup for N = 7 datasets.

Dataset ZNccEbc ZEBC FFT SAD Spat PCEMaxSpeedup 23.43 101.45 18.67 27.33 121.28 1MinSpeedup 4.20 5.84 4.61 1.89 6.42 1AvgSpeedup 10.57 35.15 11.48 13.35 42.57 1ConfInterval 10.57±4.74 35.15±24.18 11.48±3.49 13.35±6.75 42.57±28.97 1±0


10

20

30

40

50

60

70

80

Sca

led

Ave

rage

Exe

cutio

n T

imes


169

at different time of the year, provided by Microsoft Terra Server, recently named as

www.terraserver.com. In the simulation, the scale and orientation is assumed to

be approximately same as that of the reference images. The images to be matched

contain dissimilarities due to difference in imaging sensor and viewing geometry. Ad-

ditional dissimilarities were generated by reducing the dynamic range of templates in

DS1 to one third of the original range and the templates in DS2 were contrast re-

versed. Contrast reversals are frequently observed in practical situations, if matching

is to be done across infra-red and optical imagery.

For ZNccEbc, ZBPC and Inter-Template-TEA(IT-TEA), initial correlation threshold is

0.80 for DS1 and -0.85 for DS2. In ZNccEbc algorithm, r = 8 has been used for

both datasets. In IT-TEA, the correlation of the central templates with the reference

images has been done by using the FFT based implementation and length of the

group of templates is initialized to 7 for DS1 and 5 for DS2. For the remaining groups,

length was automatically adapted by using δl = 3% and δh = 10% in Equation (5.5).

Average group length has remained {8.6, 10.9, 11.6, 12.1, 12.4, 7.8, 7.2, 7.7, 8.2, 7.7}for DS1(a-e) and DS2(a-e) datasets respectively.

Execution time comparison of IT-TEA and other algorithms is given in Table 5.12.

For DS1, maximum execution time speedup of IT-TEA over ZBPC is 9.772 times, over

ZNccEbc is 1.685, over FFT is 3.610 and over Spat is 15.101 times. For DS2, maximum

observed speedup of IT-TEA over ZBPC is 10.218, over ZNccEbc is 6.376, over FFT is

3.057 and over Spat is 10.264 times. The low performance of ZBPC and ZNccEbc

on DS2 can be attributed to the fact that these algorithms have been developed

to find only positive maximum of the correlation coefficient, where as in case of DS2

negative peaks have to be searched. Transitive elimination algorithm does not require

any modification to search for negative peaks.

Maximum speedup, minimum speedup, and average speedup of TEA along with con-

fidence interval for confidence level of 0.95 is reported in Table 5.13. Average speedup

of TEA along with confidence intervals is plotted in Figure 5.12.

170

Table 5.11: Dataset details used for video geo-registration experimentsDataset # of Frames Frame Size Ref. Size Avg. ρmax

DS1.a 734 64 × 64 736 × 1129 0.939DS1.b 744 80 × 80 736 × 1129 0.961DS1.c 694 96 × 96 736 × 1129 0.963DS1.d 641 112 × 112 736 × 1129 0.961DS1.e 594 128 × 128 736 × 1129 0.958DS2.a 659 64 × 64 1394 × 2152 -0.935DS2.b 645 80 × 80 1394 × 2152 -0.921DS2.c 648 96 × 96 1394 × 2152 -0.874DS2.d 632 112 × 112 1394 × 2152 -0.924DS2.e 616 128 × 128 1394 × 2152 -0.794

Table 5.12: Video geo-registration: average execution time in seconds per templateframe.

Dataset IT-TEA ZNccEbc ZBPC FFT SAD Spat

DS1.a 1.217 1.366 6.415 4.223 0.107 8.455DS1.b 1.156 1.675 8.575 4.173 0.156 13.587DS1.c 1.413 2.314 12.736 4.161 0.258 18.553DS1.d 1.669 2.787 16.310 4.261 0.436 24.018DS1.e 1.977 3.333 16.715 4.266 0.610 29.855DS2.a 6.394 16.725 32.163 19.547 2.969 32.848DS2.b 8.614 28.378 53.303 19.552 4.976 53.760DS2.c 12.290 42.751 74.399 19.606 7.030 74.933DS2.d 15.534 58.995 98.432 19.458 9.374 99.027DS2.e 12.250 78.110 125.170 19.563 11.959 125.740

Table 5.13: Maximum, minimum and average speedup of TEA for Video Geo-registration experiment (Table 5.12). Speedup is computed by divided the executiontime of each algorithm by the execution time of TEA. Confidence intervals zασ/

√N

are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ is standarddeviation of the speedup for N = 10 datasets.

Dataset ZNccEbc ZEBC FFT SAD Spat PCEMaxSpeedup 6.376 10.218 3.60 0.976 15.10 1MinSpeedup 1.12 5.03 1.25 0.088 5.14 1AvgSpeedup 2.71 7.37 2.45 0.42 9.54 1ConfInterval 2.71±0.83 7.37±0.98 2.45±0.43 0.42±0.14 9.54± 2.0 1±0

171

Figure 5.11: Video geo-registration dataset: (a) DS1 (b)DS2. In bothdatasets, reference images are taken from earth.google.com and templates from ter-raserver.microsoft.com.

172


2

4

6

8

10

12

Sca

led

Ave

rage

Exe

cutio

n T

imes


Figure 5.13: Rotation and Scale invariant template matching: Nine reference imagesand 14 templates.

173

5.4.5 Experiments with Inter-Template Auto-correlation: Ro-

tation / Scale Invariant Template Matching

Consecutive rotated and scaled versions of an object are generally highly correlated.

We have used this correlation to speedup the exhaustive rotation/scale invariant

template matching by using IT-TEA. These experiments are performed on optical

character recognition dataset using scanned pages from multiple books. The template

images consist of 14 letters: {a, c, e, g, i, k, m, o, p, s, v, w, x, z}, which were

extracted from one of the scanned image (see Table 5.14 and Figure 5.13). Each

template is rotated from -5 ◦ to +5 ◦ and scaled from -8% to +8% at a step size of

2%, resulting in 99 rotated/scaled versions. All of these rotated/scaled versions are

exhaustively correlated with each of the 14 reference images, with varying background

colors, arbitrary rotations, arbitrary scaling, and aliasing effects due to poor scanner

resolution and with broken and irregular character boundaries.

Out of 99 rotated/scaled versions of each template, only one template (with zero

rotation and unit scaling) is fully correlated with the complete reference image while

for all of the remaining templates, transitive bounds are computed. In these exper-

iments, initial correlation threshold is set to 0.80 for ZBPC, ZNccEbc and IT-TEA. In

ZNccEbc, partition parameter r is set to be: {19, 19, 17, 13, 5, 5, 9, 9, 5, 9, 9, 19, 9, 19}respectively for the 14 templates given in Table 5.14.

For each algorithm, total execution time including all overheads is shown in Table

5.15. The maximum execution time speedup obtained by IT-TEA is 28.292 times

over ZBPC, 30.322 times over ZNccEbc, 126.70 times over FFT, 12.674 times over SAD

and 29.261 over Spat. On this dataset, the speedup obtained by IT-TEA over other

algorithms is enhanced because of the small template sizes and high autocorrelation

between consecutive rotated/scaled template versions.

174

Table 5.14: Rotation and Scale invariant template matching: dataset for characterrecognition

Letter Tmp. Size Ref. Size Letter Tmp. Size Ref. Sizea 19×14 679×889 o 18×17 671×1215c 19×15 755×977 p 25×17 702×1206e 17×15 552×1005 s 18×12 711 × 1224g 26×16 593×1209 v 18×17 681 × 1271i 25×8 907×1263 w 19×23 756 × 1341k 25×17 684×1031 x 18×16 475 × 1463m 18×24 647×1046 z 19×15 291 × 758

Table 5.15: Rotation and Scale invariant template matching: Total execution time(in seconds) for IT-TEA and other algorithms.

Dataset IT-TEA ZNccEbc ZBPC FFT SAD Spat

a 43.06 1164.7 849.77 4975.6 322.97 836.80c 41.20 1187.2 880.01 4769.8 372.29 891.00e 40.12 1091.7 784.39 4808.9 308.02 807.84g 50.37 974.53 1230.8 4761.0 516.43 1245.2i 47.64 445.77 682.37 4804.7 253.40 679.14k 45.45 643.08 1285.9 4756.3 578.13 1289.0m 63.67 760.49 1250.7 5132.5 467.56 1311.3o 42.67 699.70 921.67 4803.1 370.72 955.35p 46.43 559.51 1286.8 4845.2 546.72 1288.7s 38.01 680.66 712.88 4815.8 260.81 706.61v 39.75 682.33 927.68 4829.8 387.66 954.36w 45.21 1311.5 1264.9 5046.7 571.79 1322.9x 41.85 733.22 878.91 4886.5 399.99 888.03z 40.22 1219.6 900.72 4816.2 457.56 893.48

175

Table 5.16: Total execution time (T ) (sec) and average percent elimination (E) forcross-correlation (ψ), NCC (φ) and the correlation-coefficient (ρ).

Dataset Tψ Tφ Tρ Eψ Eφ EρPED 23.614 45.75 58.318 99.36 82.24 80.5DS1.a 213.66 967.32 732.12 95.13 75.4 83.21DS1.b 316.42 1321.6 861.47 94.66 70.5 85.32DS1.c 548.14 1387.9 980.37 92.56 71.42 85.79DS1.d 777.27 1462.1 1069.9 89.13 72.34 86.29DS1.e 931.09 1572.1 1174.4 85.26 72.6 86.49

5.4.6 Performance Comparison of Different Correlation Based

Measures

We compared the execution times and the computation elimination performance of the

three correlation based similarity measures: cross-correlation, NCC and correlation-

coefficient on six datasets: DS1 (a, b, c, d, e) and PED. For DS1, IT-TEA and for

PED Inter-Ref-TEA has been used for comparison. The total execution time and

the average computation elimination per frame is reported in Table 5.16.

In these experiments we observe that cross-correlation is the fastest of the three

measures. Maximum speedup obtained by cross-correlation over NCC is 4.527 times

and over correlation coefficient is 3.427 times. NCC was found to be faster than

correlation coefficient over PED datasets while slower on DS1 datasets. This may

be because of the fact that NCC is not robust to additive intensity variations and

therefore in the presence of such variations the magnitude of NCC maximum may

reduce, causing a reduction in elimination and an increase in execution time. However,

it may be pointed out that the relative speedups are data dependent and may vary

for other datasets.

5.5 Conclusion

In Chapter 5 we have demonstrated that the transitive property of the correlation

based match measures may be exploited for fast template matching by developing

176

different elimination algorithms. Three variations of transitive elimination algorithms

are presented which cater different types of the template matching problems. The

proposed algorithms have exhaustive equivalent accuracy and are compared with

currently known fast exhaustive techniques on a wide variety of real image datasets.

Our empirical results, based on the correlation of almost 8465 templates with 424

reference images, demonstrate that the proposed algorithms have outperformed the

current known algorithms by a significant margin.

Chapter 6

PARTIAL CORRELATION ELIMINATION

ALGORITHMS

Bound based computation elimination algorithms are of special interest in mission

critical applications because these algorithms guarantee exhaustive equivalent ac-

curacy despite large amount of skipped computations. In Chapter 3, elimination

algorithms were broadly divided into two categories, complete elimination algorithms

and partial elimination algorithms. The category of complete elimination algorithms

contains transitive elimination algorithms discussed in the last chapter. Transitive

elimination algorithms exploit strong autocorrelation present in a template matching

system to skip computations and to obtain high speedup. Strong autocorrelation may

be found in many template matching systems, however it cannot be guaranteed in

general. As discussed in the last two chapters, in the absence of strong autocorre-

lation, the speed up performance of transitive elimination algorithms may degrade.

Therefore, in such cases, a more generic elimination scheme is required which should

not be dependent on the autocorrelation function. In the current chapter, we propose

partial correlation elimination algorithms for correlation coefficient based fast tem-

plate matching. These algorithms are generic and the performance of these algorithms

is independent of the autocorrelation function of the template matching system.

Most of the existing partial elimination algorithms have been developed for sim-

ple image match measures including Sum of Absolute Difference (SAD) and Sum of

Squared Differences (SSD). However, these measures are not invariant to brightness

and contrast variations which frequently occur in most of the practical problems. As

compared to SAD and SSD, correlation coefficient is more robust and also invariant to

linear intensity distortions, and therefore preferred if such distortions are present. A

wide variety of applications using correlation coefficient as a preferred match measure

have been listed in Chapter 1. Therefore an efficient partial elimination scheme for

correlation coefficient based template matching is of significant practical importance.

177

178

The partial elimination algorithms for SAD and SSD, for example Partial Distor-

tion Elimination (PDE) algorithms and Sequential Similarity Detection Algorithms

(SSDA), exploit the fact that these measures grow monotonically as consecutive pix-

els are processed within a block at a particular search location. The final value of

distortion is always equal to or larger than the intermediate values. Therefore, the

basic underlying principle of these algorithms is to skip the remaining computations,

as soon as the current value of distortion exceeds previous known minimum. It is be-

cause of the fact that, a location with partial distortion larger than the current known

final distortion cannot compete the currently known best match location. Therefore,

all such locations are skipped without any loss of accuracy.

Partial elimination techniques as applied to SAD or SSD cannot be extended in a

straight forward manner to speed up correlation coefficient based image matching, be-

cause of the two unfavorable properties. Firstly, the growth of correlation coefficient

is non monotonic as consecutive pixels within a block are processed. Therefore any

intermediate value may not be guaranteed to be larger than the final value. Secondly,

the best match location over the entire search space is often defined as the location

exhibiting maximum value of correlation coefficient. Hence a previously known max-

imum may not be exploited to discard the remaining computations of a block at an

intermediate stage. This is why partial elimination algorithms have largely been con-

sidered inapplicable to correlation coefficient based template matching (Brown, 1992;

Ziltova and Flusser, 2003; Pratt, 2007; Barnea and Silverman, 1972; Wu, 1995a), with

the exception of recently proposed technique (Mattoccia et al., 2008b), which we have

discussed in detail in Chapter 3.

One of the major contributions of this thesis is the development of partial elimi-

nation algorithms for correlation coefficient based fast template matching. In these

techniques, we extend the concept of PDE and SSDA algorithms for correlation coeffi-

cient based template matching, therefore, by analogy, we have named these techniques

as Partial Correlation Elimination (PCE) algorithms. PCE algorithms are based on

a monotonic formulation of correlation coefficient. To the best of our knowledge, this

form has not been proposed before us, to speed up the template matching process. If

correlation coefficient is computed using this formulation, the similarity starts from

+1 at the first pixel of a block and monotonically decreases to the final value till

179

0 11 9.99E-012 9.99E-013 9.89E-014 9.82E-015 9.81E-016 9.78E-017 9.71E-018 9.64E-019 9.59E-01

10 9.58E-0111 9.57E-0112 9.56E-0113 9.56E-0114 9.53E-0115 9.45E-0116 9.43E-0117 9.34E-0118 9.26E-0119 9.25E-0120 9.22E-0121 9.22E-0122 9.13E-0123 9.10E-0124 9.02E-0125 8.73E-0126 8.59E-0127 8.59E-0128 8.59E-0129 8.57E-0130 8.45E-0131 8.45E-0132 7.94E-0133 7.87E-0134 7.60E-0135 7.49E-0136 7.49E-0137 7.45E-0138 7.35E-0139 7.35E-0140 7.24E-0141 7.17E-0142 7.16E-0143 7.14E-0144 7.13E-01

-0.21

0

0.21

0.42

0.63

0.84

1.05

0 8 16 24 32 40 48 56 64

Gro

wth

of C

orre

lati

on C

oeff

icie

nt

Pixels Processed

0.680

0.464

0.108

-0.210

1.00

-0.3

-0.1

0.1

0.3

0.5

0.7

0 8 16 24 32 40 48 56 64

Gro

wth

of C

orre

lati

on C

oeff

icie

nt

Pixels Processed

0.464

0.680

0.108

-0.210

Figure 6.1: If computations are done with traditional correlation-coefficient formu-lations, partial value of similarity grows non-monotonically. Such growth patternsare shown for four different pairs of 8 × 8 pixels image blocks, yielding correlation-coefficient to be {0.464, 0.680, 0.108, -0.210}.

the end of the computations (Figure 6.1). Any intermediate value of similarity is

always larger than (or equal to) the final value. The speed up occurs because at any

point during the computation, if similarity happens to be less than a previous known

maximum (or an initial threshold), the remaining computations become redundant

and may be skipped without any loss of accuracy. As the total amount of skipped

computations increases, the template matching process accelerates accordingly. In

this chapter we present only the basic mode PCE algorithm while further extensions

of PCE algorithm will be discussed in Chapter 7.

In PCE algorithm, the amount of eliminated computations depends on the location

and value of the current known maximum. High value of maximum found at the start

of the search process may significantly increase computation elimination and hence

reduce the execution time. If an approximate position of the maximum is known

from the context of the problem, such as in block motion estimation, search process

may start from that location. If no such guess is known, we propose an intelligent

re-arrangement of PCE computations, motivated by the two-stage template matching

technique (Vanderbrug and Rosenfeld, 1977), as a means of finding a high threshold

180

0 11 9.99E-012 9.99E-013 9.89E-014 9.82E-015 9.81E-016 9.78E-017 9.71E-018 9.64E-019 9.59E-01

10 9.58E-0111 9.57E-0112 9.56E-0113 9.56E-0114 9.53E-0115 9.45E-0116 9.43E-0117 9.34E-0118 9.26E-0119 9.25E-0120 9.22E-0121 9.22E-0122 9.13E-0123 9.10E-0124 9.02E-0125 8.73E-0126 8.59E-0127 8.59E-0128 8.59E-0129 8.57E-0130 8.45E-0131 8.45E-0132 7.94E-0133 7.87E-0134 7.60E-0135 7.49E-0136 7.49E-0137 7.45E-0138 7.35E-0139 7.35E-0140 7.24E-0141 7.17E-0142 7.16E-0143 7.14E-0144 7.13E-01

-0.21

0

0.21

0.42

0.63

0.84

1.05

0 8 16 24 32 40 48 56 64

Gro

wth

of C

orre

lati

on C

oeff

icie

nt

Pixels Processed

0.680

0.464

0.108

-0.210

1.00

-0.3

-0.1

0.1

0.3

0.5

0.7

0 8 16 24 32 40 48 56 64

Gro

wth

of C

orre

lati

on C

oeff

icie

nt

Pixels Processed

0.464

0.680

0.108

-0.210

Figure 6.2: If computations are done by our proposed monotonic formulation, par-tial value of similarity monotonically decreases from +1.00 to correlation-coefficientbetween the two image blocks. Monotonic growth pattern is shown for the same fourpairs of 8 × 8 pixels image blocks as used in Figure 6.1.

early in the search process. In the first stage of the proposed technique, only a small

portion of the template is matched at all search locations. Based on the partial

result, complete correlation coefficient is computed at the best match location, which

is used as the initial threshold in the second stage. By using this strategy, we may

quickly find a high threshold at no additional computational cost and speed up is

obtained at no loss of accuracy. This initialization scheme is effective for small to

medium sized templates, while for larger template sizes initialization of PCE with

coarse-to-fine scheme is more efficient. Two-stage PCE is exact, having exhaustive

equivalent accuracy. In contrast, the existing two-stage algorithm for normalized

cross-correlation (NCC) developed by Goshtasby et al. (1984), is approximate with

non-zero probability of missing NCC maximum.

181

6.1 Monotonic Formulation of Correlation Coeffi-

cient

Correlation coefficient between a template image t and any search location in the

reference image ri, i ∈ {1, 2, 3, ...p}, each of size m× n pixels, is defined as (Haralick

and Shapiro, 1992)

ρt,i =

m∑x=1

n∑y=1

(t(x, y)− µt) (ri(x, y)− µi)√m∑x=1

n∑y=1

(t(x, y)− µt)2

√m∑x=1

n∑y=1

(ri(x, y)− µi)2

. (6.1)

This may be written as the normalized dot product of two vectors,

ρt,i =m∑x=1

n∑y=1

δt(x, y)

σt

δi(x, y)

σi, (6.2)

where δt and δi are the mean-subtracted versions of the template and reference lo-

cation respectively and σt and σi are proportional to the standard deviation of the

respective signals.

Partial elimination algorithms require a monotonic behavior of the partial similarity

value, which is the summation in Equation 6.2. We observe that in the currently used

form of correlation coefficient, no monotonic behavior exists. This is because of the

fact that δt(x, y) evaluates to a positive number if t(x, y) > µt, a negative number if

t(x, y) < µt, and zero if t(x, y) = µt. Similarly, δi(x, y) may also evaluate to be posi-

tive, negative or zero, depending upon the value of µi. After processing each location

(x, y), the summation in Equation 6.2 may increase if both δt(x, y) and δi(x, y) have

same sign, may decrease if δt(x, y) and δi(x, y) have opposite signs, or may remain

same if anyone of δt(x, y) and δi(x, y) is zero. Therefore, the partial similarity value

will vary non-monotonically and no direct relationship may be established between

any intermediate value and the final value (Figure 6.1).

To derive a form of correlation coefficient which exhibits monotonic growth, we ob-

serve that the norm of each of the vectors in Equation 6.2 is unity. This implies

182

that

m∑x=1

n∑y=1

δ2t (x, y)

σ2t

+m∑x=1

n∑y=1

δ2i (x, y)

σ2i

= 2. (6.3)

From 6.2 and 6.3:

ρt,i = 2 +m∑x=1

n∑y=1

δt(x, y)δi(x, y)

σtσi− δ2

t (x, y)

σ2t

− δ2i (x, y)

σ2i

. (6.4)

Rearranging and simplifying:

ρt,i = 1− 1/2m∑x=1

n∑y=1

(δt(x, y)

σt− δi(x, y)

σi)2. (6.5)

The summation in 6.5 may be viewed as the square of the normalized Euclidean dis-

tance and each term in this summation is the square of the distance or dissimilarity

presented by the corresponding pixel. Therefore, as consecutive pixels are processed,

only positive values (or zeros) are subtracted from the previous value of partial simi-

larity.

We find that the formulation of correlation coefficient as given by Equation 6.5,

has already been reported in the statistics literature, for example see Rodgers and

Nicewander (1988). However, its implications for the template matching problem and

its use for the computational speed up have not been identified before us (Mahmood

and Khan, 2007a).

6.2 Basic Mode Partial Correlation Elimination Al-

gorithm

The best match of a template image t over p search locations may be defined as the

search location maximizing correlation coefficient ρt,i

imax = arg maxi|ρt,i| ∀ 1 ≤ i ≤ p. (6.6)

183

Let λt,i(u, v) be the value of partial similarity between t and ri, computed over u rows

and v columns, such that 0 ≤ u ≤ m and 0 ≤ v ≤ n. From 6.5, it follows that

λt,i(u, v) = 1− 1/2u∑x=1

v∑y=1

(δt(x, y)

σt− δi(x, y)

σi)2. (6.7)

λt,i(u, v) will monotonically decrease from +1 to ρt,i as (u, v) increases from (0, 0) to

(m,n). Due to monotonic decreasing pattern of λt,i(u, v), it is an upper-bound on

ρt,i:

λt,i(u, v) ≥ ρt,i ∀(0 ≤ u ≤ m, 0 ≤ v ≤ n). (6.8)

The key idea of PCE algorithm is, after processing some initial number of pixels,

(u, v) = (u0, v0) at ri, if λt,i(u0, v0) is found to be less than a previous known correla-

tion coefficient maximum or correlation threshold, ρth, then final value of correlation

coefficient ρt,i is also guaranteed to be less than ρth. Therefore further computations

between t and ri become redundant and may be skipped without impacting the search

for the best match location. The comparison of λt,i(u, v) with ρth is called the elimi-

nation test. If λt,i(u, v) < ρth, the elimination test is true and consequently, remaining

computations may be skipped. If λt,i(u, v) ≥ ρth, then the elimination test is false

and computations must be continued further. After processing more pixels, the elim-

ination test needs to be re-evaluated, as the partial similarity value may have further

reduced. Thus to correlate a block, the elimination test may have to be evaluated

multiple times, until the test is true, or the computations get completed otherwise.

6.3 Two-Stage Basic Mode PCE Algorithm

Like other elimination algorithms, in Basic Mode PCE algorithm as well, the amount

of eliminated computations strongly depends upon the position of a maximum in

the search process. A high maximum found at the start of the search process may

enhance the elimination performance significantly, as compared to a maximum found

near the end of the search process. For small sized templates, we find that coarse-to-

fine scheme (Mattoccia et al., 2008a) fails to find an effective initial threshold. This

is because the coarser representation of a smaller sized template become too small to

184

remain unique and may match at any arbitrary location. As an example, if 20 × 20

pixels template is low pass filtered by a mask of size 3 × 3 and sub-sampled once,

it reduces to 9 × 9 pixels. One more low-pass filtering and sub-sampling reduces

its size to 3 × 3 pixels with all values close to the average intensity. The remaining

information in the template image is too small to yield a correct match.

Due to lack of efficiency of coarse-to-fine scheme, we have to develop some other

initialization scheme for small templates. We find the concept of two-stage template

matching (Vanderbrug and Rosenfeld, 1977) to be quite helpful in this regard. For

small template sizes, we can rearrange computations in PCE algorithm by dividing

the template into two portions, a smaller portion to be matched in the first stage and

a larger portion to be matched in the second stage. At the end of the first stage, we

choose a search location with maximum value of the partial correlation, and perform

complete computations at this location. The final value of correlation coefficient

found at this search location is used as initial threshold in the second matching stage.

In the two-stage Basic Mode PCE algorithm, we select only one elimination test for

all rows in the first stage while one test for each row in the second stage. If there

are n rows in the template and k are the number of rows to be matched in the first

stage then total number of elimination tests are n − k + 1. The first stage consists

of one scan of the complete search space. In this scan, at each search location, only

k template rows are matched using the basic monotonic formulation and the partial

results are preserved. The search location with best partial results over k rows is

considered to be the guess of the best match location. At this location, complete

correlation coefficient, ρth, is computed which is used as threshold in the following

stage.

The second stage consists of again one more scan of the search space. During this

scan, at each valid search location, elimination test is executed by using the threshold

found at the end of the first stage. Search locations where partial correlation result

over k rows, λt,i(k, n), is found to be less than ρth, are eliminated from the search

space. At each of the non-skipped location, computations are starting from k + 1st

row, until that location gets eliminated or the computations get completed. If final

value of correlation is larger than ρth, then ρth will immediately be updated.

185

The overheads of two-stage PCE include one more scan of the search space and one 2D

memory array required to store the temporary results. The results of two-stage PCE

are same as the exhaustive template matching techniques without any loss of accuracy.

Our proposed two-stage technique is better than the technique proposed by Goshtasby

et al. (1984) because that was an approximate technique with no guarantee of always

finding the correlation maximum.

6.4 Overheads of Basic Mode PCE Algorithm

The overhead of Basic Mode PCE may be found by comparing it with the tradi-

tional fast spatial domain implementation (Haralick and Shapiro, 1992) of correlation

coefficient:

ρt,i =1

σtσiψt,i −mn

µtσt

µiσi, (6.9)

where ψt,i is cross-correlation term :

ψt,i =m∑x=1

n∑y=1

t(x, y)ri(x, y). (6.10)

The speed up in this form comes from the efficient computation of the first and second

order statistics, µt, µi, σt, and σi, which can be computed at any location in a few

operations via the running sum approach.

Computationally, the traditional implementation via Equation 6.9 will be more effi-

cient than the computation of correlation coefficient via Equation 6.5, if the entire

computations are to be completed without any computation elimination. This is

because the dominant cost of computing ρt,i via Equation 6.9 is that of ψt,i, which

requires O(2mn) operations at one search location. In contrast, Equation 6.5, if im-

plemented efficiently, requires at least O(5mn) operations at each search location.

Although both implementations have same growth rates, O(mn), the constant factor

is 2.5 times bigger for monotonic formulation. However, we experimentally observe

that this factor is actually smaller than 2.5 times and is dependent on the template

sizes. For very small template sizes, 4 × 4 pixels, if no elimination is done with

186

Figure 6.3: Two frames from each of the movies: ‘Fast and Furious’, ‘Batman Begins’,‘King Kong’, ‘Under World’, ‘Spider Man’ and ‘Pink Floyd’ are shown from top tobottom respectively. Horror movies like ‘Under World’ contain significant frame toframe brightness variations.

187

the monotonic formulation, both formulations take same amount of execution time.

However, as the template size increases, this factor becomes larger than 1.00. For

templates of size 32 × 32, we observe monotonic formulation with no elimination is

slower than the traditional form by a factor of almost 1.5.

For smaller sized templates, the monotonic overhead is small and is easily offset by

the amount of eliminated computations. For larger sized templates, the overhead

of Basic Mode PCE may erode some of the computational advantage realized by

the computation elimination. Therefore, another version of PCE algorithm has been

developed which is faster on larger sized templates. We have named this version as

Extended Mode PCE and discussed in Chapter 7.

6.5 Experiments with Basic Mode PCE Algorithms

We have performed extensive empirical evaluation of the proposed algorithms on the

commonly used small template sizes ranging from 4× 4 pixels to 21× 21 pixels. For

larger templates sizes, extended mode PCE will be used, which is discussed in Chapter

7. In the datasets used in this chapter, each template is an independently captured

image, containing natural, and in some cases, synthetically generated distortions.

The basic mode partial correlation elimination algorithm is implemented in C++

and compared with the currently known fast exhaustive template matching tech-

niques including a highly optimized implementation of FFT known as FFTW3 (Frigo

and Johnson, 2005), Zero-mean Enhanced Bounded Correlation(ZNccEbc) (Mattoccia

et al., 2008b) and an exhaustive spatial domain implementation (Spat) (Haralick and

Shapiro, 1992) based on 6.9. The implementation of ZEBC algorithm was provided

by the original authors (Mattoccia et al., 2008b). Besides correlation coefficient, we

have also implemented Sum of Absolute Differences (SAD) with Partial Distortion

Elimination (PDE) (Montrucchio and Quaglia, 2005) and Successive Elimination Al-

gorithm (SEA) (Li and Salari, 1995) optimizations.

The execution times are measured on Dell Inspiron 6400, with Intel Core 2 CPU

2.13 GHz processor and 2GB physical memory. The datasets, executable scripts and

188

Table 6.1: Dataset description for the block motion estimation experimentsMovieName Dataset FrameSize BlockSize #BlocksFastFurios FF4 256×608 4×4 18151

BatmnBegns BB8 288×704 8×8 10432KingKong KK8 240× 640 8×4 16610

UnderWorld UW12 272 ×640 12×12 4498SpiderMan SM12 224 × 512 12×8 4515PinkFloyd PF12 287×346 12×4 7804Metallica MT16 240×352 16×16 1320Blade-2 BL16 336 × 608 16×12 4117

ReturnKing RK16 259 × 640 16×8 4948MissionImp MI16 218×516 16×4 6621PiratCaribb PC8 368×720 8×12 9761

detailed results are available on our web site: http://cvlab.lums.edu.pk/pce.

6.5.1 Block Motion Estimation Experiments Using Basic Mode

PCE

These experiments are performed in the scenario of block matching for motion es-

timation. The use of correlation coefficient for block motion estimation has been

motivated by Mahmood et al. (2007). These experiments are performed on 11 differ-

ent datasets (Mahmood and Khan, 2007a), taken from different commercial movies

(Table 6.1 and Figures 6.3 and 6.4). In these experiments, the current video frame

is divided into non-overlapping blocks and each block is matched with temporally

previous frame using full frame search technique. In basic mode PCE, elimination

tests are performed at the end of each row. For PCE and ZEBC algorithms an ini-

tial correlation threshold of 0.90 has been used. In ZEBC algorithm, the partition

parameter has been selected to be {4, 8, 8, 6, 6, 6, 8, 8, 8, 8, 8} respectively.

Total execution time for the block matching experiment, over five frames in each

dataset, is shown in Table 6.2 which includes all computational overheads including

the file I/O. In these experiments, PCE algorithm has been found to be 28.03 times

faster than FFTW3, 29.93 times faster than ZEBC, 19.92 times faster than SPAT. Av-

erage computation elimination over all experiments in this section is 91.2% for ZEBC,

189

Figure 6.4: A pair of selected frames from each of the movies: ‘Metallica’, ‘Blade 2’,‘Mission Impossible’ and ‘Pirates of the Caribbean’. In the scene taken from ‘Blade2’, only light intensity varies over a static scene.

190

86.3% for PCE, and 94.6% for SAD. ZEBC has higher elimination than PCE, however

due to small template sizes, the bound computation cost has increased than the com-

putation elimination benefit, therefore ZEBC has remained significantly slower than

PCE. In these experiments SAD has remained faster than all correlation coefficient

based algorithms however the margin between SAD and PCE is significantly smaller

as compared to other algorithms.

6.5.2 Feature Matching Experiments Using Basic Mode PCE

Algorithm

These experiments are performed in the scenario of feature matching for point cor-

respondence. This experiment has been performed on a video dataset obtained from

a small UAV. The UAV dataset consists of 74 frames each of size 240 × 320 pixels

(see Figure 6.5). Due to un-stability of vehicle, the viewing geometry continuously

changes resulting in projective distortions. In each video frame, 1000 best features are

marked by using KLT feature tracker (http://www.ces.clemson.edu). Each feature

from the current frame is matched with only 1000 features in the next frame, using

correlation coefficient to find the best match. In case of FFTW3 implementation,

FFT of full reference frame is computed only once for all features. FFT of each fea-

ture template is computed and pixel by pixel multiplied with the full frame FFT of

the reference frame to get convolution in the spatial domain. In this implementation,

there are 1001 transforms of size 240 × 320 for each frame. Second approach which

we have not used, is to take FFT of a small portion around each feature pair to be

matched. In that case, the number of transforms will be 1000,000 per frame, each of

size double than the size of the feature. Overall cost will further increase due to large

number of transforms. Total execution time for FFT, FFTW3, Spat, ZEBC and PCE

algorithms is shown in Table 6.4. The maximum speed up of PCE, over FFTW3 is

168.68, over ZEBC is 13.60 and over Spat is 3.30 times. Due to a lot of redundant

computations and small template sizes, the performance of FFTW3 has significantly

degraded.

191

Figure 6.5: Selected frames from a video obtained by camera mounted on an Un-manned Aerial Vehicle (UA) video. Images obtained from UAV manufacturing com-pany SATUMA.

192

Table 6.2: Total execution time (sec) for the block motion estimation experimentsDataset FFTW3 ZEBC SPAT PCE SAD

FF4 1141.1 1218.2 593.2 40.7 127.2BB8 1141.2 1081.7 1608.0 80.7 21.4KK8 1102.4 1084.9 355.1 78.83 22.0

UW12 364.5 300.0 270.3 64.3 18.3SM12 201.5 193.097 152.6 30.9 6.2PF12 363.7 277.8 167.48 37.3 6.9MT16 54.4 49.2 58.7 11.6 5.8BL16 551.0 404.16 397.4 66.394 15.8RK16 445.4 311.9 280.4 57.3 14.9MI16 329.5 288.6 189.3 41.1 5.3PC8 1116.9 1450.1 630.6 125.6 40.6

Table 6.3: Percent computation elimination comparison for the block motion estima-tion experiments

Dataset FFTW3 ZEBC SPAT PCE SADFF4 - 50.371 - 86.882 74.037BB8 - 89.277 - 85.934 98.142KK8 - 91.509 - 84.671 98.114

UW12 - 96.996 - 85.433 95.050SM12 - 98.187 - 86.727 97.534PF12 - 98.321 - 85.53 98.021MT16 - 98.381 - 87.64 91.166BL16 - 97.466 - 89.115 97.062RK16 - 96.999 - 86.671 96.377MI16 - 99.052 - 84.539 98.904PC8 - 86.645 - 85.990 96.254

193

Figure 6.6: Selected frames from Night time Highway (NH) video. Dataset obtainedby hand held SONY Handycam video camera.

194

Table 6.4: Total execution time (sec) for the feature tracking experiments to get pointcorrespondences

FeatSize FFTW3 ZEBC SPAT PCE5× 5 15560.607 1579.17 124.06 112.707× 7 13345.641 1590.65 130.43 107.009× 9 13238.55 1593.81 145.88 117.20

11× 11 17822.29 1597.99 161.57 122.9313× 13 12695.35 1596.72 172.41 129.2215× 15 22454.87 1554.33 187.31 133.1217× 17 15650.54 1619.38 212.43 139.0019× 19 11821.91 1614.19 218.92 147.5321× 21 16288.27 1557.50 246.00 151.05

6.5.3 Feature Tracking Experiments Using Two-stage Basic

Mode PCE Algorithm

The two-stage PCE experiments have been performed on three datasets, including

the night time highway video, UAV video, time lapsed still camera cloud images. In

each of these datasets, manually extracted feature templates from a single frame are

tracked in the remaining frames.

In Night time Highway (NH) video dataset (Figure 6.6), the only illumination source

is the headlights and the rear lights of the vehicles, which result in uneven scene

illumination. The UAV dataset is same as shown in Figure 6.5 and used in the

last subsection. CLoud tracking (CL) dataset (Figure 6.7) consists of cloud images

acquired by a still camera with 60 seconds duration between two consecutive images.

The cloud structure is non rigid and illumination conditions also vary over time,

which makes the tracking process quite hard. Table 6.5 may be seen for further

dataset details.

In these experiments, two-stage Basic mode PCE has been used. The first stage

consists of only 1 row for templates of size 4×4 to 18×18, and 2 rows for 19×19

to 21×21. In the second stage, elimination test is evaluated at the end of each

row. An initial threshold of 0.90 has been used for both PCE and ZEBC algorithms.

If a maximum higher than 0.90 is found in the first stage, that maximum is used

as threshold in the second stage otherwise threshold remains 0.90. In ZEBC, the

195

Table 6.5: Dataset description for the experiments on feature tracking across videoframes

Dataset # of Feat Feat Size # of Frames Frame SizeNH04 60 4 × 4 82 288 × 360NH05 60 5 × 5 82 288 × 360NH06 60 6 × 6 82 288 × 360NH07 60 7 × 7 82 288 × 360NH08 60 8 × 8 82 288 × 360NH09 60 9 × 9 82 288 × 360UA10 73 10 × 10 35 240 × 320UA11 73 11 × 11 35 240 × 320UA12 73 12 × 12 35 240 × 320UA13 73 13 × 13 35 240 × 320UA14 73 14 × 14 35 240 × 320UA15 73 15 × 15 35 240 × 320CL16 58 16 × 16 15 360 × 648CL17 58 17 × 17 15 360 × 648CL18 58 18 × 18 15 360 × 648CL19 58 19 × 19 15 360 × 648CL20 58 20 × 20 15 360 × 648CL21 58 21 × 21 15 360 × 648

196

Figure 6.7: Selected frames from Cloud Tracking (CT) dataset. The dataset is ob-tained by a still image camera, in a fixed position and taking cloud images after 60seconds interval.

197

NH04 NH06 NH08 UA10 UA12 UA14 CL16 CL18 CL200

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Datasets With Increasing Feature Size

Exe

cutio

n T

ime

in S

econ

ds

SpatFFTW3ZNccEbcPCE

NH Dataset CL DatasetUA Dataset

Figure 6.8: Plot of execution time for two-stage basic mode PCE experiments ( Table6.6), normalized to 100 templates and 100 reference frames for each dataset.

partition parameter has been selected to be {4, 5, 6, 7, 8, 9, 5, 11, 6, 13, 7, 5, 8, 17,

9, 19, 10, 7} respectively. The total execution time for FFT, FFTW3, ZEBC, and

PCE is given in Table 6.6. In this experiment, PCE algorithm has remained faster

than the other algorithms over all template sizes. The maximum speed up of PCE

over FFT is 44.74, over FFTW3 is 9.16, over SPAT is 7.12, and over ZEBC is 9.04

times.

Maximum, minimum, and average speed up of PCE algorithm as compared to other

algorithms, and confidence interval for confidence level of 0.95 is reported in Table

6.7. Average speed up of PCE along with confidence intervals is also plotted in Figure

6.9.

198

Table 6.6: Total execution time (sec) comparison in two-stage basic mode PCE ex-periments for template sizes ≤ 21× 21 pixels

Dataset FFT FFTW3 ZEBC Spat PCENH04 943.07 281.81 175.07 82.16 33.49NH05 1015.32 288.33 266.91 136.37 38.60NH06 998.74 417.40 237.24 144.16 45.72NH07 1057.75 279.49 236.46 205.98 55.38NH08 982.42 497.47 386.53 185.16 54.31NH09 1030.78 350.89 327.01 258.14 75.27UA10 4870.46 352.71 383.74 524.30 108.87UA11 4851.96 819.82 529.14 529.38 110.99UA12 4911.31 894.62 346.98 822.15 115.13UA13 4986.54 565.11 575.46 1071.88 120.09UA14 5237.24 892.43 369.44 956.55 122.41UA15 5221.95 1007.37 349.35 1179.15 130.42CL16 727.75 106.47 112.44 213.15 35.23CL17 723.05 176.21 200.74 269.55 41.21CL18 737.98 132.43 118.04 250.05 42.64CL19 739.53 113.02 211.56 307.95 52.09CL20 721.40 169.88 140.55 255.15 60.45CL21 727.02 148.37 105.74 351.30 63.80

Table 6.7: Maximum, minimum and average speed up of Two-Stage Extended ModePCE for feature tracking experiment (Table 6.6). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/

√N are also computed for α = .05 (confidence level of 0.95), zα = 1.645 and σ

is standard deviation of the speed up for N = 12 datasets.Dataset FFT FFTW3 ZNccEbc Spat PCE

Max Speedup 44.74 9.16 7.12 9.04 1Min Speedup 11.40 2.17 1.66 2.45 1

Average Speedup 26.43 5.54 4.10 5.35 1Confidence Interval 26.43±4.88 5.54±0.965 4.10±0.579 5.35±0.772 1±0

199

FFT FFTW3 ZNccEbc Spat PCE0

2

46

810

1214

1618

2022

24

2628

3032

34

Sca

led

Ave

rage

Exe

cutio

n T

imes

Figure 6.9: Plot of average execution time speed up of Two-stage Extended ModePCE with Coarse-to-Fine initialization on Video Geo-registration dataset. Confidenceintervals for confidence level of 0.95 are also plotted. Corresponding values may beseen from Table 6.7.

6.6 Conclusion

In this chapter we have presented the basic formulation of Partial Correlation Elimi-

nation (PCE) algorithm for correlation coefficient based fast template matching. An

effective initialization strategy has also been developed which we have named as ‘Two-

stage Basic Mode PCE’ algorithm. For small template sizes, coarse-to-fine scheme

often fail to yield effective initialization, however the two-stage approach has often

been found effective. Basic Mode PCE and Two-stage Basic Mode PCE algorithms

are exact, having exhaustive equivalent accuracy. These algorithms are compared

with existing fast exhaustive techniques including ZEBC and FFTW3 based imple-

mentations of correlation coefficient. On small sized templates ranging from 4 × 4

pixels to 21 × 21 pixels, PCE algorithms have outperformed other algorithms up to

two orders of magnitude in some cases.

For template sizes larger than 21 × 21 pixels, the overhead of Basic-Mode PCE in-

creases, reducing the speed up margin. Therefore, for the medium and larger sized

200

templates, we have developed an extension of PCE algorithm, which will be discussed

in the following chapter.

Chapter 7

EXTENDED MODE PARTIAL CORRELATION

ELIMINATION ALGORITHMS

In Chapter 6, Basic Mode Partial Correlation Elimination (PCE) algorithms were

discussed which are based on the monotonic formulation of correlation coefficient.

When correlation coefficient is computed using this formulation, the similarity starts

from +1 at the first pixel of a block and monotonically decreases to the final value

at the last pixel of the block. Any intermediate value of similarity is always larger

than (or equal to) the final value. The speedup occurs because at any point during

the computation, if similarity happens to be less than a previous known maximum,

the remaining computations become redundant and may be skipped without any loss

of accuracy.

The computational overhead of Basic Mode PCE algorithm was discussed in Chapter

6. In the basic monotonic formulation of correlation coefficient, the cost of processed

pixels is larger than the efficient spatial domain formulations. This computational

overhead of monotonic formulation increases as the template size increases. It is

because of the fact that when template size increases, the number of pixels to be

processed before elimination may take place, also increases, which causes an increase

in the direct computational cost. Due to this computational overhead, for larger sized

templates, some of the speedup obtained by computation elimination may get eroded.

Therefore, Basic Mode PCE as discussed in Chapter 6 was more suitable for only the

small sized templates, while the algorithm presented in the current chapter is more

efficient on medium and large sized templates. Although we successfully reduce one

type of overhead cost, another overhead has increased in the extended mode, which

may make PCE less efficient on small templates. Therefore, on small templates, Basic

Mode PCE is still more efficient.

In this chapter, we have derived another monotonic formulation of the correlation

201

202

coefficient which has reduced the computational cost of processed pixels to the min-

imum level equivalent to the other spatial domain efficient formulations. We have

named the algorithm based on this formulation as ‘Extended Mode PCE’ algorithm.

Due to lower cost of processed pixels, Extended Mode PCE is faster than Basic Mode

PCE on medium and large sized templates. In Extended Mode PCE, although the

cost of processed pixels is reduced, however another overhead which is the cost of

elimination test is increased than Basic Mode PCE formulation. In order to reduce

the cost of elimination tests, the number of elimination tests should also be kept as

minimum as possible. For this purpose, we have developed an algorithm which de-

termines the number of elimination tests and the efficient test locations as well. In

addition, we have also developed an algorithm for the selection between Basic Mode

PCE and Extended Mode PCE, based on the comparison of overheads associated

with both of the algorithms.

In Extended Mode PCE, the amount of eliminated computations may significantly

increase if a maximum of larger magnitude is found near start of the search process.

For medium sized templates, Extended Mode PCE may be initialized by using the

two-stage approach as discussed in Chapter 6, in the context of Basic Mode PCE. For

larger sized templates, Extended Mode PCE may also be initialized by coarse-to-fine

scheme (A. Rosenfeld, 1977). Extended Mode PCE with any of these initialization

schemes has exhaustive equivalent accuracy. The proposed algorithms are compared

with the current known fast exhaustive equivalent accuracy algorithms, including

a frequency domain sequential implementation of FFT (William et al., 2007), an

optimized, adaptive and parallel implementation FFTW3 (Frigo and Johnson, 2005),

a very fast spatial domain implementation ZEBC (Mattoccia et al., 2008b), and with

a spatial domain efficient exhaustive implementation (Pratt, 2007). The comparisons

are done over a wide variety of datasets and on 22× 22 to 128× 128 pixels template

sizes. For medium sized templates, 22× 22 to 48× 48, Extended Mode PCE is found

to be faster than all other techniques including FFTW3 based implementation. For

larger sized templates, 64×64 to 128×128, the performance of Extended Mode PCE

was some times better than EBC and FFTW3 and in some cases equivalent to these

techniques.

203

7.1 Extended Mode PCE Algorithm

Basic Mode PCE algorithm, discussed in Chapter 6, is based on the following mono-

tonic formulation of correlation coefficient:

λt,i(u, v) = 1− 1/2u∑x=1

v∑y=1

(t(x, y)− µt

σt− ri(x, y)− µi

σi)2, (7.1)

where t(x, y) is the template image intensity at (x, y) position and ri is the ith search

location in the reference image of size p× q pixels. In this formulation, the template

related term, (t(x, y)− µt)/σt, may be pre-computed only once because the template

image has only one mean term, µt, and only one variance term, σt. There are only

m×n normalized template terms, which may be easily stored. However, the reference

image term (r(x, y) − µi)/σi cannot be pre-computed, because in that case, at each

search location we need to save m × n terms, which explodes the size of reference

image to p×q×m×n. Therefore, the reference image terms have to be computed for

each pixel at each search location. This results in total 5 operations for each processed

pixel in Equation 7.1. We want to reduce this computational cost to 2 operations per

pixel as in the following non-monotonic but efficient formulation:

ρt,i =1

σtσiψt,i −mn

µtσt

µiσi, (7.2)

where ψt,i is cross-correlation term :

ψt,i =m∑x=1

n∑y=1

t(x, y)ri(x, y). (7.3)

The dominant computational complexity of ρt,i by Equation 7.2 is the computation

of ψt,i by Equation 7.3, which is two operations per processed pixel, including one

multiplication operation and one addition operation.

Correlation coefficient computation by Equation 7.2 is efficient because pre-computable

terms are separated from the run-time computable terms. The reduction in compu-

tational cost of Equation 7.1 is also possible by using a similar strategy. For this

204

purpose, we expand Equation 7.1:

λt,i(u, v) = 1− 1

2σ2t

u∑x=1

v∑y=1

t2(x, y)− 1

2σ2i

u∑x=1

v∑y=1

r2i (x, y)

+1

σt(µtσt− µiσi

)u∑x=1

v∑y=1

t(x, y)− 1

σi(µtσt− µiσi

)u∑x=1

v∑y=1

ri(x, y)− uv1

2(µtσt− µiσi

)2

+1

σtσiψt,i(u, v). (7.4)

Thus we have separated the run time computable cross-correlation term ψt,i(u, v),

from the pre-computable terms. We simplify and regroup the pre-computable terms,

while maintaining the monotonic growth property:

λt,i(u, v) = 1− σt(u, v) + σi(u, v)

2+ψt,i(u, v)− µt,i(u, v)

σtσi(7.5)

The term σt(u, v) in Equation 7.5 is a template image statistic and may be pre-

computed at specific (u, v) locations only once for each template image.

σt(u, v) =

u∑x=1

v∑y=1

t2(x, y)− 2µtu∑x=1

v∑y=1

t(x, y) + uvµ2t

σ2t

(7.6)

The term σi(u, v) in Equation 7.5 is a search location statistic and may be pre-

computed at specific (u, v) locations only once for a given set of search locations.

σi(u, v) =

u∑x=1

v∑y=1

r2i (x, y)− 2µi

u∑x=1

v∑y=1

ri(x, y) + uvµ2i

σ2i

(7.7)

If partial summations are available, the computation of σi(u, v) requires 9 operations.

The term µt,i(u, v) in Equation 7.5, is a hybrid statistic to be computed from both

the search location and the template, therefore this term cannot be pre-computed.

µt,i(u, v) = µi

u∑x=1

v∑y=1

t(x, y)− uvµiµt + µt

u∑x=1

v∑y=1

ri(x, y) (7.8)

205

If partial summations are available, the computation of µt,i(u, v) require 7 computa-

tions.

Complete evaluation of Equation 7.5 is required only when elimination test is to be

executed, otherwise computations proceed by just computing ψt,i(u, v). If elimina-

tion test is never executed, and complete computations are to be performed, putting

(u, v) = (m,n), σt(m,n) and σi(m,n) evaluate to 1 and µt,i(m,n) evaluate to mnµtµi.

Substituting these values in Equation 7.5, it reduces to Equation 7.2. Therefore, if

no elimination test is to be executed, the cost of Extended Mode PCE is exactly

same as that of the efficient spatial domain form given by Equation 7.2. Although

in Extended Mode PCE the cost of processed pixels has reduced as low as possible,

the cost of elimination test has been increased from one simple comparison in Basic

Mode, to about 22 operations in Extended Mode. This high overhead of each test in

Extended Mode PCE is balanced, on the other hand, by its advantage of exploiting

pre-computation, which is not possible in Basic Mode PCE. Since for larger template

sizes, the ratio of the number of tests to the total number of pixels is significantly

small, Extended Mode PCE becomes more efficient for larger sized templates.

For extended mode PCE, it is more important to design an effective testing scheme

such that the cost of elimination tests is minimized without significant reduction in

computation elimination. This means that pixel locations at which the elimination

test will be executed have to be carefully selected. Having too many test locations

within a block will have the advantage of identifying the possibility of elimination

as soon as the partial value drops below the threshold. However, the overhead of

conducting the test will be increased. If test locations are reduced, the penalty of

not identifying that the partial sum has reduced below the threshold till the next test

may be large.

In the following section we present criteria for PCE algorithm mode selection, an

algorithm for the total number of elimination tests to be performed and the efficient

test locations as well, for high elimination to be achieved.

206

7.2 PCE Mode Selection and Finding Efficient Test-

ing Scheme

For a specific dataset, the selection of Basic Mode or Extended Mode PCE may

be done by comparison of the overheads associated with each algorithm. For Basic

Mode PCE, the cost of processed pixels is larger while in Extended Mode, the cost of

elimination tests is more. Therefore, mode selection requires estimation of the total

computations to be performed and the total number of elimination test executions.

Total computations consist of the number of pixels to be processed at each search

location until the growth curve intersects the correlation threshold. These computa-

tions are dependent on two important factors: the slope of the monotonic decreasing

curve and the magnitude of the current known maximum used as the threshold (Fig.

7.2). If the slope of the growth curve is large and the known maximum is high, the

growth curve will intersect the threshold after processing only a few pixels and the

remaining pixels will constitute the eliminated computations.

At a particular search location, the slope of the monotonic growth curve depends

on two factors: the final value of correlation coefficient, ρt,i, and the distribution of

dissimilarity over the set of pixels to be processed. If the final value of correlation

coefficient is significantly low, the slope of the growth curve will be large and con-

sequently the number of processed pixels will be small, because the threshold for

elimination will be reached faster. If most of the search locations produce very low

correlation coefficient, the amount of total performed computations will reduce and

eliminated computations will increase. Therefore, the amount of total performed

computations strongly depends upon the probability distribution function of ρt,i, in

the range of -1.00 to +1.00. A distribution function skewed towards negative side will

reduce the amount of performed computations, on average, when positive maximum

has to be searched. Fig. 7.1 shows correlation coefficient histograms for four different

datasets. We observe that the shape of the histogram varies depending on the size of

the template and the contents of the images to be matched. Due to large variations in

image content, a generic parametric form of correlation coefficient distribution may

not be useful in practice.

207

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

Correlation Coefficient Values

Rel

ativ

e F

requ

enci

es32 × 32

16 × 16

24 × 24

48 × 48

Figure 7.1: Correlation coefficient histograms plotted for five templates from fourdifferent datasets: Cl dataset, template is 16 × 16, IP dataset template is 24 × 24 ,IT dataset template is 32× 32, and VG dataset template is 48× 48. Dataset detailsare given in Section 5.4.

The amount of total performed computations also depends upon the number of elim-

ination tests and the locations at which these tests are executed. If only one elim-

ination test has to be executed, the location of this test may be determined by the

intersection of the average growth curve and the correlation threshold. If more than

one elimination tests are to be performed, then we need to divide the correlation

range, -1.00 to +1.00, into multiple intervals and compute average growth curves

for the search locations falling in each interval. The test locations may be found by

finding the intersection of each average growth curve and the correlation threshold.

For example, in Fig. 7.2, the range of the correlation coefficient is divided into 12

intervals (or bins) each of size 0.1667. Based upon the final value of correlation coef-

ficient, each search location is assigned to a specific interval. Average growth curves

are computed for all search locations within the same interval. The intersection of

each average curve with the threshold yields the index at which the respective test

should be executed.

We observe that the average curves in Fig. 7.2 are close to straight lines, therefore the

average distribution of dissimilarity may be assumed to be approximately uniform over

208

all pixels. For a specific search location, the average dissimilarity per pixel, E[∆2t,i],

may be defined as the total dissimilarity divided by the number of pixels:

E[∆2t,i] =

1

mn

m∑x=1

n∑y=1

(δt(x, y)

σt− δi(x, y)

σi)2. (7.9)

Using Equation 6.5, Equation 7.9 may be written in the following form

E[∆2t,i] =

2(1− ρt,i)mn

. (7.10)

If the value of the current known maximum or correlation threshold is ρth, then the

number of pixels to be processed such that the partial value of similarity becomes

equal to ρth is given by:

rc =2(1− ρth)E[∆2

t,i], (7.11)

where r are the rows and c are the columns to be processed. From Equations 7.10

and 7.11

rc =(1− ρth)(1− ρt,i)

mn. (7.12)

In general, if the number of tests to be performed is ν, in order to find each of the test

locations, the range of correlation coefficient is divided into ν intervals. Any search

location within the kth interval, 1 ≤ k ≤ ν, will yield ρt,i, such that −1+2(k−1)/ν <

ρt,i ≤ −1 + 2k/ν. Let rk be the number of rows and ck be the number of columns to

be processed to eliminate all search locations within the kth bin. We have to select

ρt,i in 7.12 equal to the upper boundary of the interval, −1 + 2k/ν, to get the kth test

location

rkck =ν(1− ρth)2(ν − k)

mn. (7.13)

The association between a particular search location and the interval it will finally

map to, is not known in advance. Therefore the elimination test designed to elim-

inate search locations in the kth interval, has to be performed on all existing (non-

eliminated) search locations.

If the number of elimination tests is already known, Equation 7.13 may be used

to compute efficient test locations. The maximum number of elimination tests in

209

0 4 8 12 16 20 24 28 32 36 40 44 48−0.666

−0.499

−0.332

−0.165

0.002

0.169

0.336

0.503

0.67

0.837

1

Elimination Test Indexes/No. of Processed Rows

Par

tial C

orre

latio

n V

alue

Bin 12

Bin 11

Bin 10

Bin 8

Bin 7

Bin 6

Bin 5

Bin 9

Bin 4

Bin 2

Bin 3

1 2 3 4 5 6 7 8 9

Figure 7.2: Average growth curves of monotonic correlation coefficient plotted for 48×48 pixels 106 randomly selected templates from VG dataset. Correlation coefficientrange is divided into 12 bins. For a threshold of 0.837, 9 elimination tests are foundto be executed after processing 4, 5, 6, 7, 9, 12, 16, 26, and 48 rows respectively.Elimination tests are aligned with row boundaries.

Extended Mode PCE is constrained due to the test overhead cost. A comparison

between the elimination test overhead cost in Extended Mode PCE and the direct

computation cost overhead in Basic Mode PCE may be used for selection of a specific

mode, as well as the maximum number of elimination tests if Extended Mode is

selected.

The overhead of Basic Mode PCE may be estimated by computing the total amount

of performed computations which requires correlation coefficient histogram. Let nk

be the count of search locations in kth bin, and Pr{k} = nk/p be the probability that

a search location will map to the kth bin. Since the number of processed pixels are

ckrk, computations done in the kth bin are pckrkPr{k}, and total computations to be

performed, wt, are given by the summation over all bins. Substituting value of ckrk

from Equation 7.13

wt = mnpν(1− ρth)

2

ν∑k=1

1

(ν − k)Pr{k}, (7.14)

210

The overhead of Extended Mode PCE may be estimated by computing total number

of elimination test executions. Each elimination test is performed upon different

number of search locations. For example, first elimination test is performed upon

all search locations, 2nd test is performed upon all less than the locations in 1st bin,

and so on. Let lk be the number of search locations on which kth elimination test is

executed

lk = p(1−k−1∑i=1

Pr{k}). (7.15)

Total number of elimination test executions, lt, is given by the summation over all

tests

lt =ν∑k=1

p(1−k−1∑i=1

Pr{k}) (7.16)

which may be simplified to

lt = pν∑k=1

k Pr{k}. (7.17)

After estimating the total performed computations and the total number of elimina-

tion test executions, we may proceed to the comparison of Basic Mode and Extended

Mode PCE overheads. If the computational cost of one elimination test in Extended

Mode is ct operations, the ratio of computational cost of Basic Mode to Extended

Mode is cm, then Extended Mode will be preferred only if ctlt + wt < cmwt or from

7.14 and 7.17

ν∑k=1

k Pr{k} ≤ ν(cm − 1)(1− ρth)mn2ct

ν∑k=1

1

ν − kPr{k} (7.18)

For a given dataset, correlation coefficient histogram may be estimated empirically

using a representative sample. For known values of m × n, ρth, cm, and ct, the

maximum value of ν ≥ 1, which satisfy this inequality is an upper-bound on the

number of elimination tests may be performed with Extended Mode PCE. If no value

of ν ≥ 1 satisfies this inequality, Basic Mode may be selected.

As an illustration, if we assume the distribution of correlation coefficient is uniform,

211

then substituting Pr{k} = 1/ν in Equation 7.18, and simplifying we get

ν + 1 ≤ mn(cm − 1)(1− ρth)ct

ln ν. (7.19)

For a given size of template, initial correlation coefficient threshold, cost of elimina-

tion test, the overhead of monotonic formulation, and distribution of correlation coef-

ficient, 7.19 may be used to find total number of elimination tests. For m×n = 32×32,

ct = 22, cm = 2.5, ρth = 0.90, we get ν + 1 ≤ 6.98 ln ν, which is satisfied for ∀ ν ≤ 32.

For smaller template sizes, for example 20× 20, with same values for other parame-

ters, from 7.19 we get: ν + 1 ≤ 2 ln ν, which cannot be satisfied for ν > 1. Therefore,

for this size, basic mode PCE will be preferred over extended mode PEC.

Another constraint on the maximum number of tests in Extended Mode is that the

testing cost should be significantly less than the elimination benefit: et ≥ ctftlt, where

et is total elimination, ft is a factor which should be significantly larger than 1.00.

From 7.14, et = pmn− wt, or

et = mnp(1− ν(1− ρth)2

ν∑k=1

1

(ν − k)Pr{k}) (7.20)

From 7.20 and 7.17

mn(1− ν(1− ρth)ν∑k=1

Pr{k}(ν − k)

) ≥ ctft

ν∑k=1

k Pr{k} (7.21)

If all other parameters are known, this constraint gives and upper bound upon the

number of elimination tests, ν ≥ 1, that may be performed with Extended Mode

PCE.

Again, as an illustration, assuming uniform distribution of correlation coefficient,

substituting Pr{k} = 1/ν in Equation (7.21), and simplifying we get

2mn(1− ln ν(1− ρth))ct(ν + 1)

≥ ft (7.22)

for m×n = 32×32, ρth = 0.90, ft = 5, ct = 20, constraint on ν is 17.6 ≥ 1.86 ln ν+ν

which is satisfied ∀ ν ≤ 11.

212

The constraints given by inequalities 7.18 and 7.21 may be used for the selection of

Basic Mode or Extended Mode PCE algorithm, and if Extended Mode is selected,

maximum number of elimination tests is also known. The location of each of the test

in Extended Mode may be computed by using Equation 7.13. In case of Basic Mode,

elimination tests may be executed more frequently. In our implementation of Basic

Mode PCE, we have used one elimination test at the end of each row. In Extended

Mode as well, for the ease of implementation, we align the elimination tests with the

row boundaries. It may be done by using round function in Equation 7.13. Since

each test is executed at a unique row index, the number of tests in Extended Mode

may further decrease if more than one tests map to the same row index.

From the analysis given in this section as well as by performing large number of

experiments, we find that Basic Mode is more efficient on small template sizes, while

Extended Mode is more efficient on medium to large templates. In our experiments,

we observe that Basic Mode is more efficient for template sizes 4×4 to 21×21, while

Extended Mode is more efficient on all sizes ≥ 22× 22 pixels.

7.3 Initialization Schemes for Extended Mode PCE

Algorithm

The amount of eliminated computations in Extended Mode PCE algorithm strongly

depends upon the position of a maximum in the search process. A maximum found at

the start of the search process may enhance the elimination performance significantly

as compared to a maximum found near end of the search process. For larger template

sizes, we use coarse-to-fine initialization scheme (Mattoccia et al., 2008a) to find a

high correlation maximum before start of the actual search process. For medium

sized templates, we find that coarse-to-fine scheme often fails to yield an effective

initial threshold. This is because of the fact that due to low-pass filtering and sub-

sampling, the coarse representation of a medium sized template loose uniqueness and

may match at arbitrary locations (Robinson and Milanfar, 2004). When full size

template is matched at the corresponding location, no correlation maximum is found.

213

7.3.1 Extended Mode Multi-Stage PCE Algorithm

Extended Mode Multi-Stage PCE algorithm is an enhanced version of the basic mode

two-stage algorithm, as discussed in Chapter 6. Due to increased cost of the elim-

ination tests, the number of tests are significantly reduced than those used in the

basic mode. Efficient test locations may be computed by using Equation 7.13. Rows

between two consecutive test locations may be considered as one partition of the tem-

plate image. Rows before first elimination test comprise first partition, rows between

first and second elimination tests is second partition and so on.

For ease of implementation, we assume the partition boundaries are aligned with the

row boundaries. That is, no partition can have size less than one row and all partition

sizes are in term of number of complete rows. Number of rows in the first partition are

given by: ε1 = Round(r1c1), as given by Equation 7.13. ε1 is the number of rows to be

processed before first elimination test is executed, and these rows constitute the first

partition of the template image. Similarly, number of rows between first and second

elimination test constitute the second partition: ε2 = Round(r1c1) − Round(r2c2).

Note that in some cases, due to round operation, the size of a partition may evaluate

to zero. This case is encountered when the size of a partition is less than one row,

while rounding to nearest row boundary that partition will merge in a neighboring

partition. In such cases, the corresponding elimination test will also be skipped.

In multi-stage algorithm, we initially correlate first partition of template consisting of

ε1 rows, at all search locations. The partial correlation value, λt,i(ε1, n), computed at

each search location is stored and the maximum partial correlation value is tracked.

After completing scan of full search space, complete correlation value is computed at

the location exhibiting maximum partial correlation value. The complete correlation

value will serve as first initial threshold, ρth1 for the following partitions.

In the second stage, at each search location first elimination test is executed: λt,i(ε1, n) <

ρth1 . Search locations where this comparison evaluates to true are marked as skipped

from the search space. At the non-skipped search locations, only partition two rows,

ε2, are matched. The partial correlation result obtained from second partition are

accumulated with the results of first partition, λt,i(ε1 + ε2, n). Therefore, the partial

214

correlation results after matching partition two are for all rows included in partition

one and partition two. Once again complete correlation is computed at the location

with maximum partial correlation over two partitions. This correlation value will

be used as second initial threshold, ρth2 , for partition three. Note that the value of

threshold will increase as more and more partitions are processed: ρth2 ≥ ρth1 and

the partial correlation values will decrease: λt,i(ε1 + ε2, n) ≤ λt,i(ε1, n).

In the third stage, elimination test is performed at all non-skipped locations: λt,i(ε1 +

ε2, n) < ρth2 . Search locations where the test is found to be successful are again

skipped from the search space. At remaining locations, rows in partition three are

matched. Same process is repeated for the following partitions, until the value and

location of partial maximum become fixed to the same search location for multiple

iterations: ρthi−1 = ρthi . That means the maximum has converged to the correct

position, therefore further iterations become redundant. All remaining partitions

in the template image are matched with the non-skipped search locations by using

Extended Mode PCE algorithm in only one scan of the search space.

The execution time performance of multi-stage algorithm depends on the convergence

of maximum found at the end of each stage, to the global maximum. If convergence is

fast and obtained in only one or two stages, the execution time will significantly reduce

in the remaining computations. If convergence is slow, requiring a large number of

stages, execution time will increase. We observe that convergence of multi-stage

algorithm depends on the downward slope of the monotonic decreasing growth curve.

In Figure 7.2, average growth curves are shown for different final values of correlation

coefficient. Curves having final values on the lower end have larger slopes, while

curves having final values on the higher side have relatively smaller slopes. For the

search locations having perfect correlation coefficient score of +1, the slope of the

monotonic decreasing curve will be zero.

For a particular template image, growth curve at each search location have a different

average slope. Growth curve at best match location has minimum average slope,

while for the locations having larger dissimilarities, growth curves have larger slopes.

Convergence of the multi-stage algorithm depends on the estimation of average slope

of growth curves in as few stages as possible. Once the growth curve having minimum

215

average slope is identified, the algorithm convergence is complete, because global

maximum is found.

The overhead of multi-stage algorithm is the multiple scans of the search space.

That is, if there are k stages, then there are k scans of the search space. A fast

converging multi-stage algorithm may have only two stages, and corresponding only

two scans of the search space. A slow converging algorithm may have many stages

and therefore increased cost of multiple scans of the search space. In case of slow

convergence, as the stage number increases, the number of search locations to be

processed decreases. Therefore, when the percentage of remaining locations reduces

below a certain threshold, for example ≤ 5%, algorithm may switch to the last stage.

7.3.2 Initialization of Extended Mode PCE with Coarse-to-

Fine Scheme

Coarse-to-Fine scheme is a fast technique for searching approximate location of the

maximum in a large search space. We have discussed this technique in Chapter 3

among the large search space approximate techniques. Although this technique is

not effective to initialize small and medium sized templates, ≤ 48 × 48, it is quite

effective for larger sized templates, ≥ 80× 80 pixels. We observe that for larger sized

templates, initialization of Extended Mode PCE with coarse-to-fine scheme is more

efficient than initialization by the multi-stage algorithm discussed in the last section.

As discussed in the last section, the efficiency of multi-stage algorithm depends on the

speed of convergence of approximate maximum to the global maximum. This converge

depends on the accuracy of predicting average slope of growth curve by processing

as few pixels as possible. Multi-stage algorithm will converge quickly if the growth

curve with minimum average slope is identified by processing only few pixels. On the

other hand, the multistage algorithm will converge slowly if large number of pixels is

to be processed to identify the growth curve with minimum average slope.

As the template size increases, the number of pixels required to estimate the average

slope of monotonic decreasing growth curve also increases. It is because, by increasing

template size, average contribution of each pixel in the slope of growth curve decreases.

216

It may also be observed that the average amount of normalized distortion per pixel

decreases by increasing the template size. It is because of the fact that total amount

of distortion is 1.00, which is to be shared by all pixels. If template size is 4 × 4

pixels, average contribution per pixel is 1/16 and if template size is 80× 80, average

contribution per pixel is 1/6400. The average contribution of one pixel in 4 × 4

template is same at the contribution of 400 pixels in 80× 80 template. Therefore, for

larger template sizes, the use of coarse-to-fine scheme for finding the initial correlation

threshold becomes more efficient as compared to multi-stage approach.

Multiple implementations of coarse-to-fine scheme are possible to find a high initial

threshold for Extended Mode PCE algorithm. In our implementation, we have down-

sampled both the template and the reference images by 1/4 in each dimension. That

resulted in a reduction of 1/16 in the number of pixels of both images. This is also

equivalent to reducing the image sizes to the second pyramid level. Reduced template

is matched at all search locations in the reduced reference image. The best match

location found at the coarser level is projected to the actual images and full sized

template is matched in a 5× 5 block around the expected maximum location in the

actual reference image. Maximum value of correlation coefficient found in the 25

locations is selected as the initial threshold for Extended Mode PCE algorithm.

The computational overhead of our implementation of coarse-to-fine scheme is neg-

ligibly small. Assuming the size of the template to by m × n pixels and size of the

reference image to be p× q pixels, the spatial domain template matching complexity

is O(mnpq). The overhead of matching the reduced template with the reduced refer-

ence image is O(mnpq256

), which turns out to be only 0.390% of the total spatial domain

computations. Therefore, the computational overhead of coarse-to-fine scheme may

easily be ignored.

217

7.4 Experiments with Extended Mode PCE algo-

rithm

We have performed extensive empirical evaluation of Extended Mode PCE algorithms

on template sizes ranging from 22 × 22 pixels to 128 × 128 pixels. In our datasets,

each of the template is an independently captured image, containing natural, and in

some cases, synthetically generated distortions. Extended Mode PCE algorithms are

implemented in C++ and compared with the currently known fast exhaustive tem-

plate matching techniques including a sequential implementation of FFT (William

et al., 2007), highly optimized parallel implementation FFTW3 (Frigo and Johnson,

2005), ZNccEbc algorithm (Mattoccia et al., 2008b) and an exhaustive spatial do-

main implementation (Haralick and Shapiro, 1992). The implementation of ZNccEbc

algorithm was provided by the original authors. Besides correlation coefficient, we

have also implemented Sum of Absolute Differences (SAD) with optimizations pro-

posed by Montrucchio and Quaglia (2005) and Li and Salari (1995). The execution

times are measured on Dell Inspiron 6400, with Intel Core 2 CPU 2.13 GHz processor

and 2GB physical memory. The datasets, executable scripts and detailed results are

available on our web site: http://cvlab.lums.edu.pk/pce.

7.4.1 Feature Tracking with Extended Mode Two-stage PCE

Algorithm

These experiments have been performed for feature tracking in Infra Red (IR) video

datasets. Two Infra Red (IR) video datasets acquired in two different scenarios have

been used: IR Pedestrian video (IP dataset) used for tracking humans and IR Traffic

video (IT dataset) used for tracking vehicles. Due to very low incident energy, both

IR videos suffer from significant background noise which has been removed by simple

averaging technique. See Table 7.1 and Fig. 7.3 for dataset details.

In Extended Mode Two-stage PCE Algorithm, the number of elimination tests has

been evaluated to be {7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10 } respectively, by using Equations

7.18 and 7.21. For all sizes, 2 rows are required to be processed before executing the

218

Table 7.1: Dataset description for two-stage Extended Mode PCE experiments onfeature tracking across video frames

Dataset # of Feat Feat Size # of Frames Frame SizeIP22 27 22 × 22 56 240 × 360IP23 27 23 × 23 56 240 × 360IP24 27 24 × 24 56 240 × 360IP25 27 25 × 25 56 240 × 360IP26 27 26 × 26 56 240 × 360IP27 27 27 × 27 56 240 × 360IT28 33 28 × 28 74 240 × 360IT29 33 29 × 29 74 240 × 360IT30 33 30 × 30 74 240 × 360IT31 33 31 × 31 74 240 × 360IT32 33 32 × 32 74 240 × 360IT33 33 33 × 33 74 240 × 360

first elimination test. For the sake of simplicity, the first stage consists of only one

elimination test, which is first two rows of each template. An initial threshold of 0.90

has been used for both PCE and ZEBC algorithm. If a maximum higher than 0.90

is found in the first stage, that maximum is used as threshold in the second stage

otherwise threshold remains 0.90. In ZEBC algorithm, the partition parameter has

been selected to be {11, 23, 12, 5, 13, 9, 14, 29, 10, 31, 8, 11} respectively.

The average computation elimination for ZEBC, PCE and SAD are 99.57%, 90.56%,

and 88.88% respectively. ZEBC has achieved maximum computation elimination over

all template sizes, however still the cost of elimination test in ZEBC is larger than the

benefit obtained by computation elimination. Therefore ZEBC has remained slower

than PCE. The total execution time for FFTW3, ZEBC, PCE, and SAD is given in

Table 7.3. PCE algorithm has remained faster than all the other algorithms over all

template sizes. The maximum speedup of PCE over FFTW3 is 4.94 times, over SPAT

is 12.60, and over ZEBC is 5.53. PCE has been found to be faster than SAD for only

IP27 dataset, while for the remaining datasets, SAD is faster. However SAD is not

robust to intensity and contrast variations. Due to the presence of these variations

in these datasets, the accuracy of SAD has remained less than 5%.

219

Table 7.2: Percent computation elimination comparison for the feature tracking ex-periment for template sizes ≥ 22× 22 pixels

Dataset ElimZEBC ElimTPCE ElimSADIP22 99.35 87.97 84.21IP23 99.04 88.31 83.09IP24 99.36 88.67 81.94IP25 99.37 89.04 80.74IP26 99.34 89.32 79.56IP27 99.43 89.60 78.40IT28 99.85 91.69 96.76IT29 99.76 91.94 96.60IT30 99.88 92.22 96.51IT31 99.76 92.45 96.38IT32 99.89 92.67 96.24IT33 99.88 92.85 96.11

Maximum, minimum, and average speed up of PCE over other algorithms, and con-

fidence interval for confidence level of 0.95 is reported in Table 7.4. Average speed

up of PCE along with confidence intervals is plotted in Figure 7.5.

7.4.2 Template Matching with Extended Mode Two-Stage

PCE Algorithm

These experiments are performed upon Video Geo-registration (VG) dataset taken

from (Mahmood and Khan, 2010). With addition of some new frame sizes it consists

of 300 square templates of each size: {47, 48, 63, 64, 79, 80, 95, 96, 111, 112, 127,

128}. The reference image is 736 × 1129 pixels. Further dataset details may be found

in Chapter 5.

In Extended Mode PCE algorithm, the number of elimination tests evaluated to {10,

10, 12, 12, 14, 14, 16, 16, 19, 19, 22, 22} respectively by using Equations 7.18 and

7.21, for ρth = 0.90. The partition parameter in ZEBC has been selected to be {47,

8, 9, 8, 79, 8, 5, 8, 37, 8, 127, 8} respectively. Transitive Elimination Algorithm

(TEA) (Mahmood and Khan, 2010) for correlated templates has also been run upon

VG dataset. The GOP parameter was initialized to 7, while the actual GOP length

220

(a)

(b)

Figure 7.3: (a) Four frames from IR camera pedestrian video dataset. (b) Four framesfrom IR traffic video dataset

221

Table 7.3: Total execution time (sec) comparison for two-stage extended mode PCEexperiments, for template sizes ≥ 22× 22

Dataset FFTW3 ZEBC SPAT TPCE SADIP22 221.03 211.22 400.68 70.26 49.42IP23 247.24 365.91 529.76 66.21 59.02IP24 296.13 279.19 540.40 73.30 64.42IP25 105.67 134.24 565.04 93.26 75.79IP26 208.77 197.33 610.40 81.40 80.03IP27 247.24 189.76 552.72 81.71 94.96IT28 213.56 195.38 520.96 59.83 15.05IT29 215.86 308.35 614.20 58.26 17.27IT30 281.64 168.53 568.32 56.95 17.58IT31 99.23 278.20 495.06 59.77 19.99IT32 236.65 138.86 700.04 55.54 20.26IT33 136.23 160.07 666.00 58.38 23.19

22X22 24X24 26X26 28X28 30X30 32X32 33X330

500

1000

1500

2000

2500

Template Size

Exe

cutio

n T

ime

in S

econ

ds

ZEBC

SPAT

TPCE

SAD

FFTW3

Figure 7.4: Plot of execution time for two-stage extended mode PCE experiments(Table 7.3), normalized to 100 templates and 100 reference frames for each dataset.

222

Table 7.4: Maximum, minimum and average speedup of Two-Stage Extended ModePCE for feature tracking experiment (Table 7.3). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/


is standard deviation of the speedup for N = 12 datasets.Dataset FFTW3 ZEBC SPAT PCE SAD

Max Speedup 4.95 5.52 12.60 1 1.16Min Speedup 1.13 1.44 5.70 1 0.25

Average Speedup 3.17 3.33 8.58 1 0.615Confidence Interval 3.17±0.52 3.33±0.596 8.58±1.02 0 0.615±0.153

FFTW3 ZNccEbc Spat PCE SAD0

2

4

6

8

10

Sca

led

Exe

cutio

n T

ime

(Sec

)

Figure 7.5: Plot of average execution time speedup of Two-stage Extended Mode PCEalong with confidence intervals for confidence level of 0.95 are plotted. Correspondingvalues may be seen from Table 7.4.

223

Table 7.5: Percent computation elimination comparison for the feature tracking ex-periment for template sizes ≥ 22× 22 pixels

Dataset ElimZEBC ElimTPCE ElimSADIP22 99.35 87.97 84.21IP23 99.04 88.31 83.09IP24 99.36 88.67 81.94IP25 99.37 89.04 80.74IP26 99.34 89.32 79.56IP27 99.43 89.60 78.40IT28 99.85 91.69 96.76IT29 99.76 91.94 96.60IT30 99.88 92.22 96.51IT31 99.76 92.45 96.38IT32 99.89 92.67 96.24IT33 99.88 92.85 96.11

Figure 7.6: Video Geo-registration (VG) dataset: reference image fromearth.google.com and templates from terraserver.microsoft.com.

224

was computed at run time.

In these experiments, total execution time of FFT, ZEBC, SPAT, TEA, and PCE al-

gorithms is given in Table 7.7 which includes all computational overheads. Maximum

speedup of PCE over FFT is 11.32, ZEBC is 5.06, SPAT is 22.49, and over TEA is

2.17 times. For VG80 dataset, ZEBC has remained 1.44 times faster than PCE, while

on VG63, VG64, VG95, and VG128 datasets, both algorithms have performed quite

similar. It is because of the fact that ZEBC performed quite well on template sizes

which are composite of small prime numbers while its performance deteriorated upon

template sizes which are composite of large prime numbers, for example, for dataset

VG047 execution time of ZEBC is 609.44 seconds while for VG048 its execution time

is only 180.86 seconds. In contrast, PCE algorithm has no issue with any particular

size and has performed equally well upon prime and non-prime sizes.

In comparison of PCE with TEA, TEA has also remained faster than PCE algo-

rithm on 4 datasets with maximum speedup of 1.32 times. The speedup of TEA

depends upon inter-template auto-correlation. If inter-template auto-correlation is

high TEA will be faster, and if low, TEA will be slower. A high autocorrelation can-

not be guaranteed in all template matching scenarios. In contrast, PCE algorithm

have no performance dependence upon autocorrelation, therefore PCE algorithm has

significantly broader scope than TEA.

7.4.3 Coarse-to-Fine Initialization of Extended Mode PCE

Algorithm

In coarse to fine scheme, we have used Gaussian low-pass filter of 5 tabs to filter

the templates and the reference images and then down-sampled by 1/2. The down-

sampled images are again low-pass filtered and again down sampled by 1/2. Each

of the coarse image representation have size 1/16 of the original image size. The

computational cost of the coarse to fine initialization scheme is 1/256 of the spatial

domain template matching, which is significantly small cost.

In VG dataset, for template sizes ≥ 79×79 pixels, coarse to fine initialization scheme

remains successful to find high initial threshold while for the smaller sized templates,

225

0 4 8 12 16 20 24 28 32 36 40 44 48−1

−0.8333

−0.6666

−0.4999

−0.3332

−0.1665

0.0002

0.1669

0.3336

0.5003

0.667

0.8337

1

Number of Processed Rows

Par

tial C

orre

latio

n C

oeffi

cien

t

threshold=0.90

Figure 7.7: Correlation coefficient from -1 to +1 is divided into 12 bins of equalsize and average growth curves of search locations within each bin are computed.Number of locations in each bin is found to be: 0, 4025, 25093, 56970, 113044,187363, 167027, 113314, 52598, 20953, 3294, and 47 respectively and average valueof correlation coefficient in each bin is found to be: -0.7059, -0.5693, -0.4082, -0.2406,-0.0812, 0.0796, 0.2431, 0.4036, 0.5683, 0.7047, 0.8925. This experiment is done forfirst template in VG48 dataset.

226

Table 7.6: Video Geo-Registration Dataset for Experiments on Larger Sized Tem-plates (Mahmood and Khan, 2010)

Dataset # Frames Frame Size # of Ref. Ref. SizeVG047 300 47 × 47 1 736 × 1129VG048 300 48 × 48 1 736 × 1129VG063 300 63 × 63 1 736 × 1129VG064 300 64 × 64 1 736 × 1129VG079 200 79 × 79 1 736 × 1129VG080 200 80 × 80 1 736 × 1129VG095 200 95 × 95 1 736 × 1129VG096 200 96 × 96 1 736 × 1129VG111 150 111 × 111 1 736 × 1129VG112 150 112 × 112 1 736 × 1129VG127 100 127 × 127 1 736 × 1129VG128 100 128 × 128 1 736 × 1129

Table 7.7: Total execution time in seconds for the Video Geo-Registration datasetwith Two-Stage PCE implementation

Dataset FFT ZEBC Spat TPCE TEAVG047 1254.38 609.44 1175.50 120.36 247.96VG048 1289.64 180.86 1194.20 113.88 247.92VG063 1256.11 222.92 2256.00 224.97 250.86VG064 1230.05 194.29 2185.20 211.43 251.00VG079 1242.3 965.41 3488.7 230.31 261.42VG080 1249.8 202.97 3195.1 292.49 261.75VG095 1234.6 311.26 4518.1 325.59 265.47VG096 1265.9 247.77 4531.4 239.99 265.64VG111 1296.3 444.6 6205.6 293.16 267.56VG112 1292.3 288.26 6049.6 268.94 267.3VG127 1247.7 1277.9 7154.4 348.51 267.99VG128 1267.8 352.83 7052.4 353.34 268.29

227

we observe that this scheme has high failure rates. If coarse-to-fine scheme fail to

produce an initial threshold larger than 0.90, we use 0.90 as initial threshold in

ZEBC and PCE algorithms. If coarse-to-fine scheme successfully finds a high initial

threshold, we use Extended Mode PCE algorithm, without two-stage optimization.

Total number of elimination tests and the test locations are computed for initial

correlation threshold of 0.90. In ZEBC, the partition parameter has been selected to

be the same as in experiments performed in the last subsection.

Both ZEBC and PCE algorithms are initialized with coarse-to-fine scheme. Gaussian

level 2 pyramids are used to find an approximate matching location. The best match

location is found at level 2 and projected to the actual size. Around approximate

location, 5 × 5 correlations are done for fine search. The maximum found by this

scheme is used as ρth if it is larger than 0.90, otherwise ρth = 0.90 has been used.

In PCE experiments with coarse to fine scheme, the execution time for different

algorithms is shown in Table 7.8. In these experiments, PCE has remained faster

than ZEBC over 8 datasets with a maximum speedup of 6.15 times, while on the 4

remaining datasets the performance of PCE is close to ZEBC. In comparison with

FFTW3, the PCE algorithm has remained faster over 6 datasets with maximum

speedup of 1.60, while FFTW3 has remained faster than PCE on the 6 remaining

datasets. FFTW3 adapts to the hardware to maximize performance and also utilizes

the SIMD instructions which perform same operation on all elements in a data array,

in parallel. PCE algorithm has been sequentially implemented without hardware

specific optimizations similar to those used in FFTW3 (Frigo and Johnson, 2005).

Despite the hardware specific optimizations in FFTW3, in our experiments, PCE

algorithms have remained faster than FFTW3, for template sizes from 4×4 to 48×48

consecutively and then for 64 × 64, 79 × 79, 80 × 80, and 95 × 95, while no other

spatial domain algorithm has remained faster than FFTW3 for so many sizes.

Maximum speedup of PCE over other algorithms, minimum speedup, average speedup

along with confidence interval for confidence level of 0.95 is reported in Table 7.9.

Average speedup of PCE along with confidence intervals are plotted in Figure 7.10.

228

48X48 64X64 80X80 96X96 112X112 128X128

200

400

600

800

1000

1200

1400

1600

Template Size in Pixels

Exe

cutio

n T

ime

in S

econ

ds

ZEBC

FFTW3

TEA

PCE

Figure 7.8: Plot of execution time of VG dataset with coarse to fine initializationscheme.

Table 7.8: Total execution time in seconds for Video geo-registration with coarse tofine initialization scheme used for PCE and ZEBC algorithms

Dataset FFT FFTW3 ZEBC Spat PCEVG047 1254.38 193.13 609.44 1175.50 120.36VG048 1289.64 169.10 180.86 1194.20 113.88VG063 1256.11 184.39 222.92 2256.00 224.97VG064 1230.05 221.21 194.29 2185.20 211.43VG079 1242.3 172.92 941.29 3488.7 162.4VG080 1249.8 217.86 189.24 3195.1 164.81VG095 1234.6 246.41 258.6 4518.1 221.22VG096 1265.9 180.65 212.69 4531.4 222.12VG111 1296.3 239.32 489.1 6205.6 260.42VG112 1292.3 169.74 267.76 6049.6 267.98VG127 1247.7 227.43 1645.1 7154.4 267.36VG128 1267.8 250.86 280.68 7052.4 266.58

229

48X48 64X64 80X80 96X96 112X112 128X128100

120

140

160

180

200

220

240

260

280

300

Template Size in Pixels

Exe

cutio

n T

ime

in S

econ

ds

FFTW3

PCE

TEA

Figure 7.9: Plot of execution time of PCE with coarse to fine initialization scheme,compared with FFTW3 and TEA on VG dataset.

Table 7.9: Maximum, minimum and average speedup of Two-Stage Extended ModePCE for feature tracking experiment (Table 7.8). Speedup is computed by divided theexecution time of each algorithm by the execution time of PCE. Confidence intervalszασ/


is standard deviation of the speedup for N = 12 datasets.Dataset FFT FFTW3 ZNccEbc Spat PCE

Max Speedup 11.32 1.60 6.15 26.76 1Min Speedup 4.66 0.633 0.918 9.76 1

Average Speedup 6.57 1.05 2.31 18.49 1Confidence Interval 6.57±1.071 1.05±0.138 2.31±0.983 18.49±3.13 1±0

230

FFT FFTW3 ZNccEbc Spat PCE0

2

4

6

8

10

12

14

16

18

20

22

24

Sca

led

Ave

rage

Exe

cutio

n T

imes

Figure 7.10: Plot of average execution time speedup of Two-stage Extended ModePCE with Coarse-to-Fine initialization on Video Geo-registration dataset. Confidenceintervals for confidence level of 0.95 are also plotted. Corresponding values may beseen from Table 7.9.

231

7.5 Conclusion

In this chapter we have discussed Extended Mode Partial Correlation Elimination

(PCE) algorithm which is more efficient on medium and large templates as compared

to Basic Mode PCE discussed in Chapter 6. For a given dataset, a scheme for selection

of a particular mode has also been proposed. Two effective initialization strategies

have also been proposed which include coarse-to-fine scheme for large template sizes

and two-stage PCE for medium sized templates. An algorithm for estimating total

number of elimination tests and the test locations has also been proposed. Extended

Mode PCE algorithms are exact, having exhaustive equivalent accuracy, and are

compared with existing fast exhaustive techniques including ZEBC and FFTW3. On

medium sized templates, PCE algorithm have outperformed other algorithms with

significant margin, while on the larger sized templates, PCE algorithm have shown

competitive performance.

This chapter concludes the core contribution of this thesis. In the following two

Chapters, two further research directions will be presented.

Chapter 8

COMPUTATION ELIMINATION ALGORITHMS FOR

ADABOOST BASED DETECTORS

Bound based computation elimination algorithms have been well investigated in the

perspective of fast computation of image match measures. However, we observe that

similar strategies may also be used to speed up other applications in the fields of Com-

puter Vision and Image Processing. We find that many of the object detectors may

be made faster by just rearranging the computations and terminating computations

before completion. In this regard, we have proposed early termination algorithms for

speeding up the detection phase of the AdaBoost based object detectors.

As discussed in Chapters 6 and 7, a monotonic formulation of correlation coefficient

was required for Partial Correlation Elimination (PCE) algorithms. For partial com-

putation elimination of AdaBoost based object detector, we rearrange the computa-

tions such that the detector response becomes monotonic decreasing. At a particular

search location, as soon as the response becomes less than AdaBoost global threshold,

remaining computations become redundant and may be skipped without any change

in detection accuracy.

In order to further reduce computations, we have incorporated the concept of two-

stage template matching in the framework of non-maxima suppression process. We

have developed a new non-maxima suppression algorithm that we have named as

‘Early Non Maxima Suppression’, which provides the opportunity of discarding com-

putations based on the local maximum. We have implemented the proposed algo-

rithms to speed up an AdaBoost based edge-corner detector proposed by Mahmood

(2007). Our experiments show more than an order of magnitude speed up over the

original AdaBoost detector implementation. Significant speedups are also observed

over some other edge corner detectors.

232

233

8.1 Introduction

Since the seminal work of P.Viola and Jones (2001, 2004) on real time face detection

using AdaBoost algorithm, the face detection problem has been well explored by

many other researchers as well, for example Vincenzo and Lisa (2007); Cristinacce

and Cootes (2003); Wu et al. (2004). In all of these techniques, a high speed up

has been obtained by exploiting the regularity of a human face. As an example, one

of the most extensively used rules is, if at a search location no eyes are detected,

that location cannot contain a face. Therefore, all search locations where no eyes are

detected are eliminated from the search space. Unfortunately, such rules cannot be

made for objects which do not possess a fixed orientation or highly regular patterns.

In contrast, our proposed early termination algorithms discussed in this chapter are

generic and applicable to the detection of any type of objects. In this regard we

have proposed two algorithms, a basic early termination algorithm and an early non-

maxima suppression algorithm.

In the basic early termination algorithm, each candidate location is initialized with

the total weight of the trained AdaBoost ensemble. If a weak learner classifies the

current location as a non-object, the weight of that learner is subtracted from the

current total weight. As more learners are processed, the weight of the candidate

location monotonically decreases, and as soon as the current weight becomes less than

AdaBoost global threshold, that location can never become a positive object instance,

therefore further calculations may be skipped and the location may be discarded.

In order to suppress multiple responses of the same object, only local maximum in

each locality has to be retained, and the local non-maxima candidates have to be

suppressed to zero by using a process known as Non-Maxima-Suppression (NMS).

We reduce the computations at local-non-maxima candidate locations by developing

the Early Non-Maxima Suppression (ENMS) algorithm. In ENMS algorithm, we

partially compute AdaBoost detector response at all candidate locations. In each

local NMS window, we choose the candidate location with the best partial result, and

compute the final detector response at that location. If this final response is larger

than AdaBoost classification threshold, then for the remaining candidate locations

in that NMS window, the early termination threshold is raised to the final value of

234

the local maximum. That is, in a specific NMS window, a candidate location will be

discarded as soon as the detector response falls below the local maximum or below

AdaBoost classification threshold, whichever is larger. ENMS algorithm is helpful in

reducing the redundant computations done at local non-maxima candidate locations.

The proposed early termination algorithm is incorporated within our previous imple-

mentation of AdaBoost based edge-corner detector (Mahmood, 2007). The quality of

the detected edge-corners has remained exactly the same, while the speed up over the

original algorithm is more than an order of magnitude. We have also compared the

quality and speed up of the edge-corners detected by Adaboost detector with three

other detectors including KLT detector, Harris detector (Harris and Stephens, 1988)

and Xiao’s detector (Xiao and Shah, 2003). We find that the edge-corners detected

by AdaBoost detector are of comparable quality as KLT, Harris and Xiao detectors

while the execution time speed up is up to 4.00 times faster than KLT, 17.13 times

than Harris and 79.79 times than Xiao’s detector.

8.2 Related Work

The details of AdaBoost algorithm may be found in texts on machine learning and

the details of edge-corner detection using AdaBoost algorithm may be found in our

earlier work (Mahmood, 2007). For completeness, the detection phase of AdaBoost

algorithm is briefly described as used by P.Viola and Jones (2004).

Suppose the trained ensemble of weak learners consists ofm learners, {f1, f2, f3, ...fm},ordered in the descending order of weights: {α1 ≥ α2 ≥ α3 ≥ ... ≥ αm} (Figure 8.1).

At a specific candidate location rio,jo , where (io, jo) are the coordinates of first pixel

of the search window, AdaBoost detector response is given by:

Λ(rio,jo) =m∑k=1

αkLk(riojo), (8.1)

where Lk(rio,jo) is the label of rio,jo as predicted by the learner fk. Lk(rio,jo) may have

235

0

0.02

0.04

0.06

0.08

0.1

0 10 20 30 40 50

Wea

k Le

arne

r Wei

ghts

The Selection Numbers

Figure 8.1: The weights of weak learners are not always in decreasing order withrespect to the selection number. Therefore, after training phase, the ensemble shouldbe sorted in decreasing order of weights.

236

only two values:

Lk(rio,jo) =

1 if prediction is Object,

0 otherwise.(8.2)

After evaluating Λ(rio,jo) in the whole search space, the labeling process starts: the

search locations where Λ(rio,jo) is larger than the AdaBoost global threshold Gt, are

labeled as objects, while the remaining locations are labeled as non-objects.

Lm(rio,jo) =

1 if Λ(rio,jo) ≥ Gt,

0 otherwise,(8.3)

where Lm(·) is the final label of a search location. AdaBoost global threshold Gt, is

defined as:

Gt = Tα

m∑k=1

αk, (8.4)

where 1.0 ≥ Tα ≥ 0.0.

Once labeling process is complete, Non Maxima Suppression (NMS) process has been

followed to suppress multiple responses to the same object.

8.3 AdaBoost Global Threshold Based Early Ter-

mination Algorithm

In our proposed algorithm, the current candidate search location is initially assigned

the maximum possible AdaBoost detector response, wm:

wm =m∑k=1

αk. (8.5)

Then starting with the weak learner f1, with maximum weight α1, in the trained

ensemble, we keep on evaluating learners in the order of decreasing weights: {α1 ≥α2 ≥ α3 ≥ ... ≥ αm}. If a learner predicts the current search location as object, we

take no action; however if the predicted label is 0, we subtract the weight of that

237

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

Mon

oton

ic D

ecre

asin

g R

espo

nse

The Number of Weak Learners

Figure 8.2: Monotonic decreasing ensemble response at a non-object location. Asmore and more weak learners are evaluated, the response decreases monotonically.

learner from the current value of response. Therefore, the detector response, after

processing i < m learners is given by:

Λi(rio,jo) = wm −i∑

k=1

αk(1− Lk(rio,jo)), (8.6)

In this form, AdaBoost detector response has become monotonic decreasing function

over the number of processed learners. After processing each learner, either the

response remains same or decreases (Figure 8.2). As soon as the current response,

Λi(rio,jo), falls below the global threshold Gt, computation of the remaining learners

becomes redundant and may be skipped without any loss of accuracy.

Since most of the ensemble weight is generally concentrated in the first few learners,

for non-object locations, the detector response rapidly decreases to less than global

threshold. Therefore the average number of learners to be evaluated at any location

reduces to a very small number, rendering the detection speed significantly faster,

without any loss of accuracy.

238

Figure 8.3: Image dataset consisting of still images of varying details, used for detec-tion speed up comparison.

8.4 Early Non-Maxima Suppression Algorithm

Non Maxima Suppression (NMS) process has been commonly used to suppress mul-

tiple detections corresponding to the same real world object. Assuming that the

detector response surface is smooth and considering an NMS-window of appropriate

size around the current search location, the non-maxima suppression may be de-

scribed as: if the detector response at current location is not maximum within the

NMS window, current location will be labeled as non-object, otherwise it will remain

labeled as object. That is, the label of current location rio,jo is given by:

Lm(rio,jo) =

0 if Λ(rio,jo) ≤ Λ(ri′o,j′o),

1 otherwise,(8.7)

where Λ(rio,jo) is the detector response at the current location and Λ(ri′o,j′o) is the

maximum detector response at any other location within the same NMS-window.

The early termination algorithm discussed in the last section may be integrated with

NMS process to further reduce redundant computations. If in a locality, the local

maximum Λ(ri′o,j′o) is significantly higher than the global threshold, Gt, then all search

239

Figure 8.4: View Invariance of early terminated AdaBoost detector: (a)-(b) Twoviews of LUMS library building (c)-(d) two views from hotel sequence. Red crossesshow AdaBoost detections, yellow dots show missing detections.

locations in that locality having response less than Λ(ri′o,j′o) are non-object locations.

Therefore computations at the current location will stop as soon as the detector

response falls below the local maximum, Λ(ri′o,j′o).

Lm(rio,jo) =

0 if Λi(rio,jo) ≤ max(Λ(ri′o,j′o),Gt)

1 otherwise,(8.8)

where Λi(rio,jo) is the detector response at current location for i < m learners.

In order to find local maximum, we compute AdaBoost detector response over all

search locations for first p learners such that the sum of weight of these first p learners,

wp, satisfies the following bound:

wp ≥ (1− Tα)m∑k=1

αk, (8.9)

which means that we−wp ≤ Gt, the global threshold. Partial response over p learners

is given by:

Λp(rio,jo) = we −p∑

k=1

αk(1− Lk(rio,jo)). (8.10)

Search locations where Λp(riojo) ≤ Gth are labeled as 0, while the search locations

where Λp(riojo) ≥ Gth are still undecided.

At these undecided locations, ENMS algorithm is implemented as follows: if partial

response at current search location, ri′o,j′o , is larger than the partial response at all

240

Table 8.1: Execution time (sec) of AdaBoost, Early-terminated AdaBoost, KLT,Harris, and Xiao edge-corner detectors.

Img ID AdaBoost EAdaBoost KLT Hrs Xiao

1 24.08 1.73 5.86 22.57 34.44

2 24.16 1.46 5.89 22.86 9.82

3 24.11 2.76 5.89 22.91 244.15

4 24.19 3.96 6.02 22.93 300.16

5 24.12 1.34 5.83 22.96 5.366

6 24.09 1.84 6.00 22.93 60.20

7 24.15 1.77 5.81 22.47 54.60

8 24.11 2.27 5.95 22.90 130.24

9 24.14 1.84 5.81 22.87 48.81

10 24.15 2.69 5.91 22.86 86.23

Mean 24.18 2.16 5.90 22.83 97.40

search locations within the current NMS window, calculate the complete response

over m learners at the current search location:

Λm(ri′o,j′o) = Λp(ri′o,j′o)−m∑

k=p+1

αk(1− Lk(ri′o,j′o)). (8.11)

If Λm(ri′o,j′o) ≥ Gt, all remaining search locations in the current NMS window having

partial response less than Λm(ri′o,j′o) will be labeled as non-objects:

Lm(rio,jo) =

0 if Λp(rio,jo) ≤ Λm(ri′o,j′o),

u otherwise,(8.12)

where u means label is yet undecided. At each of these undecided locations, further

learners are evaluated until that location is labeled as non-object or final response is

computed. In any locality, as soon as a maximum larger than the previous known

maximum is found, the previous best location is labeled as non-object. When all

locations are exhausted, the last undecided location in each locality will be labeled

as object.

241

8.5 Experiments and Results

The speed up generated by the proposed early termination algorithms is compared

with our previous AdaBoost implementation Mahmood (2007) as well as KLT, Harris

and Xiao’s detectors. The speed up comparison is done on a dataset of ten images

shown in Figure 8.3, each of size 2304×3072 pixels, having varying levels of details.

The number of detected edge corners varies from the minimum 309 in image 5 to

the maximum 167898 in image 3 (Figure 8.3). In Xiao and EAdaBoost detectors,

execution time increases with an increase in number of detected edge corners, while

in AdaBoost, Harris and KLT detectors the processing time remains same. The

thresholds for each algorithm are set such that the number of detected edge corners

remain approximately same. In most of the experiments, AdaBoost global threshold

was selected to be 0.70. The number of weak learners evaluated varied as p = 4, 8, 12

and 16. Figure 8.5 shows the corresponding fraction of eliminated locations. After

processing 16 learners, on the average 91.75% locations are found to be eliminated.

On the remaining locations, Early Non-Maxima Suppression algorithm was applied.

The speed up comparison is done on HP Pavilion Notebook PC with Intel Core 2

Duo CPU 2.0 GHz and 2GB RAM. The early terminated AdaBoost detector is up to

4.00 times faster than KLT detector, 17.13 times faster than Harris detector, 18.00

times faster than the traditional AdaBoost detector and up to 79.79 times faster than

Xiao’s detector (Table 1).

Invariance comparison of EAdaBoost based detector with the three other detectors is

made in the presence of view changes, scale changes, blurring noise, additive Gaussian

noise and rotation (Figure 8.4). In our experiments, we found that the quality of

AdaBoost detector is comparable with the other detectors. Moreover, the quality of

AdaBoost detector with and without early termination remains exactly same.

8.6 Conclusion

In this chapter we have presented early termination algorithms to speed up the Ad-

aBoost based edge corner detector. The proposed algorithms have been found to be

242

0

0.25

0.5

0.75

1

4 8 12 16

Frac

tion

of E

limin

ated

Loc

atio

ns

Number of Processed Weak Learners

Figure 8.5: Fraction of eliminated search locations reduces as the number of processedweak learners increases.

faster up to an order of magnitude over our original AdaBoost implementation (Mah-

mood, 2007). The proposed algorithms are exact, therefore the final results of the

AdaBoost detector remains exactly same. The proposed elimination algorithms are

more generic than those used by Viola and Jones in their face detector and may be

potentially used to speed up many other object detectors, edge corner detectors and

image feature detectors.

Chapter 9

USE OF CORRELATION COEFFICIENT FOR VIDEO

ENCODING

The process of block matching for motion estimation in video encoders may also be

considered as an application of image matching problem. In traditional encoders, the

best match position is defined by minimization of SAD and motion compensation is

done by taking simple difference between the best match location and the matched

block. In this chapter, we have explored the role of correlation coefficient in motion

compensation in video encoders. If motion estimation is done by maximization of

correlation coefficient, the best match location is also a best linearly fitting location

therefore having least linear estimation error. Using this fact, motion compensation

may be done by finding two linear parameters that relate the block and the best match

location. We theoretically show that the reduction in variance of residue is maximum

if motion estimation is done by maximization of correlation coefficient and motion

compensation is done by first order linear estimation. We find that by using the

linear motion compensation, the entropy of residue significantly reduces as compared

to the entropy of simple difference. In existing video encoders, the block mean may be

encoded in the bit stream as an extra parameter. One of the two linear parameters,

estimated in our approach, may be encoded instead of mean. Therefore, the overhead

of our proposed approach is the encoding of second parameter, which is the slope of

the best fitting line.

We have verified our findings through experimentation on a wide variety of datasets

taken from several commercial movies. In some cases, for the same number of bits

per pixel, our proposed scheme exhibits an improvement in peak signal to noise ratio

(PSNR) of up to 5 dB when compared to the traditional encoding scheme.

243

244

9.1 Block Based Motion Compensation in Video

Encoders

A digital video signal consists of a sequence of frames and is usually characterized by

strong temporal correlation between adjacent frames. This correlation is exploited in

standard video codec to achieve significant compression, resulting in storage and com-

munication efficiency. For this purpose block based motion compensation techniques

are used and have become an integral part of modern video codec such as H.263 and

H.264/AVC. Block based motion compensation involves dividing each frame into non

overlapping rectangular blocks, matching each block with another suitable block in

a previous frame and finally taking the difference of the two matched blocks. Most

video encoders use the minimization of Sum of Absolute Differences (SAD) as a cri-

terion for finding the best match for a block (Ghanbari, 2003), a process commonly

known as motion estimation. Current video codec expect high similarity between the

two matching blocks such that the variance of the difference signal is smaller than the

variance of the current block to be encoded. In video literature, this type of encod-

ing is referred to as predictive coding. However, it is important to realize that this

procedure simply takes the differential of the two matched blocks and is equivalent to

differential encoding used in audio signals. The notion of predictive encoding in audio

signals is different and involves linear estimation of a signal sample from previously

observed samples. That is, the characteristics of a sample are predicted from previous

sample values. Although existing video encoding techniques predict the motion of a

block, they do not attempt to predict the relationship between two matched blocks.

We argue that it is beneficial to predict a relationship between two matched blocks

as compared to the current practice of simply taking their difference.

For motion estimation, SAD presents a computationally efficient solution (Vanne

et al., 2006) and is therefore used in existing video encoders. However, SAD is im-

plicitly based on the ‘brightness constancy’ assumption, i.e. the intensity values of

a block of video are not expected to change from one frame to another, although

the block may undergo a spatial shift. However, such ideal conditions rarely exist:

brightness and contrast changes are frequently observed between frames, especially in

245

commercial videos. Even under simple linear changes, such as brightness variation,

SAD does not guarantee a correct match.

As a simple illustration of this fact, consider an m × n image block. Suppose that

the subsequent frame is brighter by a constant factor ∆ at each pixel. The correct

matching location for this block will thus have a SAD value of mn∆, and the difference

signal at this location will have a variance of zero. However, it is quite likely that

there would be other locations in the search area that will have a lower SAD value.

This is because addition of ∆ causes the intensity levels at the other locations to

become closer to the intensity levels of the original block to be encoded. Thus,

a motion estimator based on SAD will result in a match at an incorrect location

where the variance of the difference signal can potentially be much higher than zero.

Hence, even under simple variations from the brightness-constancy assumption, SAD

no longer remains an accurate motion estimator for video encoding.

We consider the use of first order linear estimator to model the changes in intensity

of a block from frame to frame. This choice is motivated by observing the brightness

and contrast changes in real videos. Instead of taking the difference between two

matching blocks, we estimate one from the other and take the difference between

the actual and estimated values. That is, if r2 is the best match of r1, we use r2

to compute r1 as Minimum Mean Squared Error (MMSE) linear estimate of r1, and

then consider the estimation error, r1 − r1, as residue for further processing. We

show that the variance of linear estimation error is always smaller than the variance

of simple differences, r1 − r2, leading to better compression and resulting in storage

and communication efficiency. We further show that, when r1 − r1 is used instead of

r1 − r2, the optimal criterion for finding the best match is the maximization of the

magnitude of correlation coefficient. The proposed scheme, Video Coding with Linear

Compensation (VCLC), captures all first order variations in video signals. We have

observed with experimentation on eight commercial videos that considering a non

linear predictor instead of a linear predictor results in diminishing gains, indicating

that the video signal changes from frame to frame are well modeled by linear predictor.

246

9.2 Problem Definition

We consider a digital video signal as a sequence of frames F indexed at discrete time

k. For the purpose of encoding, each frame F (k) is divided into non overlapping

blocks r1(k, x, y), each of size m × n pixels and the parameters x, y represent the

spatial position of block r1(k, x, y) within frame F (k). Two primary steps in the

video encoding process are:

1. Motion prediction (or motion estimation) which is carried out on each block

r1(k, x, y) by finding its closest match r2(k + δk, x+ δx, y + δy), in a judiciously

selected search area within a previous frame.

2. Motion compensation, which essentially means finding the motion compensated

differential signal:

∆ = r1(k, x, y)− h{r2(k + δk, x+ δx, y + δy)} (9.1)

where h(·) is an arbitrary function that has to be chosen such that the variance

of ∆, σ2∆ is minimized. ∆ is also known as the motion compensated residue and

σ2∆ is known as inter-frame variance (Jain and Jain, 1981). In current practice,

h(·) is taken to be the identify function.

The primary goal of video encoding is to maximize the compression for which a

heuristic is to minimize the variance of motion compensated residue (∆), such that

fewer bits are needed for its representation. Thus h(·) is used as an estimation filter

for r1(k, x, y), such that the estimation error variance (σ2∆) is minimized.

We, therefore, intend to find the function h(·) in motion compensation step as well

as the criteria for finding the closest match in motion estimation step such that σ2∆

is minimized. It is expected, and as we will show, that the criteria for finding the

closest match and the estimation function h(·) are closely related to each other.

247

9.3 Maximization of Gain Guaranteed by Maxi-

mization of Correlation Coefficient

A number of schemes have been proposed and standardized for video encoding. All

existing video encoders use minimization of SAD as the criteria for finding the best

match in the motion estimation step. That is, the values k + δk, x + δx, y + δy are

determined, such that the SAD value given by the following expression is minimized:

SAD =m∑x=1

n∑y=1

|r1(k, x, y)− r2(k + δk, x+ δx, y + δy)| (9.2)

Furthermore, existing video encoders select h(·) in Equation 9.1 such that h(θ) = θ.

In this case, the resulting motion compensated differential signal (∆d) is given by:

∆d(k, x, y) = r1(k, x, y)− r2(k + δk, x+ δx, y + δy) (9.3)

The variance of ∆d can be expressed in the following form:

σ2∆d

= σ21 + σ2

2 − 2ρ1,2σ1σ2 (9.4)

where r1 = r1(k, x, y) and r2 = r2(k + δk, x + δx, y + δy) and ρ1,2 is the correlation

coefficient between blocks r1 and r2. We define the gain of traditional video encoders

as: Gd = σ21/σ

2∆d

. Using Equation 9.4, an expression for Gd can be derived and is

given below:

Gd =σ2

1

σ2∆d

=1

1 + σ2σ1

(σ2σ1− 2ρ1,2)

(9.5)

If the video signal is assumed to be stationary, such that σ21 = σ2

2, then Equation 9.5

reduces to:

Gds =1

2(1− ρ1,2)(9.6)

From 9.6, we note that Gds is maximized when ρ1,2 is maximized. However, minimiza-

tion of SAD does not guarantee a maximization of ρ1,2, thus SAD is not the optimal

248

criteria for the maximization of Gds. Furthermore, in general the video signal is non-

stationary and the true gain is given by 9.5, whose maximum cannot be guaranteed

either by maximization of ρ1,2 or by minimization of SAD.

Nevertheless, from equations 9.5 and 9.6, maximization of correlation coefficient ap-

pears to be a more attractive criterion for motion estimation as compared to minimiza-

tion of SAD. However, correlation coefficient has not been given serious consideration

in the video encoding literature because of its high computational complexity (Barnea

and Silverman, 1972). We have removed this objection to a larger extent by develop-

ing Basic Mode Partial Correlation Elimination (PCE) algorithm which has very good

performance on small template sizes (see Chapter 6). It should also be noted that

with the availability of powerful processors, to enhance the compression efficiency,

complex algorithms are now practicable. For example, H.264/AVC uses much more

complex algorithms than those employed by previous video encoding standards.

The use of correlation coefficient as a motion estimator has also been ignored because

of the comments made in the seminal paper by Jain and Jain (Jain and Jain, 1981),

suggesting that the accuracy of the area correlation method is poor when the block

size is small and the blocks are not undergoing pure translation. However, for block

sizes commonly used in motion estimation algorithms, correlation coefficient actually

outperforms SAD and other measures that are based upon the brightness constancy

assumption. We have verified this by performing a large number of experiments

on a number of scenes taken from ten commercial videos. Furthermore, in case of

non-translational motion, all block based motion estimation algorithms suffer some

degradation in performance. However, performance of correlation coefficient based

estimators degrades much gracefully (Wu, 1995b).

We note that VCLC is a fundamental technique and other schemes and optimizations

proposed in literature or included in standards may be used in addition to VCLC. Such

additional schemes include the overlapped block motion estimation (OBMC) (Orchard

and Sullivan, 1994; Su and Mersereau, 2000) that was invented to handle the complex

motion within a block. Similarly, sub-pixel motion estimation (Girod, 1993) that aims

to increase the accuracy of motion compensation may also be used with VCLC.

249

simple difference linear quadratic 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Mea

n Sq

uare

Err

or

Figure 9.1: The average and standard deviation of Mean Squared Error of differentestimation filters h(·). More than 400,000 8x8 blocks taken from eight commercialmovies were used to compute these statistics.

9.4 Video Coding with Linear Compensation (VCLC)

In traditional video encoding systems, the estimation filter h(·) in Equation 9.1 is

selected to be the identity function: h(θ) = θ, therefore in these systems, the differ-

ence signal is used in Equation 9.3. Inherently, the use of Equation 9.3 is based on

the brightness constancy assumption for pixel intensities. However, we have observed

that brightness and contrast changes are so ubiquitous in natural videos, especially

commercial videos like movies, that the assumption of constant pixel intensities breaks

down frequently.

We have experimentally verified this observation by measuring the average MSE of the

original block and the matched block while varying the estimation filter h(·). Figure

9.1 shows the reduction in MSE as h(·) was changed from identity to first order linear

and first order quadratic estimator. A significant decline in MSE was observed when

linear estimator was used compared to identity function. Increasing the complexity of

the estimator from linear to quadratic resulted in diminishing returns, and subsequent

improvements were not as significant.

250

Hence, in this chapter, we propose that intensity changes between blocks in the nearby

frames can be better modeled by a first order linear estimator. Therefore, we select

h(·) for the estimation of block r as:

h(r2) = αr2 + β (9.7)

where α and β are selected to minimize the mean square error between h(r2) and

the block r1 that is being estimated. With each block of video, these two additional

parameters are transmitted, but, as we will show, the corresponding reduction in the

variance of the linear compensated difference signal justifies this overhead.

In the next two subsections, we first discuss the theoretical impact of choosing the

first order linear model on the motion compensation strategy, and then discuss the

optimal motion estimator under this model.

9.4.1 Motion Compensation using Linear Estimator

For the motion compensation step, current input block r1(k, x, y) is linearly estimated

from the best matching block r2(k + δk, x + δx, y + δy), instead of computing the

difference as in traditional methods. Thus we use:

r1(k, x, y) = αr1(k + δk, x+ δx, y + δy) + β (9.8)

The parameters α and β are selected such that the mean squared estimation error,

given below is minimized:

Λ =m∑y=1

n∑x=1

(r(k, x, y)− αr2(k + δk, x+ δx, y + δy)− β)2 (9.9)

Minimizing Λ with respect to α and β yields:

α = ρ1,2σ1

σ2

(9.10)

β = µ1 − ρ1,2σ1

σ2

µ2 (9.11)

251

In the proposed VCLC scheme, we define motion compensated residue ∆p, similar to

the traditional case, but using MMSE linear estimate r1 instead of r2:

∆p(k, x, y) = r1(k, x, y)− r1(k, x, y) (9.12)

It is straightforward to show that the mean of ∆p is always zero, regardless of the

form of the original and the matched block. The variance of ∆p has a direct impact

on compression efficiency: if σ2∆p

< σ2∆d

, VCLC would lead to better compression

compared to the traditional schemes. Since ∆p is zero mean, its variance is the

minimum mean square error of estimation given by Equation 9.9, which can also be

derived to the follow form:

σ2∆p

= (1− ρ21,2)σ2

1 (9.13)

The above relationship of σ2∆p

, i.e. the variance of linear compensated difference

signal, should be compared to the expression of σ2∆d

in Equation 9.4, which is the

variance of the simple difference. Using the expression in 9.13, we can show that σ2∆p

is always less than or equal to σ2∆d

.

Theorem 9.1 For same motion estimator, σ2∆p

is upper bounded by σ2∆d

.

proof 9.4.1.1 Since the square of any real number is non-negative, the following

inequality holds: (ρ1,2

σ1

σ2

− 1

)2

≥ 0 (9.14)

Rearranging we get:

σ21 − ρ2

1,2σ21 ≤ σ2

1 − 2ρ1,2σ1σ2 + σ22 (9.15)

Comparing 9.15 with 9.4 and 9.13, it follows that σ2∆p≤ σ2

∆dalways hold true, re-

gardless of the form of input the signal.

Similar to the definition of Gd in section III, we define motion compensation gain of

VCLC scheme as:

Gp =σ2

1

σ2∆p

=1

1− ρ21,2

(9.16)

252

Since σ2∆p≤ σ2

∆d, therefore Gp ≥ Gd. Hence it can be concluded that the use of VCLC

scheme will never result in a lower gain when compared with traditional encoding

scheme.

9.4.2 Motion Estimation with Correlation Coefficient

In previous discussion, advantage of VCLC scheme was shown over the traditional

motion compensation techniques, independent of motion estimation process. This

implies that if VCLC scheme is used with traditional motion estimation, gain will

still be improved. However we notice from 9.16 that the gain of VCLC is maximized

when |ρ1,2| is maximized. This indicates that for VCLC, the optimal criteria for

finding the closest match in the motion estimation step is not minimization of SAD,

rather it is the maximization of the magnitude of correlation coefficient in the search

space. Thus the location of best matching block is given by:

(x, y) = arg maxδk,δx,δy

|ρ1,2| (9.17)

where

ρ1,2 =

m∑x=1

n∑y=1

(r1(k, x, y)− µr)(r2(k + δk, x+ δx, y + δy)− µ2)

σ1σ2

(9.18)

Thus there is no other location where the linear compensated differential signal would

have a lower variance or a higher gain than the one obtained by maximizing Equation

9.17 over ((k + δk, x+ δx, y + δy)).

9.5 Video Coding With Linear Compensation: Sys-

tem Overview

Simplified block diagrams of VCLC encoder and decoder are shown in Figures 9.2

and 9.3 respectively. An input video frame to be encoded is sent to the motion vector

253

DCT Q Entropy Coding

Q-1

DCT-1

+

-

MEM

LFE

LPE

MVE

Input Video (b)

b’+

p

V

Encoded Video

V

inter/intra

Figure 9.2: Block diagram of video coder with linear compensation (VCLC). MVE:motion vector estimator, LPE: linear parameter estimator, LFE: linear frame estima-tor

estimator (MVE) which also obtains a reference frame from the memory. The MVE

finds the best matching blocks in the reference frame for each block in the input video

frame, by maximizing the magnitude of correlation coefficient as given in Equation

9.17. For each block MVE provides the motion vector information to linear parameter

estimator (LPE) which computes α and β for each block in accordance with 9.10 and

9.11. LPE sends these parameters to linear frame estimator (LFE) where the linear

estimate of the complete frame is formed using the linear estimates of the individual

blocks. The linear estimate of the complete frame is subtracted from the input video

frame and the resulting residue error is further processed through transform coder,

quantizer and entropy coder.

Traditional decoders require residue error information along with motion vectors in

order to decode the current frame. VCLC decoder additionally requires transmission

of α, β parameters. We, however, observe that when using VCLC, the mean of motion

compensated residue is zero, resulting in a zero DC value of the transform of each

block:

DCVCLC = 0 (9.19)

which essentially reduces transmission of one parameter as compared to traditional

encoders. In traditional generic encoders (GE) Ghanbari (2003), the DC value of a

transformed block is the difference of means of input block r1 and its best matching

254

Entropy Decoding

Q-1 DCT-1Coefficients

MCP

MV

LPC

MEM

b’++

Encoded Video

Decoded Video

Reference Frame

Inter/Intra

Figure 9.3: A generic inter-frame predictive decoder with linear compensation. MCP:Motion compensated prediction, LPC: linear parameter compensation

block r2:

DCGE = µ1 − µ2 (9.20)

and it is generally non-zero. Note that in VCLC scheme, instead of transmitting α, β

parameters, we can transmit α, (µ1 − µ2) and reconstruct α, β on decoder side using

9.11. Therefore as compared to traditional systems, the actual overhead is only one

parameter per block. Furthermore, for intermediate to larger block sizes, for example

8 by 8 or above, the cost of α parameter, in terms of bits per pixel, turn out to be

insignificant. For smaller block sizes, which are not very common in video encoders

due to large number of motion vectors, the cost of sending an additional parameter

becomes noticeable requiring the use of efficient quantization and coding for sending

the same.

9.6 Experiments and Results

We have experimentally verified the theoretical results of previous sections by en-

coding scenes selected from numerous commercial videos. These videos often exhibit

significantly larger changes in lighting compared to the standard test sequences often

used in video codec research. On this dataset, the efficiency of VCLC was compared

with that of traditional Generic Encoder (GE) (Ghanbari, 2003) which used SAD

255

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

-20 -15 -10 -5 0 5 10 15 20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Figure 9.4: (a) The histogram of α always have a maximum at 1.00 (b) The histogramof µ1 − µ2 always have a maximum at 0.00 and have shape similar to Laplaciandistribution

for motion estimation and simple differences for motion compensation. In our ex-

periments, the motion compensated residue of VCLC and GE was first transformed

using DCT and then quantized by a uniform quantizer. The minimum number of bits

needed to transmit the quantized residue was estimated by calculating its entropy.

Note that VCLC improves the motion compensation efficiency while the other blocks

of video encoder remain same.

The improvement in motion compensation is generally measured by the improvement

in prediction SNR defined in (Jain and Jain, 1981):

SNR = 10 log10

MI2max∑M

r=1 σ2r

(9.21)

where M is the total number of residue blocks, σ2r is the variance of a block of

residue, and Imax is the maximum pixel intensity. For traditional generic encoder:

σ2r = σ2

∆dand for VCLC: σ2

r = σ2∆p

. The SNR comparison as shown in table 9.6 was

computed for three motion estimation block sizes: 4×4, 8×8 and 16×16. Maximum

improvement in SNR was observed for 4×4 block size, which was up to 11.3 dB,

whereas, for 8×8 and 16×16 block sizes it was up to 6.8 dB and 6.1 dB respectively.

The SNR in Equation 9.21 measures the performance of an encoding scheme based

on the variance of the motion compensated residue without considering the effects of

transform encoding and quantization. A better way to evaluate an encoding scheme is

256

30

35

40

45

50

55

60

65

0 0.5 1 1.5 2 2.5 3

Batman Begins

VCLC

GE

30

35

40

45

50

55

60

0 0.5 1 1.5 2 2.5 3

PSN

R (d

B)

King Kong

VCLC

GE

30

35

40

45

50

55

60

0 0.5 1 1.5 2 2.5 3

PSN

R (d

B)

Underworld Evolution

VCL

GE

30

35

40

45

50

55

60

0 0.5 1 1.5 2 2.5 3

Blade 2

VCLC

GE

30

35

40

45

50

55

60

0 0.5 1 1.5 2 2.5 3

Bits Per Pixel

PSN

R (d

B)

Lord of The Rings 3

VCLC

GE

25

30

35

40

45

50

55

60

0 0.5 1 1.5 2 2.5 3

Bits Per Pixel

Mission Impossible

VCLC

GE

Figure 9.5: Variation of PSNR with the variation of bits per pixel (bpp) for VideoCoding with Linear Compensation (VCLC) and the Traditional Generic Encoder(TGE).

257

Table 9.1: Comparison of traditional Generic Encoder (GE) and VCLC motion com-pensation SNR (dB)

SNRV CLC SNRGE

Dataset 4×4 8×8 16×16 4×4 8×8 16×1 6

Fast&Furious 36.11 36.51 29.03 29.09 33.04 25.32BatmanBgins 41.83 39.33 32.97 34.05 34.66 28.82

KingKong 38.69 35.08 29.31 30.58 31.62 25.58UnderWorld 42.70 39.58 35.25 34.95 34.34 29.15Spiderman 35.01 31.56 26.44 29.64 29.46 24.41PinkFloyd 40.59 37.66 35.14 37.18 35.83 33.29Metallica 40.59 35.28 31.17 32.91 32.49 28.37

Blade 45.25 39.68 35.31 35.89 32.739 30.04LordOfRings 39.69 35.84 33.60 34.15 31.91 30.13MissionImps 36.70 31.87 27.66 29.42 26.60 23.75

to characterize the end to end performance of the system by measuring the distortion

in the decoded signal. Although in VCLC scheme only motion compensation and

motion estimation steps are improved, by simple experiments we show that end to

end performance is also improved. We computed peak signal to noise ratio (PSNR)

defined in H264 as:

PSNR = 10 log10

2552

MSE(9.22)

where MSE is the mean squared error between original frame and the corresponding

reconstructed frame.

In an end-to-end system, the additional parameters α and (µr − µr′) also have to be

quantized before entropy coding. Typical histograms of both of these parameters are

shown in figure 9.4. We used a generalized Lloyd Max quantizer with 5 bits for α

and 4 bits for (µ1 − µ2). For our datasets, the entropy of these parameters, for 8 by

8 block size was computed to be 6.46 bits, and therefore the additional overhead of

these parameters is approximately 0.1 bits per pixel. Figure 9.5 shows the average

rate distortion curves for six videos encoding using 8×8 block size. In figure 9.5,

the slight rightward shift of the top curve, representing VCLC’s performance, is due

to the overhead of the two additional parameters. We note that the VCLC scheme

exhibits an improvement of up to 5 dB in PSNR when compared with the traditional

generic encoder.

258

9.7 Conclusion

In this chapter we demonstrated that motion estimation with correlation coefficient

and motion compensation with MMSE first order linear estimator can be used to

reduce the number of bits required to encode a video for same PSNR. Therefore, the

proposed video encoding scheme, Video Coding with Linear Compensation (VCLC),

may turn out to be of practical significance in the perspective of video transmission

and storage applications.

Chapter 10

CONCLUSIONS AND FUTURE DIRECTIONS

Despite the presence of large number of fast approximate image matching techniques,

bound based computation elimination algorithms are of special interest because of ex-

haustive equivalent accuracy and high speedups offered by these algorithms. In this

thesis we have presented two different types of bound based computation elimination

algorithms for correlation based fast template matching, namely Transitive Elimina-

tion Algorithms (Mahmood and Khan, 2007b, 2008, 2010), and Partial Correlation

Elimination algorithms (Mahmood and Khan, 2007a, 2011). The first type of algo-

rithms is complete elimination algorithms while the second type is partial elimination

algorithms.

10.1 Transitive Algorithms

While investigating transitivity property of correlation based measures, we have de-

rived two different types of transitive bounds on correlation based measures. We

derived first type of transitive bounds by applying the triangular inequality on an-

gular distance measure while the second type of bounds by applying the triangular

inequality on Euclidean distance measure. We theoretically compared both type of

bounds, and showed that angular distance based bounds are contained within Eu-

clidean distance based bounds. Therefore, we concluded that angular distance based

bounds are tighter than Euclidean distance based bounds and more useful from com-

putation elimination perspective.

We studied tightness characteristics of angular distance based transitive bounds and

found that these bounds remain tight if at least one of the two bounding correlations

is ensured to remain high. We suggested that autocorrelation present in most of

the template matching systems, should be used as the strong bounding correlation.

Transitive elimination algorithms presented in this thesis may be efficiently used in

259

260

the following scenarios:

1. In a typical template matching scenario, a template image is matched across

one or more big reference images. Natural scenes, especially remotely sensed

satellite images have high local spatial autocorrelation. We have developed

Transitive algorithm for fast template matching in this scenario by exploiting

intra-reference autocorrelation. Fast algorithms for autocorrelation computa-

tion are also developed.

2. In other template matching applications, such as object tracking in a video

sequence, especially for applications such as tracking humans in surveillance

videos, one object template have to be correlated with multiple video frames.

The nearby frames of a video sequence are generally highly correlated. To ex-

ploit inter-frame temporal autocorrelation, we have developed Intra-Ref Tran-

sitive Elimination Algorithm. Fast algorithms for the computation of autocor-

relation between two video frames are also developed.

3. In some other applications, such as video geo-registration, a set of highly corre-

lated template frames are to be matched with the same reference image. To ex-

ploit the strong inter-template correlation, we have developed an Inter-Template

TEA algorithm.

4. Inter-Template TEA algorithm is also applied to rotation/scale invariant tem-

plate matching, because the consecutive rotated and scaled versions of an object

are highly correlated. Fast speedup is obtained by exploiting this inter-template

autocorrelation for fast rotation/scale invariant template matching.

The main principle of all of these transitive elimination algorithms is as follows: if at a

particular search location, upper transitive bound is found to be less than the current

known maximum, correlation computations at that location may be skipped without

any loss of accuracy. The execution time speedup of transitive algorithms depends on

the strength of autocorrelation found in nearby locations. If strong autocorrelation

is found, transitive algorithms may become extremely fast, while if autocorrelation

is weak, transitive elimination algorithms may not remain very efficient, hence the

261

speed up will reduce. To handle such scenarios, we have proposed the second cat-

egory of elimination algorithms, Partial Correlation Elimination (PCE) algorithms.

These are more generic than transitive algorithms because these are not dependent

on autocorrelation function.

10.2 PCE Algorithms

In PCE algorithms, correlation coefficient is computed by using monotonic decreas-

ing formulations of correlation coefficient. At a particular search location, as soon as

partial value of correlation falls below the current known maximum, remaining com-

putations may be skipped without any loss of accuracy. Different versions of PCE

algorithms are efficient in different template size ranges:

1. Basic Mode PCE is empirically found to be more efficient on small template

sizes, ≤ 21 × 21 pixels. For these sizes, we have also developed a novel initial-

ization scheme, named as Two-stage Basic-Mode PCE algorithm. Basic Mode

PCE algorithm has been found to be significantly faster than all existing fast

algorithms. It is because of the fact that frequency domain implementations

are slow for small templates. Other efficient spatial implementations such as

ZEBC are also slow due to high overhead of bound computation.

2. Extended Mode PCE with multi-stage initialization scheme is efficient on medium

sized templates, having size larger than 21×21 and less than 48×48 pixels. An

algorithm to find efficient elimination test locations is also developed. On these

sizes, PCE algorithm has remained faster than all other algorithms, however

the speedup margin reduces as the template size increases.

3. Extended Mode PCE with coarse-to-fine scheme is more efficient for larger sized

templates, ≥ 48 × 48 pixels. Algorithm for efficient elimination test locations

is also applicable for these sizes. For large template sizes, FFT performance

significantly improves. The performance of efficient spatial domain algorithm,

ZEBC also improves for large template sizes. However, ZEBC is still slow on

templates having number of rows equal to some prime number.

262

10.3 Limitations of TEA and PCE Algorithms

Despite high speedups obtained by our proposed algorithms, there are also some im-

portant limitations of these schemes. One important limitation is these algorithms

expect high correlation maximum to be present in the search space. The amount

of eliminated computations increases as the height of known maximum increases.

For template matching applications requiring finding a small maximum, for example

ρmax = 0.40, elimination algorithms no longer remain efficient. This is because the

amount of eliminated computations will decrease, resulting an increase in the execu-

tion time. In our proposed algorithms, high speedup will only be obtained if large

magnitude maximum is found at the start of the search process.

10.4 Elimination Algorithms for Object Detection

We show that bound based computation elimination strategies can also be applied for

fast object detection (Mahmood and Khan, 2009). An early termination algorithm is

applied to speedup AdaBoost based edge-corner detector (Mahmood, 2007). In this

regard, Early Non Maxima Suppression (ENMS) algorithm has also been proposed

which integrates the detection process within the non-maxima suppression process to

reduce computations.

10.5 Using Correlation Coefficient in Video Cod-

ing

The use of correlation coefficient in video encoders (Mahmood et al., 2007) has also

been explored. We found that if correlation coefficient is used for motion estimation

and first order linear estimation is used for motion compensation, the variance of

residue signal is guaranteed to be less than that of the traditional encoding schemes.

The proposed video encoding scheme may potentially be used to increase compression

of the video signal.

263

10.6 Future Directions

Several future work directions emerge from the research presented in this thesis. Some

of the important future research directions are introduced in this section.

1. Cascading PCE and TEA Algorithms: An important extension of the TEA

and PCE algorithms is to combine them in the form of cascade. A possible

cascading scheme is to first apply transitive bounds at each search location to get

complete elimination. Search locations where transitive bounds fail to produce

elimination, PCE algorithms may follow to get partial elimination. We find

that the overhead of TEA algorithm including autocorrelation computations

and central correlation computations may not be avoided by using cascading

scheme. PCE algorithm can only reduce computations at search locations where

transitive bound was used. Since most of these locations are eliminated by

transitive bounds, the improvement margin left for the PCE algorithm is quite

small. A study to efficiently couple both algorithms such that the results are

better than the individual algorithm may be an interesting research direction.

2. Approximate Accuracy TEA Algorithms: TEA algorithms discussed in the the-

sis have exhaustive equivalent accuracy. If this constraint is removed, then

TEA algorithm can be made extremely fast with some loss of accuracy. As

we have already discussed, the efficiency of TEA algorithm strongly depends

on the strength of local autocorrelation, which may be arbitrarily increased by

low pass filtering the reference image. TEA algorithm will run extremely fast

on blurred image with some potential loss of accuracy. An important future

direction is to study the effect of low pass filtering on accuracy and speedup of

TEA algorithm.

3. Approximate Accuracy PCE Algorithms: PCE algorithms discussed in the the-

sis also have exhaustive equivalent accuracy. These algorithms can also be made

extremely fast if the exhaustive equivalent accuracy constraint is removed. One

possible direction to make approximate accuracy PCE is to estimate the down-

ward slope of the monotonic decreasing curve by matching as few template

pixels as possible. Best match location may be defined as one having minimum

264

downward slope. An important future direction is to estimate the minimum

number of pixels to be processed such that a reliable estimate of the slope is ob-

tained. The smaller the number of processed pixels is, the faster the algorithm

will be. At the same time, the more reliable the estimate of the slope is, the

more accurate the algorithm will be. Accuracy verses speed up of approximate

PCE algorithm must be studied.

4. Integrating Existing Approximate Algorithms with PCE: Another important re-

search direction is to combine PCE algorithm with approximate accuracy al-

gorithms such as Three Step Search (TSS) or Two Dimensional Logarithmic

(TDL) search. We believe that PCE algorithm may be effectively used to re-

duce computations in most of these algorithms.

5. Early Terminated Object Detectors: An important future direction is to extend

the idea of partial elimination beyond template matching problem. We have

demonstrated that same idea may also be used to speedup AdaBoost based

object detectors. We observe that if at a particular search location, final detector

response is computed as summations of multiple partial values, computations

may be reduced by intelligent rearrangements. Different type of object detectors

should be explored to find which of them may get benefit from the partial

elimination algorithms.

6. Early Non Maxima Suppression (ENMS): NMS has often been used to suppress

multiple responses of a detector to the presence of single object instance. It is

commonly used in the pedestrian detectors, face detectors and object detectors.

We observe that if final detector response is a summation of the multiple partial

sums, ENMS algorithm may be used to reduce computations.

7. Video Coding with Linear Compensation (VCLC): An important future direc-

tion is to explore the idea of VCLC in significantly more detail. We have demon-

strated by performing some experimentation that the VCLC algorithm may offer

promising results, especially in cases when the videos to be compressed contain

significant intensity and contrast variations. The VCLC algorithms should be

integrated within the framework of existing codec, such as H.264. We observe

that the proposed scheme requires transmission of two parameters in addition

265

to residue signal, for each block. In H.264 bit stream there is provision of trans-

mitting one parameter per block. The second parameter required by VCLC

algorithm may change the bit stream and make it non compatible for standard

video codecs. Transmission overhead of extra parameters and the benefit ob-

tained by linear compensation should be theoretically compared. The benefit

should be significant in order to justify a new video encoding scheme.

8. Elimination Algorithms for Volume Registration: Volume registration is some-

times required in medical image processing. The computation elimination algo-

rithms may be extended for volume registration. We observe that volume data

have higher local autocorrelation, resulting in significant speedup of TEA algo-

rithms. PCE algorithms may also be used to speedup volume image registration

problem.

9. Expanding the Scope of Elimination Algorithms: Before the work presented by

this thesis, elimination algorithms were only well known for Sum of Absolute

Differences (SAD) and Sum of Squared Differences (SSD). We have extended

the scope of elimination algorithms to include correlation based measures as

well, with emphasis on correlation coefficient. As we have discussed in Chapter

2, correlation coefficient is robust to linear intensity variations and can measure

the strength of linear association between two images. It cannot measure the

strength of non-linear, functional or stochastic associations. The strength of

functional associations may be measured by correlation ratio and the strength of

stochastic associations can be measured by using mutual information. Mutual

information has been extensively used in medical image registration. A very

important future direction is to extend the concept of partial and complete

elimination algorithms for fast computation of correlation ratio and for the

mutual information. If efficient elimination algorithms are developed for these

measures, the application areas of these measures may expand.

APPENDICES

List of Publications Related to the Thesis

Following is the list of publications included in this thesis:

1. Arif Mahmood and Sohaib Khan, Exploiting inter-frame correlation for fast

video to reference image alignment, in Lecture Notes in Computer Science,

Asian Conference on Computer Vision (ACCV 2007), vol. 4843, pp. 647-656,

Springer Berlin / Heidelberg, 2007.

2. Arif Mahmood and S. Khan, Exploiting local auto-correlation function for fast

video to reference image alignment, in IEEE International Conference on Image

Processing (ICIP ’08), October 2008, pp. 2412-2415.

3. Arif Mahmood and S. Khan, Exploiting transitivity of correlation for fast tem-

plate matching, IEEE Transactions on Image Processing, vol. 19, no. 8, pp.

2190-2200, August 2010.

4. Arif Mahmood and S. Khan, Early termination algorithms for correlation co-

efficient based block matching, in IEEE International Conference on Image

Processing, (ICIP ’07), October 2007, vol. 2, pp. II-469-II-472.

5. Arif Mahmood and S. Khan, Correlation coefficient based fast template match-

ing through partial elimination, accepted for publication in IEEE Transactions

on Image Processing, May 2011.

6. Arif Mahmood and S. Khan, Early terminating algorithms for AdaBoost based

detectors, in IEEE International Conference on Image Processing (ICIP ’09),

November 2009, pp. 1209-1212.

7. Arif Mahmood, Z.A. Uzmi, and S. Khan, Video coding with linear compensa-

tion (VCLC), in IEEE International Conference on Communications,(ICC’07),

June 2007, pp. 6220-6225.

266

267

List of Publications Not Included in the Thesis

Following is the list of publications not included in the thesis:

1. Arif Mahmood, Structure-less object detection using Adaboost algorithm, in

International Conference on Machine Vision (ICMV 2007), December 2007,

Islamabad, Pakistan

2. M. Shahid Farid and Arif Mahmood, Image Morphing in Frequency Domain”,

in Journal of Mathematical Imaging and Vision, Springer Netherlands, March

2011.

3. M. Shahid Farid, Hassan Khan and Arif Mahmood, Image Inpainting using

Cubic Hermit Spline, in International Conference on Signal and Information

Processing (IEEE ICSIP), December 2010, Changsha, China.

4. M. Shahid Farid, Hassan Khan and Arif Mahmood, Image Inpainting based on

Pyramids, in 10th IEEE International Conference on Signal Processing (ICSP),

November 2010, Beijing, China.

5. Mian Muhammad Awais, Arif Mahmood and Asim Karim, Automatically Gen-

erating Association Rules Under Diverse Operational Conditions for a Large

Scale Power Plant, In 2nd International Bhurban Conference on Applied Sci-

ences and Technology (IBCACT), June 2003, Islamabad, Pakistan .

Bibliography

A. Rosenfeld, G. J. V. 1977. Coarse to fine template matching. IEEE Trans.

Syst., Man, Cybern. 7, 104–107.

Ahn, T. G., Moon, Y. H., and Kim, J. H. 2004. Fast full-search motion estimation

based on multilevel successive elimination algorithm. IEEE Trans. Circuits and

Systems for Video Technology 14, 11 (November), 1265–1270.

Avouac, J. P., Ayoub, F., Leprince, S., Konca, O., and Helmberger, D. V.

2006. The 2005, Mw 7.6 Kashmir earthquake: Sub-pixel correlation of ASTER im-

ages and seismic waveforms analysis. Science Direct, Earth and Planetary Science

Letters 249, 514–528.

Barnea, D. and Silverman, H. 1972. A class of algorithms for fast digital image

registration. IEEE Trans. Commun. 21, 2 (February), 179–186.

Bierling, M. 1988. Displacement estimation by hierarchical block matching. Proc.

SPIE, Visual Communications and Image Processing 10, 942–951.

Bouguezel, S., Ahmad, M., and Swamy, M. 2004. A new radix-2/8 fft algorithm

for length-q times 2m dfts. IEEE Transactions on Circuits and Systems 51, 9

(September), 1723 – 1732.

Bowen, M. M., Emery, W. J., and Wilkin, J. L. 2002. Extracting multi-

year surface currents form sequential thermal imagery using the maximum cross

correlation technique. J. Atmos. Oceanic Technol. 19, 1665–1676.

Briechle, k. and Hanebeck, U. D. 2001. Template matching using fast normal-

ized cross correlation. Proc. SPIE, Opt. Patt. Rec. XII 4387, 95–102.

Brown, L. 1992. A survey of image registration techniques. ACM Computing

Surveys 24, 326–373.

Brunig, M. and Niehsen, W. 2001. Fast full-search block matching. IEEE Trans.

Circuits Syst. Video Technol. 11, 2, 241–247.

268

269

Bryant, V. 1985. Metric Spaces: Iteration and Application. Cambridge University

Press, New York,USA.

Burago, D., Burago, Y. D., and Ivanov, S. 2001. A Course in Metric Geometry.

American Mathematical Society, New York,USA.

Burt, P. J. and Adelson, E. H. 1983. The laplacian pyramid as a compact image

code. IEEE Trans. Comput. 31.

Caelli, T. M. and Liu, Z. Q. 1988. On the minimum number of templates required

for shift, rotation and size invariant template matching. Patt. Rec. 21, 3, 205–216.

Caves, R. G., Harley, P. J., and Quegan, S. 1992. Matching map features

to synthetic aperture radar (SAR) images using template matching. IEEE Trans.

Geosci. Remote Sensing 30, 4, 680–685.

Cha, S.-H. 2007. Comprehensive survey on distance/similarity measures between

probability density functions. International Journal of Mathematical Models and

Methods in Applied Science. 1, 300–307.

Chalermwat, P. 1999. HIGH PERFORMANCE AUTOMATIC IMAGE REGIS-

TRATION FOR REMOTE SENSING. PhD Thesis, George Mason University,

Fairfax, Virginia.

Chan, S. C. and Ho, K. L. 1991. On indexing the prime-factor fast fourier trans-

form algorithm. IEEE Trans. Circuits and Systems 38, 8, 951–953.

Cheung, C. and Po, L. 2003. Adjustable partial distortion search algorithm for

fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 13, 1,

100–110.

Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine calculation

of complex fourier series. Math. Comput. 19, 297–301.

Coorg, S. and Teller, S. 2000. Spherical mosaics with quaternions and dense

correlation. IJCV 37, 3, 259–273.

270

Cristinacce, D. and Cootes, T. 2003. Facial feature detection using adaboost

with shape constraints. BMVC .

Crowell, K. J., Wilson, C. J., and Canfield, H. E. 2003. Application of local

surface matching to multi-date ALSM data for improved calculation of flood-driven

sediment deposition and erosion. In Fall Meeting. American Geophysical Union,

San Francisco.

Danielson, G. C. and Lanczos, C. 1942. Some improvements in practical fourier

analysis and their application to x-ray scattering from liquids. J. Franklin Inst. 233,

365380 and 435452.

Dare, P. M. and Fraser, C. S. 2000. Linear infrastructure mapping using air-

borne video imagery and subsequent integration into a gis. In IEEE International

Geoscience and Remote Sensing Symposium, IGARSS 2000. IEEE, Honolulu, HI ,

USA.

Dew, G. and Holmlund, K. 2000. Investigations of cross-correlation and euclidean

distance target matching techniques in the mpef environment. In Proc. 5th Int.

Winds Workshop,. IWWG, Lorne, Australia, 235–243.

Deza, E. and Deza, M. 2006. Dictionary of Distances. Elsevier.

di Stefano and Mattoccia, L. 2003. A sufficient condition based on the cauchy-

schwarz inequality for efficient template matching. ICIP , 269–272.

di Stefano, Mattoccia, S., and Mola, M. 2003. An efficient algorithm for

exhaustive template matching based on normalized cross correlation. CIAP , 322–

327.

Di Stefano, L., Mattoccia, S., and Tombari, F. 2005. ZNCC-based template

matching using bounded partial correlation. Pattern Recognition Ltr. 26, 14, 2129–

2134.

Dierking, W. and Skriver, H. 2002. Change detection for thematic mapping by

means of airborne multitemporal polarimetric sar imagery. IEEE Trans. Geosci.

Remote Sensing 40, 3, 618–636.

271

Du, Y., Cihlar, J., Beaubien, J., and Latifovic, R. 2001. Radiometric normal-

ization, compositing, and quality control for satellite high resolution image mosaics

over large areas. IEEE Trans. Geosci. Remote Sensing 39, 3, 623–634.

Duhamel, P. and Hollmann, H. 1984. Split-radix fft algorithm. Electron.

Lett. 20, 1, 14–16.

Duhamel, P. and Vetterli, M. 1990. Fast fourier transforms: a tutorial review

and a state of the art. Signal Processing 19, 259–299.

Eckart, S. and Fogg, C. 1995. Iso/iec mpeg-2 software video codec. Proc.

SPIE 2419, 100118.

Ellis, G. A. and Peden, I. C. 1997. Cross-borehole sensing: Identification and

localization of underground tunnels in the presence of a horizontal stratification.

IEEE Trans. Geosci. Remote Sensing 35, 3, 756–761.

Emery, B., Matthews, D., and Baldwin, D. 2004. Mapping surface coastal

currents with satellite imagery and altimetry. In IEEE IGARSS’ 04. IEEE, USA.

Emery, W. J., Baldwin, D., and Matthews, D. 2003. Maximum cross correla-

tion automatic satellite image navigation and attitude corrections for open-ocean

image navigation. IEEE Trans. Geosci. Remote Sensing 41, 1, 33–41.

Emery, W. J., Fowler, C. W., Hawkins, J., and Preller, R. H. 1991. Fram

strait satellite image-derived ice motions. J. Geophys. Res. 96, 4751–4768.

Emery, W. J., Thomas, A. C., and Collins, M. J. 1986. An objective method

for computing advective surface velocities from sequential infrared images. J. Geo-

physical Res 91, 12865–12878.

Eumetsat. 1998. Workshop on wind extraction from operational meteorological

satellite data. In Proc. 4th Int. Winds Workshop. IWWG, Saanenmser, Switzerland.

Eumetsat. 2000. Workshop on wind extraction from operational meteorological

satellite data. In Proc. 5th Int. Winds Workshop. IWWG, Lorne, Australia.

272

Evans, A. N. 2000. Glacier surface motion computation from digital image se-

quences. IEEE Trans. Geosci. Remote Sensing 38, 2, 1064–1072.

Feind, R. E. and Welch, R. M. 1995. Cloud fraction and cloud shadow property

retrievals from coregistered tims and aviris imagery: The use of cloud morphology

for registration. IEEE Trans. Geosci. Remote Sensing 33, 172–184.

Fisher, R. 1925. Statistical methods for research workers. Oliver and Boyd.

Foroosh, H., Zerubia, J. B., and Berthod, M. 2002. Extension of phase

correlation to subpixel registration. IEEE Trans. Image Processing 11, 3 (March),

205–216.

Foster, M. P. 2005. Motion Estimation in Remote Sensed Multi-Channel Images.

MS thesis, University of Bath, UK.

Frigo, M. 1999. A fast Fourier transform compiler. In Proc. 1999 ACM SIGPLAN

Conf. on Programming Language Design and Implementation. Vol. 34. ACM, At-

lanta, GA, 169–180.

Frigo, M. and Johnson, S. G. 1998. FFTW: An adaptive software architecture for

the FFT. In Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing.

Vol. 3. IEEE, Seattle, WA, 1381–1384.

Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3.

Proceedings of the IEEE 93, 2, 216–231.

Gao, X., Duanmu, C., and Zou, C. 2000. A multilevel successive elimination al-

gorithm for block matching motion estimation. IEEE Trans. Image Processing 9, 3

(March), 501–504.

Garcia, C. A. E. and Robinson, I. S. 1989a. Sea surface velocities in shallow

seas extracted from sequential coastal zone color scanner satellite data. J. Atmos.

Oceanic Tech. 94, 12681–12691.

Garcia, C. A. E. and Robinson, I. S. 1989b. Sea surface velocities in shallow seas

extracted from sequential coastal zone colour scanner satellite data. J. Geophysical

Res 94, 12681–12691.

273

Ghanbari, M. 1990. The cross search algorithm for motion estimation. IEEE Trans.

Comput. 38, 7 (July), 950–953.

Ghanbari, M. 2003. Standard Codecs: Image compression to advanced video coding.

Vol. 49.

Girod, B. 1993. Motion compensating prediction with fractional pel accuracy. IEEE

Trans. Comput. 41, 4 (April), 604–611.

Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing. Pearson

Education.

Good, I. J. 1960. The interaction algorithm and practical fourier analysis. J. R.

Statist. Soc. 22, 2, 373–375.

Goshtasby, A., Gage, S. H., and Bartholic, J. F. 1984. A two-stage cross

correlation approach to template matching. IEEE Trans. Pattern Anal. Machine

Intell. 6, 374–378.

Haralick, R. M. and Shapiro, L. G. 1992. Computer and Robot Vision. Vol. 2.

Addison-Wesley.

Harnett, D. L. 1982. Statistical Methods , 3 ed. Addison-Wesley Publishing Com-

pany, Inc., New York.

Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In Pro-

ceedings of the Alvey Vision Conference. The British Machine Vision Association

and Society for Pattern Recognition, University of Manchester, UK.

Hel-Or, Y. and Hel-Or, H. 2003. Real-time pattern matching using projection

kernels. ICCV .

Hel-Or, Y. and Hel-Or, H. 2005. Real-time pattern matching using projection

kernels. IEEE Trans. Pattern Anal. Machine Intell. 27, 9 (September), 1430–1445.

Huang, Y.-W., Chen, C.-Y., Tsai, C.-H., Shen, C.-F., and Chen, L.-G.

2006a. Survey on block matching motion estimation algorithms and architec-

tures with new results. The Journal of VLSI Signal Processing 42, 297–320.

10.1007/s11265-006-4190-4.

274

Huang, Y.-W., Chen, C.-Y., Tsai, C.-H., Shen, C.-F., and Chen, L.-G.

2006b. Survey on block matching motion estimation algorithms and architectures

with new results. The Journal of VLSI Signal Processing 42, 297–320.

Irani, M. and Anandan, P. 1998. Robust multi-sensor image alignment. In ICCV.

IEEE, Bombay, India.

ITU-T. 1995. Itu-t recommendation h.263 software implementation. Digital Video

Coding Group, Telenor R&D .

Jain, J. and Jain, A. 1981. Displacement measurement and its application in

interframe image coding. IEEE Trans. Commun. 29, 12 (December), 1799–1808.

Jedrasiak, K. and Nawrat, A. 2009. Image recognition technique for unmanned

aerial vehicles. In Computer Vision and Graphics. Lecture Notes in Computer

Science, vol. 5337. Springer Berlin / Heidelberg, 391–399.

Jin, H., Favaro, P., and Soatto, S. 2001. Real-time feature tracking and outlier

rejection with changes in illumination. In ICCV. Vol. 1. IEEE, USA, 684 – 689.

Johnson, S. G. and Frigo, M. 2007. A modified split-radix fft with fewer arith-

metic operations. IEEE Trans. Signal Processing 55, 1, 111–119.

Kamachi, M. 1989. Advective surface velocities derived from sequential images for

rotational flow field: Limitations and applications of maximum cross-correlation

method with rotational registration. J. Geophysical Res 94, 18227–18233.

Kappagantula, S. and Rao, K. R. 1985. Motion compensated interframe image

prediction. IEEE Transactions on Communication 33, 9 (September), 1011–1015.

Kawanishi, T., Kurozumi, T., Kashino, K., and Takagi, S. 2004. A fast

template matching algorithm with adaptive skipping using inner-subtemplates dis-

tances. In ICPR. International Association for Pattern Recognition (IAPR), Cam-

bridge, England, UK, 654 – 657.

Kim, J. N. and Choi, T. S. 1999. Adaptive matching scan algorithm based on gra-

dient magnitude for fast full search in motion estimation. IEEE Trans. Consumer

Electron. 45, 3, 762–772.

275

Kim, J. N. and Choi, T. S. 2000. A fast full-search motion-estimation algorithm

using representative pixels and adaptive matching scan. IEEE Trans. Circuits Syst.

Video Technol. 10, 7, 1040–1048.

Kim, T. and Im, Y. J. 2003. Automatic satellite image registration by combi-

nation of matching and random sample consensus. IEEE Trans. Geosci. Remote

Sensing 41, 5, 1111–1117.

Koga, T., Iinuma, K., Hirano, A., Iijima, Y., and Ishiguro, T. 1981. Motion

compensated interframe coding for video conferencing. Proc. National Telecom.

Conf., G5.3.1–G5.3.5.

Kuglin, C. and Hines, D. 1975. The phase correlation image alignment method.

IEEE Conf. Cyb. Soc., 163–165.

Langford, E., Schwertman, N., and Owens, M. 2001. Is the property of being

positively correlated transitive. The American Statistician 55, 4, 33–55.

Lee, C. and Chen, L. 1997. A fast algorithm based on the block sum pyramid.

IEEE Trans. Image Processing 6, 11 (November), 1587–1591.

Leese, J. A., Novak, S., and Clark, B. 1971. An automatic technique for

obtaining cloud motion from geosynchronous satellite data using cross correlation.

J Appl Meteor 10, 118–132.

Lewis, J. 1995. Fast normalized cross-correlation. In International Conference on

Vision Interface. Canadian Image Processing and Pattern Recognition Society, Cal-

gary, Canada, 120–123.

Li, F. and Goldstein, R. 1990. Studies of multibaseline spaceborne interferometric

synthetic aperture radars. IEEE Trans. Geosci. Remote Sensing 28, 1, 88–97.

Li, H., Shi, R., Chen, W., and Shen, I.-F. 2006. Image tangent space for image

retrieval. Pattern Recognition, 2006. ICPR 2006. 18th International Conference

on 2, 1126 –1130.

Li, R., Zeng, B., and Liou, M. 1994. A new three-step search algorithm for block

motion estimation. IEEE Trans. Circuits System, Video Technology 9, 2, 287290.

276

Li, W. and Salari, E. 1995. Successive elimination algorithm for motion estima-

tion. IEEE Trans. Image Processing 4, 1 (January), 105–107.

Mahmood, A. 2007. Structure-less object detection using Adaboost algorithm. In

International Conference on Machine Vision(ICMV 2007). National University of

Science and Technology, Pakistan, Islamabad, Pakistan, 85–90.

Mahmood, A. and Khan, S. 2007a. Early termination algorithms for correlation

coefficient based block matching. In IEEE International Conference on Image

Processing, (ICIP ’07). Vol. 2. IEEE, San Antonio, TX, USA, II–469–II–472.

Mahmood, A. and Khan, S. 2007b. Exploiting inter-frame correlation for fast

video to reference image alignment. Lecture Notes in Computer Science, Asian

Conference on Computer Vision (ACCV 2007) 4843, 647–656.

Mahmood, A. and Khan, S. 2008. Exploiting local auto-correlation function for

fast video to reference image alignment. In IEEE International Conference on

Image Processing (ICIP ’08). IEEE, San Diego, CA, 2412 –2415.

Mahmood, A. and Khan, S. 2009. Early terminating algorithms for adaboost

based detectors. In IEEE International Conference on Image Processing (ICIP

’09). IEEE, Cairo, Egypt, 1209 –1212.

Mahmood, A. and Khan, S. 2010. Exploiting transitivity of correlation for fast

template matching. IEEE Transactions on Image Processing 19, 8 (August 2010),

2190 –2200.

Mahmood, A. and Khan, S. 2011. Correlation coefficient based fast template

matching through partial elimination. Accepted for Publication in IEEE Transac-

tions on Image Processing x, y (May), xyz.

Mahmood, A., Uzmi, Z., and Khan, S. 2007. Video coding with linear compen-

sation (vclc). In IEEE International Conference on Communications,(ICC ’07).

IEEE, Glasgow, UK, 6220 –6225.

Manduchi, R. and Mian, G. A. 1993. Accuracy analysis for correlation based

image registration algorithms. IEEE ISCAS , 834–837.

277

Mattoccia, S., Tombari, F., and Di Stefano, L. 2008a. Fast full-search equiv-

alent template matching by enhanced bounded correlation. IEEE Trans. Image

Processing 17, 4, 528–538.

Mattoccia, S., Tombari, F., and Di Stefano, L. 2008b. Reliable rejection of

mismatching candidates for efficient ZNCC template matching. In International

Conference on Image Processing. IEEE, San Diego, CA, 849–852.

Montgomery, D. C. and Peck, E. A. 1982. Introduction to Linear Regression

Analysis. John Wiley and Sons, Inc., New York,USA.

Montrucchio, B. and Quaglia, D. 2005. New sorting-based lossless motion

estimation algorithms and a partial distortion elimination performance analysis.

IEEE Trans. Circuits Syst. Video Technol. 15, 2, 210–220.

Mukherjee, D. P. and Acton, S. T. 2002. Cloud tracking by scale space classi-

fication. IEEE Trans. Geosci. Remote Sensing 40, 2 (February), 405–415.

N. Bryant, A. Zobrist, T. L. 2003. Automatic co-registration of space-based

sensors for precision change detection and analysis. In Proc. IEEE Int. Geoscience

and Remote Sensing Symp. IGARSS ’03. IEEE, Centre de Congre‘s Pierre Baudis

Toulouse, France, 1371–1373.

NiBlack, W. 1986. An introduction to digital image processing. Prentice Hall ,

115–116.

Nillius and Eklundh, J. O. 2002. Fast block matching with normalized cross cor-

relation using walsh transforms. TRITA-NA-P02/11, ISRN KTH/NA/P–02/11–

SE .

Ninnis, R. M., Emery, W. J., and Collins, M. J. 1986. Automated extraction

of pack ice motion from avhrr imagery. J. Geophys. Res. 91, 10, 725–734.

N.Otsu. 1979. A threshold selection method from gray-level histograms. IEEE

transactions on Systems, man and Cybernetics 9, 1, 62–66.

278

Oller, G., Marthon, P., and Rognant, L. 2003. Correlation and similarity

measures for sar image matching. In 10th Int. Sym. on Remote Sensing. SPIE,

Barcelone, Spain.

Orchard, M. T. and Sullivan, G. J. 1994. Overlapped block motion compensa-

tion: an estimation theoratic approach. IEEE Trans. Image Processing 3, 5 (Sep),

693–699.

Ouchi, K., Maedoi, S., and Mitsuyasu, H. 1999. Determination of ocean wave

propagation direction by split-look processing using jers-1 sar data. IEEE Trans.


Po, L. M. and Ma, W. C. 1996. A novel four-step search algorithm for fast block

motion estimation. IEEE Trans. Circuits System, Video Technology 9, 2, 287290.

Pope, P. A. and Emery, W. J. 1994. Sea surface velocities from visible and

infrared multispectral atmospheric mapping sensor (mams) imagery. IEEE Trans.


Pratt, W. K. 2007. Digital Image Processing, 4th Edition. Wiley Interscience, New

Jersey, USA.

Puri, A., Hang, H. M., and Schilling, D. L. 1987. An efficient block matching

algorithm for motion compensated coding. Proc. IEEE ICASP , 25.4.1–25.4.4.

Puymbroeck, N. V., Michel, R., Binet, R., Avouac, J. P., and Taboury, J.

2000. Measuring earthquakes from optical satellite images. Applied Optics 39, 20,

3486–3494.

P.Viola and Jones, M. 2001. Rapid object detection using a boosted cascade of

simple features. In IEEE CVPR. IEEE Computer Society, Kauai, Hawaii, USA.

P.Viola and Jones, M. 2004. Robust real time face detection. International

Journal on Computer Vision (IJCV) 57, 2, 137–154.

Quaglia, D. and Montrucchio, B. 2001. Sobol partial distortion algorithm

for fast full search in block motion estimation. Proc. 6th Eurographics Workshop

Multimedia, 77–84.

279

Rader, C. M. and Brenner, N. M. 1976. A new principle for fast fourier transfor-

mation. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-24,

264–266.

Ramachandran, S. and Srinivasan, S. 2001. Fpga implementation of a novel,

fast motion estimation algorithm for real-time video compression. Ninth Interna-

tional Symposium on FPGAs 2, 287290.

Reddy, B. S. and Chatterji, B. N. 1996. An fft-based technique for translation,

rotation, and scale-invariant image registration. IEEE Trans. Image Processing 5, 8

(August), 1266–1271.

Rietz, H. L. 1919. On functional relations for which coefficient of correlation is zero.

Publications of the American Statistical Association 1, 472–476.

Robinson, D. and Milanfar, P. 2004. Fundamental performance limits in image

registration. IEEE Trans. Image Processing 13, 9 (September), 1185–1199.

Roche, A., Malandain, G., and Ayache, N. 2000. Unifying maximum likelihood

approaches in medical image registration. International Journal of Imaging Systems

and Technology: Special issue on 3D imaging 11, 71–80.

Roche, A., Malandain, G., Ayache, N., and Prima, S. 1999. Towards a

better comprehension of similarity measures used in medical image registration. In

Proc. 2th MICCAI. Lecture Notes in Computer Science, vol. 1679. Springer Verlag,

Cambridge, United Kingdom, 555–566.

Roche, A., Malandain, G., Pennec, X., and Ayache, N. 1998. The corre-

lation ratio as a new similarity measure for multimodal image registration. In

Proc. 1st MICCAI. Lecture Notes in Computer Science, vol. 1496. Springer Verlag,

Cambridge, MA, 1115–1124.

Rodgers, J. L. and Nicewander, W. A. 1988. Thirteen ways to look at corre-

lation coefficient. The American Statistician 42, 59–66.

Roma, N., Santos-Victor, J., and Tom, J. 2000. A comparative analysis of

cross-correlation matching algorithms using a pyramidal resolution approach. In

280

2nd Workshop on Empirical Evaluation Methods in Computer Vision. World Sci-

entific Press, 117–142.

Scambos, A, T., Dutkiewicz, J, M., Wilson, C, J., Bindschadler, and A, R.

1992. Application of image cross-correlation to the measurement of glacier velocity

using satellite image data. Remote Sensing of Environment 42, 3, 177–186.

Schweitzer, H., Bell, J., and Wu, F. 2002. Very fast template matching. In

ECCV. IV: 358 ff.

Shah, M. and Kumar, R. 2003a. Video Registration. Kluwer Academic Publishers,

Boston.

Shah, M. and Kumar, R. 2003b. Video Registration. Kluwer Academic Publishers,

Boston.

Shanableh, T. and Ghanbari, M. 2000. Heterogeneous video transcoding to

lower spatio-temporal resolutions and different encoding formats. IEEE Trans.

Multimedia 2, 2, 101110.

Sheikh, Y., Khan, S., and Shah, M. 2004. Feature-Based Georegistration of

Aerial Images. A. Stefanidis and S. Nittel (eds.) Geosensor Networks, Boca Raton,

Florida: CRC Press. ISSN 0415324041.

Sheikh, Y. and Shah, M. 2004. Aligning dissimilar images directly. In ACCV.

Asian Federation of Computer Vision Societies, Jeju, Korea.

Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE CVPR. IEEE,

Seattle, WA, USA.

Shum, H. Y. and Szeliski, R. 2000. Systems and experiment paper: Construction

of panoramic image mosaics with global and local alignment. IJCV 36, 2, 101–130.

Sigley, D. T. and Stratton, W. T. 1942. Solid Geometry and Mensuration.

Dryden Press, Inc., New York.

Simonetto, E., Oriot, H., and Garello, R. 2005. Rectangular building ex-

traction from stereoscopic airborne radar images. IEEE Trans. Geosci. Remote

Sensing 43, 10, 2386–2395.

281

Singleton, R. C. 1969. An algorithm for computing the mixed radix fast fourier

transform. IEEE Transactions on Audio and Electroacoustics 17, 2 (June), 93–103.

Snedecor, G. W. and Cochran, W. G. 1968. Statistical Methods , 6 ed. The

Iowa State University Press,, Ames, Iowa, USA.

Sorensen, H. V., Heideman, M. T., and Burrus, C. S. 1986. On computing

the split-radix fft. IEEE Trans. Acoust., Speech, Signal Processing 34, 1, 152–156.

Sotos, A. E. C., Vanhoof, S., Noortgate, W. V. D., and Onghena, P.

2007. The non-transitivity of pearson’s correlation coefficient: An educational

perspective. International Statistical Institute, 56th Session.

Sotos, A. E. C., Vanhoof, S., Noortgate, W. V. D., and Onghena, P.

2009. The transitivity misconception of pearsons correlation coefficient. Statisics

Education Research Journal 8, 2, 33–55.

Spigel, M. R. and Stephens, L. J. 1990. Schaum’s Out Lines STATISTICS , 3

ed. Tata McGraw-Hill Publishing Company Ltd., New York.

Srinivasan, R. and Rao, K. R. 1985. Predictive coding based on efficient motion

estimation. IEEE Trans. Commun. COM-33, 8 (August), 888–896.

Stefano, L. D. and Mattoccia, S. 2003. Fast template matching using bounded

partial correlation. Machine Vision and Applications 13, 213–221.

Strozzi, T., Luckman, A., Murray, T., Wegmller, U., and Werner, C. L.

2002. Glacier motion estimation using sar offset-tracking procedures. IEEE Trans.

Geosci. Remote Sensing 40, 11 (Nov), 2384–2391.

Su, J. K. and Mersereau, R. M. 2000. Motion estimation methods for overlapped

block motion compensation. IEEE Trans. Image Processing 9, 9 (September),

1509–1521.

Sun, S., Park, H., Haynor, D. R., and Kim, Y. 2003. Fast template matching

using correlation based adaptive predictive search. Int. J. Img. Sys. Tech., Wiley

InterScience.

282

Svedlow, M., McGillem, C. D., and Anuta, P. E. 1976. Experimental ex-

amination of similarity measures and preprocessing methods used for image reg-

istration. In Symposium on Machine Processing of Remotely Sensed Data. The

Laboratory for Applications of Remote Sensing, Purdu University, West Lafayette,

Indiana.

Svedlow, M., McGillem, C. D., and Anuta, P. E. 1978. Image registration:

Similarity measure and preprocessing method comparisons. IEEE Transactions on

Aerospace and Electronic Systems 14, 1 (January), 141–150.

T. M. Cover, J. A. T. 1991. Elements of Information Theory. John Wiley and

Sons, New York.

Thomas, L. H. 1963. Using a computer to solve problems in physics. Applications

of Digital Computers .

Tokmakian, R., Strub, P. T., and McClean-Padman, J. 1990. Evaluation

of the maximum crosscorrelation method of estimating sea surface velocities from

sequential satellite images. J. Geophys. Res. 7, 852–865.

Townshend, J., Justice, C., Gurney, C., and McManus, J. 1992. The impact

of misregistration on change detection. IEEE Trans. Geosci. Remote Sensing 30, 5

(Sep), 1054–1060.

Turin, G. L. 1960. An introduction to matched filters. IRE Transactions on Infor-

mation Theory 6, 3 (September), 311–329.

Vachon, P. W. and Raney, R. K. 1991. Resolution of the ocean wave propagation

direction in sar imagery. IEEE Trans. Geosci. Remote Sensing 29, 105–112.

Vachon, P. W. and West, J. C. 1992. Spectral estimation techniques for multi-

look sar images of ocean waves. IEEE Trans. Geosci. Remote Sensing 30, 568–577.

Vanderbrug, G. and Rosenfeld, A. 1977. Two-stage template matching. IEEE

Trans. Comput. 26, 4 (April), 384–393.

Vanderburg, G. J. and Rosenfeld, A. 1977. Two-stage template matching.

IEEE Trans. Comput. 26, 384–393.

283

Vanne, J., Aho, E., Hamalainen, T. D., and Kuusilinna, K. 2006. A high-

performance sum of absolute difference implementation for motion estimation.

IEEE Trans. Circuits Syst. Video Technol. 16, 7 (July), 876–883.

Vetterli, M. and Nussbaumer, H. J. 1984. Simple fft and dct algorithms with

reduced number of operations. Signal Processing 6, 4, 267–278.

Vincent, E. and Lagani‘ere, R. 2001. Matching feature points in stereo pair a

comparative study of some matching strategies. Machine Graphics and Vision 10, 3,

237–259.

Vincenzo, R. and Lisa, U. 2007. An improvement of adaboost for face-detection

with motion and color information. ICIAP .

Wang, H. and Mersereau, R. 1999. Fast algorithms for the estimation of motion

vectors. IEEE Trans. Image Processing 8, 3, 435–438.

William, P., Saul, T., William, V., and Brian, F. 2007. Numerical Recipes:

The Art of Scientific Computing , 3nd ed. Cambridge University Press, Cambridge,

UK.

Wu, B., Haizhou, Huang, C., and Lao, S. 2004. Fast rotation invariant multi-

view face detection based on real adaboost. IEEE FG .

Wu, Q. X. 1995a. A correlation-relaxation-labeling framework for computing optical

flow - template matching from a new perspective. IEEE Trans. Pattern Anal.

Machine Intell. 17, 8 (September), 843–853.

Wu, Q. X. 1995b. A correlation-relaxation-labeling framework for computing optical

flow - template matching from a new perspective. IEEE Trans. Pattern Anal.

Machine Intell. 17, 8 (September), 843–853.

Wu, Q. X. and Pairman, D. 1995. A relaxation-labeling technique for computing

sea surface velocities from sea surface temperature. IEEE Trans. Geosci. Remote

Sensing 33, 1 (January), 216–220.

284

Wu, Q. X., Pairman, D., McNeill, S., and Barnes, E. J. 1992. Computing

advective velocities from satellite images of sea surface temperature. IEEE Trans.

Geosci. Remote Sensing 30, 166–176.

Xiao, J. and Shah, M. 2003. Two-frame wide baseline matching. In The

Ninth IEEE International Conference on Computer Vision (ICCV’03). IEEE,

Nice,France.

Yavne, R. 1968. A modified split-radix fft with fewer arithmetic operations. Proc.

AFIPS Fall Joint Computer Conf. 33, 115–125.

Yoshimura, S. and Kanade, T. 1994. Fast template matching based on the

normalized correlation by using multiresolution eigenimages. IEEE/RSJ/GI Int.

Conf. Int. Rob. and Sys.(IROS ’94) 3, 2086 – 2093.

Zhu, C., Qi, W., and Ser, W. 2005. Predictive fine granularity successive elim-

ination for fast optimal block-matching motion estimation. IEEE Trans. Image

Processing 14, 2 (February), 213–220.

Zhu, S. and Ma, K. K. 2000. A new diamond search algorithm for fast block-

matching motion estimation. IEEE Trans. Image Processing 9, 2, 287290.

Ziltova, B. and Flusser, J. 2003. Image registration methods: A survey. Image

and Vision Computing 21, 977–1000.

prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/1593/1/1263s.pdf · final defense committee...

Documents