convergence of a hill-climbing genetic algorithm for graph ...erh/rae/converge.pdf · [3,4],...

18
Pattern Recognition 33 (2000) 1863 } 1880 Convergence of a hill-climbing genetic algorithm for graph matching Andrew D.J. Cross, Richard Myers, Edwin R. Hancock* Department of Computer Science, University of York, York Y01 5DD, UK Received 10 December 1998; received in revised form 12 July 1999; accepted 12 July 1999 Abstract This paper presents a convergence analysis for the problem of consistent labelling using genetic search. The work builds on a recent empirical study of graph matching where we showed that a Bayesian consistency measure could be e$ciently optimised using a hybrid genetic search procedure which incorporated a hill-climbing step. In the present study we return to the algorithm and provide some theoretical justi"cation for its observed convergence behaviour. The novelty of the analysis is to demonstrate analytically that the hill-climbing step signi"cantly accelerates convergence, and that the convergence rate is polynomial in the size of the node-set of the graphs being matched. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Genetic algorithms; Graph matching; Convergence analysis; Consistent labelling; Hybrid genetic algorithm; Bayesian consistency measure 1. Introduction Con"gurational optimisation problems permeate all "elds of machine intelligence. Broadly speaking, they are concerned with assigning symbolic or discretely de"ned variables to sites organised on a regular or irregular network in such a way as to satisfy certain hard con- straints governing the structure of the "nal solution. The problem has been studied for over three decades. Con- crete examples include the travelling salesman [1] and N-queens problems [2] together with a variety of net- work labelling [3,4] and graph matching [5] or graph colouring problems. The search for consistent solutions has been addressed using a number of computational techniques. Early examples from the arti"cial intelligence literature include Mackworth's constraint networks [3,4], Waltz's use of discrete relaxation to locate consis- tent interpretations of line drawings [6], Shapiro and Haralick's use of forward-checking and backtracking to * Corresponding author. Tel.: #44-1904-43-2767; fax: #44- 1904-43-2767. E-mail address: erh@minster.cs.york.ac.uk (E.R. Hancock). solve the consistent labelling problem [7], together with a host of applications involving the AH algorithm [8,9]. More recently, the quest for e!ective search strategies has widened to include algorithms which o!er improved global convergence properties. Examples include the use of simulated annealing [1,10,11], mean-"eld annealing [12], tabu-search [13}16] and most recently genetic search [17]. Despite stimulating a large number of application studies in the machine intelligence literature, the conver- gence properties of these modern global optimisation methods are generally less well understood than their classical counterparts. For instance, in the case of genetic search, although there has been considerable e!ort directed at understanding the convergence for in"nite populations of linear chromosomes [18,19], little attention has been directed towards understanding the performance of the algorithm for discrete entities or- ganised on a network structure. However, in a recent study, we have taken some "rst steps in the analysis of genetic algorithms for consistent labelling. There are two contributions of our earlier work which merit further discussion. First, we have provided a factor analysis of the signi"cance of the di!erent parameters required for 0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 7 1 - 5

Upload: others

Post on 17-Feb-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Pattern Recognition 33 (2000) 1863}1880

Convergence of a hill-climbing genetic algorithmfor graph matching

Andrew D.J. Cross, Richard Myers, Edwin R. Hancock*

Department of Computer Science, University of York, York Y01 5DD, UK

Received 10 December 1998; received in revised form 12 July 1999; accepted 12 July 1999

Abstract

This paper presents a convergence analysis for the problem of consistent labelling using genetic search. The workbuilds on a recent empirical study of graph matching where we showed that a Bayesian consistency measure could bee$ciently optimised using a hybrid genetic search procedure which incorporated a hill-climbing step. In the present studywe return to the algorithm and provide some theoretical justi"cation for its observed convergence behaviour. The noveltyof the analysis is to demonstrate analytically that the hill-climbing step signi"cantly accelerates convergence, and that theconvergence rate is polynomial in the size of the node-set of the graphs being matched. ( 2000 Pattern RecognitionSociety. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Genetic algorithms; Graph matching; Convergence analysis; Consistent labelling; Hybrid genetic algorithm; Bayesianconsistency measure

1. Introduction

Con"gurational optimisation problems permeate all"elds of machine intelligence. Broadly speaking, they areconcerned with assigning symbolic or discretely de"nedvariables to sites organised on a regular or irregularnetwork in such a way as to satisfy certain hard con-straints governing the structure of the "nal solution. Theproblem has been studied for over three decades. Con-crete examples include the travelling salesman [1] andN-queens problems [2] together with a variety of net-work labelling [3,4] and graph matching [5] or graphcolouring problems. The search for consistent solutionshas been addressed using a number of computationaltechniques. Early examples from the arti"cial intelligenceliterature include Mackworth's constraint networks[3,4], Waltz's use of discrete relaxation to locate consis-tent interpretations of line drawings [6], Shapiro andHaralick's use of forward-checking and backtracking to

*Corresponding author. Tel.: #44-1904-43-2767; fax: #44-1904-43-2767.

E-mail address: [email protected] (E.R. Hancock).

solve the consistent labelling problem [7], together witha host of applications involving the AH algorithm [8,9].More recently, the quest for e!ective search strategies haswidened to include algorithms which o!er improvedglobal convergence properties. Examples include the useof simulated annealing [1,10,11], mean-"eld annealing[12], tabu-search [13}16] and most recently geneticsearch [17].

Despite stimulating a large number of applicationstudies in the machine intelligence literature, the conver-gence properties of these modern global optimisationmethods are generally less well understood than theirclassical counterparts. For instance, in the case of geneticsearch, although there has been considerable e!ortdirected at understanding the convergence for in"nitepopulations of linear chromosomes [18,19], littleattention has been directed towards understanding theperformance of the algorithm for discrete entities or-ganised on a network structure. However, in a recentstudy, we have taken some "rst steps in the analysis ofgenetic algorithms for consistent labelling. There are twocontributions of our earlier work which merit furtherdiscussion. First, we have provided a factor analysis ofthe signi"cance of the di!erent parameters required for

0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 7 1 - 5

convergence [20]. In a second study, we have performedan empirical investigation of genetic search for the graph-matching problem [21].

Our motivation for embarking on this study of geneticalgorithms for consistent labelling was previous work inwhich we had developed a Bayesian framework for gaug-ing relational consistency [22]. We have not only usedthe new theory to show how the underlying consistencymodel accounts for a number of associative memoryarchitectures [23,24], but have also developed somehighly e!ective algorithms for inexact graph matching[25}27] and correcting structural errors in networks[27,28]. Our initial aim was to consider how theBayesian consistency measure [22] could be used toassess "tness and how the process of correcting grapherrors via edit operations [27] could be mapped onto thearchitecture of genetic search [21]. The main conclusionsof our study were threefold. First, the consistent labellingof graphs was only amenable to genetic search if a hill-climbing operator was incorporated. Second, the qualityof the "nal solution was greatly improved if cross-over(or genetic recombination) was conducted by exchangingconnected subgraphs. Finally, we found the optimisationprocess to be relatively insensitive to the choice of muta-tion rate.

Unfortunately, our analysis of the empirical results hashitherto been extremely limited and has been couchedonly in terms of a rather qualitative model of the patternspace in which con"gurational optimisation is performed[21]. This has meant that we have been unable either topredict the convergence behaviour or to account for thethree interesting empirical properties listed above. Theaim in this paper is to remedy this shortcoming bypresenting a detailed analysis of algorithm behaviour. Itis important to stress that although there have beenseveral analyses of genetic search, these di!er from thestudy described here in three important ways. First, weare concerned speci"cally with the graph-matching prob-lem. This means that we present an analysis that is morepertinent to the consistent labelling problem where thereis network organisation rather than a linear chromo-some. Second, we pose our analysis in terms of discreteassignment variables rather than continuous ones. Fi-nally, we deal with the in"nite population size assump-tion in a critical manner.

2. Paper outline

Stated succinctly, the aim of this paper is to provide ananalysis of the convergence of the genetic optimisationprocedure when applied to the problem of graph match-ing [5,29}31]. We commence by investigating each of thegenetic operators in turn. Once we have understood thebehaviour of the operators in isolation, we turn ourattention to predicting their collective behaviour. In or-

der to demonstrate the validity of our theoretical predic-tions, we compare them with a Monte-Carlo study.

It is important at the outset to point out that theanalysis presented in this paper commences from thesame basic standpoint as the work of Qi and Palmieri[18,19]. However, there are two principal di!erencesbetween their analysis and the one presented here.First, their model of the genetic algorithm operates in acontinuous space. While this renders the analysis al-gebraically tractable, it represents a simplistic model ofdiscretely de"ned con"gurational optimisation which isunrealistic. Second, their analysis makes non-speci"cassumptions concerning the form of the "tness function.In consequence, it is of limited use in understandingthe graph-matching or consistent labelling problem.

By contrast, we provide an analysis which is morespeci"c to the Bayesian framework developed by Wilsonand Hancock [26] for consistent labelling problems. It isthe Bayesian consistency measure developed in the workthat has been explored in an empirical manner in theevolutionary search procedure developed by Cross andHancock [21]. The new analysis presented here not onlyallows us to develop quantitative predictions of overallpopulation behaviour, it also allows us to attempt a real-istic analysis of the algorithm time complexity. One of themain conclusions of our empirical study of genetic search[21] was that the addition of a hill-climbing operator canyield signi"cant improvements in both the convergencerate and solution quality. For this reason, we will supple-ment our analysis of the traditional operators with astudy of the hybrid hill-climbing algorithm. This is one ofthe novel contributions of the paper.

While the results obtained for the mutation and cross-over operators are relatively generic, the analysis of theselection and hill-climbing operators require a more de-tailed model of the problem at hand. It is here that ouranalysis becomes problem-speci"c to graph matching.

The paper outline is as follows. In Section 3, we brie#yreview the "tness measure that underpins our graph-matching algorithm. Section 4 details the main stages ofthe genetic search procedure. These two sections aree!ectively a synopsis of our recent paper [21] whichreports the details of the evolutionary graph-matchingtechnique. In Section 5, we commence our modelling ofthe distribution of the Bayesian consistency measurewhich ful"ls the role of "tness in our graph-matchingtechnique. Section 6 exploits this distribution-model topredict the evolution of "tness under each of the geneticoperators, i.e. mutation, selection, cross-over and hill-climbing. In Section 7, we use the individual operatorcharacteristics to understand the iterative behaviour ofthe combined hill-climbing genetic operator. This sectionalso comments on the validity of our analysis. Section8 provides some illustrative examples of the graph-matching method for real-world images. Finally, Section9 o!ers some conclusions.

1864 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

3. Relational graphs

Central to this paper is the aim of matching relationalgraphs represented in terms of con"gurations of symboliclabels. We represent such a graph by G"(<,E), where< is the symbolic label-set assigned to the set of nodesand E is the set of edges between the nodes. Formally, werepresent the matching of the nodes in the data graphG

1"(<

1,E

1) against those in the model graph

G2"(<

2, E

2) by the function f : <

1P<

2. In other

words, the current state of match is denoted by the set ofCartesian pairs constituting the function f.

In order to describe local interactions between thenodes at a manageable level, we will represent the graphsin terms of their clique structure. The clique associatedwith the node indexed j consists of those nodes thatare connected by an edge of the graph, i.e. C

j"

jXMi3<1D(i, j)3E

1N. The labelling or mapping of this

clique onto the nodes of the graph G2

is denoted by!j"M f (i)3<

2, ∀i3C

jN. Suppose that we have access

to a set of patterns that represent feasible relationalmappings between the cliques of graph G

1and those of

graph G2. Typically, these relational mappings would be

con"gurations of consistent clique labellings which wewant to recover from an initial inconsistent state of thematched graph G

1. Assume that there are Z

jrelational

mappings for the clique Cj

which we denote by"k"Mjk

i3<

2, ∀i3C

jN where k3M1, 22Z

jN is a pattern

index. According to this notation jki3<

2is the match

onto graph G2

assigned to the node i3<1

of graph G1

bythe kth relational mapping. The complete set of legalrelational mappings for the clique C

jare stored in a dic-

tionary which we denote by #j"M"kDk"1, 2,2, Z

jN.

The discrete relaxation procedure is based on maxi-mising the joint probability of the matched label con"g-uration, i.e. P(!

j). It is therefore necessary to "nd a way

of enumerating P(!j) when the label con"guration is

highly inconsistent. The approach is to adopt a Bayesianviewpoint in which it is assumed that only consistentlabellings in the dictionary are legal and have uniformnon-zero a priori probabilities of occurrence, i.e.P("k)"Z~1

j. Other con"gurations do not occur a priori

but are the corrupted realisations of the dictionary items.This idea is realised by applying the axiomatic propertyof joint probability to expand P(!

j) over the space of

consistent con"gurations

P(!j)"

Zj

+k/1

P(!jD"k)P("k). (1)

Further development of a useful objective function fordiscrete relaxation requires a model of the label corrup-tion process, that is of the conditional probabilities ofthe potentially inconsistent con"gurations given each ofthe Z

jfeasible relational mappings P(!

jD"k). We adopt a

very simple viewpoint; matching errors are assumed to bememoryless and to occur with uniform probability.

The "rst consequence of the assumed absence of mem-ory is that the errors are independent. As a result we canfactorize the conditional probabilities over the individualnodes in the graph, i.e.

P(!jD"k)"<

i|Cj

P( f (i)Djki) (2)

Our next step is to propose a model for the label corrup-tion mechanism at each node in the graph. Again, takingrecourse to the memoryless assumption, we take theprobability of label errors on individual objects to beindependent of the class of label. If P

eis the label error

probability, then the distribution function for the labelconfusion probabilities is

P( f (i)Djki)"G

1!Pe

if f (i)"jki,

Pe

otherwise.(3)

As a result of this distribution rule, the conditionalmatching probabilities depend on the Hamming distanceH(!

j, "k) between the matched con"guration !

jand the

individual dictionary items "k, i.e.

P(!jD"k)"(1!P

e)@Cj @~H(!j ,"k)PH(!j ,"k)

e, (4)

where the Hamming distance H(!j, "k) is de"ned using

the Kronecker delta function to be H(!j, "k)"

+i|Cj

(1!df(i),jk

i). The model components given in Eqs.

(2)}(4) naturally lead to the following expression for P(!j)

in terms of the set of Hamming distances to the consistentlabellings residing in the dictionary

P(!j)"

b

Zj

Zj

+k/1

exp[!keH(!

j, "k)] (5)

where b"(1!Pe)@Cj @ and k

e"ln(1!P

e)/P

e. According

to our simple model of label errors Hamming distance isthe basic measure of consistency. Systematic softeningof the constraints residing in the dictionary is controlledby the parameter P

e.

The con"gurational probability P(!j) is the basic in-

gredient of our genetic search procedure. It represents theprobability of a particular matching con"guration evalu-ated over the state space of feasible possibilities (i.e. thedictionary). We use as our global measure of consistencythe average of the clique con"gurational probabilities, i.e.

PG"

1

D<1D+j|V1

P(!j). (6)

In the next section of this paper we will describe how thisaverage consistency measure can be utilised as a "tnessmeasure in the genetic search for relational matches.

4. Genetic search

In a recent paper [21], we showed that a hill-climbinggenetic search procedure provides a very natural way of

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1865

locating the global optimum of the global consistencymeasure described in the previous section. In essence, theapproach relies on generating a population of randominitial matching con"gurations. These undergo cross-over, mutation and selection to locate the match thatoptimises the Bayesian consistency measure de"ned inthe previous section of this paper. The main stages ofthe algorithm are outlined below and more detaileddiscussion can be found in Ref. [21].

4.1. Initial population generation

The idea underpinning genetic search is to maintaina population of alternative solution vectors and to re"nethis population using various evolutionary operators. Todistinguish the di!erent solutions we use a populationindex a. We let f (a)(i) denote the match assigned to thenode i in the data-graph by the ath solution in the currentpopulation. The "tness associated with the solution in-dexed a is denoted by Pa

G.

In order to initialise the algorithm we randomly assignmatches. In other words, our initial solution vectors arerandom con"gurations of labels drawn from the modelgraph.

4.2. Cross-over

Cross-over is the process which mixes the pool ofsolutions to produce new ones. If e!ectively controlled,the process can be used to combine pairs of suboptimalor partially consistent matches to produce one of im-proved global consistency. Typically, deterministic up-dating of the match will propagate constraints only overthe distance of one neighbourhood with each iteration.Cross-over can accelerate this process by combining dis-connected yet internally consistent subgraphs from theindividual solutions in the pool.

The standard cross-over procedure involves selectingat random pairs of global matching con"gurations fromthe current population. Matches at the correspondingsites at randomly chosen locations in the two graphs arethen interchanged with uniform probability 1

2. However,

this uniform cross-over mechanism will not necessarilyfacilitate the merging of locally consistent subgraphs.Moreover, the process also ignores the underlying struc-ture of the graphs. A better strategy is to combine thesolutions by physically dividing the graphs into twodisjoint subgraphs. In this way internally consistent por-tions of the individual solutions may be exchanged at thestructural level.

4.3. Mutation

A further randomisation stage is applied to the indi-vidual matches to introduce new information into the

population of global matches through a process of muta-tion. This is e!ected by randomly re-assigning the match-es on individual sites. The probability of re-assignment isuniform across the sites. In other words, we randomlyre-assign a "xed fraction, the matches in f a(i), with ran-dom labels selected from the set <

2.

4.4. Hill-climbing

The aim in performing hill-climbing operations is torestore consistency to graphs modi"ed by the cross-overand mutation operations. Although this can be e!ectedby stochastic means, this is time consuming. The hill-climbing stage involves iteratively recon"guring thegraphs modi"ed by cross-over or mutation to maximisethe value of Pa

G. Formally, this corresponds to a parallel

iterative application of the following decision rule:

f (a)(i)"arg maxV2

PaG. (7)

This application of the rule has the e!ect of locating thenearest local optima of the global consistency measure. Ittherefore redistributes the population of solutions toreside at the modes of this "tness measure. Suboptimalmodes become increasingly unlikely as they are removedfrom the population by the stochastic selection opera-tions. This process not only accelerates convergence, butalso diminishes the requirement for a large population ofgraphs.

4.5. Selection

The "nal stochastic element of genetic search is theselection process. The aim here is to randomly admitthe con"gurations re"ned by the hill-climbing process tothe population on the basis of their "tness measure. Theprobability distribution de"ned in Eq. (5) lends itselfnaturally to the de"nition of a population membershipprobability. Suppose that P(a)

Gdenotes the global con"g-

urational probability for the ath member of the pool(population) of graphs. By normalising the sum of cliquecon"guration probabilities over the population ofmatches, the probability for randomly admitting the athsolution to the pool of graphs P is

Pas"

P(a)G

+a{|PP(a{)G

. (8)

4.6. Empirical xndings

Based on an empirical study of the resulting graphmatching algorithm, we reached the following con-clusions concerning its convergence behaviour:

f The method was relatively insensitive to mutation rate.In fact, provided that the mutation probability did not

1866 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

exceed 0.600, then the number of iterations requiredfor convergence was approximately constant.

f The addition of the hill-climbing step considerablyreduced the number of iterations required for conver-gence.

f Once the population size exceeded a critical value,then the convergence rate was essentially independentof population size.

f The number of iterations required for convergence wasapproximately polynomial in the number of graphnodes.

The aim in the remainder of this paper is to provide ananalysis which supports these empirical "ndings.

5. Distribution analysis

We formulate our investigation of graph matching asa discrete-time process with the states de"ned over thestate space of all possible correspondences between a pairof graphs. Our analysis of the population is a statisticalone, in which we assume that the population is su$-ciently large that we can invoke the central limit theorem.For this reason we direct our attention to the modellingof the probability density function for the distribution ofsolution vectors.

5.1. Formal ingredients of the model

For the problem of graph matching, each solutionvector f (a) : <

1P<

2represents the labelling of the nodes

of a data graph <1

with the nodes of a model graph <2.

In order to simplify the analysis, we will assume that thepair of graphs have an identical number of nodes, i.e.D<D"D<

1D"D<

2D.

To commence our modelling of the distribution ofsolution vectors, we focus our attention on the fractionof mappings that are in agreement with known groundtruth. If the con"guration of ground-truth correspond-ence matches is denoted by fI , then the fraction of cor-rectly assigned matches for the solution vector indexeda is equal to

F(n)a "

1

D<D+i|V

d (a)f (i),fI (i)

, (9)

where a is a population index of the solution vector, nis the iteration number and d is the Kronecker deltafunction. A solution vector f (a) in which each of thematches is correct would have F(n)a "1. By contrast,a solution vector in which none of the correspondencematches are correct would have F(n)a "0. In order toanalyse how the genetic graph-matching process per-forms, we wish to evaluate the distribution of F(n)a overthe entire population of candidate solution vectors. At

iteration n, we denote the distribution of fractionalmatching error by P(n)

D(F"c).

The overall goal of our analysis is to model how thedistribution of the fraction of correct matches evolveswith iteration number. For reasons of tractability, welargely con"ne our attention to understanding how themean fraction of correct matches with iteration number.The quantity of interest is

SF(n)a T"1

DPD+a|P

F(n)a "PccP(n)D

(F"c) dc. (10)

Since the mutation and cross-over operators do not drawupon the "tness of the individuals in the population, itfollows that we can model their e!ect on the distributionof correct matches without reference to the speci"c na-ture of our "tness measure. In contrast, the selectionoperator draws upon our measure of relational consist-ency in order to determine the probability that eachindividual belonging to the population survives into thenext generation. For graph matching, it is clear that the"tness measure is not related in a monotonic manner tothe fraction of correct matches. In other words, there isno one-to-one relationship between P

Gand F(n)a . It is for

this reason that we must turn to genetic search ratherthan hill-climbing gradient ascent as a means of opti-misation. However, since the "tness measure draws onHamming distance between the super-cliques as a meansof gauging relational consistency via Eqs. (5) and (6),we would expect that P

Gbecomes small when F(n)a ap-

proaches zero. Conversely, PG

will be large when F(n)aapproaches unity. In other words, extreme survivalprobabilities should correspond to the extremes of thedistribution of correct matches.

Most attempts to analyse the convergence propertiesof genetic algorithms [18,19] have relied on the assump-tion that the population size is in"nite. Although thisassumption is made for reasons of computational tracta-bility, it is rarely possible to realise it in practice. In fact,such an assumption corresponds to taking the case inwhich the entire search space is populated by candidatesolutions. This is clearly at odds with the spirit of geneticsearch, where it is the role of the evolutionary operatorsto advantageously position a small number of distinctindividuals. Moreover, if the whole sample space were"lled with solutions, then global optimisation could betrivially performed by exhaustive selection of the "ttestindividual. However, while we criticise the in"nite popu-lation size assumption on the grounds of realism, we donot wish to detract from its importance in understandingthe convergence properties of genetic search. One of theimportant features of our analysis is that even at relative-ly small population sizes, our results predict the conver-gence behaviour accurately. It is interesting to note thatin their analysis Qi and Palmieri [18,19] show that the"nite population density function approaches the in"nite

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1867

population limit with a fractional error that is propor-tional to 1/JD<D.

We commence our analysis by assuming that the num-ber of correct matches in the initial population follows abinomial distribution. If the initial population is chosenin a random fashion, then at the outset the expectedfraction of correct matches is given by

SF(0)T"1

D<D(11)

Under the binomial assumption, the number of correctmatches has mean F(0)D<D and standard deviationJD<DF(0)(1!F(0)). As a result the initial number of cor-rect matches q is distributed in the following manner:

P(q)"(D<D)!

q!(D<D!q)!(SF(0)T)q(1!SF(0)T)@V@~q. (12)

Since we are interested in the fraction of correct matches,we turn our attention to the distribution of the randomvariable c"q/D<D. By appealing to the central limit the-orem under the assumption that the graphs have largenumbers of nodes, we can replace the binomial distribu-tion of the number of correct matches by a Gaussiandistribution of the fraction of correct matches. In prac-tice, we deal with graphs whose size exceeds 40 nodes,and so this approximation will be a faithful one. Theprobability distribution for the fraction of correct solu-tions in the population is therefore

P(0)D

(F(n)"c)+1

J2pD<DSF(0)T(1!SF(0)T)

]exp A!(D<Dc!D<DSF(0)T)2

2D<DSF(0)T(1!SF(0)T)B. (13)

Because the distribution is Gaussian, the mode is locatedat the position c"D<DSF(0)T.

6. Genetic operators

In this section we will investigate the role that the threetraditional genetic operators, i.e. mutation, cross-overand selection, play in the evolution of the average frac-tion of correct solutions in the genetic population. Wewill supplement this analysis with a discussion of thehill-climbing process. At this stage, our interest lies notwith the prediction of the collective behaviour of theoperators, but with the e!ect that each one has in isola-tion upon the population. Collective behaviour is thetopic of Section 7.

6.1. The mutation operator

The goal of the mutation operator is to increase popu-lation diversity by performing a stochastic state swap-

ping process. This is e!ected by randomly re-assigning onthe individual node matches. The process is applied toeach of the di!erent solutions that constitute the currentpopulation. This process proceeds independently forboth the individual nodes and the individual solutions.This is in contrast with the cross-over which serves toexchange information between pairs of solutions in orderto form new individuals that are admitted to the popula-tion on the basis of their "tness. The uniform probabilityof any match undergoing a random state swapping pro-cess is P

m. For each individual solution vector, there are

three possible transitions that can occur in the state ofmatch. First, an individual mutation could increase thenumber of correct matches by one unit; in this casethe increase in the fraction of correct matches isPm(1!F(n)a ) (1/ D<D). The second possible outcome is a re-

duction in the number of correctly assigned matches byone unit; in this case the decrease in the fraction ofcorrect matches is P

mF(n)a ( D<D!1)/D<D. Finally, the muta-

tion could leave the number of correct correspondencesunchanged; in this case the fraction of correct matchesremains unchanged at the value F(n)a . For moderate muta-tion rates, the most likely change to the fraction ofmatches is due to the second transition. This correspondsto a disruption of the set of correctly assigned matches.

We are interested in the e!ect that the mutation oper-ator has upon a solution vector in which the fraction ofcorrect matches is F(n)a at iteration n. In particular, wewould like to compute the average value of the fraction ofcorrect matches at iteration n#1. Based on the threeassignment transitions outlined above, the new averagefraction of correct matches is

SF(n`1)a T"F(n)a #Pm(1!F(n)a )

1

D<D!P

mF(n)a

D<D!1

D<D. (14)

After some straightforward algebra, we can re-write thisrecursion formula in terms of the fraction of matchescorrect at the outset, i.e. F(0)a . As a result, the averagefraction of correct matches at iteration n is

SF(n)a T"(1!Pm)n AF(0)a !

1

D<DB#1

D<D. (15)

This result depends on only the iteration number and theinitial fraction of correct matches. We can make theexponential character of the formula clearer by rewritingit in terms of the natural exponential function. As a result

SF(n)a T"1

D<D#AF(0)a !

1

D<DB exp (!kmn), (16)

where km"ln (1/(1!P

m)). There are a number of inter-

esting features of this formula that deserve further com-ment. First, the equation represents an exponential decaythat tends towards a minimum value of 1/D<D, i.e. theprobability of randomly assigning a correct match. The

1868 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

Fig. 1. A numerical simulation of the distribution of correctsolutions in the population using only the mutation operator.

Fig. 2. The comparison of the analytic predictions of the muta-tion operator with the simulation run results.

rate of decay is determined by the logarithm of theprobability that a mutation operation does not takeplace, i.e. 1!P

m. In qualitative terms, the mutation

process represents an exponential drift towards highlyun"t solutions. The rate of drift is controlled by twofactors. The "rst of these is the mutation rate P

m. As the

mutation probability increases, the disruptive e!ect ofthe operator becomes more pronounced. The secondfactor is the initial fraction of correct matches thatwere present prior to mutation. As this initial fraction ofcorrect matches increases, then so does the disruptivee!ect of the mutation operator. The e!ect of this seconddrift process is to impose a higher rate of disruptionon solutions in the population that are approachinga consistent state. Poor or highly inconsistent solutions,on the other hand, are not signi"cantly a!ected. Thislatter drift e!ect can be viewed as a natural mechanismfor escaping from local optima that may be encounteredwhen the global solution is approached in a complex"tness landscape.

After a very large number of mutation processes,the population will approach a completely randomstate. In this state the number of correct matches willbe governed by a binomial distribution in which theexpected value of the fraction of correct matches is 1/D<D.In other words, we expect any initial distribution ofsolutions to drift towards a binomial one in an exponen-tial fashion.

To verify the validity of the result, we have conducteda Monte-Carlo simulation. We set the mutation prob-ability to a value of 1

40and plot the distribution P

Das a

function of iteration number. The resulting plot is givenin Fig. 1. We observe the predicted exponential decay.Speci"cally, the peak of the population distribution driftstowards the origin in the fashion predicted. Moreover,the width of the distribution becomes narrower as itapproaches a binomial distribution at the origin. Inorder to quantify this process we plot the most probablefraction of correct matches in the population against

iteration number. This plot is shown in Fig. 2. We notethat there is a good agreement between our prediction ofexponential decay and what is observed experimentally.

6.2. An analysis of the selection operator

In contrast with the mutation operator which is a uni-form random process, selection draws upon the "tnessfunction to determine the probability that the di!erentsolution vectors survive into the next population genera-tion. Because of this task-speci"c nature of the "tnessfunction, it is not possible to undertake a general analysisof the selection process. Moreover, in the case of graphmatching the compound exponential structure of our"tness measure further complicates the analysis. To over-come this problem, we present an approximation to ourBayesian consistency measure which allows us to relatethe survival probability to the fraction of correctmatches. This approximate expression for the survivalprobability turns out to be polynomial in the fraction ofcorrect matches.

We commence by writing the "tness using the expres-sion for the super-clique matching probability given inEq. (6). To make the role of error probability moreexplicit, we re-write the matching probability in terms ofKronecker delta functions that express the compatibilitybetween the current matching assignments and the con-sistent matches demanded by the con"guration residingin the dictionary. As a result

P(!j)"

1

D#jD

+"k|#j

<i|Cj

P(1~df(i),jki )e

(1!Pe)df(i),jki . (17)

Our aim is to compute the average value of the globalconsistency measure, Pa

G. Because the consistency

function averages the matching probability P(!j), the

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1869

Fig. 3. A plot of the empirical values of our "tness measure asa function of the fraction of correct mappings. The dashed linerepresents our theoretical prediction, while the points are thecollected data.

expected value of the global probability is equal to

PaG"EC

1

D#jD

+"k|#j

<i|Cj

P(1~df(i),jki )e

(1!Pe)df(i),jkiD. (18)

We now note that the expected value of the exponentialfunction under the product can be re-expressed in termsof the assignment probabilities in the following manner:

E[P(1~df(i),jki )e

(1!Pe)df(i),jki]

"PeP( f (i)Ojk

i)#(1!P

e)P( f (i)"jk

i). (19)

As a result, the expected value of the global matchingprobability, i.e. the probability of survival, is equal to

PaG"

1

D#jD

+"k|#j

<i|Cj

MPeP( f (i)Ojk

i)

#(1!Pe)P( f (i)"jk

i)N. (20)

Unfortunately, this expression still contains reference tothe dictionary of structure-preserving mappings. In orderto further simplify matters, we observe that when thecon"guration of assigned matches becomes consistentthen, provided the error probability P

eis small, we would

expect the sum of exponentials appearing in Eq. (5) to bedominated by the single dictionary item that is fullycongruent with the ground-truth match. The remainingdictionary items make a negligible contribution. Supposethat k( is the index of the correctly matching dictionaryitem, then we can write

exp[!keH(!

j, "k( )]A +

"k|#j~"k(exp[!k

eH(!

j, "k)]. (21)

We can now approximate the super-clique matchingprobability by considering only the dominant dictionaryitem. This allows us to remove the summation overdictionary items. Finally, we note that the average valueof the probability of correspondence match, i.e.P( f (i)"jk

i), is simply equal to the fraction of correct

matches F(n)a . By assuming that all super-cliques are ofapproximately the same average cardinality, denoted byDCK D, we can approximate the global probability of matchin the following manner:

PaG"

1

D#jD[P

e(1!F(n)a )#(1!P

e)F(n)a ]@CK @. (22)

In other words, our measure of relational consistency ispolynomial in the fraction of correct matches. Moreover,the order of the polynomial is equal to the average nodeconnectivity DCK D. As the average neighbourhood size ornode connectivity in the graphs increases, the discrimi-nating power of the cost function becomes more pro-nounced.

The model developed in this section relates theBayesian consistency measure to the fraction of correct

matches in the population. In order to verify the model,we have conducted the following experiment. We havegenerated a large number of random graphs. For eachgraph in turn, we have generated a set of random self-correspondences with varying degrees of error. For eachset of correspondences, we have computed the Bayesianconsistency measure. We have then compared the mea-sured and predicted values of Pa

Gfor each value of the

known fraction of correct matches. This plot is shown inFig. 3 for randomly generated graphs containing 40nodes. The graphs used in this experiment are Delaunaytriangulations generated from random point sets. For theDelaunay triangulation, the average clique size or nodeconnectivity is DCK D"5.2. The predicted value agrees wellwith the true match probability P

Gas the fractional error

increases. At lower values, the disparity between ourprediction and the true functional becomes more pro-nounced, although the general form of the curve is stillre#ected in the data.

Having derived an approximate relationship betweenthe Bayesian measure of relational consistency and thefraction of correct matches, we recall that the survivalprobability for the solution vector indexed a betweengenerations n and n#1 under roulette wheel selectionstrategy [32] is given by

PaSelection

"

PaG

+b|PPbG

. (23)

In order to model the selection process, we recast theroulette wheel probabilities in terms of the distribution ofthe fraction of correct matches. This involves takingthe approximation for the matching probability andweighting according to the probability distribution for

1870 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

Fig. 4. The predicted fraction of correct mappings as a functionof the iteration number when using only selection.

the fraction of correct matches. The expression for theselection probability becomes

P (n`1)D

(F(n)"c)"P (n)

D(c)P

G(c)

:10P (n)

D(c@)P

G(c@) dc

. (24)

The normalisation integral appearing in the denomin-ator is not of prime interest to us since it depends only oniteration number and not upon the overall fraction ofcorrect matches. Here we are concerned with the depend-ence on the fraction of correct matches. In order toproceed we observe that the normalisation factor servesonly to guarantee that the total survival probability sumsto unity. Stated alternatively, this means that the popula-tions at generations n and n#1 are of the same size.Furthermore, if we con"ne our attention to the conver-gence properties of the mode of the distribution of thefraction of correct matches, we can neglect the normaliz-ation term and investigate the quantity

P (n`1)D

(F(n`1)"c)"B(n)P (n)D

(c)PG(c), (25)

where B(n) is an iteration-dependent normalization con-stant that is not a function of the fraction of matchescorrect. The recursive application of this equation yieldsthe distribution of the fraction of correct matches at anygeneration in terms of the initial fraction. The relation-ship is a power law of the form

P (n)D

(F(n`1)"c)"a(n)P (0)D

(c)PG(c)n. (26)

Substituting for the approximate initial distributiongiven in Eq. (13) together with our approximation to thecost function from Eq. (22), we "nd

P (n)D

(F(n)"c)+1

J2pD<DF(0)(1!F(0))

]exp A!(D<Dc!D<DF(0))2

2D<DF(0)(1!F(0))B](P

e(1!c)#(1!P

e)c)(@CK @n). (27)

The required distribution is simply a Gaussian distribu-tion that is modulated by a polynomial of order DCK Dn.This demonstrates that the average fraction of correctmatches in the population will tend to increase as thevalue of n increases. In other words, the iteration processimproves the fraction of correct matches. By con"ningour attention to the solutions that occur most frequentlyin the population, we can track the iteration dependenceof the mode or peak, F(n)

max, of the distribution of correct

matches. To locate the most frequently occurring solu-tion in the population, we proceed as follows. First, weevaluate the derivative of the distribution function in Eq.(27) with respect to the fraction of correct mappings, i.e. c.Next, we set the derivative equal to zero. By solving theresulting saddle-point equation for F(n) and after rejectingthe non-physical values of c that fall outside the interval

[0, 1], we "nd that the maximum value PD

is located atthe position

F(n).!9

"

1

2AF(0)#Pe

i2

#

Ji21(P2e#2P

eF(0)i

2#F(0)i2

2)#2i

1i22DCK Dn

i1i2

B, (28)

where

i1"

(D<D)2F(0)(1!F(0))

(29)

and

i2"1!2P

e. (30)

In Fig. 4 we plot the iteration dependence of the modalfraction of correct matches. The example shown in thisplot has the label error probability set to P

e"0.01, the

graph size D<D is 40, and the average super-clique size DCK Dis 5.5.

With this model of the iteration dependence ofF(n)max

under the selection operator to hand, we are in aposition to compute the number of iterations required foralgorithm convergence. Our convergence condition isthat the modal fraction of correct matches is unity. Weidentify the value of the iteration index n that satis"es thiscondition by setting F(n)

max"1 in Eq. (28). Furthermore,

we assume that the initial population is randomly chosenand as a result F(0)"1/D<D. Solving for n, we "nd thenumber of iterations required for convergence to beequal to

nconverge

"

(D<D)2(1!Pe)

DCK D(1!2Pe). (31)

In other words, commencing from a simple model of theselection process that uses a number of domain-speci"c

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1871

Fig. 5. The predicted number of iterations needed to ensureconvergence as a function of the graph size.

Fig. 6. A numerical simulation of the distribution of correctsolutions in a genetic population using only the selection oper-ator.

assumptions concerning our Bayesian consistency mea-sure, we have shown that we would expect the number ofiterations required for convergence to be polynomialwith respect to the number of nodes in the graphs undermatch. A plot of the number of iterations required forconvergence is shown in Fig. 5.

In order to provide some justi"cation for our model-ling of the population mode, we have investigated howthe selection operator modi"es the distribution of correctcorrespondences in the genetic population. To embarkon this study, we use our simple simulation process togenerate the distribution P(n)

Dunder the selection operator

as a function of the iteration number. The results areshown in Fig. 6. The main point to note is that the widthof the distribution remains narrow as the iterations pro-ceed. This is because only the selection operator is used.It is important to stress that there is no diversi"cationprocess at play.

6.3. An analysis of the cross-over operator

The cross-over or recombination operator is respon-sible for exchanging the assigned matches at correspond-ing sites in pairs of solution vectors. This contrasts withthe mutation and hill-climbing stages of the geneticsearch procedure, where the matches are re-assigned. Inother words, cross-over mixes the matching con"gura-tions through an exchange process, while mutation andhill-climbing are responsible for recon"guration.

To proceed with our analysis of the cross-over oper-ator, let us assume that we are exchanging the assignedmatches between the candidate solutions indexed a andb. These two solutions are chosen at random from thepopulation at iteration n. The fraction of matches cor-rectly assigned in the solution indexed a is F(n)a , while forthe solution indexed b the fraction is F(n)b . Suppose thatthe fraction of nodes exchanged from the solution in-dexed a to the solution indexed b is denoted by s. Underthese circumstances, we form two new solutions, indexeda@ and b@ which have the following fractions of correctmatches

F(n)a{ "sF(n)a #(1!s)F(n)b , (32)

F(n)b{"(1!s)F(n)a #sF(n)b . (33)

Since the recombination process involves only swapsand does not recon"gure the locations of the matches, ithas no overall e!ect on the average number of correctlyassigned node matches per solution in the population. Itwill however blur the distributions of both "tness and thefraction of correct solutions.

Let us consider the e!ect of a large number of cross-over operations. According to the central-limit theorem,if we consider the distribution of a random variableassociated with the solutions in the population, then asthe number of trials increases then so the distribution willapproach a Gaussian. To provide an illustration of thise!ect, we construct an initial population in which eachsolution has half the matches correctly assigned, i.e.F(n)"0.5. After "ve cross-over iterations, the resultingpopulation distributions are plotted in Fig. 7.

This blurring process, while not directly e!ecting thealgorithm convergence, plays an important role whencombined with the selection operator. The blurring oc-curs symmetrically about the mode of the distribution. Ifthis blurred distribution were subjected to a selectionoperation, then the high "tness tail would be selected inpreference to the low-"tness tail. The resulting distribu-tion would therefore become skewed towards the higherfractions of correct matches.

6.4. An analysis of the hill-climbing operator

Since the hill-climbing operator is only used to makelocal changes that increase P

G, it is clear that it can only

1872 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

Fig. 7. A demonstration of the blurring process e!ected by thegenetic cross-over operator.

Fig. 8. Empirical results demonstrating how we expect gradientascent to perform. The dotted curve represents the best "t thatwas found.

Fig. 9. A prediction of the number of iterations required toensure convergence when using only the hill-climbing operator.

improve the quality of the match. In this section of ouranalysis, we aim to determine to what extent the hill-climbing operator e!ects the overall convergence rate ofour algorithm.

Modelling the behaviour of the gradient ascent stepof the algorithm is clearly a di$cult problem since it ishighly dependent on the local structure of the globallandscape of the "tness measure. One way of simplifyingthe analysis is to adopt a semi-empirical approach. Herewe aim to Monte-Carlo the gradient ascent process andextract a parameterisation of the iteration dependence ofthe required distribution parameters. Our starting pointis to generate 1000 random graphs. Commencing froma controlled fraction of initially correct matches, we per-form gradient ascent until the con"guration of matchesstabilises and no more updates can be made. We plot the"nal fraction of correct matches against the fractioninitially correct in Fig. 8. The best "t to the data gives thefollowing iteration dependence:

F(n`1)a "1!(1!F(n)a )2.8. (34)

This result relates the fraction of correct matches atiterations n and n#1 resulting from the application ofthe hill-climbing operator.

By expanding the recursion in iteration number, wecan obtain the dependence on the initial fraction ofcorrect matches. At iteration n, the fraction of correctmatches is given by

F(n)a "1!(1!F(0)a )2.8n. (35)

We can use the empirical iteration dependence of theexpected fraction of correct solutions to make a numberof predictions about the convergence rate of population-based hill-climbing. We commence by assuming that theinitial set of matches is selected in a random manner. Asbefore, this corresponds to the case F(0)"1/D<D. Ourcondition for convergence is that less than one of thematches per solution is in error, i.e. F(n)'(D<D!1)/D<D.By substituting this condition into Eq. (35) and solvingfor n, we "nd

n"0.36ln(D<D)

ln[D<D/ ( D< D!1)]. (36)

So when the graphs are large nK0.36 D< D ln D< D.The number of iterations required for convergenceis plotted as a function of graph size in Fig. 9. Theconvergence rate increases slowly with graph size. Formodest numbers of nodes the increase is approximatelylinear.

It is interesting to contrast the dependence on graphsize with that for the selection operator (see Fig. 9).Whereas hill-climbing has a slow dependence on graphsize, in the case of selection there is a more rapid poly-nomial dependence. As a "gure of merit, for a graph ofsize 50 nodes, the number of iterations required by selec-tion is a factor of 10 larger than that required by hill-climbing.

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1873

7. An analysis of the combined operators

As we pointed our earlier, the overall goal in this paperis an analysis of the combined e!ect of the mutation,cross-over, selection and hill-climbing operators de-scribed in Section 3. This is not a straightforward task.Previous attempts have included the work of Qi andPalmieri [18,19]. The distinguishing feature of this workwas to use a general framework to derive a su$cientcondition for monotonic increase of the average "tnessunder the processes of selection and mutation. Unfortu-nately, the framework assumes that the optimisationprocess operates in a continuous space rather than a dis-crete one. It is the discrete search space of the consistentlabelling problem posed by graph matching which is thefocus of attention in this paper.

7.1. Standard genetic search

Our aim is to extend the analysis of the individualoperators presented in Section 6 by deriving a su$cientcondition that ensures a monotonic increase of the ex-pected fraction of correct solutions when composite op-erators are applied. To embark on this study, we must"rst consider the order in which the di!erent geneticoperators are applied. As the population of candidatesolutions enters the new iteration n#1 from the preced-ing iteration n, we "rst perform the cross-over operation.As we have discussed earlier, this process results ina post-cross-over population that is distributed accord-ing to a Gaussian distribution. As a result the mode of thedistribution is located where the fraction of correct solu-tions is equal to F(n), while we let the standard deviationof the distribution be equal to p(n). The mutation oper-ator is applied after the cross-over process. The maine!ect is to shift the mode of the Gaussian distribution toa lower "tness value. To model the combined e!ect of thecross-over and mutation operators, we use Eq. (15) inorder to compute the change in the fraction of correctsolutions due to a mutation operation. The change isequal to

*Fmutation"!

Pm(F(n)D<D!1)

D<D. (37)

As expected, there is a decrease in the expected fraction ofcorrect solutions. Immediately following the applicationof the mutation operator, we do not know the exactdistribution of the fraction of correct matches. However,as demonstrated earlier, we know that for a large numberof mutation operations, the distribution is binomial,which in turn can be approximated well by a Gaussianfor large D<D. As a result, the required distribution can beapproximated in a Gaussian manner. The mode of thedistribution is located at the position

Fmutation"F(n)#*Fmutation. (38)

If the mutation probability Pm

is relatively small, as isusually the case, after a single mutation operation, thenwe can assume that the standard deviation of the Gaus-sian, i.e. p(n), remains unchanged.

In order to determine how the peak or mode of thisdistribution is shifted by the selection operator we recallEq. (28). Our interest is now with the change fraction ofcorrect matches that the peak of the distribution under-goes under the combined selection and mutation oper-ators. This quantity is equal to the peak value o!set bythe shift due to mutation, i.e.

*Fselection"Fselectionmax

!*Fmutation. (39)

It is important to note that the distribution used as inputto the selection operator is the result of the sequentialapplication of the cross-over and mutation processes.Computing the distribution shift after selection isstraightforward, but algebraically tedious. For this rea-son we will not reproduce the details here. Given that wenow have a prediction of how we expect the peak of thepopulation distribution to evolve under the processes ofcross-over, mutation and subsequent selection, we are ina position to construct a condition for monotonic con-vergence. Clearly, for the population to converge, thedownward shift (i.e. "tness reduction) due to the muta-tion operator must be smaller than the upward shift(i.e. "tness increase) resulting from selection. In order toinvestigate this balance of operators, we consider thebreak-even point between mutation and selection whichoccurs where

*Fselection"*Fmutation. (40)

Substituting from Eqs. (37) and (39), and solving for Pm,

the break-even condition is satis"ed when

Pm)

D<D ) DCK D(1!2Pe)

(Pe/F(n)#1)(1!2P

e)!F(n)(1!2P

e)!D<DP

e

pF(n)

.

(41)

It is important to note that this condition on the muta-tion probability is very similar to that derived by Qi andPalmieri [18,19]. In fact, the maximum mutation rate isproportional to the ratio of the variance of the fraction ofcorrect mappings in the population to the current ex-pected fraction of matches correct. Moreover, the limit-ing value of the mutation is proportional to the totalnumber of edges in the graphs, i.e. D<D ) DCK D. Finally, wenote that as the fraction of correct matches increases, themutation rate must be reduced in order to ensure conver-gence. It is important to emphasize that we have con"nedour attention to deriving the condition for monotonicconvergence of the expected "tness value. This conditiondoes not guarantee that the search procedure will con-verge to the global optimum. Neither does it make anyattempt to capture the possibility of premature conver-gence to a local optimum.

1874 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

Fig. 10. The maximum mutation rate that may be used toensure monotonic convergence of a hybrid genetic hill-climbingoptimisation scheme.

7.2. Hybrid hill-climbing

Having derived the monotonic convergence conditionfor the combined e!ect of the three standard geneticoperators, we will now turn our attention to the hybridhill-climbing algorithm used in our empirical study ofgraph matching [21]. As before, we compute the changein the fraction of correct matches that we would expect toresult from the additional application of the hill-climbingoperator. Since this step immediately follows mutation,the population shift is given by

*Fhillclimb"1!(1!Fmutation)2.8!Fmutation. (42)

For completeness, our analysis should next focus on thee!ect of the selection operator. However, as we demon-strated in Section 6.4, the rate of convergence for theselection operator is signi"cantly slower than that of thehill-climbing operator. This observation suggests that wecan neglect the e!ects of selection when investigating thehybrid hill-climbing algorithm.

In order to identify the monotonic convergence cri-terion for the hybrid hill-climbing algorithm, we focus onthe interplay between opposing population shifts causedby mutation and hill-climbing. This analysis is entirelyanalogous to the case presented in the previous subsec-tion where we consider the interplay between mutationand selection for the standard genetic algorithm. In thecase of the hybrid hill-climbing algorithm, the break-evenoccurs when

*Fhillclimb*!*Fmutation. (43)

Proceeding as before, the inequality yields the followingrelationship:

1!A1#F(n)D<D(P

m!1)!P

mD<D B

2.8

*

F(n)D<D(Pm!1)!P

mD<D

. (44)

By collecting terms and solving for the mutation prob-ability, we arrive at the following convergence condition:

Pm)

D<D((1!F(n))[email protected]#F(n)!1)

F(n)D<D!1. (45)

To simplify the convergence condition further, we makethe reasonable assumption that the size of the graphs islarge, i.e. D<D<1. Under this assumption, the dependenceon D<D cancels with the result

Pm)

((1!F(n))[email protected]#F(n)!1)

F(n). (46)

This limiting mutation rate is plotted in Fig. 10 as a func-tion of F(n). In practice, we must select the operating valueof P

mto fall within the envelope de"ned by the curve in

Fig. 10. As the fraction of correct matches approachesunity (i.e. the algorithm is close to convergence), then themutation rate should be annealed towards zero. Moreinterestingly, we can use the convergence condition todetermine the largest value of the mutation rate for whichconvergence can be obtained. By taking the limit as thefraction of correct matches approaches zero, we "nd thatlim

F(n)?0

Pm"0.6430. This agrees well with the empirical

"ndings reported in our previous work [21].

7.3. Result validity

Before providing some illustration of the matchingmethod, we return to the question of population size.The aim here is to use a Monte-Carlo study to assess howthe theoretical predictions, made under the central limitassumption for large population size, degrade as thepopulation size becomes relatively small.

To embark on this study, we use the simulationmethod outlined in Section 5.2. The problem studiedinvolves a 20 node graph. In the "rst set of runs, we useonly the three traditional genetic operators, i.e. mutation,cross-over and selection. In Fig. 11, we plot F(n) as a func-tion of iteration number for increasing population sizes.It is clear that beyond a population size of about 50solutions, the convergence curves for the di!erent popu-lation sizes become increasingly similar.

We repeat this investigation, but supplement the stan-dard operators with the hill-climbing operator. The re-sults are shown in Fig. 12. As expected, we "nd thatagreement between curves, even for small populationsizes, is extremely high. We would consequently expectour convergence predictions for the hybrid hill-climbinggenetic algorithm to be valid for all population sizes thatwould be used in practice.

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1875

Fig. 11. The expected value for F(n) under the traditional geneticoperators for various population sizes.

Fig. 12. The expected value for F(n) under the hybrid hill-climbing genetic algorithm for di!erent population sizes.

8. Real images

This section furnishes two examples of the perfor-mance of the hybrid genetic algorithm on matching fea-ture sets. The "rst task was matching corners extractedfrom an o$ce scene, as shown in panels (a) and (b) ofFig. 13. These are low-quality images taken with anIndyCam. There is no calibration information: groundtruth data were obtained by hand. The corners sets wereDelaunay triangulated, as shown in panels (c) and (d). Byconsidering the number of mappings in the ground truthdata which were inconsistent with the triangulations, theamount of relational corruption was estimated at 15%.Each graph contains about 70 nodes. The graphswere matched using a hybrid genetic algorithm with a

population size of 10. The cross-over and mutation rateswere 1.0 and 0.4, respectively. The matching results areshown in panels (e) and (f ). The initial guess is randomand contains no correct mappings. After only "ve iter-ations, the hybrid genetic algorithm has converged ona "nal match containing 99% correct mappings.

The second example is more di$cult. Panels (a) and (b)of Fig. 14 show the left and right images. This time, about70 regions were segmented from each image using asimple thresholding procedure. As can be seen from thegraphs in panels (c) and (d), there is considerable relation-al corruption. The ground truth suggested that there wasas much as 50% relational corruption. The graphs werematched using the same algorithm as before. Again, theinitial guess was poor. After 10 iterations, the algorithmhad converged to a solution with about 67% correctmappings.

9. Conclusions

In this paper our aim has been to understand the rolethat the di!erent genetic operators play in the conver-gence of a genetic algorithm. In addition, our investiga-tion has allowed us to make predictions about thenumber of iterations required for convergence under theprocesses of selection and hill-climbing. In particular, wehave found that the selection process converges in a poly-nomial number of iterations as a function of graph size.The typical number of iterations required for conver-gence were in their thousands. Hill-climbing was foundto converge in very nearly linear time complexity withlogarithmic behaviour as the graph sizes became verysmall. The typical number of iterations required for con-vergence was less than 10.

The mutation operator was found to produce an ex-ponential drift of the population distribution towardsincorrect mappings. The drift rate was found to dependon both the mutation rate and the current fraction ofcorrect correspondences. In other words, there is greaterdisruption when the population is dominated by a singleconsistent solution. In the case when the populationcontains a large number of dissimilar yet poor solutions,there is less disruptive drift. In contrast with the otheroperators, the role of the cross-over operator is to ex-change information via recombination. The net e!ect isto blur the distribution of the fraction of correct solutionsin a Gaussian manner. In other words, the mean fractionof correct solutions remains stationary, while the corre-sponding variance increases.

Based on this operator-by-operator analysis, we haveanalysed the conditions that are required in order toguarantee monotonic convergence of the most likelyitem in the population. In particular, we have obtainedtwo interesting convergence conditions. First, we derivedthe convergence condition for the standard genetic

1876 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

Fig. 13. Uncalibrated Stereogram 1. The camera positions are not known. (a) Left image; (b) right image; (c) left feature graph; (d) rightfeature graph; (e) initial guess (0%); (f) "nal match (99%).

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1877

Fig. 14. Uncalibrated Stereogram 2. The camera positions are not known. There is considerable relational corruption: the twoDelaunay triangulations are very di!erent. (a) Left image; (b) right image; (c) Left feature graph; (d) right feature graph; (e) initial guess(0.05%); (f) "nal match (67%).

1878 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880

algorithm, composed of cross-over, mutation and selec-tion. This condition was found to have a similar structureto that reported by Qi and Palmieri [18,19]. However,the condition for convergence makes the role of graphstructure explicit in that the limiting value of the muta-tion probability is proportional to the total number ofedges in the graphs being matched. The second resultapplies to the convergence of a hill-climbing geneticalgorithm. The striking feature of the hybrid algorithm isthat the limiting value of mutation probability is inde-pendent of the structure of the graphs being matched.This result accords well with our previous empirical"ndings.

In a "nal section we have investigated the validity ofour results when generalized to problems with "nitepopulation sizes. In particular, we have discussed theerrors introduced by the simplifying approximations thathave been made in order to derive our convergenceconditions. One approximation that we have madethroughout is that the population distribution can beassumed to be Gaussian. By the central-limit theorem, asthe size of the graphs increase, then so the approximationbecomes increasingly accurate. While deriving the con-vergence results for the selection operator we were re-quired to make an approximation of our functionalwhich simpli"es the dictionary model of consistency.Speci"cally, we assume that dominant contribution isfrom a single noise corrupted dictionary item. Of all theassumptions that have been made, this is the least satis-factory. When the fraction of correct matches is low, thisapproximation tends to underestimate the true value ofthe "tness functional. However, at higher values it yieldsaccurate results. In other words, the approximation exag-gerates the true discriminating power of our consistencymeasure. Consequently, we would expect to "nd that ourtheoretical convergence rates are underestimates. The"nal assumption has been to approximate the behaviourof the hill-climbing step using an empirical performancecurve. In practice, the error resulting from the "ttedpolynomial approximation was never found to exceed5%.

References

[1] S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimisation bysimulated annealing, Science 220 (1983) 671}680.

[2] V. Kumar, Algorithms for constraint-satisfaction prob-lems: a survey, AI Magazine 13 (1992) 32}44.

[3] A.K. Mackworth, E.C. Freuder, The complexity of somepolynomial network consistency algorithms for constraintsatisfaction problems, Artif. Intell. 25 (1985) 65}74.

[4] A.K. Mackworth, Consistency in a network of relations,Aritif. Intell. 8 (1977) 99}118.

[5] A. Sanfeliu, K.S. Fu, A distance measure between at-tributed relational graphs for pattern recognition, IEEETrans. Systems Man. Cybernet. 13 (1983) 353}362.

[6] D. Waltz, Understanding line drawings of scenes withshadows, in: P.H. Winton (Ed.), The Psychology of Com-puter Vision, McGraw-Hill, New York, 1975.

[7] L.G. Shapiro, R.M. Haralick, A metric for comparingrelational descriptions, IEEE Trans. Pattern MachineIntell. 7 (1985) 90}94.

[8] J. Pearl, Heuristics: Intelligent Search Strategies for Com-puter Problem Solving, Addison-Wesley, Reading, MA,1984.

[9] A.M. Yuille, J. Coughlan, Twenty questions, focus of atten-tion and AH: a theoretical comparison of optimisationstrategies, in: M. Pelillo, E.R. Hancock (Eds.), EnergyMinimisation Methods in Computer Vision and PatternRecognition, Lecture Notes in Computer Science, vol.1223, Springer, Berlin, 1997, pp. 197}212.

[10] S. Geman, D. Geman, Stochastic relaxation, Gibbs distri-butions and Bayesian restoration of images, IEEE Trans.Pattern Machine Intell. PAMI-6 (1984) 721}741.

[11] B. Gidas, A re-normalisation-group approach to imageprocessing problems, IEEE Trans. Pattern Machine Intell.11 (1989) 164}180.

[12] A. Yuille, Generalised deformable models, statistical phys-ics and matching problems, Neural Comput. 2 (1990) 1}24.

[13] F. Glover, Ejection chains, reference structures and alter-nating path methods for traveling salesman problems,Discrete Appl. Math. 65 (1996) 223}253.

[14] E. Rolland, H. Pirkul, F. Glover, Tabu search for graphpartitioning, Ann. Oper. Res. 63 (1996) 232}290.

[15] F. Glover, Genetic algorithms and tabu search* hybridsfor optimisation, Discrete Appl. Math. 49 (1995) 111}134.

[16] F. Glover, Tabu search for nonlinear and parametric opti-misation (with links to genetic algorithms), Discrete Appl.Math. 49 (1995) 231}255.

[17] D.B. Fogel, An introduction to simulated evolutionaryoptimisation, IEEE Trans. Neural Networks 5 (1994) 3}14.

[18] X.F. Qi, F. Palmieri, Theoretical analysis of evolutionaryalgorithms with an in"nite population in continuousspace: basic properties of selection and mutation, IEEETrans. Neural Networks 5 (1994) 102}119.

[19] X.F. Qi, F. Palmieri, Theoretical analysis of evolutionaryalgorithms with an in"nite population in continuousspace: analysis of the diversi"cation role of cross-over,IEEE Trans. Neural Networks 5 (1994) 120}129.

[20] R. Myers, E.R. Hancock, Genetic algorithm parameter setsfor line labelling, Pattern Recognition Lett. 18 (1998)1283}1292.

[21] A.D.J. Cross, R.C. Wilson, E.R. Hancock, Inexact graphmatching using genetic search, Pattern Recognition 30(1997) 953}970.

[22] E.R. Hancock, J. Kittler, A Bayesian interpretation for theHop"eld network, IEEE International Conference onNeural Networks, 1993, pp. 341}346.

[23] E.R. Hancock, J. Kittler, Discrete relaxation, Pattern Rec-ognition 23 (1990) 711}733.

[24] E.R. Hancock, M. Pelillo, A Bayesian interpretation forthe exponential correlation associative memory, PatternRecognition Lett. 19 (1998) 139}149.

[25] R.C. Wilson, E.R. Hancock, Gauging consistency andcontrolling structural errors, IEEE Computer SocietyComputer Vision and Pattern Recognition Conference,1996.

A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880 1879

[26] R.C. Wilson, E.R. Hancock, Structural matching by dis-crete relaxation, IEEE Trans. Pattern Machine Intell. 19(1997) 634}648.

[27] R.C. Wilson, E.R. Hancock, Structural matching with ac-tive triangulations, Computer Vision Image Understand-ing 72 (1998) 21}38.

[28] R.C. Wilson, E.R. Hancock, Relational matching withdynamic graph structures, Proceedings of the Fifth Inter-national Conference on Computer Vision, 1995, pp.450}456.

[29] H.G. Barrow, R.J. Popplestone, Relational descriptions inpicture processing, Machine Intell. 6 (1971).

[30] B.T. Messmer, H. Bunke, E$cient error-tolerant subgraphisomomrphism detection, In: D. Dori, A. Bruckstein(Eds.), Shape, Structure and Pattern Recognition, 1995,pp. 231}240.

[31] A.K.C. Wong, M. You, Entropy and distance ofrandom graphs with application to structural patternrecognition, IEEE Trans. Pattern Machine Intell. 7 (1985)599}609.

[32] D. Goldberg, Genetic Algorithms in search, Optimisationand Learning, Addison-Wesley, Reading, MA, 1989.

About the Author*ANDREW CROSS gained his B.Sc. in Computational Physics with "rst-class honours from the University ofManchester Institute of Science and Technology in 1994. Between 1994 and 1998 he undertook research in the area of optimisationmethods for computer vision at the University of York. He was awarded the D.Phil. degree for this work in July 1998. Followinga period of postdoctoral research at York, Dr. Cross took up an appointment with NewTek in San Antonio, Texas. His interests are incomputer vision, graphics and image processing.

About the Author*RICHARD MYERS took his B.A. in Natural Sciences from the University of Cambridge in 1989. In 1995 he gaineda M.Sc. with distinction in Information Processing at the University of York. He has recently completed a D.Phil. in the ComputerVision Group at the Department of Computer Science at the University of York. The main topic of his research is the use of geneticalgorithms to solve consistent labelling problems arising in the machine vision domain. In 1997 he spent two months working at NECCorporation in Kawasaki, Japan sponsored a REES/JISTEC fellowship. His interests include evolutionary computation, perceptualorganisation and labelling problems.

About the Author*EDWIN HANCOCK gained his B.Sc. in physics in 1977 and Ph.D. in high energy nuclear physics in 1981, both fromthe University of Durham, UK. After a period of postdoctoral research working on charm-photo-production experiments at theStanford Linear Accelerator Centre, he moved into the "elds of computer vision and pattern recognition in 1985. Between 1981 and1991, he held posts at the Rutherford-Appleton Laboratory, the Open University and the University of Surrey. He is currently Professorof Computer Vision in the Department of Computer Science at the University of York where he leads a group of some 15 researchers inthe areas of computer vision and pattern recognition. Professor Hancock has published about 180 refereed papers in the "elds of highenergy nuclear physics, computer vision, image processing and pattern recognition. He was awarded the 1990 Pattern RecognitionSociety Medal and received an outstanding paper award in 1997. Professor Hancock serves as an Associate Editor of the journal PatternRecognition and has been a guest editor for the Image and Vision Computing Journal. He is currently guest-editing a special edition ofthe Pattern Recognition journal devoted to energy minimisation methods in computer vision and pattern recognition. He chaired the1994 British Machine Vision Conference and has been a programme committee member for several national and internationalconferences.

1880 A.D.J. Cross et al. / Pattern Recognition 33 (2000) 1863}1880