heuristic algorithms for combinatorial optimization · exchange algorithms in combinatorial...

Heuristic Algorithmsfor Combinatorial Optimization

Corso di dottorato in Informatica

Roberto Cordone

DI - Universita degli Studi of Milano

E-mail: [email protected]

Web page: https://homes.di.unimi.it/cordone/courses/2017-ae/2017-ae.html

Lesson 4: Local search heuristics Milano, A.A. 2017/18

1 / 69

https://homes.di.unimi.it/cordone/courses/2017-ae/2017-ae.html

Exchange algorithms

In Combinatorial Optimization each solution x is a subset of B

An exchange heuristic updates step by step a subset x (t)

1 start from a feasible solution x (0) ∈ X found somehow(often by a constructive heuristic)

2 exchange subsets A external to solution x (t) and internal subsets D(add/drop) so as to generate a family of feasible solutions

x ′A,D = x ∪ A \ D with A ⊆ B \ x and D ⊆ x

3 at each step t, select which subsets to exchage based on a suitableselection criterium ϕ (x ,A,D)

(A∗,D∗) = arg min(A,D)

ϕ (x ,A,D)

4 generate the new current solution performing the chosenmodifications

x (t+1) := x (t) ∪ A∗ \ D∗

5 if a termination condition holds, terminate; otherwise, go back topoint 2

2 / 69

Neighbourhood

An exchange heuristic is defined by:

• the pairs of subsets (A,D) available for an exchange,i.e. the solutions generated by a single exchange starting from x

• the selection criterium ϕ (x ,A,D)

Neighbourhood N : X → 2X is a function which associates to eachfeasible solution x ∈ X a subset of feasible solutions N (x) ⊆ X

The situation can be formally described with a search graph in which

• the nodes represent the feasible solutions x ∈ X

• the arcs connect each solution x to those of its neighbourhood N (x)

Given a search graph

• a run of the algorithm corresponds to a path

• an arc is also denoted as move,because it transforms a solution in another one moving elements

How does one define a neighbourhood and select a move?

3 / 69

Neighbourhoods based on distance

Every solution x ∈ X can be represented by its incidence vector

xi =

{1 if i ∈ x

0 if i ∈ B \ x

Hamming distance between two solutions x and x ′

is the number of elements in which their incidence vectors differ

dH (x , x ′) =∑i∈B

|xi − x ′i |

Referring to the subsets, dH (x , x ′) = |x \ x ′|+ |x ′ \ x |

A typical definition of neighbourhood of x , with an integer parameter k,is the set of all solutions with a Hamming distance from x not larger thank

NHk(x) = {x ′ ∈ X : dH (x , x ′) ≤ k}

4 / 69

Example: the KP

The instance of KP with B = {1, 2, 3, 4}, w = [ 5 4 3 2 ] and W = 10,has the following solutions

where the subsets {1, 2, 3, 4}, {1, 2, 3} and {1, 2, 4} are unfeasible.

Solution x = {1, 3, 4} (in blue) has a neighbourhood NH2 (x) of 7elements (in pink)

The subsets in black do not belong to the neighbourhood, because theirHamming distance from x is > 2

5 / 69

Neighbourhoods based on operations

Another common definition of neighbourhood is obtained defining

• a family O of operations on the solutions of the problem

• the set of the solutions generated applying to x the operations of ONO (x) = {x ′ ∈ X : ∃o ∈ O : o (x) = x ′}

Considering again the example of KP, O can be defined as

• adding to x an element of B \ x• removing from x an element

• exchange one element of x with one of B \ x

The resulting neighbourhood NO is related to those defined by theHamming distance, but does not coincide with any of them

N1 ⊂ NO ⊂ N2

These neighbourhoods can be parametrized performing sequences of koperations of O instead of a single one, exactly as in the distance-basedneighbourhoods

NOk(x) = {x ′ ∈ X : ∃o1, . . . , ok ∈ O : ok (ok−1 (. . . o1 (x))) = x ′}

6 / 69

Distance and operation-based neighbourhoods: differences

In general, an operation-based neighbourhood includes solutions withdifferent Hamming distances

For the TSP one can define a neighbourhood NS1 including the solutionsobtained exchanging two nodes in their visit order

The neighbourhood of solution x = (3, 1, 4, 5, 2) is:

NS1(x) = {(1, 3, 4, 5, 2) , (4, 1, 3, 5, 2) , (5, 1, 4, 3, 2) , (2, 1, 4, 5, 3) , (3, 4, 1, 5, 2) ,

(3, 5, 4, 1, 2) , (3, 2, 4, 5, 1) , (3, 1, 5, 4, 2) , (3, 1, 2, 5, 4) , (3, 1, 4, 2, 5)}

If the two nodes are adjacent, the modified arcs are 3 + 3; otherwise, they are 4 + 4

7 / 69

Distance and operation-based neighbourhoods: relations

Sometimes the neighbourhoods obtained with the two definitions coincide

• for the MDP• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NS1 (exchange one element of x with one of B \ x)

• for the BPP• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NT1 (transfers of a object to another container)

• for SAT• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NF1 (“flips”, replacing the truth assignment of

variable with the opposite one)

This is typical of problems with solutions of fixed cardinality:

• perform a sequence of k exchanges between single elements(|A| = |D| = 1): k elements go into x and k elements out of it

• the Hamming distance between the two extreme solutions is ≤ 2k(if all exchanged elements are different, it is exactly 2k)

8 / 69

Different neighbourhoods for the same problem: the CMST

A problem can admit different neighbourhoods based on differentoperations

In the CMST it is possible to• exchange edges: delete (i , j), add (i , n)

• exchange vertices: n moves from subtree 2 to subtree 1 (computingthe edges to keep each subtree connected at minimum cost):

9 / 69

Different neighbourhoods for the same problem: the PMSP

For the PMSP it is possible to

• transfer neighbourhood NT1 , based on the set T1 of all transfers of atask on another machine

• exchange neighbourhood NS1 , based on the set S1 of the exchangesof two tasks between two machines (one for each machine)

10 / 69

Connectivity of the solution space

An exchange heuristic can find optimal solution only ifat least one optimal solution can be reached from each starting solution

The search graph is denoted as weakly connected to the optimum whenit includes for each solution x ∈ X a path from x to X ∗

As X ∗ is unknown, a stronger condition is often used:the search graph is denoted as strongly connected whenit includes for each pair of solutions x , y ∈ X a path from x to y

An exchange heuristic should guarantee one of these conditions

This is not always possible

• in the MDP, neighbourhood NS1 allows to connect any pair ofsolutions in at most k steps

• in the KP and the SCP, no neighbourhood NSk guarantees that

Feasible solutions can have any cardinality

• the search graph becomes connected also in the KP and the SCPif, besides exchanges, also removals and additions are admitted

11 / 69

Connectivity of the solution space

If feasibility is defined in a sophisticated way, exchanging, adding anddeleting single elements can be insufficient to reach all solution:the unfeasible subsets can break all paths between some feasible solutions

Given W = 8, there are only three feasible solutions, all with twosubtrees of weight 4:

• x = {(r , a) , (a, b) , (b, e) , (r , d) , (c , d) , (d , g) , (f , g)}• x = {(r , a) , (a, e) , (e, f ) , (f , g) , (r , d) , (c , d) , (b, c)}• x = {(r , a) , (a, b) , (b, c) , (r , d) , (d , g) , (f , g) , (e, f )}

The three solutions are mutually reachable only exchanging two edges ata time; exchanging only one yields unfeasible subsets

12 / 69

Steepest descent (hill-climbing) heuristics

Quite often the selection criterium ϕ (x ,A,D) of the new solution inN (x) is the objective function, that is each step of the heuristicmoves from the current solution to the best solution in its neighbourhood

To avoid cyclic behaviour, only strictly improving solutions are accepted

Algorithm SteepestDescent(I , x (0)

)x := x (0);

Stop := false;

While Stop = false do

x := arg minx′∈N(x)

f (x ′);

If f (x) ≥ f (x) then Stop := true; else x := x ;

EndWhile;

Return (x , f (x));

13 / 69

Local and global optimality

A steepest descent heuristic terminates when it finds alocally optimal solution, that is a solution x ∈ X such that

f (x) ≤ f (x) for each x ∈ X

A globally optimal solution is always also locally optimal, but theopposite is not always true: X ∗ ⊆ XN ⊆ X

14 / 69

Exact neighbourhood

Exact neighbourhood is a neighbourhood function N : X → 2X such thateach local optimum is also a global optimum

XN = X ∗

Trivial case: the neighbourhood of each solution coincides with the wholeset of feasible solutions (N (x) = X for each x ∈ X )

It is a useless neighbourhood: too wide to explore

The exact neighbourhoods are extremely rare

• exchange between basic and nonbasic variables used by the simplexalgorithm for Linear Programming

• exchange between edges for the Minimum Spanning Tree problem

In general, the steepest descent heuristic does not find a global optimum

Its effectiveness depends on the properties of the search graph and of theobjective

15 / 69

Proprieties of the search graph

Some relevant properties are• the size of the search space |X |• the connectivity of the search graph (as above discussed)• the diameter of the search graph, that is the number of arcs of the

minimum path between the two farthest solutions: a richerneighbourhood produces graphs of smaller diameter

(but other factors exist: see the “smallworld” effect)

For example, for the symmetric TSP on a complete graph with theneighbourhood NS1 :• the search space includes |X | = (n − 1)! solutions

• the neighbourhood NS1 (two node exchange) includes(n2

)= n(n−1)

2solutions

• the search graph is strongly connected and has diameter ≤ n − 2:each solution transforms into each other with at most n − 2exchanges

For example, x = (1, 5, 4, 2, 3) becomes x ′ = (1, 2, 3, 4, 5) in 3 steps

x = (1, 5, 4, 2, 3)→ (1, 2, 4, 5, 3)→ (1, 2, 3, 5, 4)→ (1, 2, 3, 4, 5) = x ′

(the first node is always 1, the last one is automatically managed)16 / 69

Properties of the search graph

Other relevant properties

• the density of the global optima ( |X∗||X | ) and the local optima ( |XN |

|X | ):

if the local optima are numerous, it is hard to find the global ones

• the distribution of the quality δ (x) of local optima (SQD diagram):if local optima are good, it is less important to find a global one

• the distribution of the locally optimal solutions in the search space:if local optima are close to each other, it is not necessary to explorethe whole space

These indices would require an exhaustive exploration of the search graph

In pratice, one performs a sampling: these analyses

• require very long times

• can be misleading, especially if the global optima are unknown

17 / 69

Example: the TSP

Typical results for the TSP on a complete symmetric graph withEuclidean costs

• the Hamming distance between two local optima is on average � n:the local optima gather in a small region of X

• the Hamming distance between local optima on average exceedsthat between local and global optima:the global optima tend to gather in the middle of local optima

• the FDC diagram (Fitness-Distance Correlation) relates the qualityδ (x) with the distance from global optima dH (x ,X ∗): if there is acorrelation, the best local optimal are closer to the global optima

18 / 69

Fitness-Distance Correlation

If the correlation between quality and closeness to the global optima isstrong• building good starting solutions is profitable,

because they initialize the local search near a good local optimum• it is better to intensify than to diversify

On the contrary, if the correlation is weak• a good inizialization is less important• it is better to diversify than to intensify

This happens for example for the Quadratic Assignment Problem (QAP)

19 / 69

Landscape

Let landscape be the triplet (X ,N, f ), where

• X is the search space, or set of feasible solutions

• N : X → 2X is the neighbourhood function

• f : X → N is the objective function

It can be seen as the search graph with node weights given by theobjective

The effectiveness of the exchange heuristics strongly depend on thelandscape

Rugged landscapes are associated to several optimal local,hence to less effective heuristics

20 / 69

Different kinds of landscape

There is a great variety of landscapes, very different from one another

21 / 69

Autocorrelation coefficient

The complexity of a landscape can be empirically estimated1 performing a random walk in the search graph2 determining the sequence of values of the objective f (1), . . . , f (tmax)

3 computing the sample mean f =tmax∑t=1

f (t)

4 computing the empirical autocorrelation coefficient

r (i) =

tmax−i∑t=1

(f (t)−f )(f (t+i)−f )

tmax−itmax∑t=1

(f (t)−f )2

tmax

This is a function of i starting from r (0) = 1; in general it decreases• if it keeps r (i) ≈ 1, the landscape is smooth:

• the neighbour solutions have values close to the current one• there are few local optima• the steepest descent heuristic is effective

• if it varies steeply, the landscape is rugged:• the neighbour solutions have values far from the current one• there are many local optima• the steepest descent heuristic is ineffective

22 / 69

Plateau

The search graph can be analyzed dividing it by objective levels• plateau of value f is each subset of solutions of value f that are

adjacent in the search graph

Very large plateaus complicate the choice of the solution in the steepestdescent heuristic, because they make it depend on the visit order of N (x)

An extremely uniform landscape is not an advantage!

Example: all transfers and exchanges between machines 1 and 3 leave theobjective value unchanged (the other moves worsen it)

23 / 69

Attraction basins

An alternative partition of the search graph is based on the concept of:

• attraction basin of a locally optimal solution x , that is the set of thesolutions x (0) ∈ X such that the steepest descent heuristic startingfrom x (0) returns x

The steepest descent heuristic is

• effective if the attraction basins are few and large(especially if the global optima have larger basins)

• ineffective if the attraction basins are many and small(especially if the global optima have smaller basins)

24 / 69

Complexity

Algorithm SteepestDescent(I , x (0)

)x := x (0);

Stop := false;


x := arg minx′∈N(x)

f (x);


EndWhile;

Return (x , f (x));

The complexity of the steepest descent heuristic depends on

1 the number of steps, which depends on the structure of the searchgraph (width of the attraction basins), hard to predict a priori

2 the search of the best solution in the neighbourhood, which dependson how the search itself is performed

25 / 69

The exploration of the neighbourhood

Two strategies are possible

1 exhaustive search: evaluate all the neighbour solutions;the complexity of a single step is the product of• the number of neighbour solutions (|N (x)|)• the evaluation of the cost of each solution (γN (|B|, x))

Sometimes it is not easy to restrict the visit only to the neighboursolutions:• visit a superset of the neighbourhood• for each element, evaluate its feasibility• for the feasible ones, evaluate the cost

2 efficient exploration of the neighbourhood: instead of visiting thewhole neighbourhood,finds its optimal solution solving an auxiliary problem

Only some special neighbourhoods allow that

26 / 69

Exhaustive visit of the neighbourhood

Algorithm SteepestDescent(I , x(0)

)x := x(0);

Stop := false;


x := x ; { x := arg minx′∈N(x)

f (x ′) }

For each x ′ ∈ N (x) do

If f (x ′) < f (x) then x := x ′;

EndFor;


EndWhile;

Return (x , f (x));

The complexity is the product of three terms

1 the number of iterations tmax performed to reach the local optimum

2 the number of solutions∣∣N (x (t), |B|

)∣∣ visited ad each iteration

3 the time to evaluate the objective γN(x (t), |B|

)In general, the three terms are overestimated with expressions dependingon |B|, but not on x (t)

27 / 69

Objective evaluation time: the additive case

The first device to accelerate an exchange algorithm is to minimize thetime to evaluate the objective: in general,updating the value of f (x) costs much less than recomputing it

For example, if f (x) is additive and the number of added (|A|) anddeleted (|D|) elements are small, the evaluation takes constant time:

• sum φj for each element j ∈ A, added to x

• subtract φi for each element i ∈ D, deleted from x

Moving from solution x to x ′ = x \ {i} ∪ {j}, the objective changes by

δf (x , i , j) = f (x \ {i} ∪ {j})− f (x) = φj − φi

Notice that δf (x , i , j) does not depend on x : we will talk about it later

This holds for the simple exchange of objects in the KP and edges in theCMSTP

28 / 69

Example: the symmetric TSP

The neighbourhood NR2 for the TSP

• deletes two nonconsecutive arcs (si , si+1) and (sj , sj+1)

• adds the two arcs (si , sj) and (si+1, sj+1)

• reverts the path (si+1, . . . , sj) (modifying O (n) arcs!)

If the graph and the cost function are symmetric, the variation of f (x) is

δf (x , i , j) = csi ,sj + csi+1,sj+1 − csi ,si+1 − csj ,sj+1

In many other cases, however, the function is not additive

29 / 69

Quadratic objective functions

The MDP has a quadratic objective function: computing it costs Θ(n2)

Moving from x to x ′ = x \ {i} ∪ {j} (neighbourhood NS1 ), the objectivechanges by

δf (x , i , j) = f (x \ {i} ∪ {j})− f (x) =∑

h,k∈x\{i}∪{j}

dhk −∑h,k∈x

dhk

which depends on O (n) distance terms, related to points i and j

There is a general trick for the simmetric quadratic functions

δf (x , i , j) =∑h∈x

∑k∈x

dhk −∑

h∈x\{i}∪{j}

∑k∈x\{i}∪{j}

dhk ⇒

⇒ δf (x , i , j) = 2∑k∈x

djk − 2∑k∈x

dik − 2dij = 2(Dj (x)− Di (x)− dij

)If D` (x) =

∑k∈x

d`k is known for each ` ∈ B, the computation takes O (1)

30 / 69

Example: the MDP

Let us consider f (x) /2Evaluate the exchange

x → x ′ = x \ {i} ∪ {j}

with i ∈ x and j ∈ B \ x

f (x ′) = f (x)− Di + Dj − dij

• the pairs including i are lost

• the pairs including j are acquired

• but the pair (i , j) is in excess

i j

x E \ x

31 / 69

Example: the MDP


x → x ′ = x \ {i} ∪ {j}


f (x ′) = f (x)− Di + Dj − dij




i j

x E \ x

−Di

31 / 69

Example: the MDP


x → x ′ = x \ {i} ∪ {j}


f (x ′) = f (x)− Di + Dj − dij




i j

x E \ x

+Dj

31 / 69

Example: the MDP


x → x ′ = x \ {i} ∪ {j}


f (x ′) = f (x)− Di + Dj − dij




i j

x E \ x

−dij

31 / 69

Example: the MDP

Update of the data structures:

• D` = D` − dì + d`j , ` ∈ E

Each element ` sees

• dì disappear

• d`j appear

i j

x E \ x

`

+d`j−dì

32 / 69

Use of auxiliary information

Many nonlinear functions can be updated in a similar way

• save aggregated information on the current solution x (t)

• use them to compute f (x ′) efficiently for each x ′ ∈ N(x (t))

• update them when moving to the following solution x (t+1)

In the PMSP with the transfer (NT1 ) and exchange (NS1 )neighbourhoods, the objective can be evaluated in constant time savingand updating

• the completion time for each machine

• the indices of the machines with the maximum and secondmaximum time

33 / 69

Example: the PMSP

Consider the exchange o = (i , j) of tasks i and j(i on machine Mi , j on machine Mj)

• compute in constant time the new completion times:one increases, the other decreases (or both remain constant)

• test in constant time whether either exceeds the maximum

• if the maximum time decreases, test in constant time whether theother time or the second maximum time becomes the maximum

Once the neighbourhood is visited and the exchange selected, update

• the two modified completion times (each one in constant time)

• their positions in a max-heap (each one in time O (log |M|))

34 / 69

Use of auxiliary information

The information can refer to

• the current solution x

• the previous solution visited in the neighbourhood, according to asuitable order

Consider the neighbourhood NR2 for the asymmetric TSP:

• the neighbour solutions differ from x for O (n) arcs

• the neighbour solutions differ between each other for O (n) arcs

• if the pairs of arcs (si , si+1) and (sj , sj+1) follow the lexicographicorder, the reverted path changes only by one arc

35 / 69

Example: the asymmetric TSP

In general, the variation of f (x) is

δf (x , i , j) = csi ,sj + csi+1,sj+1 − csi ,si+1 − csj ,sj+1 + csj ...si+1 − csi+1...sj

Moving from exchange (si , sj) to exchange (si , sj+1)• the first four terms change, but they are inputs of the problem• the last two terms can be updated in constant time{

csj′ ...si+1 = csj ...si+1 + csj+1,sj

csi+1...sj′ = csi+1...sj + csj ,sj+1

What is the effect of exploring the neighbourhood in a predefined order?36 / 69

And feasibility?

Some operations of the neighbourhood can generate unfeasible subsets

NO (x) = {x ′ ⊆ B : ∃o ∈ O : o (x) = x ′} ⊇ NO (x) = NO (x) ∩ X

(for example, this holds for the KP, BPP, SCP, CMSTP. . . )

In that case, for each element of NO (x) it takes to

• test feasibility

• for the feasible elements evaluate the cost

To test feasibility, the same techniques used for the objective can be used

37 / 69

Example: the CMSTP

Consider neighbourhood NS1 which adds one edge and deletes another

• if the two edges are in the same subtree, the solution remains feasible

• if they are in different subtrees, one loses weight, the other acquiresit: the variation is equal to the weight of the transferred subtree

Saving for each vertex the weight of the appended subtree,it is enough to compare that weight with the residual capacity of thesubtree that receives it

Once performed the best exchange, this information must be updated intime O (n) visiting the tree (each vertex has an appended subtree)

38 / 69

A general scheme of sophisticated exploration

The use of auxiliary information implies1 the inizialization of suitable

• local data structures, related to the exploration of a singleneighbourhood

• global data structures, related to the whole search process

2 their update from a solution or an iteration to the following one

Algorithm SteepestDescent(I , x(0)

)x := x(0); GD := InitializeGD(); Stop := false;


x := 0; δ := 0; LD := InitializeLD()

For each x ′ ∈ N (x) do

If δ (x ′) < δ (x) then x := x ′;

LD := UpdateLD(LD, x ′)

EndFor;

If f (x) ≥ f (x)

then Stop := true;

else x := x ; GD := UpdateGD(GD, x)

EndIf

EndWhile;

Return (x , f (x)); 39 / 69

Partial saving of the neighbourhood

Often, performing operation o ∈ O on a solution x ∈ X , the variationδf (x , o) does not depend on x or depends only on a part of x

In particular, it happens that

δf (x ′, o) = δf (x , o) for each x ′ = o (x) and for each o ∈ O

where O ⊂ O is a large subset of O

In that case, it is advantageous to

1 compute δf (x , o) for each or ∈ O and save the values

2 perform the best operation o∗, generating the new solution x ′

3 compute δf (x ′, o) only for o ∈ O \ O and retrieve the values savedfor o ∈ O, which are still valid

4 go back to point 2

40 / 69

Example: the CMST

Consider the neighbourhood NS1for the CMST:

• addition of an edge j ∈ E \ x• deletion of a edge i ∈ x

Two branches are involved: one acquires a subtree, the other loses it

Once the exchange is performed, the effect of exchanges involving other branchesremains unchanged

δf (x , i , j) = δf(x ′, i , j

)Therefore one can

• keep the set of feasible exchanges, with the associated values δf

• perform the best exchange (i , j) and remove from the set the exchangesinvolving one of the two branches involved in the best one

• recompute the exchanges that involve any such branch and add them to the set

If the branches are numerous, the saving is very strong 41 / 69

Efficiency-effectiveness trade-off

The complexity depends on three factors

1 number of local search steps

2 cardinality of the visited neighbourhood

3 cost computation of the single solution

Assuming an identical effectiveness, the first two factors are in clearconflict:

• a small neighbourhood is fast to explore, but requires several stepsto reach a local optimum

• a large neighbourhood requires few steps, but is slow to explore

The optimal trade-off is somewhere in the middle: the neighbourhoodshould be

• large enough to include good solutions

• small enough to be explored quickly

A priori it is not easy to identify the best trade-off, because

• the effectiveness is not identical: large neighbourhoods have betterlocal optima

• the efficiency quickly decreases as the neighbourhood increases42 / 69

Fine tuning of the neighbourhoods

It is also possible to define a neighbourhood N and tune its size

• explore only a promising subneighbourhood N ′ ⊂ NFor example, if the objective function is additive, one can

• add only elements j ∈ B \ x of low cost φj

• delete only elements i ∈ x of high cost φi

• terminate the visit after finding a promising solutionFor example, the first-best strategy stops the exploration at the first solutionbetter than the current one

If f (x) < f (x) then Stop := true;

The effectiveness depends on the objective

• if the cost of some elements influences very much the objective,it is worth taking it into account, fixing of forbidding them

and on the structure of the neighbourhood

• if the landscape is smooth, the first improving solution approximateswell the best solution of the neighbourhood: it is better to stop

• if the landscape is rugged, the best solution of the neighbourhoodcould be much better: it is better to go on

43 / 69

Very Large Scale Neighbourhood Search

The steepest descent heuristic is

• very effective if the attraction basins are few and large

• ineffective if the attraction basins are many and small

Enlarging the neighbourhood, in general yields larger attraction basins,but also longer exploration times

The Very Large Scale Neighbourhood (VLSN) Search approaches have

• neighbourhoods exponential in |B|• explored in polynomial time

Two strategies avoid that the computational time become exponential

1 explore the neighbourhood heuristically, returning a candidatesolution instead of the best of the whole neighbourhood

2 select a neighbourhood in which the objective could be optimized inpolynomial time even if the number of solutions is exponential

44 / 69

Variable Depth Search (VDS)

The neighbourhoods based on operations are easily parametrizable

NOk(x) = {x ′ ∈ X : x ′ = ok (ok−1 (. . . o1 (x))) with o1, . . . , ok ∈ O}

where k is the number of operations performed to generate the solutions

The idea is to

• define a composite move as sequence of elementary moves

• build the sequence optimizing each elementary step

• accept the final solution only if it improves the starting ones

Parameter k assumes high values only if profitable: it adapts at each step

For the neighbourhoods based on distance, similar considerations hold45 / 69

Scheme of the Variable Depth Search

Given x (t), for each x ′ ∈ N(x (t)), instead of evaluting only f (x ′)

1 select the best solution x ′′ in a neighbourhood N (x ′) ⊆ N (x ′)

2 if x ′′ improves the starting solution x , perform the moveand go back to point 1; otherwise terminate

3 return the best solution found during the whole process

For each x ′ ∈ N (x)

{ Steepest descent } { Variable Depth Search }Compute f (x ′) y∗ := x ′; Stop := false;


y := arg miny′∈N(y)

f (y ′);

If f (y) ≥ f(x(t))then Stop := true; else y := y ;

If f (y) < f (y∗) then y∗ := y ;

EndWhile;

Compute f (y∗);

It is a sort of roll-out for exchange algorithms

46 / 69

Differences with respect to steepest descent

In what does it differ with respect to steepest descent exploration?

• it finds a local optimum for each solution of the neighbourhoodcome if it performed a sort of one-step look-ahead

• it admits worsenings along the sequence of elementary moves(never with respect to the starting solution)

• it requires devices to avoid that the elementary moves of thesequence cancel each other creating a cycle

• often, for efficiency, it uses• a reduced neighbourhood N ⊂ N to scan the elementary moves• the first-best strategy in the exploration of N (elementary step)• the first-best strategy in the exploration of N (starting

neighbourhood)

47 / 69

Lin-Kernighan’s algorithm for the TSP

The neighbourhood NRk(x) for the symmetric TSP includes the

solutions obtained

• deleting k arcs of x

• adding other k arcs that rebuild a hamiltoniano circuit

• possibly inverting parts of the circuit (leaving the cost unchanged)

Lin-Kernighan’s algorithm applies the VDS by sequences of 2-optexchanges: each k-opt exchange is equivalent to a sequence of (k − 1)2-opt exchanges, each of which deleted one of the two arcs added by theprevious one

Then for each solution x ′ ∈ NR1 (x) obtained by exchange (i , j)

• evaluate the 2-opt exchanges that delete the added arc (si , sj+1) andeach arc of x ∩ x ′ to find the best exchange (i ′, j ′)

• if migliora rispetto a x , si esegue lo exchange (i ′, j ′), obtaining x ′′

• evaluate the exchanges that delete the arc (si ′ , sj′+1) and each arc ofx ∩ x ′′. . .

• . . .

• if the best solution among x ′, x ′′, . . . is better than x , accept it48 / 69

Example: Lin-Kernighan’s algorithm

Start with exchange (v1, v4), that reverts the stretch (v2, . . . , v4)

49 / 69


The exchange has replaced (v1, v2) and (v4, v3) with (v1, v4) and (v2, v3)

Now scan the exchanges that delete (v1, v4) and another arc of x ∩ x ′

50 / 69


The best exchange replaces (v1, v4) and (v6, v5) with (v1, v6) and (v4, v5)

Now scan the exchanges that delete (v1, v6) and another arc of x ∩ x ′′

51 / 69


The best exchange replaces (v1, v6) and (v8, v7) with (v1, v8) and (v6, v7)

Now scan the exchanges that delete (v1, v8) and another arc of x ∩ x ′′′

52 / 69


The best exchange replaces (v1, v8) and (v10, v9) with (v1, v10) and(v9, v8)

Now scan the exchanges that delete (v1, v10) and another arc of x ∩ x iv

53 / 69


The best exchange replaces (v1, v10) and (v12, v11) with (v1, v12) and(v11, v10)

Now scan the exchanges that delete (v1, v12) and another arc:as they are all worsening with respect to the starting solution, terminate

54 / 69

Implementation details

• to avoid destroying the already performed moves, the second arcdeleted each time must belong to the starting solution

(the first is one of those added by the previous exchange)

• this implies an upper bound on the length of the sequence

• it is provable that stopping the sequence as soon as the exchangesno longer improve the starting solution does not impair the result• the overall variation of the objective is the sum of the variations due

to the single exchanges

δ (x , o1, . . . , ok) =k∑

`=1

δ (x , o`)

• each sequence of numbers with negative sum admits a cyclicpermutation whose partial sums are all negative

• therefore, there is a cyclic permutation of the moves o1, . . . , ok thatis advantageous at each step

δ (x , o1, . . . , ok) < 0⇒ ∃h :∑`=1

δ (x , oh+1, . . . , oh+`) < 0 ∀` = 1, . . . , k

55 / 69

Destroy-and-repair methods

Each exchange from solution x to solution x ′ can be seen as

• addition to x of subset A = x ′ \ x• deletion from x of subset D = x \ x ′

Combined additions and deletions produce exchanges, but

• all subsets x ∪ {j} \ {i} obtained by exchanges of single elementscould be unfeasible or very bad

• enlarging the neighbourhood to exchanges of more elements can beinefficient

• in many problems A and D can have different cardinalities, since thecardinality of the solutions is nonuniform (e.g., KP, SCP. . . )

The alternative is

• deleting a subset D ⊂ x of cardinality ≤ k and completing it with aconstructive heuristic (especially if x \ D ∈ X for each D)

• adding a set A ⊂ B \ x of cardinality ≤ k and reducing x with adestructive heuristic (especially if x ∪ A ∈ X for each A)

56 / 69

Destroy-and-repair methods

The complexity decreases from O(n|A|n|D|γ (n)

), where

• the number of the possible subsets to add is O(n|A|)

• the number of the possible subsets to delete is O(n|D|

)• γ (n) is the complexity of evaluating feasibility and objective

to O(n|D|Tcostr (n)

)or O

(n|A|Tdistr (n)

), where

• Tcostr (n) is the complexity of the constructive heuristic

• Tdistr (n) is the complexity of the destructive heuristic

The complexity can be further reduced as already described

• focus the exchanges on elements of low or high cost

• apply the first-best strategy instead of the global-best strategy

57 / 69

Efficient visit of exponential neighbourhoods

Another family of methods is based on neighbourhoods whose optimalsolution can be found solving a problem on an auxiliary matrix or graph

• packing: Dynasearch

• negative cost cycle: cyclic exchanges

• shortest path: ejection chains, order-and-split

Such tools are usually defined improvement matrices or graphs

58 / 69

Dynasearch

The variation δ (x , o) of the objective function implied by an elementarymove o ∈ O often depends only on a part of the solution

Operations o′ which act on other parts of the solution have anindependent effect: the order with which they are performed is irrelevant

f (or ′ (or (x))) = f (or (or ′ (x)))

If f (·) is additive, the effect of the two exchanges is simply summed

Example:

• exchanges between different branches for the CMSTP or paths forthe VRP

• 2-opt exchanges for the TSP operating on disjoint segments of thecircuit

59 / 69

Dynasearch

The composite move is a set of elementary moves as in VDS but theelementary moves must have mutually independent effects on feasibilityand the objective

The situation can be modelled with an improvement matrix A in which• the rows represent the components of the solution

(e.g., branches in the CMSTP, paths in the VRP, circuit segments inthe TSP)

• the columns represent the elementary moves:each column has a value equal to the objective improvement −δf

• if the move j impacts on component i , aij = 1; otherwise aij = 0

Determine the optimum packing of the columns, that is the subset ofnonconflicting columns of maximum value

The Set Packing Problem is in general NP-hard, but• on special matrices it is polynomial (this is the case of TSP)• if each move modifies at most two components

• the rows can be seen as nodes• the columns can be seen as edges• each packing of columns becomes a matching

and the maximum matching problem is polynomial60 / 69

Cyclic exchanges

In many problems

• the feasible solutions partition the elements into components S`(the vertices or edges into branches for the CMSTP, the nodes orarcs into paths for the VRP, the objects into containers in the BPP,ecc. . .

• the feasibility is associated to the single components

• the objective function is additive with respect to the components

f (x) =r∑

`=1

f (S`)

It is natural to define for these problems the set of operations Tk whichincludes the transfers of k elements from their component to anotherTk generates neighbourhood NTk• often the feasibility constraints forbid the simple transfers

• but the number of multiple transfers quickly grows with k

We want to find a subset of NTk efficient to explore61 / 69

The improvement graph

The improvement graph allows to describe sequences of transfers

• a node i corresponds to an element i of the ground set B

• an arc (i , j) corresponds to• the transfer of element i from its current component Si to the

current component Sj of element j• the deletion of element j

• the cost of arc cij corresponds to the variation of the contribution ofSj to the objective

cij = f (Sj ∪ {i} \ {j})− f (Sj)

with cij = +∞ if it is unfeasible to transfer i deleting j

A circuit in such a graph corresponds to a closed sequence of transfers

The cost of the circuit corresponds to the cost of the sequence

• but only if each node belongs to a different component

Find the minimum cost circuit satisfying this condition

62 / 69

Example: the CMSTP

Composite move:

• vertex 4 moves from the red branch to the blue branch

• vertex 3 moves from the blue branch to the green branch

• vertex 11 moves from the green branch to the brown branch

• vertex 8 moves from the brown branch to the red branch

The weight of the various branches remain the same (if wv = 1) or similarThen the solution is certainly (or probably) feasible

63 / 69

Search for the minimum cost circuit

The problem is actually NP-hard, but

• the constraint of visiting only once each component allows a ratherefficient dynamic programming algorithm

(if the components are r , the circuit has at most r arcs)

• since a sequence of numbers with negative sum always admits acyclic permutation with all negative partial sums, one can delete allthe partial paths of cost ≥ 0

• there are polynomial algorithms to compute• any negative circuit (Floyd-Warshall)• the minimum average cost circuit (total cost divided by the number

of arcs)

Even violating the constraint on the components, they provide• an underestimate which can prove the nonexistence of negative

circuits• a negative circuit which can respect the constraint by chance or be

modified to obtain one

• there are heuristic polynomial algorithms

64 / 69

Noncyclic exchange chains

It is also possible to create noncyclic transfer chains,so that the cardinality of the components could vary

It is enough to add to the improvement graph

• a source node

• a node for each component

• arcs from the source node to the nodes associated to the elements

• arcs from the nodes associated to the elements to the nodesassociated to the components

Then, find the minimum cost path that

• start from the source node going to a component node

• visit a single component node (the topology guarantees that)

• if it visits a component node, does not visit any associate elementnode

These paths correspond to open transfer chains in which

• a component loses an element

• zero or more components lose an element and acquire another one

• a component acquires an element65 / 69

Example: the CMSTP

66 / 69

Order-first split-second

The Order-first split-second method for partition problems

• builds a starting permutation of the elements

• partitions the elements into components in an optimum way underthe additional constraint that elements of the same component beconsecutive in the starting permutation

Of course, the solution depends on the starting permutation:it is reasonable to repeate the resolution for different permutationscreating a two-level method

1 at the upper level, the permutation is modifed

2 at the lower level, the optimal partition for that permutation iscomputed

Problem: since the permutations are more numerous than the solutions,different permutations can yield the same solution

67 / 69

The auxiliary graph

Once again, we exploit an auxiliary graph

Given the permutation (s1, . . . , sn) of the elements of the ground set E

• each node si corresponds to a element si of the ground set E ,plus a fictitious node s0

• each arc (si , sj) with i < j corresponds to a potential componentS` = (si+1, . . . , sj) formed by the elements of the permutation• from si excluded• to sj included

• the cost csi ,sj corresponds to the cost of the component f (S`)

• the arc does not exist if the component is unfeasible

Consequently

• each path from s1 a sn represents a partition of B

• the cost of the path coincides with the cost of the partition

• the graph is acyclic: finding the optimum path costs O (m) wherem ≤ n (n − 1) /2 is the number of arcs

68 / 69

Example: the VRP

Given an instance of VRP with 5 nodes and capacity W = 10

the arcs corresponding to unfeasible paths (weight >W ) do not exist,the costs are the costs of the cyclic path (d , si+1, . . . , sj)

The optimum path corresponds to three cycles: (d , s1, s2, d), (d , s3, d)and (d , s4, s5, d)

69 / 69

heuristic algorithms for combinatorial optimization · exchange algorithms in combinatorial...

Documents