heuristic algorithms for combinatorial optimization · exchange algorithms in combinatorial...
TRANSCRIPT
Heuristic Algorithmsfor Combinatorial Optimization
Corso di dottorato in Informatica
Roberto Cordone
DI - Universita degli Studi of Milano
E-mail: [email protected]
Web page: https://homes.di.unimi.it/cordone/courses/2017-ae/2017-ae.html
Lesson 4: Local search heuristics Milano, A.A. 2017/18
1 / 69
Exchange algorithms
In Combinatorial Optimization each solution x is a subset of B
An exchange heuristic updates step by step a subset x (t)
1 start from a feasible solution x (0) ∈ X found somehow(often by a constructive heuristic)
2 exchange subsets A external to solution x (t) and internal subsets D(add/drop) so as to generate a family of feasible solutions
x ′A,D = x ∪ A \ D with A ⊆ B \ x and D ⊆ x
3 at each step t, select which subsets to exchage based on a suitableselection criterium ϕ (x ,A,D)
(A∗,D∗) = arg min(A,D)
ϕ (x ,A,D)
4 generate the new current solution performing the chosenmodifications
x (t+1) := x (t) ∪ A∗ \ D∗
5 if a termination condition holds, terminate; otherwise, go back topoint 2
2 / 69
Neighbourhood
An exchange heuristic is defined by:
• the pairs of subsets (A,D) available for an exchange,i.e. the solutions generated by a single exchange starting from x
• the selection criterium ϕ (x ,A,D)
Neighbourhood N : X → 2X is a function which associates to eachfeasible solution x ∈ X a subset of feasible solutions N (x) ⊆ X
The situation can be formally described with a search graph in which
• the nodes represent the feasible solutions x ∈ X
• the arcs connect each solution x to those of its neighbourhood N (x)
Given a search graph
• a run of the algorithm corresponds to a path
• an arc is also denoted as move,because it transforms a solution in another one moving elements
How does one define a neighbourhood and select a move?
3 / 69
Neighbourhoods based on distance
Every solution x ∈ X can be represented by its incidence vector
xi =
{1 if i ∈ x
0 if i ∈ B \ x
Hamming distance between two solutions x and x ′
is the number of elements in which their incidence vectors differ
dH (x , x ′) =∑i∈B
|xi − x ′i |
Referring to the subsets, dH (x , x ′) = |x \ x ′|+ |x ′ \ x |
A typical definition of neighbourhood of x , with an integer parameter k,is the set of all solutions with a Hamming distance from x not larger thank
NHk(x) = {x ′ ∈ X : dH (x , x ′) ≤ k}
4 / 69
Example: the KP
The instance of KP with B = {1, 2, 3, 4}, w = [ 5 4 3 2 ] and W = 10,has the following solutions
where the subsets {1, 2, 3, 4}, {1, 2, 3} and {1, 2, 4} are unfeasible.
Solution x = {1, 3, 4} (in blue) has a neighbourhood NH2 (x) of 7elements (in pink)
The subsets in black do not belong to the neighbourhood, because theirHamming distance from x is > 2
5 / 69
Neighbourhoods based on operations
Another common definition of neighbourhood is obtained defining
• a family O of operations on the solutions of the problem
• the set of the solutions generated applying to x the operations of ONO (x) = {x ′ ∈ X : ∃o ∈ O : o (x) = x ′}
Considering again the example of KP, O can be defined as
• adding to x an element of B \ x• removing from x an element
• exchange one element of x with one of B \ x
The resulting neighbourhood NO is related to those defined by theHamming distance, but does not coincide with any of them
N1 ⊂ NO ⊂ N2
These neighbourhoods can be parametrized performing sequences of koperations of O instead of a single one, exactly as in the distance-basedneighbourhoods
NOk(x) = {x ′ ∈ X : ∃o1, . . . , ok ∈ O : ok (ok−1 (. . . o1 (x))) = x ′}
6 / 69
Distance and operation-based neighbourhoods: differences
In general, an operation-based neighbourhood includes solutions withdifferent Hamming distances
For the TSP one can define a neighbourhood NS1 including the solutionsobtained exchanging two nodes in their visit order
The neighbourhood of solution x = (3, 1, 4, 5, 2) is:
NS1(x) = {(1, 3, 4, 5, 2) , (4, 1, 3, 5, 2) , (5, 1, 4, 3, 2) , (2, 1, 4, 5, 3) , (3, 4, 1, 5, 2) ,
(3, 5, 4, 1, 2) , (3, 2, 4, 5, 1) , (3, 1, 5, 4, 2) , (3, 1, 2, 5, 4) , (3, 1, 4, 2, 5)}
If the two nodes are adjacent, the modified arcs are 3 + 3; otherwise, they are 4 + 4
7 / 69
Distance and operation-based neighbourhoods: relations
Sometimes the neighbourhoods obtained with the two definitions coincide
• for the MDP• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NS1 (exchange one element of x with one of B \ x)
• for the BPP• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NT1 (transfers of a object to another container)
• for SAT• neighbourhood NH2 (solutions at Hamming distance equal to 2)• neighbourhood NF1 (“flips”, replacing the truth assignment of
variable with the opposite one)
This is typical of problems with solutions of fixed cardinality:
• perform a sequence of k exchanges between single elements(|A| = |D| = 1): k elements go into x and k elements out of it
• the Hamming distance between the two extreme solutions is ≤ 2k(if all exchanged elements are different, it is exactly 2k)
8 / 69
Different neighbourhoods for the same problem: the CMST
A problem can admit different neighbourhoods based on differentoperations
In the CMST it is possible to• exchange edges: delete (i , j), add (i , n)
• exchange vertices: n moves from subtree 2 to subtree 1 (computingthe edges to keep each subtree connected at minimum cost):
9 / 69
Different neighbourhoods for the same problem: the PMSP
For the PMSP it is possible to
• transfer neighbourhood NT1 , based on the set T1 of all transfers of atask on another machine
• exchange neighbourhood NS1 , based on the set S1 of the exchangesof two tasks between two machines (one for each machine)
10 / 69
Connectivity of the solution space
An exchange heuristic can find optimal solution only ifat least one optimal solution can be reached from each starting solution
The search graph is denoted as weakly connected to the optimum whenit includes for each solution x ∈ X a path from x to X ∗
As X ∗ is unknown, a stronger condition is often used:the search graph is denoted as strongly connected whenit includes for each pair of solutions x , y ∈ X a path from x to y
An exchange heuristic should guarantee one of these conditions
This is not always possible
• in the MDP, neighbourhood NS1 allows to connect any pair ofsolutions in at most k steps
• in the KP and the SCP, no neighbourhood NSk guarantees that
Feasible solutions can have any cardinality
• the search graph becomes connected also in the KP and the SCPif, besides exchanges, also removals and additions are admitted
11 / 69
Connectivity of the solution space
If feasibility is defined in a sophisticated way, exchanging, adding anddeleting single elements can be insufficient to reach all solution:the unfeasible subsets can break all paths between some feasible solutions
Given W = 8, there are only three feasible solutions, all with twosubtrees of weight 4:
• x = {(r , a) , (a, b) , (b, e) , (r , d) , (c , d) , (d , g) , (f , g)}• x = {(r , a) , (a, e) , (e, f ) , (f , g) , (r , d) , (c , d) , (b, c)}• x = {(r , a) , (a, b) , (b, c) , (r , d) , (d , g) , (f , g) , (e, f )}
The three solutions are mutually reachable only exchanging two edges ata time; exchanging only one yields unfeasible subsets
12 / 69
Steepest descent (hill-climbing) heuristics
Quite often the selection criterium ϕ (x ,A,D) of the new solution inN (x) is the objective function, that is each step of the heuristicmoves from the current solution to the best solution in its neighbourhood
To avoid cyclic behaviour, only strictly improving solutions are accepted
Algorithm SteepestDescent(I , x (0)
)x := x (0);
Stop := false;
While Stop = false do
x := arg minx′∈N(x)
f (x ′);
If f (x) ≥ f (x) then Stop := true; else x := x ;
EndWhile;
Return (x , f (x));
13 / 69
Local and global optimality
A steepest descent heuristic terminates when it finds alocally optimal solution, that is a solution x ∈ X such that
f (x) ≤ f (x) for each x ∈ X
A globally optimal solution is always also locally optimal, but theopposite is not always true: X ∗ ⊆ XN ⊆ X
14 / 69
Exact neighbourhood
Exact neighbourhood is a neighbourhood function N : X → 2X such thateach local optimum is also a global optimum
XN = X ∗
Trivial case: the neighbourhood of each solution coincides with the wholeset of feasible solutions (N (x) = X for each x ∈ X )
It is a useless neighbourhood: too wide to explore
The exact neighbourhoods are extremely rare
• exchange between basic and nonbasic variables used by the simplexalgorithm for Linear Programming
• exchange between edges for the Minimum Spanning Tree problem
In general, the steepest descent heuristic does not find a global optimum
Its effectiveness depends on the properties of the search graph and of theobjective
15 / 69
Proprieties of the search graph
Some relevant properties are• the size of the search space |X |• the connectivity of the search graph (as above discussed)• the diameter of the search graph, that is the number of arcs of the
minimum path between the two farthest solutions: a richerneighbourhood produces graphs of smaller diameter
(but other factors exist: see the “smallworld” effect)
For example, for the symmetric TSP on a complete graph with theneighbourhood NS1 :• the search space includes |X | = (n − 1)! solutions
• the neighbourhood NS1 (two node exchange) includes(n2
)= n(n−1)
2solutions
• the search graph is strongly connected and has diameter ≤ n − 2:each solution transforms into each other with at most n − 2exchanges
For example, x = (1, 5, 4, 2, 3) becomes x ′ = (1, 2, 3, 4, 5) in 3 steps
x = (1, 5, 4, 2, 3)→ (1, 2, 4, 5, 3)→ (1, 2, 3, 5, 4)→ (1, 2, 3, 4, 5) = x ′
(the first node is always 1, the last one is automatically managed)16 / 69
Properties of the search graph
Other relevant properties
• the density of the global optima ( |X∗||X | ) and the local optima ( |XN |
|X | ):
if the local optima are numerous, it is hard to find the global ones
• the distribution of the quality δ (x) of local optima (SQD diagram):if local optima are good, it is less important to find a global one
• the distribution of the locally optimal solutions in the search space:if local optima are close to each other, it is not necessary to explorethe whole space
These indices would require an exhaustive exploration of the search graph
In pratice, one performs a sampling: these analyses
• require very long times
• can be misleading, especially if the global optima are unknown
17 / 69
Example: the TSP
Typical results for the TSP on a complete symmetric graph withEuclidean costs
• the Hamming distance between two local optima is on average � n:the local optima gather in a small region of X
• the Hamming distance between local optima on average exceedsthat between local and global optima:the global optima tend to gather in the middle of local optima
• the FDC diagram (Fitness-Distance Correlation) relates the qualityδ (x) with the distance from global optima dH (x ,X ∗): if there is acorrelation, the best local optimal are closer to the global optima
18 / 69
Fitness-Distance Correlation
If the correlation between quality and closeness to the global optima isstrong• building good starting solutions is profitable,
because they initialize the local search near a good local optimum• it is better to intensify than to diversify
On the contrary, if the correlation is weak• a good inizialization is less important• it is better to diversify than to intensify
This happens for example for the Quadratic Assignment Problem (QAP)
19 / 69
Landscape
Let landscape be the triplet (X ,N, f ), where
• X is the search space, or set of feasible solutions
• N : X → 2X is the neighbourhood function
• f : X → N is the objective function
It can be seen as the search graph with node weights given by theobjective
The effectiveness of the exchange heuristics strongly depend on thelandscape
Rugged landscapes are associated to several optimal local,hence to less effective heuristics
20 / 69
Different kinds of landscape
There is a great variety of landscapes, very different from one another
21 / 69
Autocorrelation coefficient
The complexity of a landscape can be empirically estimated1 performing a random walk in the search graph2 determining the sequence of values of the objective f (1), . . . , f (tmax)
3 computing the sample mean f =tmax∑t=1
f (t)
4 computing the empirical autocorrelation coefficient
r (i) =
tmax−i∑t=1
(f (t)−f )(f (t+i)−f )
tmax−itmax∑t=1
(f (t)−f )2
tmax
This is a function of i starting from r (0) = 1; in general it decreases• if it keeps r (i) ≈ 1, the landscape is smooth:
• the neighbour solutions have values close to the current one• there are few local optima• the steepest descent heuristic is effective
• if it varies steeply, the landscape is rugged:• the neighbour solutions have values far from the current one• there are many local optima• the steepest descent heuristic is ineffective
22 / 69
Plateau
The search graph can be analyzed dividing it by objective levels• plateau of value f is each subset of solutions of value f that are
adjacent in the search graph
Very large plateaus complicate the choice of the solution in the steepestdescent heuristic, because they make it depend on the visit order of N (x)
An extremely uniform landscape is not an advantage!
Example: all transfers and exchanges between machines 1 and 3 leave theobjective value unchanged (the other moves worsen it)
23 / 69
Attraction basins
An alternative partition of the search graph is based on the concept of:
• attraction basin of a locally optimal solution x , that is the set of thesolutions x (0) ∈ X such that the steepest descent heuristic startingfrom x (0) returns x
The steepest descent heuristic is
• effective if the attraction basins are few and large(especially if the global optima have larger basins)
• ineffective if the attraction basins are many and small(especially if the global optima have smaller basins)
24 / 69
Complexity
Algorithm SteepestDescent(I , x (0)
)x := x (0);
Stop := false;
While Stop = false do
x := arg minx′∈N(x)
f (x);
If f (x) ≥ f (x) then Stop := true; else x := x ;
EndWhile;
Return (x , f (x));
The complexity of the steepest descent heuristic depends on
1 the number of steps, which depends on the structure of the searchgraph (width of the attraction basins), hard to predict a priori
2 the search of the best solution in the neighbourhood, which dependson how the search itself is performed
25 / 69
The exploration of the neighbourhood
Two strategies are possible
1 exhaustive search: evaluate all the neighbour solutions;the complexity of a single step is the product of• the number of neighbour solutions (|N (x)|)• the evaluation of the cost of each solution (γN (|B|, x))
Sometimes it is not easy to restrict the visit only to the neighboursolutions:• visit a superset of the neighbourhood• for each element, evaluate its feasibility• for the feasible ones, evaluate the cost
2 efficient exploration of the neighbourhood: instead of visiting thewhole neighbourhood,finds its optimal solution solving an auxiliary problem
Only some special neighbourhoods allow that
26 / 69
Exhaustive visit of the neighbourhood
Algorithm SteepestDescent(I , x(0)
)x := x(0);
Stop := false;
While Stop = false do
x := x ; { x := arg minx′∈N(x)
f (x ′) }
For each x ′ ∈ N (x) do
If f (x ′) < f (x) then x := x ′;
EndFor;
If f (x) ≥ f (x) then Stop := true; else x := x ;
EndWhile;
Return (x , f (x));
The complexity is the product of three terms
1 the number of iterations tmax performed to reach the local optimum
2 the number of solutions∣∣N (x (t), |B|
)∣∣ visited ad each iteration
3 the time to evaluate the objective γN(x (t), |B|
)In general, the three terms are overestimated with expressions dependingon |B|, but not on x (t)
27 / 69
Objective evaluation time: the additive case
The first device to accelerate an exchange algorithm is to minimize thetime to evaluate the objective: in general,updating the value of f (x) costs much less than recomputing it
For example, if f (x) is additive and the number of added (|A|) anddeleted (|D|) elements are small, the evaluation takes constant time:
• sum φj for each element j ∈ A, added to x
• subtract φi for each element i ∈ D, deleted from x
Moving from solution x to x ′ = x \ {i} ∪ {j}, the objective changes by
δf (x , i , j) = f (x \ {i} ∪ {j})− f (x) = φj − φi
Notice that δf (x , i , j) does not depend on x : we will talk about it later
This holds for the simple exchange of objects in the KP and edges in theCMSTP
28 / 69
Example: the symmetric TSP
The neighbourhood NR2 for the TSP
• deletes two nonconsecutive arcs (si , si+1) and (sj , sj+1)
• adds the two arcs (si , sj) and (si+1, sj+1)
• reverts the path (si+1, . . . , sj) (modifying O (n) arcs!)
If the graph and the cost function are symmetric, the variation of f (x) is
δf (x , i , j) = csi ,sj + csi+1,sj+1 − csi ,si+1 − csj ,sj+1
In many other cases, however, the function is not additive
29 / 69
Quadratic objective functions
The MDP has a quadratic objective function: computing it costs Θ(n2)
Moving from x to x ′ = x \ {i} ∪ {j} (neighbourhood NS1 ), the objectivechanges by
δf (x , i , j) = f (x \ {i} ∪ {j})− f (x) =∑
h,k∈x\{i}∪{j}
dhk −∑h,k∈x
dhk
which depends on O (n) distance terms, related to points i and j
There is a general trick for the simmetric quadratic functions
δf (x , i , j) =∑h∈x
∑k∈x
dhk −∑
h∈x\{i}∪{j}
∑k∈x\{i}∪{j}
dhk ⇒
⇒ δf (x , i , j) = 2∑k∈x
djk − 2∑k∈x
dik − 2dij = 2(Dj (x)− Di (x)− dij
)If D` (x) =
∑k∈x
d`k is known for each ` ∈ B, the computation takes O (1)
30 / 69
Example: the MDP
Let us consider f (x) /2Evaluate the exchange
x → x ′ = x \ {i} ∪ {j}
with i ∈ x and j ∈ B \ x
f (x ′) = f (x)− Di + Dj − dij
• the pairs including i are lost
• the pairs including j are acquired
• but the pair (i , j) is in excess
i j
x E \ x
31 / 69
Example: the MDP
Let us consider f (x) /2Evaluate the exchange
x → x ′ = x \ {i} ∪ {j}
with i ∈ x and j ∈ B \ x
f (x ′) = f (x)− Di + Dj − dij
• the pairs including i are lost
• the pairs including j are acquired
• but the pair (i , j) is in excess
i j
x E \ x
31 / 69
Example: the MDP
Let us consider f (x) /2Evaluate the exchange
x → x ′ = x \ {i} ∪ {j}
with i ∈ x and j ∈ B \ x
f (x ′) = f (x)− Di + Dj − dij
• the pairs including i are lost
• the pairs including j are acquired
• but the pair (i , j) is in excess
i j
x E \ x
−Di
31 / 69
Example: the MDP
Let us consider f (x) /2Evaluate the exchange
x → x ′ = x \ {i} ∪ {j}
with i ∈ x and j ∈ B \ x
f (x ′) = f (x)− Di + Dj − dij
• the pairs including i are lost
• the pairs including j are acquired
• but the pair (i , j) is in excess
i j
x E \ x
+Dj
31 / 69
Example: the MDP
Let us consider f (x) /2Evaluate the exchange
x → x ′ = x \ {i} ∪ {j}
with i ∈ x and j ∈ B \ x
f (x ′) = f (x)− Di + Dj − dij
• the pairs including i are lost
• the pairs including j are acquired
• but the pair (i , j) is in excess
i j
x E \ x
−dij
31 / 69
Example: the MDP
Update of the data structures:
• D` = D` − d`i + d`j , ` ∈ E
Each element ` sees
• d`i disappear
• d`j appear
i j
x E \ x
`
+d`j−d`i
32 / 69
Example: the MDP
Update of the data structures:
• D` = D` − d`i + d`j , ` ∈ E
Each element ` sees
• d`i disappear
• d`j appear
i j
x E \ x
`
+d`j−d`i
32 / 69
Example: the MDP
Update of the data structures:
• D` = D` − d`i + d`j , ` ∈ E
Each element ` sees
• d`i disappear
• d`j appear
i j
x E \ x
`
+d`j−d`i
32 / 69
Use of auxiliary information
Many nonlinear functions can be updated in a similar way
• save aggregated information on the current solution x (t)
• use them to compute f (x ′) efficiently for each x ′ ∈ N(x (t))
• update them when moving to the following solution x (t+1)
In the PMSP with the transfer (NT1 ) and exchange (NS1 )neighbourhoods, the objective can be evaluated in constant time savingand updating
• the completion time for each machine
• the indices of the machines with the maximum and secondmaximum time
33 / 69
Example: the PMSP
Consider the exchange o = (i , j) of tasks i and j(i on machine Mi , j on machine Mj)
• compute in constant time the new completion times:one increases, the other decreases (or both remain constant)
• test in constant time whether either exceeds the maximum
• if the maximum time decreases, test in constant time whether theother time or the second maximum time becomes the maximum
Once the neighbourhood is visited and the exchange selected, update
• the two modified completion times (each one in constant time)
• their positions in a max-heap (each one in time O (log |M|))
34 / 69
Use of auxiliary information
The information can refer to
• the current solution x
• the previous solution visited in the neighbourhood, according to asuitable order
Consider the neighbourhood NR2 for the asymmetric TSP:
• the neighbour solutions differ from x for O (n) arcs
• the neighbour solutions differ between each other for O (n) arcs
• if the pairs of arcs (si , si+1) and (sj , sj+1) follow the lexicographicorder, the reverted path changes only by one arc
35 / 69
Example: the asymmetric TSP
In general, the variation of f (x) is
δf (x , i , j) = csi ,sj + csi+1,sj+1 − csi ,si+1 − csj ,sj+1 + csj ...si+1 − csi+1...sj
Moving from exchange (si , sj) to exchange (si , sj+1)• the first four terms change, but they are inputs of the problem• the last two terms can be updated in constant time{
csj′ ...si+1 = csj ...si+1 + csj+1,sj
csi+1...sj′ = csi+1...sj + csj ,sj+1
What is the effect of exploring the neighbourhood in a predefined order?36 / 69
And feasibility?
Some operations of the neighbourhood can generate unfeasible subsets
NO (x) = {x ′ ⊆ B : ∃o ∈ O : o (x) = x ′} ⊇ NO (x) = NO (x) ∩ X
(for example, this holds for the KP, BPP, SCP, CMSTP. . . )
In that case, for each element of NO (x) it takes to
• test feasibility
• for the feasible elements evaluate the cost
To test feasibility, the same techniques used for the objective can be used
37 / 69
Example: the CMSTP
Consider neighbourhood NS1 which adds one edge and deletes another
• if the two edges are in the same subtree, the solution remains feasible
• if they are in different subtrees, one loses weight, the other acquiresit: the variation is equal to the weight of the transferred subtree
Saving for each vertex the weight of the appended subtree,it is enough to compare that weight with the residual capacity of thesubtree that receives it
Once performed the best exchange, this information must be updated intime O (n) visiting the tree (each vertex has an appended subtree)
38 / 69
A general scheme of sophisticated exploration
The use of auxiliary information implies1 the inizialization of suitable
• local data structures, related to the exploration of a singleneighbourhood
• global data structures, related to the whole search process
2 their update from a solution or an iteration to the following one
Algorithm SteepestDescent(I , x(0)
)x := x(0); GD := InitializeGD(); Stop := false;
While Stop = false do
x := 0; δ := 0; LD := InitializeLD()
For each x ′ ∈ N (x) do
If δ (x ′) < δ (x) then x := x ′;
LD := UpdateLD(LD, x ′)
EndFor;
If f (x) ≥ f (x)
then Stop := true;
else x := x ; GD := UpdateGD(GD, x)
EndIf
EndWhile;
Return (x , f (x)); 39 / 69
Partial saving of the neighbourhood
Often, performing operation o ∈ O on a solution x ∈ X , the variationδf (x , o) does not depend on x or depends only on a part of x
In particular, it happens that
δf (x ′, o) = δf (x , o) for each x ′ = o (x) and for each o ∈ O
where O ⊂ O is a large subset of O
In that case, it is advantageous to
1 compute δf (x , o) for each or ∈ O and save the values
2 perform the best operation o∗, generating the new solution x ′
3 compute δf (x ′, o) only for o ∈ O \ O and retrieve the values savedfor o ∈ O, which are still valid
4 go back to point 2
40 / 69
Example: the CMST
Consider the neighbourhood NS1for the CMST:
• addition of an edge j ∈ E \ x• deletion of a edge i ∈ x
Two branches are involved: one acquires a subtree, the other loses it
Once the exchange is performed, the effect of exchanges involving other branchesremains unchanged
δf (x , i , j) = δf(x ′, i , j
)Therefore one can
• keep the set of feasible exchanges, with the associated values δf
• perform the best exchange (i , j) and remove from the set the exchangesinvolving one of the two branches involved in the best one
• recompute the exchanges that involve any such branch and add them to the set
If the branches are numerous, the saving is very strong 41 / 69
Efficiency-effectiveness trade-off
The complexity depends on three factors
1 number of local search steps
2 cardinality of the visited neighbourhood
3 cost computation of the single solution
Assuming an identical effectiveness, the first two factors are in clearconflict:
• a small neighbourhood is fast to explore, but requires several stepsto reach a local optimum
• a large neighbourhood requires few steps, but is slow to explore
The optimal trade-off is somewhere in the middle: the neighbourhoodshould be
• large enough to include good solutions
• small enough to be explored quickly
A priori it is not easy to identify the best trade-off, because
• the effectiveness is not identical: large neighbourhoods have betterlocal optima
• the efficiency quickly decreases as the neighbourhood increases42 / 69
Fine tuning of the neighbourhoods
It is also possible to define a neighbourhood N and tune its size
• explore only a promising subneighbourhood N ′ ⊂ NFor example, if the objective function is additive, one can
• add only elements j ∈ B \ x of low cost φj
• delete only elements i ∈ x of high cost φi
• terminate the visit after finding a promising solutionFor example, the first-best strategy stops the exploration at the first solutionbetter than the current one
If f (x) < f (x) then Stop := true;
The effectiveness depends on the objective
• if the cost of some elements influences very much the objective,it is worth taking it into account, fixing of forbidding them
and on the structure of the neighbourhood
• if the landscape is smooth, the first improving solution approximateswell the best solution of the neighbourhood: it is better to stop
• if the landscape is rugged, the best solution of the neighbourhoodcould be much better: it is better to go on
43 / 69
Very Large Scale Neighbourhood Search
The steepest descent heuristic is
• very effective if the attraction basins are few and large
• ineffective if the attraction basins are many and small
Enlarging the neighbourhood, in general yields larger attraction basins,but also longer exploration times
The Very Large Scale Neighbourhood (VLSN) Search approaches have
• neighbourhoods exponential in |B|• explored in polynomial time
Two strategies avoid that the computational time become exponential
1 explore the neighbourhood heuristically, returning a candidatesolution instead of the best of the whole neighbourhood
2 select a neighbourhood in which the objective could be optimized inpolynomial time even if the number of solutions is exponential
44 / 69
Variable Depth Search (VDS)
The neighbourhoods based on operations are easily parametrizable
NOk(x) = {x ′ ∈ X : x ′ = ok (ok−1 (. . . o1 (x))) with o1, . . . , ok ∈ O}
where k is the number of operations performed to generate the solutions
The idea is to
• define a composite move as sequence of elementary moves
• build the sequence optimizing each elementary step
• accept the final solution only if it improves the starting ones
Parameter k assumes high values only if profitable: it adapts at each step
For the neighbourhoods based on distance, similar considerations hold45 / 69
Scheme of the Variable Depth Search
Given x (t), for each x ′ ∈ N(x (t)), instead of evaluting only f (x ′)
1 select the best solution x ′′ in a neighbourhood N (x ′) ⊆ N (x ′)
2 if x ′′ improves the starting solution x , perform the moveand go back to point 1; otherwise terminate
3 return the best solution found during the whole process
For each x ′ ∈ N (x)
{ Steepest descent } { Variable Depth Search }Compute f (x ′) y∗ := x ′; Stop := false;
While Stop = false do
y := arg miny′∈N(y)
f (y ′);
If f (y) ≥ f(x(t))then Stop := true; else y := y ;
If f (y) < f (y∗) then y∗ := y ;
EndWhile;
Compute f (y∗);
It is a sort of roll-out for exchange algorithms
46 / 69
Differences with respect to steepest descent
In what does it differ with respect to steepest descent exploration?
• it finds a local optimum for each solution of the neighbourhoodcome if it performed a sort of one-step look-ahead
• it admits worsenings along the sequence of elementary moves(never with respect to the starting solution)
• it requires devices to avoid that the elementary moves of thesequence cancel each other creating a cycle
• often, for efficiency, it uses• a reduced neighbourhood N ⊂ N to scan the elementary moves• the first-best strategy in the exploration of N (elementary step)• the first-best strategy in the exploration of N (starting
neighbourhood)
47 / 69
Lin-Kernighan’s algorithm for the TSP
The neighbourhood NRk(x) for the symmetric TSP includes the
solutions obtained
• deleting k arcs of x
• adding other k arcs that rebuild a hamiltoniano circuit
• possibly inverting parts of the circuit (leaving the cost unchanged)
Lin-Kernighan’s algorithm applies the VDS by sequences of 2-optexchanges: each k-opt exchange is equivalent to a sequence of (k − 1)2-opt exchanges, each of which deleted one of the two arcs added by theprevious one
Then for each solution x ′ ∈ NR1 (x) obtained by exchange (i , j)
• evaluate the 2-opt exchanges that delete the added arc (si , sj+1) andeach arc of x ∩ x ′ to find the best exchange (i ′, j ′)
• if migliora rispetto a x , si esegue lo exchange (i ′, j ′), obtaining x ′′
• evaluate the exchanges that delete the arc (si ′ , sj′+1) and each arc ofx ∩ x ′′. . .
• . . .
• if the best solution among x ′, x ′′, . . . is better than x , accept it48 / 69
Example: Lin-Kernighan’s algorithm
Start with exchange (v1, v4), that reverts the stretch (v2, . . . , v4)
49 / 69
Example: Lin-Kernighan’s algorithm
The exchange has replaced (v1, v2) and (v4, v3) with (v1, v4) and (v2, v3)
Now scan the exchanges that delete (v1, v4) and another arc of x ∩ x ′
50 / 69
Example: Lin-Kernighan’s algorithm
The best exchange replaces (v1, v4) and (v6, v5) with (v1, v6) and (v4, v5)
Now scan the exchanges that delete (v1, v6) and another arc of x ∩ x ′′
51 / 69
Example: Lin-Kernighan’s algorithm
The best exchange replaces (v1, v6) and (v8, v7) with (v1, v8) and (v6, v7)
Now scan the exchanges that delete (v1, v8) and another arc of x ∩ x ′′′
52 / 69
Example: Lin-Kernighan’s algorithm
The best exchange replaces (v1, v8) and (v10, v9) with (v1, v10) and(v9, v8)
Now scan the exchanges that delete (v1, v10) and another arc of x ∩ x iv
53 / 69
Example: Lin-Kernighan’s algorithm
The best exchange replaces (v1, v10) and (v12, v11) with (v1, v12) and(v11, v10)
Now scan the exchanges that delete (v1, v12) and another arc:as they are all worsening with respect to the starting solution, terminate
54 / 69
Implementation details
• to avoid destroying the already performed moves, the second arcdeleted each time must belong to the starting solution
(the first is one of those added by the previous exchange)
• this implies an upper bound on the length of the sequence
• it is provable that stopping the sequence as soon as the exchangesno longer improve the starting solution does not impair the result• the overall variation of the objective is the sum of the variations due
to the single exchanges
δ (x , o1, . . . , ok) =k∑
`=1
δ (x , o`)
• each sequence of numbers with negative sum admits a cyclicpermutation whose partial sums are all negative
• therefore, there is a cyclic permutation of the moves o1, . . . , ok thatis advantageous at each step
δ (x , o1, . . . , ok) < 0⇒ ∃h :∑`=1
δ (x , oh+1, . . . , oh+`) < 0 ∀` = 1, . . . , k
55 / 69
Destroy-and-repair methods
Each exchange from solution x to solution x ′ can be seen as
• addition to x of subset A = x ′ \ x• deletion from x of subset D = x \ x ′
Combined additions and deletions produce exchanges, but
• all subsets x ∪ {j} \ {i} obtained by exchanges of single elementscould be unfeasible or very bad
• enlarging the neighbourhood to exchanges of more elements can beinefficient
• in many problems A and D can have different cardinalities, since thecardinality of the solutions is nonuniform (e.g., KP, SCP. . . )
The alternative is
• deleting a subset D ⊂ x of cardinality ≤ k and completing it with aconstructive heuristic (especially if x \ D ∈ X for each D)
• adding a set A ⊂ B \ x of cardinality ≤ k and reducing x with adestructive heuristic (especially if x ∪ A ∈ X for each A)
56 / 69
Destroy-and-repair methods
The complexity decreases from O(n|A|n|D|γ (n)
), where
• the number of the possible subsets to add is O(n|A|)
• the number of the possible subsets to delete is O(n|D|
)• γ (n) is the complexity of evaluating feasibility and objective
to O(n|D|Tcostr (n)
)or O
(n|A|Tdistr (n)
), where
• Tcostr (n) is the complexity of the constructive heuristic
• Tdistr (n) is the complexity of the destructive heuristic
The complexity can be further reduced as already described
• focus the exchanges on elements of low or high cost
• apply the first-best strategy instead of the global-best strategy
57 / 69
Efficient visit of exponential neighbourhoods
Another family of methods is based on neighbourhoods whose optimalsolution can be found solving a problem on an auxiliary matrix or graph
• packing: Dynasearch
• negative cost cycle: cyclic exchanges
• shortest path: ejection chains, order-and-split
Such tools are usually defined improvement matrices or graphs
58 / 69
Dynasearch
The variation δ (x , o) of the objective function implied by an elementarymove o ∈ O often depends only on a part of the solution
Operations o′ which act on other parts of the solution have anindependent effect: the order with which they are performed is irrelevant
f (or ′ (or (x))) = f (or (or ′ (x)))
If f (·) is additive, the effect of the two exchanges is simply summed
Example:
• exchanges between different branches for the CMSTP or paths forthe VRP
• 2-opt exchanges for the TSP operating on disjoint segments of thecircuit
59 / 69
Dynasearch
The composite move is a set of elementary moves as in VDS but theelementary moves must have mutually independent effects on feasibilityand the objective
The situation can be modelled with an improvement matrix A in which• the rows represent the components of the solution
(e.g., branches in the CMSTP, paths in the VRP, circuit segments inthe TSP)
• the columns represent the elementary moves:each column has a value equal to the objective improvement −δf
• if the move j impacts on component i , aij = 1; otherwise aij = 0
Determine the optimum packing of the columns, that is the subset ofnonconflicting columns of maximum value
The Set Packing Problem is in general NP-hard, but• on special matrices it is polynomial (this is the case of TSP)• if each move modifies at most two components
• the rows can be seen as nodes• the columns can be seen as edges• each packing of columns becomes a matching
and the maximum matching problem is polynomial60 / 69
Cyclic exchanges
In many problems
• the feasible solutions partition the elements into components S`(the vertices or edges into branches for the CMSTP, the nodes orarcs into paths for the VRP, the objects into containers in the BPP,ecc. . .
• the feasibility is associated to the single components
• the objective function is additive with respect to the components
f (x) =r∑
`=1
f (S`)
It is natural to define for these problems the set of operations Tk whichincludes the transfers of k elements from their component to anotherTk generates neighbourhood NTk• often the feasibility constraints forbid the simple transfers
• but the number of multiple transfers quickly grows with k
We want to find a subset of NTk efficient to explore61 / 69
The improvement graph
The improvement graph allows to describe sequences of transfers
• a node i corresponds to an element i of the ground set B
• an arc (i , j) corresponds to• the transfer of element i from its current component Si to the
current component Sj of element j• the deletion of element j
• the cost of arc cij corresponds to the variation of the contribution ofSj to the objective
cij = f (Sj ∪ {i} \ {j})− f (Sj)
with cij = +∞ if it is unfeasible to transfer i deleting j
A circuit in such a graph corresponds to a closed sequence of transfers
The cost of the circuit corresponds to the cost of the sequence
• but only if each node belongs to a different component
Find the minimum cost circuit satisfying this condition
62 / 69
Example: the CMSTP
Composite move:
• vertex 4 moves from the red branch to the blue branch
• vertex 3 moves from the blue branch to the green branch
• vertex 11 moves from the green branch to the brown branch
• vertex 8 moves from the brown branch to the red branch
The weight of the various branches remain the same (if wv = 1) or similarThen the solution is certainly (or probably) feasible
63 / 69
Search for the minimum cost circuit
The problem is actually NP-hard, but
• the constraint of visiting only once each component allows a ratherefficient dynamic programming algorithm
(if the components are r , the circuit has at most r arcs)
• since a sequence of numbers with negative sum always admits acyclic permutation with all negative partial sums, one can delete allthe partial paths of cost ≥ 0
• there are polynomial algorithms to compute• any negative circuit (Floyd-Warshall)• the minimum average cost circuit (total cost divided by the number
of arcs)
Even violating the constraint on the components, they provide• an underestimate which can prove the nonexistence of negative
circuits• a negative circuit which can respect the constraint by chance or be
modified to obtain one
• there are heuristic polynomial algorithms
64 / 69
Noncyclic exchange chains
It is also possible to create noncyclic transfer chains,so that the cardinality of the components could vary
It is enough to add to the improvement graph
• a source node
• a node for each component
• arcs from the source node to the nodes associated to the elements
• arcs from the nodes associated to the elements to the nodesassociated to the components
Then, find the minimum cost path that
• start from the source node going to a component node
• visit a single component node (the topology guarantees that)
• if it visits a component node, does not visit any associate elementnode
These paths correspond to open transfer chains in which
• a component loses an element
• zero or more components lose an element and acquire another one
• a component acquires an element65 / 69
Example: the CMSTP
66 / 69
Order-first split-second
The Order-first split-second method for partition problems
• builds a starting permutation of the elements
• partitions the elements into components in an optimum way underthe additional constraint that elements of the same component beconsecutive in the starting permutation
Of course, the solution depends on the starting permutation:it is reasonable to repeate the resolution for different permutationscreating a two-level method
1 at the upper level, the permutation is modifed
2 at the lower level, the optimal partition for that permutation iscomputed
Problem: since the permutations are more numerous than the solutions,different permutations can yield the same solution
67 / 69
The auxiliary graph
Once again, we exploit an auxiliary graph
Given the permutation (s1, . . . , sn) of the elements of the ground set E
• each node si corresponds to a element si of the ground set E ,plus a fictitious node s0
• each arc (si , sj) with i < j corresponds to a potential componentS` = (si+1, . . . , sj) formed by the elements of the permutation• from si excluded• to sj included
• the cost csi ,sj corresponds to the cost of the component f (S`)
• the arc does not exist if the component is unfeasible
Consequently
• each path from s1 a sn represents a partition of B
• the cost of the path coincides with the cost of the partition
• the graph is acyclic: finding the optimum path costs O (m) wherem ≤ n (n − 1) /2 is the number of arcs
68 / 69
Example: the VRP
Given an instance of VRP with 5 nodes and capacity W = 10
the arcs corresponding to unfeasible paths (weight >W ) do not exist,the costs are the costs of the cyclic path (d , si+1, . . . , sj)
The optimum path corresponds to three cycles: (d , s1, s2, d), (d , s3, d)and (d , s4, s5, d)
69 / 69