cs 261 notes

Upload: peng-hui-how

Post on 05-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 Cs 261 Notes

    1/120

    CS261 - Optimization Paradigms

    Lecture Notes for 2011-2012 Academic Year

    cSerge Plotkin

    January 2013

    1

  • 7/21/2019 Cs 261 Notes

    2/120

    Contents

    1 Steiner Tree Approximation Algorithm 3

    2 The Traveling Salesman Problem 7

    2.1 General TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.2 Example of a TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.3 Computational Complexity of General TSP . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.4 Approximation Methods of General TSP? . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2 TSP with triangle inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.1 Computational Complexity of TSP with Triangle Inequality . . . . . . . . . . . . . 9

    2.2.2 Approximation Methods for TSP with Triangle Inequality . . . . . . . . . . . . . . 10

    3 Matchings, Edge Covers, Node Covers, and Independent Sets 19

    3.1 Minumum Edge Cover and Maximum Matching . . . . . . . . . . . . . . . . . . . . . . . 19

    3.2 Maximum Independent Set and Minimum Node Cover . . . . . . . . . . . . . . . . . . . . 21

    3.3 Minimum Node Cover Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4 Intro to Linear Programming 24

    4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.3 Geometric interpretation of LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4.4 Existence of optimum at a vertex of the polytope . . . . . . . . . . . . . . . . . . . . . . . 27

    4.5 Primal and Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    4.6 Geometric view of Linear Programming duality in two dimensions . . . . . . . . . . . . . 30

    4.7 Historical comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    5 Approximating Weighted Node Cover 34

    5.1 Overview and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.2 Min Weight Node Cover as an Integer Program . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.3 Relaxing the Linear Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    2

  • 7/21/2019 Cs 261 Notes

    3/120

    5.4 Primal/Dual Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    6 Approximating Set Cover 38

    6.1 Solving Minimum Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    7 Randomized Algorithms 43

    7.1 Maximum Weight Crossing Edge Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    7.2 The Wire Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    7.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    7.2.2 Approximate Linear Program Formulation . . . . . . . . . . . . . . . . . . . . . . . 45

    7.2.3 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    7.2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    8 Introduction to Network Flow 50

    8.1 Flow-Related Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    8.2 Residual Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    8.3 Equivalence of min cut and max flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    8.4 Polynomial max-flow algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    8.4.1 Fat-path algorithm and flow decomposition . . . . . . . . . . . . . . . . . . . . . . 56

    8.4.2 Polynomial algorithm for Max-flow using scaling . . . . . . . . . . . . . . . . . . . 59

    8.4.3 Strongly polynomial algorithm for Max-flow . . . . . . . . . . . . . . . . . . . . . . 59

    8.5 The Push/Relabel Algorithm for Max-Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    8.5.1 Motivation and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    8.5.2 Description of Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    8.5.3 An Illustrated Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    8.5.4 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    8.5.5 A better implementation: Discharge/Relabel . . . . . . . . . . . . . . . . . . . . . 71

    8.6 Flow with Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    8.6.1 Motivating example - Job Assignment problem . . . . . . . . . . . . . . . . . . . 74

    8.6.2 Flows With Lower Bound Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 75

    3

  • 7/21/2019 Cs 261 Notes

    4/120

    9 Matchings, Flows, Cuts and Covers in Bipartite Graphs 80

    9.1 Matching and Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    9.2 Finding Maximum Matching in a Bipartite Graph . . . . . . . . . . . . . . . . . . . . . . 81

    9.3 Equivalence Between Min Cut and Minimum Vertex Cover . . . . . . . . . . . . . . . . . 83

    9.4 A faster algorithm for Bipartite Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    9.5 Another O(m

    n) algorithm that runs faster in practice . . . . . . . . . . . . . . . . . . . 87

    9.6 Tricks in the implementation of flow algorithms. . . . . . . . . . . . . . . . . . . . . . . . 89

    10 Partial Orders and Dilworths Theorem 91

    10.1 Partial orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    10.2 D ilworth Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    11 Farkas Lemma and Proof of Duality 96

    11.1 T he Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    11.2 Alternative Forms of Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    11.3 Duality of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    12 Examples of primal/dual relationships 101

    12.1 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    12.2 Maximum bipartite matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    12.3 Shortest path from source to sink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    12.4 Max flow and min-cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    12.5 Multicommodity Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    13 Online Algorithms 108

    13.1 I ntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    13.2 Ski Problem and Competitive Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    13.3 P aging and Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    13.3.1 Last-in First-out (LIFO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    13.3.2 Longest Forward Distance (LFD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    13.3.3 Least Recently Used (LRU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    1 3 . 4 L o a d B a l a n c i n g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 4

    4

  • 7/21/2019 Cs 261 Notes

    5/120

    13.5 On-line Steiner trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    5

  • 7/21/2019 Cs 261 Notes

    6/120

    1 Steiner Tree Approximation Algorithm

    Given a connected graph G= (V, E) with non-negative edge costs, and a set of special nodes SV,a subgraph ofG is a Steiner tree, if it is a tree that spans (connects) all the (special) nodes in S.

    The Steiner Tree problem is to find a Steiner Tree of minimum weight (cost).

    Steiner Tree is an important NP-hard problem that is often encountered in practice. Examplesinclude design of multicast trees, design of signal distribution networks, etc. Since it is NP-hard, wecannot expect to design a polynomial-time algorithm that solves it to optimality. In this lecture wewill describe a simple polynomial time algorithm that produces an answer that is within a factor 2 ofoptimum.

    Since a minimum cost spanning tree (MST) of the given graph is a Steiner tree, the intuitively easiestapproximation is to find the MST of the given graph. Unfortunately, there are graphs for which this isa bad choice, as illustrated in fig. (1).

    nnn

    nodes

    1

    1

    2

    1

    1

    1

    1

    2

    1

    1

    (a) (c)(b)

    1

    1

    2

    1

    1

    Figure 1: An example showing that MST is not a good heuristic for the minimum Steiner tree problem.(a) The original graph. Circled nodes are the special nodes. (b) The minimum cost spanning tree. (c)The minimum cost Steiner tree.

    In this graph, the MST has weight (n 1). Also, in this case, the MST is minimal in the Steinertree sense, that is, removing any edge from it will cause it to not be a Steiner tree. The minimum costSteiner tree, however, has weight 2. The approximation is therefore only good to a factor of (n 1)/2,which is quite bad.

    Next we will describe a better approximation algorithm, which is guaranteed to produce a Steinertree no worse than 2 times the optimal solution. The algorithm to find an approximate optimal Steinertree for a graph G = (V, E) and special nodesSis as follows:

    6

  • 7/21/2019 Cs 261 Notes

    7/120

    1. Compute the complete graph of distances between all of the special nodes. The distance betweentwo nodes is the length of the shortest path between them. We construct a complete graphG = (V, E), where V corresponds to the special nodes S, and the weight of each edge in E

    is the distance between the two corresponding nodes in G, i.e., the length of the shortest path

    between them.2. Find the MST on the complete graph of distances. LetT be the MST ofG.

    3. Reinterpret the MST in the original graph. For each edge in T, add the corresponding path in Gtoa new graphG. This corresponding path would be a shortest path between the two correspondingspecial nodes. G is not necessarily a tree yet is a subgraph of the original graph G.

    4. Find a spanning tree in the resulting subgraph. Tapprox, a spanning tree ofG is an approximately

    optimal Steiner tree.

    A

    C

    BD

    3 3

    2

    2

    2

    2

    3 3

    A

    C

    BD

    3 3

    3 3

    A

    C

    BD

    3 3

    2

    2

    2

    2

    3 3

    4

    4

    (a (b (c

    Figure 2: The Steiner tree approximation algorithm run on the graph in (a). Circled nodes are thespecial nodes. (b) The corresponding complete graph of distances. The minimum spanning tree isdrawn in bold. (c)The minimum spanning tree of the graph of distances reinterpreted as a subgraph ofthe original graph (drawn in bold). In this example, it is its own spanning tree.

    Theorem 1.1. The above algorithm produces a Steiner tree.

    Proof. The algorithm returns a tree which is a subgraph of G, because step 4 returns a tree Tapproxwhich is a subgraph of the reinterpreted MST G , which is a subgraph ofG.

    To see that Tapprox spans all the nodes in S, see that V has a node corresponding to each node

    in S, and thus that the MST ofG

    , T

    spans nodes corresponding to all the nodes in S. Thus, afterreinterpretation in step 3, G spans all the nodes in S. Since Tapprox is a spanning tree ofG, it too

    spans all the nodes in S.

    Thus, Tapprox is a tree, a subgraph ofG, and spans S. Thus, by definition, it is a Steiner tree.

    Theorem 1.2. The Steiner tree returned by the above algorithm has weight at most twice the optimum.

    7

  • 7/21/2019 Cs 261 Notes

    8/120

    (a) (b) (c)

    Figure 3: (a) A depth-first search walk on an example optimal tree Topt. (b) The effect of taking onlyspecial nodes to form W. (This is not W, but the special nodes corresponding to W, in order.)(c)

    The effect of dropping repeated nodes to form W. (These are the special nodes corresponding to W,in order.)

    Proof. To show this, consider a minimum cost Steiner tree, Topt, with weight OP T.

    Now, do a depth-first search walk on this optimum tree Topt. This can be described recursivelyby starting from the root, moving to a child, performing an depth-first search walk on that subtree,returning to the root, and continuing to the next child until there are no children left. Call this walkW.

    Each edge in the tree is followed exactly twice: once on the path away from a node, and once on thepath back to the node. The total weight of the walk w(W) = 2OP T.

    Next, we relate this to the complete graph of distances G. We replace the walk W in G by a walkW in G, the complete graph of distances, in the following manner. We follow the original walk, andevery time a special node is encountered we put this in the walk in the distance graph. One can viewthis operation as shortcutting.

    Consider a part ofWbetween two successive special nodes. This is a path in G. The correspondingedge in W is of the weight of the minimum path in G between the two special nodes, by constructionin step 1. Thus, each edge ofW is of weight no greater than the corresponding path in W. Thus, thetotal weight of the new walk, w(W)w(W) = 2OP T.

    Next, remove duplicate nodes, if any, from W to getW. Obviously, w(W)2OP T.Now, since Topt spanned all the special nodes, so did W. Thus, W and hence W span all the

    nodes in G. Thus, W is a spanning tree of G. Since T is the minimum spanning tree of G,

    w(T)w(W)2OP T.The reinterpretation ofT in step 3 adds edges of cumulative weight exactly equal to the edge ofT

    being considered, toG. Thus, the total weight ofG cannot be greater than that ofT. It could be less,though, because an edge might be present in G due to more than one edges ofTand yet contributeits weight only once to G. Thus, w(G)w(T)2OP T.

    Step 4 will just delete some edges from G, possibly reducing its weight further, to give Tapprox.Thus, w(Tapprox)w(G)2OP T.

    8

  • 7/21/2019 Cs 261 Notes

    9/120

    To summarize, a good way to compute the approximation bounds of an algorithm is to manipulatethe optimal solution somehow in order to relate it to the output of the algorithm. In the present case,we related the weight of the Steiner tree given by the algorithm to the optimal Steiner tree, through adepth-first search walk of the optimal Steiner tree.

    2

    2

    2

    1 1

    1 1

    2

    Figure 4: This example shows that the above algorithm sometimes indeed produces suboptimal solution.In this example, black nodes are special and red node is not. The algorithm described above willproduce a tree of weight 6, while the optimum is clearly 4.

    Notes The algorithm presented in this section is from [8]. Best approximation factor for Steiner Treeproblem is 1.39, presented in [2]. See also [13]. It is knowns that unless P=NP, it is impossible toproduce an approximation factor better than (1 + ) for some constant .

    9

  • 7/21/2019 Cs 261 Notes

    10/120

    2 The Traveling Salesman Problem

    In this section we discuss several variants of the Travelling Salesman Problem (TSP). For the generalTSP, we know that the problem is NP-hard. So we ask the natural question: can we find a reasonable

    approximation method? In this section we prove that the answer is no: there is no polynomial-timealgorithm that solves the general TSP to within a given constant factor of the optimal solution.

    If we constrain the general TSP by demanding that the edge weights obey the triangle inequality,we still have an NP-hard problem. However, now we can find reasonable approximation methods. Twoare presented: one which obtains approximation factor of 2 and a more complicated one that achievesapproximation factor of 1.5.

    2.1 General TSP

    2.1.1 Definitions

    Before defining TSP, it is useful to consider a related problem:

    Definition 2.1. Hamiltonian Cycle is a cycle in an undirected graph that passes through each nodeexactly once.

    Definition 2.2. Given an undirectedcompleteweighted graph, TSP is the problem of finding a minimum-cost Hamiltonian Cycle.

    Let G = (V, E) be a complete undirected graph. We are also given a weight function wij definedon each edge joining nodes i and j in V, to be interpreted as the cost or weight (of traveling) directlybetween the nodes. The general Travelling Salesman Problem (TSP) is to find a minimal weight tourthrough all the nodes of G. A tour is a cycle in the graph that goes through every node once and only

    once. We define the weight of a tour as the sum of the weights of the edges contained in the tour.Note that in general the function wij need not be a distance function. In particular, it need not satisfythe triangle inequality. For this section, we constrain wij to be nonnegative to make the proofs lesscomplicated. All the results we show can also be proved when negative weights are allowed.

    2.1.2 Example of a TSP

    Here is a specific example of where the TSP arises. Suppose we are drilling some holes in a board in ourfavorite machine shop. Suppose it takes timepj to drill the hole at position j and time wij to move thedrill head from position i to position j. We wish to minimize the time needed to drill all of the holesand return to the initial reference position.

    The solution to this problem can be represented as a permutation , which represents the order inwhich the holes are drilled. The cost of the solution (the time required for drilling all holes) is

    time=

    i

    (wi(i)+p(i)) =

    i

    wi(i)+

    i

    p(i);

    Sincepi is fixed for each i,

    ip(i) is the same for any permutation of the holes. This part of thecost is fixed, or may be considered a sunk cost, so the objective is to minimize the sum of the wi(i)terms, which is done by formulating the problem as a TSP and solving it.

    10

  • 7/21/2019 Cs 261 Notes

    11/120

    2.1.3 Computational Complexity of General TSP

    Recall that algorithmic complexity classes apply to decision problems: those with a yes/no type ofanswer. An algorithm to solve the TSP problem might return a minimum cost tour of an input graph,

    or the cost of such a tour. We can formulate a decision problem version of TSP as follows: We ask Doesthe minimum cost tour ofG have a cost less than k? wherek is some constant we specify.

    The complexity class NP may be casually defined as the set of all problems such that solutions tospecific instances of these problems can be verified in polynomial time. We can see that TSP is inNP, because we can verify whether the cost of a proposed tour is below a specified cost threshold k inpolynomial time. Given any tour of the graph G, we can verify that the tour contains all of the nodes,can sum up the cost of the tour, and then compare this cost to k. Clearly these operations can beperformed in polynomial time. Thus, TSP is in NP.

    It turns out that the general TSP is, in fact, NP-complete (that is, it is in NP and all problems ofclass NP polynomially transform to it). Recall that since all NP-complete problems have polynomialtransformations to TSP, if we could solve TSP in polynomial time we could solve all NP-complete

    problems in polynomial time. Since NP-complete problems have been studied for a long time and nopoly-time solutions have been found, seeking a poly-time solution to TSP is very likely to be fruitless.

    The proof of NP-completeness is not presented here but is similar to the proof in the next sectionand can be found in [4].

    2.1.4 Approximation Methods of General TSP?

    First we show that one cannot expect to find good approximation algorithms for general TSP. Moreformally:

    TheoremNo polynomial time algorithm that approximates TSP within a constant factor exists unless

    P=NP.

    A general strategy to prove theorems of this type is to prove a polynomial-time reduction from ahard problem to our problem of interest. If we can quickly convert any instance of the hard problemto an instance of our problem, and can quickly solve any instance of our problem, then we can sovethe hard problem quickly. This produces a contradiction if we know (or assume) that the hard problemcannot be solved quickly.

    Proof. Suppose that we have a polynomial-time algorithm A that approximates TSP within a contantfactorr . We show that A can be used to decide the Hamiltonian Cycle problem, which is known to beNP-complete in polynomial time. But unless P=NP, this is impossible, leading us to concludeA doesntexist.

    Consider an instance of Hamiltonian cycle problem in graph G. As defined above, the Hamiltoniancycle problem is the problem of finding a cycle in a graph G = (V, E) that contains every node ofGonce and only once (in other words, a tour ofG). The catch is that, unlike in the TSP problem, G isnot guaranteed to be a complete graph. Note that it is trivial to find a Hamiltonian Cycle in a completegraph: just number the nodes in some arbitrary way and go from one node to another in this order.

    Modify G = (V, E) into a complete graphG = (V, E) by adding all the missing edges. For the costfunction, if an edge is in the original graph G (meaning eE), assign to it weight w(e) = 1. If not (i.e.we had to add it to G to make G complete, so e /E), give it weight w(e) =r n + 1.

    11

  • 7/21/2019 Cs 261 Notes

    12/120

    IfG has a Hamiltonian cycle, then this cycle contains the same edges as the minimum cost TSPsolution inG. In this case, the minimum cost tours length is equal to n, and the algorithm A onG will give an approximate solution of at most r n.

    If G does not have a Hamiltonian cycle, then any TSP solution of G must include one of the

    edges we added to G to make it complete. So, an optimal solution must have weight at leastr n+ 1. Clearly, the weight of the cycle produced by A on G must be at least r n+ 1, sincethe approximate solution cannot beat the optimal one.

    Now notice that we can use A on G to find whether G contains a Hamiltonian cycle. IfA gives asolution of at most r n, thenG has a Hamiltonian cycle. If not, then we can conclude G does not havea Hamiltonian cycle. Since constructing G and running A can be done in polynomial time, we have apolynomial-time algorithm which tests for the existence of a Hamiltonian cycle. Since the Hamiltoniancycle problem is known to be NP-complete, this is a contradiction under the assumption P=NP, so suchanA does not exist.

    2.2 TSP with triangle inequality

    One tactic for dealing with hard problems is to use the following rule: if you cant solve a problem,change it.

    Let us consider the following modification to the TSP problem: we require that for any three nodesi, j , and k , the triangle inequalityholds for the edges between them:

    wijwik+ wkj i,j,kV;in other words, it is not faster to go around. It turns out that with this constraint we can find good

    approximations to the TSP.

    First, lets consider the usefulness of solving this modified TSP problem. Consider the drill pressexample described earlier. Upon initial examination, this problem seems to obey the triangle inequalitybecause the drill moves between points on a plane. Suppose we want to move the drill from point Atopoint C. Further suppose there is an obstruction on the straight path fromA to C, so we go throughpointB on the way from A to C(whereB is not collinear with line AC). If the cost of going from A toC through B is equal to wAB+ wBC, then the triangle inequality is obeyed with equality. However, ifthis were a real scenario, the drill might need to spend some time at B changing direction. In this case,moving from A to B to Cwould cost more than the summed cost of moving from A to B and movingfrom B to C, so the triangle inequality would not hold. In the majority of problems for which TSP isused the triangle inequality holds, but it is important to recognize cases in which it does not hold.

    2.2.1 Computational Complexity of TSP with Triangle Inequality

    TheoremTSP with triangle inequality is NP-hard.

    Proof. Suppose we have a TSP problem without triangle inequality. Let us add some fixed valueQ tothe weight of each edge in the graph. If we do so, assuming the graph hasn nodes, the total solution

    12

  • 7/21/2019 Cs 261 Notes

    13/120

    weight will increase by n Q for all solutions. Clearly, this modification cannot change which path theTSP solution takes, since we increase the weights of all paths equally.

    Now let us consider all triples of edges ({i, j},{i, k}, and{k, j}) in the graph that violate the triangleinequality, and let us define

    qikj =wij wik wkjWe choose the largest value ofqikj over all edge triples as the value Q that we will use.

    What happens when we add this value ofQ to the weight of each edge in the graph? Assuming thatwij denotes wij+ Q for an arbitrary edge{i, j}, we have:

    wij =wij+ Q, wik = wik+ Q, w

    kj =wkj + Q.

    Now let us substitute the value ofQ in the sum w ik+ wkj :

    w

    ik+ w

    kj =wik+ Q + wkj + Qwij+ Q= w

    ij .

    We have just demonstrated that adding a certain valueQ to the weights of all edges removes triangleinequality violations in any graph. This transformation can be done in polynomial time since the numberof triangles in a graph is polynomial in the graph size. So, we can reduce a TSP without triangle inequality(the general TSP problem) to a TSP with triangle inequality in polynomial time. Since general TSP isNP-complete, this completes the proof.

    Notice how our proof that constant-factor approximation of TSP is NP-hard does not apply to TSPwith the triangle inequality restriction. This is because the triangle inequality may be violated in thegraphsG constructed by the procedure specified in the proof. Consider, for example,

    wij =wik = 1, and wkj =r n + 1.

    This would occur if edges{i, j} and{i, k} were in G but edge{k, j} was not. Since our graphconstruction procedure does not in general produce graphs that satisfy the triangle inequality, we cannotuse this procedure to reformulate any instance of the Hamiltonian cycle problem as an instance of theTSP approximation problem with triangle inequality.

    In fact, there is no such polynomial time reformulation of the Hamiltonian cycle problem, becausepolynomial time approximation methods for TSP with triangle inequality do exist.

    2.2.2 Approximation Methods for TSP with Triangle Inequality

    Approximation Method within a factor of 2 We present a polynomial time algorithm that com-putes a tour within a factor of 2 of the optimal:

    1. Compute the MST (minimum spanning tree) of the given graph G.

    13

  • 7/21/2019 Cs 261 Notes

    14/120

    4

    1

    2

    3

    5

    76

    Figure 5: Example MST Solution

    4

    1

    2

    3

    5

    76

    Figure 6: Pre-order Walk of MST

    2. Walk on the MST:

    First do a pre-order walk of the MST: Transcribe this walk by writing out the nodes in the orderthey were visited in the pre-order walk. e.g., in the above tree the pre-order walk would be:

    1234356575321;3. Shortcut

    now compute a TSP tour from the walk using shortcuts for nodes that are visited several times;e.g., in the example above, the walk

    123435...will be transformed into

    12345...,because in the subpath

    ...435...

    14

  • 7/21/2019 Cs 261 Notes

    15/120

    4

    1

    2

    3

    5

    76

    Figure 7: TSP tour with shortcuts

    of the original walk, node 3 has already been visited.

    Shortcutting is possible because we only consider complete graphs. Notice that after shortcutting thelength of the walk cannot increase because of the triangle inequality. We can show this by induction: asa base case, if we shortcut a two-edge path by taking a single edge instead then the triangle inequalityassures us that the shortcut cannot be longer than the original path. In the inductive case, consider thefollowing figure in which we want to shortcut m edges:

    a

    b

    e

    m2moreedges

    d

    Figure 8: Triangle inequality induction

    We can see that the length of edge e is less than or equal to the summed lengths of edges a and b,and the result of substituting e for a and b is a case in which d shortcuts a path ofm 1 edges. Byrepeatedly constructing edges of type e, we can reach the base case because our inductive shortcuttingprocedure is only applied to paths of finite length.

    Theorem 2.3. The TSP tour constructed by the algorithm above is within a factor of2 of the optimalTSP tour.

    Proof. The weight of the path constructed in step 2 above is no more than twice the sum ofMST edge weights, since our walk of the tree traverses each edge of the MST at most twice.Also, as argued above, the shortcutting performed in step 3 cannot increase the summed weight.

    15

  • 7/21/2019 Cs 261 Notes

    16/120

    Thus, weight(T SPapprox)2 weight(MST), whereT SPapprox is the approximate TSP solutionproduced by our algorithm.

    If we delete one edge from the optimal TSP tour, we get a tree (all paths are trees) that connects allvertices inG. The sum of edge weights for this tree cannot be smaller than the sum of edge weights

    in the MST, by definition of MST. Thus weight(M ST)weight(T SPOP T), where T SPOP T is aminimum weight tour of the graph.

    Thus, weve shown that weight(T SPapprox)2 weight(T SPOP T).

    Approximation Method Within a Factor of1.5 AEulerian walkon a graph is a walk that includeseach edge of the graph exactly once. Our next algorithm depends on the following elementary theoremin graph theory: every vertex of a connected graph Ghas even degree iffG has a Eulerian walk.

    As an aside, its easy to see that a Eulerian walk can only exist if all nodes of a graph have evendegree: Every time the walk passes through a node it must use two edges (one to enter the node andone to exit). No edges are traversed twice in the walk, so if a node is visited c times it must have degree2c, an even number.

    Using the Eulerian walk theorem, we can get a factor 1 .5 approximation. This approach is called theChristofides algorithm:

    1. Find the MST of the given graphG.

    2. Identify all odd-degree nodes in the MST

    Another elementary theorem in graph theory says that the number of odd-degree nodes in a graphis even. Its easy to see why this is the case: The sum of the degrees of all nodes in a graph istwice the number of edges in the graph, because each edge increases the degree of both its attachednodes by one. Thus, the sum of degrees of all nodes is even. For a sum of integers to be even itmust have an even number of odd terms, so we have an even number of odd-degree nodes.

    3. Do minimum costperfect matching on the odd-degree nodes in the MST

    A matching is a subset of a graphs edges that do not share any nodes as endpoints. A perfectmatching is a matching containing all the nodes in a graph (a graph may have many perfectmatchings). A minimum cost perfect matching is a perfect matching for which the sum of edgeweights is minimum. A minimum cost perfect matching of a graph can be found in polynomialtime.

    4. Add the matching edges to the MST.

    This may produce doubled edges, which are pairs of edges joining the same pair of nodes. Wewill allow doubled edges for now. Observe that in the graph produced at this step, all nodes areof even degree.

    5. Do a Eulerian walk on the graph from the previous step.

    By the Eulerian walk theorem stated earlier, a Eulerian walk exists on this graph. Moreover, weclaim without proof that it can be found in polynomial time.

    16

  • 7/21/2019 Cs 261 Notes

    17/120

    6. Shortcut the Eulerian walk.

    Since the Eulerian walk traverses all nodes in the graph, we can shortcut this walk to produce atour of the graph of total weight less than or equal to that of the Eularian walk. The proof of thisis the same as the one used for shortcutting in the previous triangle inequality TSP approximation

    algorithm.

    A graphical example of the Christofides algorithm is included at the end of this section.

    Theorem 2.4. The Christofides algorithm achieves 1.5 factor approximation.

    Proof. Clearly, the cost of the MST found in step 1 above is at most weight(T SPOP T), for we canconvert T SPOP T into a spanning tree by eliminating any one edge. We now show that the process insteps 2-5 of converting this MST into a TSP solution adds weight no more than .5 weight(T SPOP T).

    Assume we have a solution T SPOP Tto the TSP problem on G. Mark all odd-degree nodes found in

    step 2 of the approximation algorithm in T SPOP T. This is possible becauseT SPOP Tcontains all nodesinG. As has been shown above, there is now an even number of marked nodes.

    Now let us build a cycle through only the marked nodes in T SPOP T. The length of the resultingtour is still at mostweight(T SPOP T), since all we have done is shortcut a minimal tour and the triangleinequality holds.

    Now consider matchings of the marked nodes. We can construct two perfect matchings from the cyclethrough the marked nodes, because the number of marked nodes is even. For example, if we have a cycleas in Figure 5

    6

    1

    2

    3

    4

    5

    Figure 9: Cycle through 6 nodes

    the two matchings will be as in Figure 6

    Since by combining these two matchings one obtains the original marked cycle, the length of the lightestof these matchings is at most .5 weight(T SPOP T).We have just shown that

    mincost matchingsmaller matching0.5 marked cycle0.5 weight(T SPOP T)

    17

  • 7/21/2019 Cs 261 Notes

    18/120

    6

    1

    2

    3

    4

    5

    6

    1

    2

    3

    4

    5

    Figure 10: 2 Matchings from cycle

    Thus, .5 weight(T SPOP T) is the maximum weight that we add to the weight of the MST in thealgorithm to convert it to a tour of the graph. Therefore, the total weight of the solution found by thealgorithm is at most 1.5 weight(T SPOP T), with at most weight(T SPOP T) for the MST and at most.5

    weight(T SPOP T) to convert the MST to a tour.

    Notes There is a huge body of literature on TSP and its variants. One of the best sources for furtherreading is the book by Lawler et. al. [9].

    18

  • 7/21/2019 Cs 261 Notes

    19/120

    The Christofides Algorithm - Example

    4

    1

    2

    3

    5

    76

    Figure 11: Step 1: MST Solution

    1

    4

    3

    5

    76

    Figure 12: Step 2: Find odd-degree Nodes

    19

  • 7/21/2019 Cs 261 Notes

    20/120

    4

    1

    3

    5

    76

    Figure 13: Step 3: Matching

    4

    1

    2

    3

    5

    76

    Figure 14: Step 4: Add matching edges

    20

  • 7/21/2019 Cs 261 Notes

    21/120

    1

    2

    3

    5

    76

    4

    Figure 15: Step 5 and 6: Eulerian walk with Shortcuts

    21

  • 7/21/2019 Cs 261 Notes

    22/120

    3 Matchings, Edge Covers, Node Covers, and Independent Sets

    A graph G = (V, E) consists of the set of vertices (nodes) V and the set of edges E. |S| denotes thecardinality (size) of a set S, and S denotes a set which is optimal in some sense to be defined. We

    consider only unweighted graphs here.

    A matching, M, on a graph G is a subset of Ewhich does not contain any edges with a node incommon. A maximum matching, M, on a graph G is a matching on G with the highest possiblecardinality. A perfect matching consists of edges which cover all the nodes of a graph. A perfectmatching has cardinality n/2, where n =|V|.

    Figure 16: A matching and a maximum (and perfect) matching.

    Matchings have applications in routing (matching between input and output ports) and job assign-ment (pairing workers to tasks). Besides direct applications, matching often arises as a subroutine invarious combinatorial optimization algorithms (e.g. Christophedes TSP approximation).

    Anedge cover,, of a graph G is a subset ofEwhich contains edges covering all nodes ofG. Thatis, for each uV there is a v V such that (u, v). We assume there are no isolated nodes, but ifthere are they can easily be handled separately, so this assumption does not have any cost in practice.Aminimum edge cover,, of a graph G is an edge cover ofG with the smallest possible cardinality.

    Anode cover, S, of a graph G is a subset ofVwhich contains nodes covering all edges ofG. Thatis, for each (u, v) E, u S or v S. A minimum node cover, S, of a graph G is a node coverofGwith the smallest possible cardinality. An application of node cover is in placing equipment whichtests connectivity on a network, where there must be a device on an end of each link.

    Anindependent set,, in a graphG is a subset ofVwhich contains only unconnected nodes. Thatis, for each v , w, (v, w)E. A maximum independent set, , in G is an independent set in Gwith the highest possible cardinality. Independent set algorithms are useful as a subroutine in solvingmore complicated problems and in applications where edges represent conflicts or incompatibilities.

    3.1 Minumum Edge Cover and Maximum Matching

    For any graphG, the minimum edge cover size and the maximum matching size sum to the total numberof nodes.

    Theorem 3.1.|| + |M|= n.

    22

  • 7/21/2019 Cs 261 Notes

    23/120

    Proof. Consider some maximum matching, M. LetSbe the set of nodes that are not covered by M,that is, those nodes which are not endpoints of any edge in M. S is an independent set. To see this,suppose v, wSand (v, w)E. By our construction ofS, v M andw M, so M{(v, w)} is astrictly larger matching than M itself. But this is a contradiction, since we started with a maximum

    matching. Thus, Smust be an independent set, as claimed.Now we can construct a valid edge cover, , by taking the edges of M and adding an additional

    edge to cover each node in S. Clearly this accounts for all nodes, though it may contain more edgesthan needed to do so. But since is a validedge cover, we know it contains at least as many edges astheminimumedge cover, . Thus,

    || ||=|M| + |S|=|M| + (n 2|M|) =n |M|,and therefore

    || + |M| n.

    We proceed by symmetry to obtain the opposite bound. We now start by considering a minimum

    edge cover,

    . A minimum edge cover cannot have chains of more than two edges, or some in the middlewould be redundant. Therefore induces a subgraph ofG that consists of stars (trees of depth one),as in Figure 17.

    Figure 17: A minimum edge cover induces a star graph.

    Since a star with k edges covers k+ 1 nodes, we can conclude that the number of stars is n

    |

    |.

    We can construct a (not necessarily maximum) matching, M, by taking one edge from each star. Bythe same sort of reasoning used in the previous case, we see here that

    |M| |M|= n ||,or,

    || + |M| n.

    Combining the two inequalities gives us the equality we claim.

    23

  • 7/21/2019 Cs 261 Notes

    24/120

    3.2 Maximum Independent Set and Minimum Node Cover

    For any graph G, the maximum independent set size and the minimum node cover size sum to the totalnumber of nodes.

    Theorem 3.2.|| + |S|= n.

    Proof. Consider some minimum node cover, S. Let be the set of vertices ofG not included in thecover. Observe that is an independent set since (v, w)E v S or w S. As in the previousproof, since the maximum independent set is at least as large as this independent set, we can concludethat|| ||= n |S|, and therefore

    || + |S| n.

    Next, consider some maximum independent set. LetSbe all nodes not in. The setSconstitutesa node cover since no edge has both endpoints in , by definition of independence, and so each edge

    has at least one endpoint in the set of nodes not in

    . This node cover is no smaller than the minimumnode cover, so|S| |S|= n ||, and therefore|| + |S| n.

    Again, combining the two inequalities gives us the equality we claim.

    3.3 Minimum Node Cover Approximation

    Our goal is to find a minimum node cover for a graph G. Or, if we could find a maximum independentset, we could simply take the nodes not in it. But these problems are NP-hard, so we will develop an

    approximation algorithm for minimum node cover. In other words, instead of looking forS, we will tryto construct a node cover, S, that is not much larger than S, and prove some bound on how close tooptimal it is.

    Consider this greedy approach:

    1. LetV =.

    2. Take the nodev of highest degree, and let V =V{v}.

    3. Delete v from the graph (along with any edges (v, w)).

    4. If there are no edges left, stop, else go back to step 2.

    Clearly, the resulting setV constitutes a node cover. We are going to prove later that this approachguarantees log n approximation, i.e., the node cover produced is no more than a factor of log n largerthan the minimum one. Moreover, it is possible to show that there are cases where this approach indeedproduces a cover log n times optimal.

    We can do much better. Consider the following algorithm:

    1. Compute maximum matching forG.

    24

  • 7/21/2019 Cs 261 Notes

    25/120

    2. For each edge (u, v) in the maximum matching, add u and v to the node cover.

    This is a node cover because any edge not in the maximum matching shares an endpoint with someedge in the maximum matching (or else this edge could have been added to create a larger matching).

    For an illustration, see Figure 18.

    Figure 18: Converting a maximum matching into a node cover. Matching represented by solid lines.

    Two marked nodes are not covered by node cover, and thus can be added to the matching, proving thatit was not maximum.

    Theorem 3.3. A node cover for a graphG produced by the above algorithm is no more than twice aslarge as a minimum node cover ofG.

    Proof. Assume|M|>|S|. Then somevS must touch two edges inM, but this is a contradiction,by construction of M. Thus,|M| |S|. Therefore, our node cover (constructed in the abovealgorithm, which finds a node cover of size 2|M|) is within a factor of 2 of the minimum node cover.

    Although one can find a maximum matching in polynomial time, the algorithm is far from trivial. Wecan greatly simplify the above algorithm by using maximal instead ofmaximummatching. Amaximalmatching is a matching that cannot be augmented without first deleting some edges from it. Here isan algorithm for computing a maximal matching:

    1. Let M=.

    2. Pick an edge (v, w)E.3. Add (v, w) to M, and delete nodes v and w and all edges adjacent to them from the graph.

    25

  • 7/21/2019 Cs 261 Notes

    26/120

    4. If there are remaining edges, go back to step 2.

    The set M constitutes a matching because no two edges in the set share an endpoint: once (v, w)is added to M, v and w are deleted, so no other edges with these endpoints will be considered. It is

    maximal (meaning we cant add edges to it and still have a matching) because it continues until noremaining edges can be added. Observe that, by construction, every edge in the graph touches at leastone of the nodes that are touched by the maximal matching. (If not, then this edge can be added to thematching, contradicting its maximality.) Thus, by taking both endpoints of the edges in the maximalmatching we get a node cover. Moreover, we have | M| |M| |S|, and so this much simpler algorithmnot only still comes within a factor of 2 of the minimum node cover size, but also never yields a worseapproximation (in fact, usually better).

    26

  • 7/21/2019 Cs 261 Notes

    27/120

    4 Intro to Linear Programming

    4.1 Overview

    LP problems come in different forms. We will start with the following form:

    Ax = b

    x 0min cx

    Here, x and b are column vectors, and c is a row vector. A feasible solution of this problem is avector x which satisfies the constraints represented by the equality Ax = b and the inequalityx0. Anoptimalsolution is a feasible solution x that minimizes the value ofcx.

    It is important to note that not all LP problems have an optimal solution. Sometimes, an LP canhave no solution, or the solution can be unbounded i.e. for all , there exists a feasible solution x suchthatcx. In the first case we say the problem is over-constrained and in the second case that it isunder-constrained.

    4.2 Transformations

    The LP that we would like to solve might be given to us in many different forms, with some of theconstraints being equalities, some inequalities, some of the variables unrestricted, some restricted tobe non-negative, etc. There are several elementary transformations that allow us to rewrite any LPformulation into an equivalent one.

    Maximization vs minimization: Maximizing linear objective cx is equivalent to minimizing objec-tivecx.

    Equality constraints to inequality constraints: Suppose we are given an LP with equality con-straints and would like to transform it into an equivalent LP with only inequality constraints. This isdone by replacing Ax = b constraints by Ax b andAx b constraints (doubling the number ofconstraints.)

    Inequality constraints to equality constraints: To translate an inequality constraint aix

    bi

    (where ai is ith row of matrix A and bi is ith coordinate of b) into an equality constraint, we canintroduce a new slack variable si and replace aixbi with

    aix + si = bi

    si 0

    It is noted that as the number of slack variables introduced equals the number of the original inequal-ities, the cost of this transformation can be high (especially in cases where there exist many inequalitiesrelative to the original number of variables).

    27

  • 7/21/2019 Cs 261 Notes

    28/120

    s

    c

    (a) Three Inequalities

    c

    (b) Three Equalities

    Figure 19: A simple LP with three constraints

    The alternative approach where all inequalities are directly transformed to equalities, which is equiv-alent to obtaining a solution with all slack variables equal to zero is incorrect, since there can exist anLP for which none of its optimum solutions satisfies all of its inequality constraints as equalities.

    Consider for example the simple LP of figure 19. While it is straightforward to detect an optimum

    solution if the three constraints are inequalities, there exists no feasible solution if the three inequalitiesbecome equalities, since such a feasible solution would have to lie on the intersection of all three lines.

    From unrestricted to non-negative variables: Given an LP with no non-negativity constraintsx 0, we would like to translate it into an LP that has only non-negative variable assignments in itssolutions. To do so, we will introduce two variables x+i andx

    i for every original variablexi, and require

    xi = x+i xi

    x+i 0xi 0

    We observe that multiple solutions of the resulting LP may correspond to a single solution of theoriginal LP. Moreover, this transformation doubles the number of variables in the LP.

    Example: Using the above described transformations, we can transform

    Ax bmax cx

    into the form

    Ax = b

    x 0

    min c

    x

    Assuming the dimension ofA is m n, the result is as follows:

    x =

    x+x

    s

    A = [A; A; I]b = b

    c = [c; c; 0m]

    28

  • 7/21/2019 Cs 261 Notes

    29/120

    Where 0m represents a vector ofm zeros.

    4.3 Geometric interpretation of LP

    If we consider a LP problem with an inequality constraint Ax b, the set of feasible solutions can beseen as the intersection of half planes. Each constraint cuts the space into two halves and states thatone of these halves is outside the feasible region. This can lead to various kinds of feasible spaces, asshown in figure 20 for a two-dimensional problem.

    D3

    D2

    D!

    (a) An empty feasible space. (b) A bounded feasi-ble space.

    (c) An unbounded feasi-ble space.

    Figure 20: Feasible space for inequality constraints.

    The feasible set is convex, ie. whenever the feasible space contains two points, it must contain allthe segment joining these two points. This can be shown as follows: If x and y both belong to thefeasible set, then Axb and Ayb must hold. Any point z on the segment [xy] can be expressed asx+ (1 )y for a given such that 0 1. Thus,Az = Ax+ (1 )Ay b+ (1 )b = b.Hence,z is a feasible solution.

    If we consider a LP problem with an equality constraint Ax = b, the feasible set is still a convexportion of space, of a lower dimension. This is the case for example with the space defined by theequalityx + y+ z = 1 in the 3-dimensional plane, as shown in figure 21, where the solution space is the2-dimensional triangular surface depicted.

    x

    zy

    Figure 21: Feasible space for an equality constraint.

    29

  • 7/21/2019 Cs 261 Notes

    30/120

    4.4 Existence of optimum at a vertex of the polytope

    Our intuition is that an optimal solution can always be found at a vertex of the polytope of feasiblesolutions. In the 2-dimensional plane, the equationcx = k defines a line that is orthogonal to the vector

    c. If we increasek , we get another line parallel to the first one. Intuitively, we can keep on increasing kuntil we reach a vertex. Then, we can no longer increase k, otherwise we are going out of the feasibleset (see figure 22.)

    C

    Cx = k

    VERTEX

    Figure 22: Location of the Optimum.

    We can also observe that if we were given c that is orthogonal to an edge of the feasible set in theexample, any point of that edge would have been an optimal solution. So our intuition is that, in general,optimal solutions are vertices but in certain circumstances, optimal solutions can also exist outside ofvertices (e.g. at the faces of the polytope.) The observation is that, even if every optimal solution is notnecessarily a vertex, there existsa vertex that is an optimal solution of the LP problem. Now we willtry to prove this formally. First, we need a formal definition of a vertex.

    Definition 4.1. LetP be the set of feasible points defined byP =

    {x

    |Ax = b, x

    0

    }. A pointx is a

    vertex of the polytope iffy= 0 such that(x + yP) (x yP).

    The following theorem implies that there is always an optimal solution at a vertex of the polytope.

    Theorem 4.2. Consider LP min{cx|Ax = b, x 0} and the associated feasible region P ={x|Ax =b, x0}. If minimum exists, then given any pointx that is not a vertex ofP, there exists a vertexxofPsuch thatcx cx.

    Proof: The proof works by moving us around the polytope and showing that moving towards an optimalsolution moves us towards a vertex. More formally, the proof constructs a sequence of points beginningwith x, such that the value of the objective function is non-increasing along this sequence and the lastpoint in this sequence is a vertex of the polytope.

    Consider a point x in the polytope such that x is not a vertex. This implies thaty = 0 suchthat (x+ y P) (x y P). We will travel in the polytope along this y. By the definition ofP,A(x + y) =b = A(x y). Thus, Ay = 0. Also, assume cy0. Otherwise we can replace y byy.

    We will try to travel as far as possible in the direction ofy. To see that this does not increase thevalue of the objective function, consider the points x + y,0. Substituting, we get:

    c(x + y) =cx + cycx.

    30

  • 7/21/2019 Cs 261 Notes

    31/120

    To check that the new point is still feasible, we need to verify first the equality constraint:

    A(x + y) =Ax + Ay= Ax = b

    So any point of the form x+y satisfies the first constraint. However, we must be more careful when

    choosing a so that we do not violate the non-negativity ofx. Letxj denote thej th coordinate ofx andyj the jth coordinate ofy. Notice thati, xi = 0 implies that xi+yi =yi 0 and xi yi =yi0.Hencexi = 0yi = 0.

    There are two cases to consider:

    1.yj :yj

    xjyj we getxj + yj 0 since x + y is feasible.

    Also, necessarilyk: xk + yk = 0 butxk= 0. Intuitively,k corresponds to the direction in whichwe moved until we reached a border.

    Finally, noticei, xi = 0yi = 0xi+ yi = 0.Hence the new pointx + y must have at least one more zero coordinate than x and is still insidethe feasible set.

    2.j :yj0.This implies that 0 :x +y0. We dont have to worry about non-negativity being violated.We can move in y s direction as far as we want. There are two further cases.

    (a) cy

  • 7/21/2019 Cs 261 Notes

    32/120

    4.5 Primal and Dual

    Consider the following LP:

    Ax bx 0

    maximize cx

    We will call it primal LP. The dual LP is composed by the following syntactic transformation. 1

    The dual LP is described by a new vector y of variables and a new set of constraints. Each new variablecorresponds to a distinct constraint in the primal, and each new constraint corresponds to a distinctvariable in the primal.

    yt

    A cy 0

    minimizeytb

    Note that the variables and constraints in the dual LP may not have any useful or intuitive meaning,though they often do.

    Calling the maximization problem the primal LP turns out to be rather arbitrary. In practice, theLP formulation of the original problem is called the primal LP regardless of whether it maximizes orminimizes. We follow this practice for the rest of this presentation.

    We say that x is feasibleif it satisfies the constraints of the LP. In our case, this means that Axband x 0. Feasible y can be defined in a similar way. Given any feasiblex and y to the primal anddual LPs respectively, we can easily derive following inequality.

    cx(ytA)x= yt(Ax)ytb

    The inequality cx ytb is called weak duality theorem. Weak duality is useful in order to provequality of a particular feasible solution: Given any maximization LP, if we can somehow compute afeasible solution to the dual LP, we can establish an upper bound on the primal LP solution. Similarly,given any minimization LP, if we can somehow compute a solution to the dual LP, we can establish alower bound on the primal LP.

    The weak duality inequality applies to any feasible primal and dual solutions x, y. If we consider

    optimum primal and dual solutionsx

    and y

    , one can prove the strong duality theoremthat sayscx = (y)tb

    Combining these two theorems, we have that

    cxcx = (y)tbytb1It is important to note that the following dual corresponds to the primal LP form above. Changing to a different form,

    e.g. minimization instead of maximization, results in a different dual.

    32

  • 7/21/2019 Cs 261 Notes

    33/120

    x1

    x2

    x*

    x2i

    x2i

    a4

    a5

    a3

    a2

    a1

    c

    Figure 23: An example showing that how the polygonal structure can be obtained from a series ofinequalities

    With this property, for any feasible x and y, the optimum solution cx and (y)tblie betweencx andytb. This observation is very useful for many approximation algorithms. We will come back to strongduality later in the course. The approximation algorithms presented below use weak duality claim only.

    Note that the above discussion is incomplete. In particular, it does not address cases where primalor dual are infeasible or where the optimum value is infinite.

    4.6 Geometric view of Linear Programming duality in two dimensions

    To better understand LP duality, we will consider two-dimensional case. In two-dimensional case LPcan be re-written as

    x= (x1, x2), x10, x20a11x1+ a12x2b1a21x1+ a22x2b2

    ...an1x1+ an2x2bn

    maximizec1x1+ c2x2

    Here,A is an n 2 matrix and b is an n-tuple.For each inequality we can draw a line in an x1 - x2 coordinate system, and finally we will get some

    convex polygonal structure like fig. (23). Suppose we pick up arbitrary intercept (0, x2i) and draw aline which is orthogonal to the vector c = (c1, c2) going through the point. If the line meets the polygon,x= (x1, x2) is feasible wherex1and x2are on the line and inside the polygon. Then, we can increasex2iand move the line up while it meets the polygon. Whenx2i is maximized,c1x1 + c2x2is also maximized.In this case the line will hit one vertex (or an edge when the edge is parallel to the line).

    33

  • 7/21/2019 Cs 261 Notes

    34/120

    x1

    x2

    a5

    a3

    c

    a3y3*+a5y5

    *

    x1

    x2

    a5

    a3

    c

    Figure 24: Pipe and Ball example

    34

  • 7/21/2019 Cs 261 Notes

    35/120

    Now, suppose that there is a pipe which is parallel with the line inside the polygon, and a ball ismoving inside the area which is restricted by the polygon and the pipe as in fig. (24). If we move thepipe up, at a certain point the ball is stuck in the corner and cannot move any more. At this point,cx ismaximized. Suppose that the corner is made by lines orthogonal to a3= (a31, a32) and a5 = (a51, a52).

    The ball is subject to three forces which are from the pipe and two edge of the polygon. These forcesachieve an equilibrium, so the sum of three forces upon the ball is zero.

    c a3y3 a5y5 = 0c = a3y

    3+ a5y

    5

    Letybe the vector with all 0 except y3, y5

    . Observe that y defined this way is indeed a feasibledual solution. Moreover, we have c = (y)tA, and y 0.

    Using this observation, we will prove that y is not only feasible but optimum.

    cx = ((y)tA)x

    = (y)t(Ax)

    = y3 (a3x) + y5 (a5x

    )

    = y3 b3+ y5 b5

    = (y)tb

    Weak duality claims that cx y tb for optimum primal x and for any feasible y. Since we provedthatcx = (y)tband oury is feasible, we have proved that oury indeed minimizesytb. This argumentcan be viewed as an informal proof of the strong duality theorem in the 2-dimensional case.

    4.7 Historical comments

    Linear Programming (or LP) was first introduced in the 1940s for military applications. The termprogramminghere is different from the interpretation that we normally use today. It refers to a concep-tually principled approach, as opposed to software development. The initial algorithm, called Simplex,was shown require exponential time in the worst case.

    In the 1970s, the first provably polynomial time algorithm for solving linear programs was invented:it was called the ellipsoid algorithm. But this algorithm actually had huge constant factors and wasmuch slower than existing implementations of Simplex. This caused bad publicity for polynomial time

    algorithms in general.

    In the 1980s a better polynomial time algorithm called the Interior Point Method was developed.Nowadays, commercial packages (e.g. CPLEX) implement both Simplex and interior point methods andleave the decision of which method to use to the user. The quality of these packages is such, that theyare even able to detect solutions to Integer Linear Programs (ILPs) within acceptable time margins,even though they do behave badly under some conditions.

    Simplex algorithm is illustrated in Figure 25. Roughly speaking, the algorithm starts by transformingthe problem into an equivalent one with an obvious (non-optimum) vertex, and iteratively moves from

    35

  • 7/21/2019 Cs 261 Notes

    36/120

    x1

    x2

    Start from here

    Move along the line

    where cx increases

    x*

    There is no linewhere cx increases

    Figure 25: Simplex method example

    one vertex to another, always trying to improve the value of the objective. It is important to note thatthe above description is quite incomplete. For example, sometimes it is not possible to improve objectivevalue at every step and the algorithm has to be able to deal with this.

    36

  • 7/21/2019 Cs 261 Notes

    37/120

    5 Approximating Weighted Node Cover

    5.1 Overview and definitions

    Given any graphG = (V, E) with non-negative weightswi, 1in associated with each node, a nodecover is defined as a set of nodes SVsuch that for every edge (ij)V, eitheriSorjS, or bothof them are in S (i.e.{i, j} S=). The minimum cardinality node cover problem is to find a coverthat consists of the minimum possible number of nodes. We presented a 2-approximation algorithm forthis problem.

    In the weighted node cover, each node is associated with a weight (cost) and the goal is to computea cover of minimum possible weight. The weight of a node cover is the sum of weights of all the chosennodes.

    Since min-cardinality node cover is a special case, weighted node cover is NP-hard as well. We willpresent two approaches to this problem, both based on the ideas of Linear Programming (LP).

    5.2 Min Weight Node Cover as an Integer Program

    Our first step is to write the Minimum Node Cover problem as an Integer Program (IP), that is a linearprogram where some of the variables are restricted to integral values. In our IP, there is a variable xicorresponding to each node i. xi = 1 if the vertex i is chosen in the node cover and xi = 0 otherwise.The formulation is as follows:

    (ij)E: xi+ xj 1

    i

    V :xi

    {0, 1

    }minimize iV

    xiwi

    It is easy to see that there is a one-to-one correspondence between node covers and feasible solutionsto the above problem. Thus, the optimum (minimum) solution corresponds to a minimum node cover.Unfortunately, since node cover is an NP hard problem, we cannot hope to find a polynomial timealgorithm to the above IP.

    5.3 Relaxing the Linear Program

    General LPs with no integrality constraints are solvable in polynomial time. Our first approximationwill come from relaxing the original IP, allowing the xi to take on non-integer values2:

    2There is no need to require xi 1; an optimal solution will have this property anyway. A quick proof: Consider anoptimal solution x with some xi > 1. Construct another set x

    , identical to x except with xi = 1. Clearly x still

    satisfies the problem constraints, but we have reduced a supposedly optimal answer. Thus, no optimal solution will haveany xi >1.

    37

  • 7/21/2019 Cs 261 Notes

    38/120

    (ij)E: xi+ xj 1iV :xi 0

    minimize iV

    xiwi

    Suppose we have a solution x to the relaxed LP. Somexi may have non-integer values, so we mustdo some adaptation before we can use x as a solution to the Minimum Weighted Node Cover problem.

    The immediate guess is to userounding. That is, we would create a solutionx by rounding all valuesxi to the nearest integer:iV :xi = min(2xi , 1).

    We should prove first that this is a solution to the problem and second that it is a good solution tothe problem.

    Feasibility Pro of The optimum fractional solutionx is feasible with respect to the constraints ofthe relaxed LP. We constructed x fromx by rounding all valuesxi to the nearest integer. So we mustshow that rounding these values did not cause the two feasibility constraints to be violated.

    In the relaxed LP solution x, we have(ij)V .xi +xj 1. In order for this to be true, at leastone ofxi or x

    j must be1/2 and must therefore round up to 1. Thus, in our adapted solutionx, at

    least one ofx i orxj must be 1, so(ij)V.xi+ xj1. Thus, x satisfies the first constraint.

    Clearly, sinceiV.xi 0, and xi is obtained by roundingxi to the nearest integer,iV.xi0.x satisfies the second constraint also and is thus feasible.

    Relationship to the Optimum Solution We proved thatx is feasible, but how good is it? We can

    prove that it is within a factor of two of the optimum.

    ConsiderOP Tint, the value of the optimum solution to the Minimum Weighted Node Cover problem(and to the associated IP), and OP Tfrac, the value of the optimum solution to our relaxed LP. Sinceany feasible solution to the IP is feasible for the relaxed LP as well, we have OP TfracOP Tint.

    Now, the weight of our solution x is:

    ni=1

    xiwi =n

    i=1

    min(2xi , 1)win

    i=1

    2xi wi = 2OP Tfrac2OP Tint

    Thus, we have a approximation algorithm within a factor of 2.

    5.4 Primal/Dual Approach

    The above approach is simple, but the disadvantage is that it requires us to (optimally) solve a linearprogram. Although, theoretically, solving LPs can be done in polynomial time, it takes quite a lot oftime in practice. Thus, it is advantageous to try to develop an algorithm that is faster and does notrequire us to solve an LP.

    38

  • 7/21/2019 Cs 261 Notes

    39/120

    The variables xi in the primal correspond to nodes, so the dual will have variables lij correspondingto the edges. The primal is a minimization problem, so the dual will be a maximization problem. Thedual formulation is as follows:

    iV :

    j:(ij)E

    lij wi

    (ij)E :lij 0maximize

    (ij)E

    lij

    Lets look at the primal and dual LP we have compared to the standard form. A is an nmmatrix where n is the number of nodes and m is the number of edges, and the rows and columns ofA correspond to nodes and edges respectively. b is the vector of weights, that is (w1, w2, . . . , wn)

    t, andc is (1, 1, . . . , 1). We cannot say what lij is exactly because its the result of syntactic transformation.

    However, we can guess that it is a certain value for each edge.

    We will now construct an algorithm that tries to find both primal and dual approximate solutionsat the same time. The intuition is as follows: We start from a feasible dual solution and infeasibleprimal. The algorithm iteratively updates the dual (keeping it feasible) and the primal (trying to makeit more feasible), while always keeping a small distance between the current values of the dual andthe primal. In the end, we get a feasible dual and a feasible primal that are close to each other in value.We conclude that the primal that we get at the end of the algorithm is close to optimum.

    1. Initially set all lij to 0 and S=(ori: xi = 0). Unfreeze all edges.2. Uniformly increment all unfrozen lij until for some i we hit the dual constraint

    j:(ij)Elij wi.3

    We call node i saturated.

    3. Freeze edges adjacent to the newly saturated node i.

    4. S= S {i} (xi = 1)5. While there are still unfrozen edges, go back to step 2

    6. OutputS.

    Analysis First we claim that the algorithm computes a feasible primal, i.e. the setSoutput by thealgorithm is indeed a node cover. This is due to the fact that, by construction, the algorithm continuesuntil all edges are frozen. But an edge is frozen only when at least one of its endpoints is added to S.The claim follows.

    Now we claim that this algorithm achieves a factor 2 approximation. To see this, observe that, byconstruction:

    iS

    wi =iS

    j:(ij)E

    lij

    3In the real life implementation of this algorithm, we can merely figure out which lij is closest to its respective wiand choose it immediately in step 2, instead of going through a series of small increments.

    39

  • 7/21/2019 Cs 261 Notes

    40/120

    For each edge (ij)E, the termlij can occur at most twice in the expression on the right side, oncefori and once forj . So,

    iS j:(ij)Elij2 (ij)ElijAccording to the weak duality theorem, we know that any feasible solution to the dual (packing) LP

    here will be less than the optimum solution of the primal (covering) LP. That is,

    (ij)E

    lijOP Tfrac

    whereOP Tfrac is value of the optimum (fractional) solution to the LP. Let OP Tint be the value ofthe optimum (integral) solution to the IP. Because OP TfracOP Tint, we say

    iS

    wi =iS

    j:(ij)E

    lij2

    (ij)E

    lij2OP Tfrac2OP Tint

    Thus, this algorithm produces a factor 2 approximation for the minimum node cover problem.

    5.5 Summary

    These notes describe a method that is commonly used to solve LP problems. In general, the strategyinvolves:

    - Start with a feasible solution to the dual.

    - Tweak the dual to get it closer to the optimum solution, while making related changes to the primal.

    - Stop when the primal becomes feasible.

    Notes Further reading: Chapter 3 in [7].

    40

  • 7/21/2019 Cs 261 Notes

    41/120

    6 Approximating Set Cover

    6.1 Solving Minimum Set Cover

    Set cover is an extremely useful problem of core importance to the study of approximation algorithms.The study of this problem led to the development of fundamental design techniques for the entire field,and due to its generality, it is applicable to a wide variety of problems. In much the same manner asthe approximation technique used in the Minimum Node Cover problem, we will look at the primal anddual LP formulation of the minimum set cover problem and derive an approximation algorithm fromthere. Using the weak duality theorem, we will prove that our approximation is within a factor of ln(n),wherenis the cardinality of the set we are covering. Somewhat surprisingly, this is asymptotically thebest algorithm that we can get for this problem.

    Motivation To motivate this problem, consider a city with several locations where hospitals could bebuilt. We want any given inhabitant to be within, say, ten minutes of a hospital, so that we can deal

    with any emergency effectively. For each location where we could put a hospital, we can consider allinhabitants within ten minutes range of this location. The problem would then be to choose locationsat which to build hospitals minimizing the total number of hospitals built subject to the constraint thatevery inhabitant is within ten minutes of (covered by) at least one hospital. Actually, this example isan example of a more specific sub-problem of general set cover, for which there exist better algorithmsthan in the general case, because it possesses a metric (the distances between customers and hospitalssatisfy the triangle inequality). Nevertheless, it provides some motivation for a solution to this problem.

    The Minimum Set Cover problem Let S1, S2, . . . S m be subsets ofV ={1, . . . , n}. The set coverproblem is the following: Choose the minimum number of subsets such that they cover V. More formally,we are required to choose I {1, . . . , m}such that|I| is minimum and

    jI

    Sj =V.

    In the more general problem, each subset can be associated with some weight, and we may seek tominimize the total weight across I instead of the cardinality of I. We can also view this as an edgecover problem in a hypergraph, where each hyperedge can connect any number of nodes (a hyperedgejust corresponds to someSiV).

    Solving Minimum Set Cover as a Linear Program As before, we begin by trying to write anLP. Let xj , a 01 variable, indicate whether the jth subset is chosen into the candidate solution (1corresponds to chosen). Throughout the rest of this section, we will use the index i to denote elements

    and the index j to denote sets. The LP includes the above constraints on xj and the following:

    i {1, . . . , n}

    j:iSj

    xj 1

    minimizem

    j=1

    xj

    41

  • 7/21/2019 Cs 261 Notes

    42/120

    The above constraint states: For every elementi in our set, we want to sum the indicator variablesxj corresponding to the subsets Sj that contain i. Constraining this sum to be greater than or equal toone guarantees that at least one of the chosen subsets contains this element i.

    Observe that the vertex cover problem is a special case of the set cover problem. There, the sets

    correspond to vertices and the elements to edges. The set corresponding to a vertex contains the edgesincident on that vertex. Every edge is incident on two vertices. In the corresponding set cover instance,every element is contained in two sets.

    We could try to solve the set cover problem in the same way as we solved the vertex cover problem -begin by relaxing the integral constraint in order to obtain a general LP, then utilize a similar roundingprocedure. This time, however, there may be multiple sets containing any element i. Specifically, what ifS1, S2,andS3are the only sets that contain the element 17, then one of our inequalities isx1+x2+x31.What if our solver says that x1= x2= x3= 1/3?

    We can no longer apply the same rounding procedure used in the vertex cover problem since oursolution may not be a feasible one afterwards. This is shown by taking the above example: usingthe previous rounding procedure, we would end up setting x1 = x2 = x3 = 0 causing the constraint,

    x1+ x2+ x31, to be violated. We can however, use a 1/krounding scheme, where k is the maximumnumber of sets that any node appears in - that is, multiply all variables by k and truncate the fractionalpart. This is good ifk is very small (e.g., ifk = 2, this is just the vertex cover problem). Unfortunately,in generalk could be an appreciable fraction ofm, in which case this results in a factormapproximation,which is trivial to obtain by simply picking all m sets.

    A Greedy (and Better) Algorithm As before, we take the LP and formulate the dual LP. We nowhave a dual variable corresponding to each constraint in the primal. Let yi be a variable correspondingto the constraint for element iV.

    j {1, . . . , m}: iSj

    yi 1

    i {1, . . . , n}yi 0maximize

    I

    yi

    Consider the following greedy algorithm:

    1. LetV =V, and I=2. Find Sj with the largest

    |Sj

    V

    |3. SetI=I {j}andV =V Sj4. While V =go back to step 25. Output I.

    We will now construct a lower bound for the optimal as we proceed through the algorithm. We willprove the following lemma:

    42

  • 7/21/2019 Cs 261 Notes

    43/120

    Lemma LetIbe the set produced by the greedy algorithm. Theny0 such thatn

    i=1

    yi =|I| and j {1, . . . , m}iSj

    yiH(|Sj |),

    We use H(x) = 1 + 12

    + . . . + 1x to denote the harmonic function. Since|Sj | n, H(|Sj|)H(n) =(log n).

    First assume that the lemma is true (we will prove it later). Observer that by dividing each variableyi byH(n), we get a feasible solution to the dual LP, since

    j {1, . . . , m}iSj

    yiH(n)

    1,

    Using the weak duality theorem, we get

    OP Tinteger OP Tfractional feasible dual value=

    yi

    H(n)

    = |I|

    H(n)

    = (our greedy solution value)/H(n)

    Thus, the lemma implies that we have a O(log n) approximation for our problem.

    Proof of the Lemma In the algorithm given above, lets consider the step that choosesSj . Assign:

    iSj V :yi = 1|Sj V|

    whereV corresponds to its value during that iteration (since V changes with every iteration). Wewill show that this assignment ofy values satisfies the conditions of the lemma.

    Choose a set and look at the elements that are not yet covered. Due to the way we assign the valueswe end up satisfying the first property above. In each iteration, since to each node we are coveringwe assign weight equal to the inverse of the number of additional elements we are covering, the totalincrease in yi is 1; the increase in|I| is also 1. That is, we are increasing by one; we divide this 1up equally amongst all new elements that we are covering. The main difficulty is to show that we cansatisfy the second property as well.

    Consider some setSafter we have already chosen one other set. Some of the elements belonging toSmight have been part of the first subset chosen, but these have already been covered. Denote the setof elements ofScovered by the first chosen subset by Q1. Similarly, we have a setQ2 and so on. DefineQ0 as the empty set, since no elements have been chosen before the algorithm starts.

    Observe thatS= Q1 Q2 Q3 . . .Qk, wherek is the number of the iteration at which all elementsofS were first covered (i.e., when the algorithm terminates).

    43

  • 7/21/2019 Cs 261 Notes

    44/120

    Denote the set that was chosen by the algorithm at iteration j as Sj and letVj be the set of elements

    that were not yet covered when Sj was chosen (i.e., the value ofV at the beginning of the iteration).

    This set covers Qj = SSjVj elements in S. Following elements of S are not yet covered at thebeginning of this iteration:

    S Vj

    =Qj Qj+1 . . . Qk (1)Thus, there are|S| j1i=1|Qi| such elements.

    Now, here is the crucial observation which is the key to this analysis: the greedy algorithm chose Sjinstead ofSat this iteration because Sj is locally optimal, which means that it covers at least as manyelements not covered by the sets chosen thus far by the greedy algorithm as Sdoes; i.e. at the beginningof the iteration we have:

    |Sj Vj | |S Vj |. (2)

    During this iteration we increment some of the coordinates of the y vector corresponding to theelements of S that are covered during this iteration (Qj ). Each increment is by one over the totalnumber of elements covered during this iteration, which is|Sj Vj|. Thus, equation 2 implies that thetotal increase in yi, iScan be bounded by:

    |Qcurrent||Sj Vj |

    |Qcurrent||S Vj |

    Summing up all the increments and using equation 1, we can see that, at the end of the algorithm,we get:

    iS

    yik

    l=1

    |Ql||S

    l1q=0 Qq |

    At the first iteration, V1 =V andS V1 =Swhich satisfies the above. At step 2, the intersectionset is S Q1 and so on. Lets look at the worst case scenario; since all Qi should sum up to S, we getthe worst case when all Qi are of size 1. That gives us the value

    |S|1

    k=0

    1

    (|S| k) =H(|S|)

    and we have a ln(n) approximation.

    To give a little intuition to this last proof, suppose that we were looking at the sets S1, S2, and S3which happened to be the first, second, and third sets to be chosen by the algorithm, respectively. Forthis example well focus the analysis on setS3. Suppose that S1 has 10 elements, of which two overlapwith S2 and four overlap with S3. We give those four elements in S3 the values 1/10. Suppose S2 has8 elements (this cant be more than 10, otherwise it would have been chosen first). Two of its elementswere already covered by the first round, and of the remaining six elements, one is in S3 as well. Thiscommon element gets the value 1/6. Finally,S3gets chosen, and suppose that it has 9 elements, of whichfive were assigned to in previous rounds of the algorithm (Notice that because S2 was chosen before S3,the number of elements in S3 is no more than 4 + 6). The remaining four elements each get value 1/4.So the sum of the yi in S3 is

    1/10 + 1/10 + 1/10 + 1/10 + 1/6 + 1/4 + 1/4 + 1/4 + 1/4

    44

  • 7/21/2019 Cs 261 Notes

    45/120

    But in the first round|S3 V|= 9, in the second round|S3 V|= 5, and in the final round|S3 V|= 4thus the sum is bounded by:

    1/9 + 1/9 + 1/9 + 1/9 + 1/5 + 1/4 + 1/4 + 1/4 + 1/4

    1/9 + 1/8 + 1/7 + 1/6 + 1/5 + 1/4 + 1/3 + 1/2 + 1/1which is justH(|S3|), the desired result.

    It is easy to check whether this algorithm is useful for a given instance of the problem. We simplyneed to checkH(n) for that instance and see if the approximation is acceptable to us. For example, if allsubsets are order of 20, we get H(20) which is almost 3. If the problem has additional structure like ametric (as in the hospital example), we can do better than this; but in the general case, this is basicallythe best algorithm that can be hoped for.

    Notes For further reading see Chapter 3 in [7], Chapters 2,13,14 in [15].

    45

  • 7/21/2019 Cs 261 Notes

    46/120

    7 Randomized Algorithms

    7.1 Maximum Weight Crossing Edge Set

    Randomization is a powerful and simple technique that can be use to construct simple algorithms tosolve seemingly complex problems. These algorithms perform well in the expected sense, but we needto use inequalities from statistics (e.g. Markov inequality, Chernoff bound etc.) to study the deviationsof the results produced by these algorithms from the mean and to impose bounds on performance. Wewill illustrate the application of this technique with a simple example.

    Consider a connected undirected graph G(V, E), with associated weight function w : E R+.Consider any cut of the graph, say (S, V S), and letEc denote the corresponding set of crossing edgesinduced by this cut. The cumulative weight of the crossing edge set is given by

    Wc=

    (i,j)Ec

    w(i, j). (3)

    The objective is to find a cut such that Wc is maximized.

    Now consider the following randomized algorithm: For each vertex v V, toss an unbiased coin,and assign v to set S if the outcome is a head, else assign v to set V S. This producesasolution tothe problem. The coin tosses are fair and independent, thus each edge can belong to the set of crossingedgesEc with probability 0.5. Thus, the expected cumulative weight produced by the algorithm is halfthe total cumulative weight of all edges ofG. Formally,

    E

    (i,j)Ec

    w(i, j)

    = 1

    2

    (i,j)E

    w(i, j). (4)

    Since the expected weight is half the total weight of all edges, there exists at least one solution withweight equal to (or more than) half the total weight of all edges. This follows because the mean of arandom variable can be at most equal to its maximum value. Now, it is obvious that even the optimumweightWc cannot exceed the sum of weights of all edges, i.e.,

    Wc

    (i,j)E

    w(i, j). (5)

    Thus, the presented approach produces 2x approximate solution to our problemin expectation.

    How often will this simple randomized algorithm produce a good solution? In other words, can weconvert this approach to an algorithm with some reasonable guarantees on performance ? First, considera situation when all edges in the graph, except one, have zero weight. Since the randomized algorithmwill pick the edge with non-zero weight with probability 0.5, we will obtain a meaningful solution onlyhalf the time, on an average.

    One possible bound on the deviation of a random variable from its mean can be obtained usingMarkovs inequality. Thus, ifYis a random variable with E(Y) =,

    P(Y) 1

    . (6)

    The disadvantage of Markovs inequality is that it produces very loose bounds. The advantage is that weneed to know only the first order statistics of the random variable in order to bound its deviation from

    46

  • 7/21/2019 Cs 261 Notes

    47/120

    Ec

    V -SS

    Figure 26: The figure shows the crossing edge set Ec, whose weight we are interested in maximizing

    the mean. As discussed below, Markovs inequality can be used to derive tighter bounds like Chebychevsbound and Chernoff bound, which are based on higher order statistics.

    A simple proof of Markovs inequality for discrete Y is as follows: Assume the the inequality doesnot hold. Now,

    E(Y) =yY

    y P(Y =y)

    y>

    y P(Y =y)> 1 , (7)

    which is a contradiction since we E(Y) =.QED.

    To use Markovs inequality, define Y as the random variable equal to the sum of weights of edgesthat are notcrossing the cut. As before, it is easy to see that E(Y) =W/2, where Wis the sum of theweights of all the edges. Using Markovs inequality, we see that the probability of Y exceeding (say)1.2E(Y) = 0.6W is at most 1/1.2, so with probability at least 0.17 we get a cut with weight at least(W 0.6W) = 0.4W. Consider the following algorithm:

    Randomly assign nodes to Sand V S Compute weight of crossing edges, compare with W Repeat until weight of crossing edges is at least 0.4W

    Our calculation above implies that expected number of iterations is 1/0.17 = 6 we get cut thatincludes at least 0.4W edges, in other words we have 2.5-approximation. Note that the constants werechosen to make the example specific. We can choose alternative constants. For example, choosing 1.1instead of 1.2 will result in a bit better approximation and slower (expected) running time of the resultingalgorithm. Note that this approach inherently cannot improve the approximation factor beyond 2.

    It is useful to note that one can use tighter bounds (e.g. Chernoff discussed in the next section),that will result in better performance. We are going to skip this due to the fact that the local searchalgorithm presented below is better and simpler anyway.

    47

  • 7/21/2019 Cs 261 Notes

    48/120

    Next we consider theLocal Search approach to the above problem. Conceptually, this approach canbe viewed as a walk on a graph, whose vertices are comprised of feasible solutions to the problem athand. Transforming one solution to another is then equivalent to moving from a vertex on a graph to oneof its neighboring vertices. Typically, we start with a reasonable guess, and apply our transformation

    iteratively until the algorithm converges. This will occur when we reach a local optimum. However, forthis method to be useful, is is critical that the number of neighboring vertices (or solutions) is relativelysmall, in particular not exponential. We will illustrate this technique for the maximum cardinalitycrossing edge set problem.

    In the maximum cardinality crossing edge set problem we can construct a neighboring solution bymoving one node from set S to VS, or vice-versa. In particular,