optimization - part i: foundations of discrete optimization - … · 2020. 10. 8. · programming...

OptimizationPart I: Foundations of Discrete Optimization - MQANT1329

Daniele Catanzaro

Center for Operations Research and Econometrics (CORE)Louvain School of Management - Université Catholique de Louvain

https://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.html

Preface Preliminaries Linear DIEQs Optimizing over Integers Optimality & Bounds Efficiently Solvable COPs Complexity General Solution ApproachDISCRETEOPTIMIZATION

One of the crucial open questions in the foundations of mathematics and computer science is whether math-ematical optimization may provide us with anything more powerful than just deterministic algorithms whentackling classes of problems for which no efficient solution algorithm is currently known. Answering this ques-tion is the ultimate mission of an operational researcher.

D. CatanzaroLouvain-la-Neuve, October 8, 2020.

1 | 296


Optimization

Uncertainty Deterministic Multiobjective

RobustOptimization

Stochastic Optimization Continuous Discrete

Unconstrained Constrained Integer ProgrammingCombinatorial Optimization

Nonlinear Equations

Nondifferentiable Optimization

Global Optimization

Nonlinear Programming

Nonlinear Least Squares

Network Optimization

BoundConstrained

Linear Constrained

SemidefiniteProgramming

Second-OrderCone Programming

Quadratically Constrained Quadratic Programming

SemiinfiniteProgramming

Mathematical Program‐ming with Equilibrium Con‐

straints

Complementarity Problems

Mixed Integer NonlinearProgramming

Derivative-Free OptimizationQuadratic Programming Linear Programming

2 | 296


Daniele Catanzaro | COREUniversité Catholique de Louvain

PrefaceOptimization is an area of mathematics and computer science that investigate the criteria and themethods thatallow to maximize (or minimize) a function of many variables subject to constraints.Discrete Optimization is a branch of Optimization that studies how to maximize (or minimize) a given functionwhen some (or all) of the involved variables are required to belong to a discrete set (typically a subset of integersor naturals). These discrete restrictions enable the description of phenomena or alternatives where indivisibilityis required or where there is not a continuum of alternatives.Many discrete optimization problems may have an underlying combinatorial structure, i.e., they can be anal-ogously defined as optimization problems over graphs, matroids, permutations, combinations, or subsets;hence, they are usually referred to as Combinatorial Optimization Problems (COPs).Combinatorial Optimization is a branch of DiscreteOptimization that investigates the nature and the propertiesof such combinatorial structures with a view to derive specific solution approaches for the corresponding COPs.Because many COPs can be modeled as integer programs and many integer programs can often be given acombinatorial interpretation, there exists a close relation between Combinatorial Optimization and IntegerProgramming (also referred to asOptimization over Integers).

3 | 296




PrefaceThis course introduces to the mathematical foundations of combinatorial optimization and integer program-ming, gets through the computational aspects involved in discrete optimization, and gives particular emphasison the techniques that proved successful in current solvers, including convexification and enumeration. Partof this course follows the diagram below inspired by A. Schrijver, Theory of linear and integer programming,John Wiley & Sons, 1980.

Problem ALinear Algebra

Linear Equations

Problem BInteger Linear Algebra

Linear Diophantine Equations

Problem CLinear Programming Linear Inequalities

Problem DInteger Linear Programming

Linear Diophantine Inequalities

Polynomial Time

NP-Complete

Fundamental COPs

General Solution Approaches

Special Cases Solvable in Polynomial Time

Computational Complexity

4 | 296




PrefaceThe vast majority of this course instead closely follows L. Wolsey, Integer Programming, Wiley (1988), which hasbeen chosen as the textbook of reference.This course particularly emphasizes the ability to develop and implementmodels and algorithms to solve prac-tical discrete optimization problems. The participants to this course are supposed to be already familiar withtheory and algorithms for linear programming as well as with rudiments of both graph theory and a general pur-pose programming language such as C/C++, Python, Java or Mosel. In particular, Mosel will the programminglanguage of reference.All of the source codes and data used in this course can be downloaded in an unique zip file via the followinglink:

• Implementations

5 | 296

https://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttp://perso.uclouvain.be/daniele.catanzaro/SupportingMaterial/UCL/Teaching/Optimization/Implementations/Implementations.zip



Course Content• Mathematical Preliminaries

• Some Fundamental Problems in Linear Algebra and Number Theory

• Optimizing over Diophantine Inequalities with Positivity Constraints

• Optimality, Relaxations & Bounds

• Efficiently Solvable COPs

• Computational Complexity

• General Solution Approach to Optimization over Integers

6 | 296




Course Material1. Main source: blackboard lectures; the slides are meant to complement the blackboard lectures. The

participant are strongly invited to integrate the blackboard lectures at least with the primary textbooks ofthe course listed below.

2. Textbooks:2.1 Primary: L. Wolsey. Integer Programming, Wiley (1988) (Wolsey, 1998);

2.2 Secondary: M. Conforti, G. Cornuéjols, and G. Zambelli. Integer Programming, Springer International Publishing(2014) (Conforti et al., 2014).

3. Supplementary Sources:3.1 A. Schrijver, Theory of Linear and Integer Programming, John Wiley & Sons, 1980 (Schrijver, 1998);3.2 Alexander Schrijver. Combinatorial Optimization, Springer (2003) (Schrijver, 2003);3.3 G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization, Wiley (1999) (Nemhauser and

Wolsey, 1999);3.4 C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization, Algorithms and Complexity, Dover, (1998) (Pa-

padimitriou and Steiglitz, 1998).4. The exercise sessions provide, from among others, an introduction to FICO Xpress Optimization Suite.

7 | 296




The Lecturer• Specialties: Discrete Optimization & Theoretical Computer Science.

• Affiliation: Center for Operations Research and Econometrics (CORE), Université Catholique de Louvain.

• Address• Campus LLN: Voie du Roman Pays 34, L1.03.01, B-1348, Louvain-la-Neuve. Floor 0, Bureau: B-003.• Campus Mons: Chaussée de Binche 151, M1.01.01, 7000 Mons, Belgium. Floor 3, Bureau: B-302.

• Webpage: http://perso.uclouvain.be/daniele.catanzaro/Index.html.

• Office Hours: Every Wednesday from 14:00 to 16:30.You may book an appointment either “on the fly” just after the lecture or by sending a simple email [email protected].

8 | 296

https://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttps://perso.uclouvain.be/daniele.catanzaro/Index.htmlhttp://perso.uclouvain.be/daniele.catanzaro/[email protected]



Exam1. Mosel exercise (usually 15-20% of the overall grade, unless otherwise specified, and usually carried out

during the last exercise session). This part is carried out onlyonceper year and the participation ismanda-tory for all of the students. A poor score on this part precludes the access to the written exam.

2. Written Exam.

9 | 296




Final RemarkThis course is a standard 5 ECTS one, hence it is designed to be delivered throughout 30 hours of frontalteaching. This limited amount of time does not allow the proper covering of classic topics such as Matroids,ColumnGeneration, Heuristics, andmany others. Someof these topicswill be studied inmore advancedmastercourses, such as Quantitative Decision Making; others can be found in more comprehensive sources such asany of the alternative textbooks mentioned in the previous slides.This set of slides is in a permanent revision for the removal of typos and other small coding errors. If you shouldsee any of them, please, do not hesitate to contact me.

10 | 296



Mathematical Preliminaries

11 | 296



Mathematical Preliminaries: Basic Notation• By R (Q, Z, N) we denote the set of real (rational, integral, natural) numbers.

• The set N does not contain zero. The set N0 = N ∪ {0} does.

• R+ (Q+, Z+) denotes the nonnegative real (rational, integral) numbers.

• For a fixed n ∈ N, the symbolRn (Qn, Zn,Nn) denotes the set of real (rational, integral, natural) n-vectors,i.e., the set of vectors having n components and entries in R (Q, Z, N).

• Addition of vectors and multiplication of vectors with scalars are as usual. With these operations, Rn andQn are vector spaces over the fields R and Q respectively, while Zn is a module over the ring Z.

12 | 296




Mathematical Preliminaries: Basic Notation• A vector is always considered as a column vector, unless otherwise stated. The superscript “T” denotes

transposition. So, for any vector e.g., x ∈ Rn, xT is a row vector, unless otherwise stated.

• Rn is endowed with a (Euclidean) inner product defined as

xT y =n∑

i=1

xiyi, ∀ x, y ∈ Rn.

• We denote ej as the j-th unit vector in Rn, i.e., the n-vector having the j-th entry equal to 1 and all theremaining ones equal to zero.

• For any given set R, we denote Rn as the set of the n-vectors having entries in R.

13 | 296




Mathematical Preliminaries: Basic Notation• Given any two n-vectors x, y ∈ Rn, for some given set R, we will write

x ≤ y

if xi ≤ yi holds for all i ∈ {1, . . . , n}.

14 | 296




Mathematical Preliminaries: Basic Notation• For any given set R, we denote Rm×n as the set of the m × n matrices having entries in R.

• We will usually assume that the row index set of a matrix A ∈ Rm×n is {1, . . . , m} and that the columnindex set is {1, . . . , n}.

• Unless specified otherwise, the elements or entries of A ∈ Rm×n are denoted by aij , i ∈ {1, . . . , m},j ∈ {1, . . . , n} and we will write for easy of notation A = {aij}.

• Vectors with n components are also considered as n × 1-matrices.

• Let A ∈ Rm×n, I ⊆ {1, . . . , m}, and J ⊆ {1, . . . , n}. Then, AIJ denotes the submatrix of A induced bythose rows and columns of A whose indices belong to I and J , respectively.

• A submatrix A of the form AII is called a principal submatrix of A. If K = {1, . . . , k}, then AKK is calledthe k-th leading principal submatrix of A.

15 | 296




Mathematical Preliminaries: Basic Notation• We denote det(A) as the determinant of any matrix A ∈ Rn×n and tr(A) as its trace, i.e., the sum of theelements on its main diagonal.

• We denote A−1 as the inverse matrix of any matrix A ∈ Rn×n. If a matrix has an inverse matrix then it iscalled nonsingular, and otherwise singular. We recall that a matrix A ∈ Rn×n is nonsingular if and onlyif det(A) ̸= 0.

• We denote I as the identity matrix, i.e., an appropriately sized diagonal matrix having all the diagonalentries equal to 1.

• We denote 0 as any appropriately sized matrix which has all entries equal to zero, and similarly any zerovector.

• We denote 1 as a vector or a matrix which has all entries equal to 1.

• Whenever we do not explicitly state whether a number, vector, or matrix is integral, rational, or real, it isimplicitly assumed to be rational. Moreover, we often do not specify the dimensions of vectors and ma-trices explicitly: when operating with them, we will always assume that their dimensions are compatible.

16 | 296




Mathematical Preliminaries: Basic Notation• A set is a collection of distinct objects. The vast majority of the sets that will be considered in this courseare finite.

• The term familydenotes a set inwhich elementsmay occurmore than once; more precisely, each elementhas amultiplicity associated.

• The term collection is a synonym of set, but is usually used to denote a set whose elements are setsthemselves.

• The terms class and system, respectively, are also synonyms of set, and are usually used to denote setsof structures (e.g., graphs, inequalities, curves, and so on).

• Given any two sets, say M and N , the expression M ⊆ N means that M is a subset (possibly improper)of N , while M ⊂ N denotes the strict containment, i.e., M ⊆ N and M ̸= N .

• We denote M \ N as the set-theoretical difference {x ∈ M : x /∈ N}.

• We denote 2M as the power set of M , i.e., the set of all the subsets of M.

17 | 296




Mathematical Preliminaries: Basic Notation• Given any two sets M, N ⊆ Rn and α ∈ R we denote:

• M + N = {x + y ∈ Rn : x ∈ M, y ∈ N};

• αM = {αx ∈ Rn : x ∈ M};

• −M = {x ∈ Rn : x ∈ M};

• M − N = M + (−N).

• Fixed a number α ∈ R, we denote ⌊α⌋ as the largest integer not larger than α (i.e., the floor or lowerinteger part of α).

• Analogously, we denote ⌈α⌉ as the smallest integer not smaller than α (i.e., the ceiling or upper integerpart of α).

18 | 296




Mathematical Preliminaries: Hulls, Independence, Dimension• We define an integer vector as a n-vector having all entries in Z.

• We define a 0-1 vector as a n-vector having all entries in {0, 1}.

• A vector x ∈ Rn is called a linear combination of the vectors x1, x1, . . . , xk ∈ Rn if, for some λ ={λ1, λ2, . . . , λk} ∈ Rk

x =k∑

i=1

λixi.

• If, in addition, λ ≥ 0 ∑k

i=1 λi = 1λ ≥ 0,

∑ki=1 λi = 1

we call x a

{conic combinationaffine combinationconvex combination

(1)

of the vectors x1, x1, . . . , xk .

19 | 296




Mathematical Preliminaries: Hulls, Independence, Dimension• These combinations are called proper if neither λ = 0 nor λ = ej , for some j ∈ {1, 2, . . . , k}.

• For a nonempty subset S ⊆ Rn, we denote bylin(S)cone(S)aff(S)conv(S)

the

linearconicaffineconvex

(2)

hull of the elements of S, i.e., the set of all vectors that are linear (conic, affine,convex) combinations offinitely many vectors of S.

• For the empty set ∅, we define lin(∅)=cone(∅)=aff(∅)=conv(∅)=∅.

20 | 296




Mathematical Preliminaries: Hulls, Independence, Dimension• A subset S ⊆ Rn is called

a linear subspacea conean affine subspacea convex set

if

S = lin(S)S = cone(S)S = aff(S)S = conv(S).

(3)

21 | 296




Mathematical Preliminaries: Hulls, Independence, Dimension• A subset S ⊆ Rn is called linearly (affinely) independent if none of its members is a proper linear (affine)combination of elements of S; otherwise, S is called linearly (affinely) dependent.

• It is well-known that a linearly (affinely) independent subset of Rn contains at most n elements (n + 1elements).

• For any subset S ⊆ Rn, the rank of S (affine rank of S), denoted by rank(S) (arank(S)), is the cardinalityof the largest linearly (affinely) independent subset of S.

• For any subset S ⊆ Rn, the dimension of S, denoted by dim(S), is the cardinality of the largest affinelyindependent subset of S minus one, i.e.,

dim(S) = arank(S) − 1.

• A subset S ⊆ Rn with dim(S)=n is called full-dimensional.

• The rank of a matrix A, denoted by rank(A), is the rank of the set of its row vectors. This is known to beequal to the rank of the set of its column vectors. A n×mmatrixA is said to have full row rank (full columnrank) if rank(A=n) (rank(A)=m).

22 | 296




Mathematical Preliminaries: Functions and Orders of Functions• A function is a relation between two sets (called domain and co-domain) that associates to each elementof the domain one and only one element of the co-domain.

• A function f : X → Y is injective if ∀ x, x′ ∈ X , x ̸= x′, it holds that f(x) ̸= f(x′).

• A function f : X → Y is surjective if ∀ y ∈ Y , ∃ x ∈ X such that f(x) = y.

• A function f : X → Y is bijective if it is both injective and surjective.

• Consider a function g : N → R+; we define the following set of functions:

O(g(n)) = {h(n) : N → R+ : ∀ n ≥ n0, ∃ c1, c2, n0 ∈ N : 0 ≤ c1g(n) ≤ h(n) ≤ c2g(n)}.

In other words, a function f : N → R+ belongs to O(g(n)), (i.e., f(n) ∈ O(g(n))), if there exist twopositive constants c1 and c2 such that for a sufficiently large value n, f(n) can be “sandwiched” betweenc1g(n) and c2g(n).

23 | 296




Mathematical Preliminaries: Functions and Orders of Functions• Example 1. The function f(n) = 2n3 + n2 belongs to O(n3).

In order to prove this claim, it is sufficient to find three constants c1, c2 et n0 that satisfy the definition.One possibility is to set c1 = 1, c2 = 3 et n0 = 0. In such a case, it is easy to see that

∀ n ≥ n0, c1g(n) ≤ f(n) ⇔ n3 ≤ 2n3 + n2.

Concerning the upper bound

∀ n ≥ n0, f(n) ≤ c2g(n) ⇔ 2n3 + n2 ≤ 3n3

⇔ 2n3 + n2 − 2n3 ≤ 3n3 − 2n3

⇔ n2 ≤ n3

24 | 296




Mathematical Preliminaries: Functions and Orders of Functions• Example 2. The function f(n) = 2n3 + n2 belongs to O(n3).

In order to prove this claim, it is sufficient to find three constants c1, c2 et n0 that satisfy the definition.One possibility is to set c1 = 1, c2 = 3 et n0 = 0. In such a case, it is easy to see that

∀ n ≥ n0, c1g(n) ≤ f(n) ⇔ n3 ≤ 2n3 + n2.

Concerning the upper bound

∀ n ≥ n0, f(n) ≤ c2g(n) ⇔ 2n3 + n2 ≤ 3n3

⇔ 2n3 + n2 − 2n3 ≤ 3n3 − 2n3

⇔ n2 ≤ n3

Note that the function f(n) = 2n3 +n2 trivially belongs also toO(n4),O(n5), and so on, but not toO(n2)or O(n).

25 | 296




Mathematical Preliminaries: Maxima, Minima, and Infinity• In this course, when speaking of amaximumorminimum, wewill often implicitly assume that the optimumis finite. If the optimum is not finite, consistency in min-max relations usually can be obtained by setting:

• a minimum over the empty set to +∞;

• a maximum over a set without upper bound to +∞;

• a maximum over the empty set to 0 or −∞ (depending on what is the universe);

• a minimum over a set without lower bound to −∞.This usually leads to trivial, or earlier proved, statements.

• If we consider maximizing a function f(x) over x ∈ X , we call any x ∈ X a feasible solution, and anyx ∈ X maximizing f(x) as optimal solution. Similarly for minimizing.

26 | 296




Mathematical Preliminaries: Graph TheoryA Graph G = (V, E) consists of a finite nonempty set V of nodes (or vertices) and a finite set E of edges. Withevery edge, an unordered pair of nodes, called its endnodes and usually denoted as (u, v), with u, v ∈ V , isassociated and we say that an edge is incident to its endnodes.

bc

bc

bc

bc bc

bcbc

bc

bc

bc

27 | 296




Mathematical Preliminaries: Graph TheoryWhen an ordering is explicitly given on the pair of endnodes (u, v), u, v ∈ V , we say that such pair (u, v) definesan arc and we say that G = (V, E) is directed. To remark this fact, we replace the letter E (i.e., the edge-set)with the letter A (i.e., the arc-set). We say that G = (V, E) is undirected otherwise.

u v z

x y

28 | 296




Graphs: Fundamental Notation & DefinitionsThe following notation and definitions are fundamental in graph theory:

• The number of nodes of G is called the order of G.

• A node that is not incident to any edge is called isolated. Two nodes that are joined by an edge are calledadjacent or neighbor. The neighborhood of a node u is the set of its neighbors.

• The set of edges having a node u ∈ V as one of their endnodes is denoted by δ(u). The number |δ(u)|denotes the degree of the node u ∈ V .

• Consider a graph G = (V, E). Then, we say that G is weighted if there exists a function w : E → R thatassociates a real number to each edge of G. In this situation, we say that wij is the weight associated tothe edge (i, j) ∈ E.

29 | 296




Graphs: Fundamental Notation & Definitions• More generally, if S ⊆ V , then δ(S) denotes the set of edges with one endnode in S and the otherendnode in V \ S. Any edge of the form δ(S), where ∅ ̸= S ̸= V is called a cut.

2 4

1 5 7

3 6

• If s and t are two different nodes of G, then an edgeset F ⊆ E is called a (s, t)-cut if there exists a nodeset S ⊆ V , with s ∈ S, t /∈ S, such that F = δ(S).

30 | 296




Graphs: Fundamental Notation & Definitions• A simple graph is called complete is every two of its nodes are joined by an edge. The complete graphof order n is denoted by Kn.

1

2

34

5

6

7

8 9

10

31 | 296




Graphs: Fundamental Notation & Definitions• A graph G whose nodeset V can be partitioned into two nonempty disjoint sets V1, V2 with V1 ∪ V2 = Vis called bipartite.

u4

u3

v3

u2

v2

u1

v1

u0

v0

V1 V2

When |V | = m + n, |V1| = m, and |V2| = n, we denote this bipartite graph as Km,n.

32 | 296




Graphs: Fundamental Notation & Definitions• The complete bipartite graph having |V1| = m and |V2| = n is denoted asKm,n. The complete bipartitegraph K1,n is called a star and the star K1,3 is called a claw.

• Given a graph G, the complement of G, denoted by Ḡ, is the simple graph which has the same nodesetas G and in which two nodes are adjacent if and only if they are nonadjacent in G.

• A graph is planar, if it can be drawn in the plane in such a way that no two edges intersect, except possiblyin their endpoints.

• A directed graph (or digraph)D = (V, A) consists of a finite nonempty set V of nodes and a setA of arcs.

• With every arc a, an ordered pair (u, v) of nodes, called its endnodes, is associated; u is the initialendnode and v is the terminal endnode.

• If u ∈ V then the set of arcs having u as initial (terminal) node is denoted by δ+(u) (δ+(u)). We setδ(u) = δ+(u) ∪ δ−(u). The numbers |δ+(u)|, |δ−(u)|, and |δ(u)| are called the outdegree, indegree, anddegree of u, respectively.

• For any set W ⊆ V , ∅ ̸= W ̸= V , we set δ+(W ) = {(i, j) ∈ A : i ∈ W, j /∈ W }; δ−(W ) = δ+(V \ W ),and δ(W ) = δ+(W ) ∪ δ−(W ).

33 | 296




Graphs: Fundamental Notation & Definitions• If s and t are two different nodes of a digraph D = (V, A), then an arcset F ⊆ A is called an (s, t)-cut in

D if there is a node set W with s ∈ W , t /∈ W , such that F = δ+(W ).

• In a graph (or a digraph), a walk is a finite and alternating sequence

W = v0, e1, v1, e2, v2, . . . , ek, vk,

for some k ≥ 0, of nodes and edges (or arcs) beginning and ending with a node, and in which each nodevi is followed by an edge (or arc) ej .

• An edge (arc) connecting two nodes of a walk but not contained in the walk is called a cord of the walk.

• A walk in which all nodes (edges or arcs) are distinct is called a path. A path in a directed graph is calleda directed path or dipath.

• Two nodes s, t of a graphG are said to be connected ifG contains an (s, t)-path, i.e., a path beginning ins and ending in t or vice versa.

• G is called connected if every two nodes of G are connected.

34 | 296




Graphs: Fundamental Notation & Definitions• A digraph D is strongly connected if for every two nodes s, t of D, there are an (s, t)-dipath and a (t, s)-dipath in D.

1

2 3

35 | 296




Graphs: Fundamental Notation & Definitions• A digraph D is weakly connected if for every two nodes s, t of D, there is either an (s, t)-dipath or a

(t, s)-dipath in D.

1 4

2 3 5 6

36 | 296




Graphs: Very Important Definitions• A walk is called closed if it has nonzero length and its origin and terminus are identical. A closed walk inwhich the origin, all internal nodes, and all edges are different is called a circuit or cycle.

• Awalk that traverses every edge (arc) of a graph (digraph) exactly once is called an Eulerian trail. A closedEulerian trail is called an Eulerian tour. An Eulerian graph is a graph that contains an Eulerian tour.

• A circuit of length n in a graph of order n is called a Hamiltonian circuit. A graph G that contains aHamiltonian circuit is called Hamiltonian.

37 | 296




Graphs: Very Important Definitions• A forest is an edgeset in a graph which does not contain a circuit.

1

2 3 4

• A connected forest is called a tree. In other words, a tree is an acyclic graph.

1

2 3 4

38 | 296




Graphs: Very Important Definitions• Let T = (V, E) a finite tree of order n. Then, the following statements are equivalent:

• T is connected and has no circuits;

• T has n − 1 edges and has no circuits;

• T is connected and has n − 1 edges;

• T is connected and the removal of any one edge makes T disconnected;

• Any two nodes of T are connected by exactly one path;

• T has no circuits and adding one edge in T gives rise to a cycle.

39 | 296




Graphs: Very Important Definitions• A spanning tree of a graph G of order n is a tree containing all of the n nodes of G.

2 4

1 5 7

3 6

40 | 296



Linear Diophantine Inequalities

41 | 296



Some Basic ProblemsThere are some basic problems concerning linear spaces which play an important role in applications and thatoccur frequently in this course. Specifically, given a matrix A ∈ Rm×n and a vector b ∈ Rm, consider thesystem of linear equations

Ax = b. (4)

Then, we can formulate the following four problems:

• Problem 1: Find a solution to (4).

• Problem 2: Find an integral solution to (4).

• Problem 3: Find a nonnegative solution to (4).

• Problem 4: Find a nonnegative integral solution to (4).

Borrowing a term from number theory, we shall call Problems 2 and 4 the Diophantine versions of Problems 1and 3, respectively.

42 | 296




Some Basic ProblemsWe can ask similar questions about the solvability of the system of linear inequalities

Ax ≤ b (5)

namely,

• Problem 5: Find a solution to (5).

• Problem 6: Find an integral solution to (5).

• Problem 7: Find a nonnegative solution to (5).

• Problem 8: Find a nonnegative integral solution to (5).

Obviously, the nonnegativity conditions in Problems 7 and 8 could be included in (5). Hence, Problem 7 isequivalent to Problem 5 and Problem 8 is equivalent to Problem 6.Furthermore, it is rather easy to see that Problem 5 is equivalent to Problem 3, and if A and b are rational thenProblem 6 is equivalent to Problem 4.

43 | 296




Systems of Linear EquationsThe issues related to the solvability of these problems are fundamental to understand how to optimize over thesets {x ∈ Rn+ : Ax ≤ b} and {x ∈ Zn+ : Ax ≤ b}. We will come back on such aspects later in this course. Forthe moment, we briefly recall the following classical results that characterize the feasibility of Problems 1, 2 and3.Let’s start from Problem 1.Theorem 1 (Solvability of Linear Equations – Gauss, 1809; Fredholm, 1903)There exists a vector x ∈ Rn such that Ax = b (i.e., the system Ax = b is consistent) if and only if there doesnot exist a vector y ∈ Rm such that yT A = 0 and yT b ̸= 0.

44 | 296




Systems of Linear EquationsProof.Necessity. Suppose by contradiction that there exists a vector y ∈ Rm such that yT A = 0 and yT b ̸= 0whenever Ax = b. Premultiply both sides of the system Ax = b by such a vector. Then, we have that 0 =yT Ax = yT b ̸= 0, which trivially leads to a contradiction.Sufficiency. Let L = {z ∈ Rm : Ax = z, for some x ∈ Rn} be a subspace of Rm. We recall from linear algebrathat a linear subspace is generated by finitely many vectors, and is also the intersection of finitely many linearhyperplanes. So for the linear subspace L there are matrices A and C such that L = {z ∈ Rm : Ax =z, for some x ∈ Rn} = {z ∈ Rm : Cz = 0}. This fact implies that CA is an all zero matrix. Now, if yT b = 0 foreach vector y ∈ Rm such that yT A = 0, then Cb = 0, i.e., b ∈ L, hence there exists some x ∈ Rn such thatAx = b.In other words, Theorem 1 states that a solution toAx = b gives a way of writing b as a sum of scalar multiples(i.e., a linear combination) of the columns of A. When the system is inconsistent, Theorem 1 states that there isa vector y ∈ Rm that is orthogonal to each column of A, but not orthogonal to b.

45 | 296




Systems of Linear EquationsExample: consider the following linear system2x1 + x2 = 1x1 − 3x2 = 113x1 + 3x2 = −3.The column vectors (2, 1, 3)T and (1, −3, 3)T span the plane 12x − 3y − 7z = 0 in R3. If b = (1, 11, −3)T is inthe plane then the system has a solution. In our case, b is in the plane and the solution of system is: x1 = 2 andx2 = −3.In contrast, b = (2, 11, −3)T is not in the plane and the system is inconsistent. Specifically, the equation of theplane gives us a normal vector (12, −3, −7)T that is orthogonal to every vector in the plane. Now, if we take theabove system and multiply the first equation by 12, the second by -3, the third by -7, and we finally combinethe resulting equations we get 0x1 + 0x2 = 12. As 0 ̸= 12, the system must be inconsistent.

46 | 296




Systems of Linear EquationsProblem1 can be solved in polynomial time by using possibly one of the best known algorithm is linear algebra:the “Gaussian elimination method”. In particular, this algorithm is O(n3) for a square matrix A of order n.Reviewing the method is out of the scope of the present course. The reader interested in such a topic is invitedto read p. 31-37 of (Schrijver, 1998) or, alternatively, any other text of linear algebra or numerical calculus.

47 | 296




Systems of Linear EquationsLet’s consider now the Diophantine version of Problem 1, i.e., the problem of finding an integral solution toAx = b. We will show that also for this problem it is possible to derive a characterization for the feasibility ofa system of linear Diophantine equations. Such characterization exploits an analogue of the reduced echelonform in the Gaussian elimination method for matrices over the integers, which is known as Hermite NormalForm (HNF).Before proceeding, we introduce some preliminary definitions and theorems that will prove useful to charac-terize the feasibility of Problem 2.

Definition 1 (The Greatest Common Divisor)Let a, b ∈ Z, such that a ̸= 0 and b ̸= 0. The Greatest Common Divisor (GDC) of a and b, denoted by gcd(a,b),is the largest positive integer d that divides both a and b, i.e., a = dx and b = dy for some x, y ∈ Z.Example: Consider the integers 24 and 36. The divisors of 24 are

{1, 2, 3, 4, 6, 8, 12, 24};

similarly, the divisors of 36 are{1, 2, 3, 4, 6, 9, 12, 18, 36}.

48 | 296




Systems of Linear EquationsThe common divisors are

{1, 2, 3, 4, 6, 12}

and the largest is 12. Thus, 12 is the GDC of 24 and 36.

49 | 296




Systems of Linear EquationsThe following theorem is a well-known result from number theory. We omit its proof and just focus on its state-ment. The reader interests in its proof is referred to Conforti et al. (2014), p. 26-27.

Theorem 2Let a, b ∈ Z, such that a ̸= 0 and b ̸= 0, and let g = gcd(a,b). Then, the following equation

ax + by = c

admits an integral solution if and only if c is an integer and g divides c. Furthermore, there exists a 2 × 2 integralmatrix T whose inverse is also integral such that

(a, b)T = (g, 0).

All integral solutions of ax + by = c are of the form T

(cgz

), for all z ∈ Z.

50 | 296




Systems of Linear EquationsThe following result generalizes Theorem 2. The proof of Theorem 3 can be found in Conforti et al. (2014), p.27-28.Theorem 3Given a ∈ Zn \ {0}, let g = gdc(a1, a2, . . . , an). Then, the following equation

ax = c

admits an integral solution if and only if c is an integer and g divides c. Furthermore, there exists a n×n integralmatrix T whose inverse is also integral, such that

aT = (g, 0, 0, . . . , 0).

All integral solutions of ax = c are of the form T

(cgz

), for all z ∈ Zn−1.

51 | 296




Systems of Linear EquationsThe followingdefinition is an integer analogueof the reducedechelon form in theGaussian eliminationmethod.We will exploit it to characterize the feasibility of a system of linear Diophantine equations:

Definition 2 (Hermite Normal Form)Let A ∈ Qm×n be a rational matrix. We say that A is in Hermite Normal Form (HNF) if it can be decomposedinto [B 0], such that

• 0 is a m × (n − m) zero matrix;• B is a m × m nonnegative triangular matrix such that bii > 0, for i ∈ {1, . . . , m}, and bij < bii, for all

1 ≤ j < i ≤ m.

For example, the following matrix is in HNF:(1 0 0 01 2 0 00 0 1 0

).

It is worth noting that, due to Sylvester’s criterion, any matrix in HNF has full row rank.

52 | 296




Systems of Linear EquationsDefinition 3 (Unimodular operations)The following operations on a matrix are called elementary (unimodular) column operations:

• exchanging two columns;• multiplying a column by -1;• adding an integral multiple of one column to another column.

Theorem 4 (HNF Theorem)Every rational matrix of full row rank can be brought into HNF by a finite sequence of unimodular column op-erations.

53 | 296




Systems of Linear EquationsProof.Let A ∈ Qm×n be a rational matrix of full row rank, and let µ be a positive integer such that A′ = µA is anintegral matrix (note that ifA is rational µ always exists). Assume thatA′ has been transformed, by a sequenceof unimodular column operations, into the form (

B 0D C

)where [B 0] is in HNF.Note that this transformation is always possible, due to Theorem3. By permuting columnsandmultiplying columns by -1, we want to transformC so that c11 ≥ c12 ≥ · · · c1k ≥ 0. To this end, we performthe following steps. If c1j > 0 for some j ∈ {2, . . . , k}, we subtract the jth column of C from the first and werepeat the previous step. Note that, at each iteration, the sum c11 +c12 +. . .+c1k decreases by 1/µ at least, thusafter a finite number of iterations c12 = c13 = . . . = c1k = 0 will hold true. When c12 = c13 = . . . = c1k = 0,we add or subtract integer multiples of the first column of C to the columns of D so that 0 ≤ b1j < c11, forj ∈ {1, . . . , n − k}. Then, the matrix [B 0] can be extended by one row, by adding the first row of [D C]. It iseasy to see that these steps can be iterated until when a matrix in HNF is obtained.

54 | 296




Systems of Linear Diophantine EquationsExample: Consider the following rational matrix:

A =

1

66121

165301

22040 01

11401

27551

34801

330601

220401

33060 0 0

.It is easy to see that, by multiplying A by µ = 66120, we get the following integral matrix

A′ = µA =

(10 4 3 058 24 19 23 2 0 0

).

which has full row rank. Let’s transform A′ into HNF by performing unimodular column operations.

55 | 296




Systems of Linear Diophantine Equations1. First, we subtract 3 times column 3 from column 1. We get

A1 =

(1 4 3 01 24 19 23 2 0 0

).

2. Now, we use column 1 ofA1 to cancel out the nonzero entries in the first row of columns 2 and 3. We get

A2 =

(1 0 0 01 20 16 23 −10 −9 0

).

56 | 296




Systems of Linear Diophantine Equations3. Now, let’s interchange columns 2 and 4 of A2. We get

A3 =

(1 0 0 01 2 16 203 0 −9 −10

).

4. Next, we cancel out the nonzero entries in the second row of columns 3 and 4 using column 2 ofA3. Weget

A4 =

(1 0 0 01 2 0 03 0 −9 −10

).

57 | 296




Systems of Linear Diophantine Equations5. We subtract column 4 of A4 from column 3 and then add 10 times column 3 to column 4. We get

A5 =

(1 0 0 01 2 0 03 0 1 0

).

6. Observe that A5 is already in lower triangular form. To complete, we need to satisfy the condition aij <aii, for all j < i. To this end, we subtract 3 times column 3 of A5 from column 1. We get

A6 =

(1 0 0 01 2 0 00 0 1 0

).

Thus, matrix A6 is in HNF.Observation: The rationality assumption of A is critical in Theorem 4. For example, the matrix A = (√5 1)cannot be brought into Hermite normal form using unimodular operations. Why?

58 | 296




Systems of Linear Diophantine EquationsAn important corollary of Theorem 4 states (Schrijver, 1998, p. 48):

Corollary 1 (Uniqueness of the HNF)Every rational matrix of full row rank admits an unique HNF.Thus, given a rational matrix A ∈ Qm×n of full row rank we can speak of the Hermite Normal Form of A.Now, let’s come back to Problem 2, i.e., the problem of finding an integral solution to Ax = b. The followingtheorem characterizes the feasibility of such a system:

Theorem 5 (Integer Solvability of Linear Equations – Kronecker (1884))Assume thatA and b are rational. Then, there exists a vector x ∈ Zn such thatAx = b if and only if there doesnot exist a vector y ∈ Qm such that yT A is integral and yT b is not an integer.

59 | 296




Systems of Linear Diophantine EquationsProof.Necessity. Suppose by contradiction that there exists a rational vector y ∈ Qm such that yT A is integral andyT b is not integral. Premultiply Ax = b by yT . As x and yT A are integral, then yT Ax = yT b is an integer,which leads to a contradiction.Sufficiency. Suppose that yT b is an integer whenever yT A is integral. Then,Ax = b has a (possibly fractional)solution, since otherwise, by Theorem 1, yT A = 0 and yT b = 1/2 for some rational vector y. As the system isconsistent, we may assume that the rows ofA are linearly independent, i.e., we may assume thatA has full rowrank. Now, observe that both sides of Ax = b are invariant under unimodular column operations and, due toTheorem 4, every rational matrix of full row rank can be brought into HNF. Hence, we may assume that A canbe decomposed into [B 0]. Since B−1[B 0] = [I 0] is an integral matrix, it follows from our assumption thatalso B−1b is an integral vector. Now, because

[B 0](

B−1b0

)= b

the vector x =(

B−1b0

)is an integral solution for Ax = b.

60 | 296




Systems of Linear Diophantine EquationsCorollary 2 (Computing the HNF - Schrijver (1998), p. 56-59)Every rational matrix of full row rank can be brought into its unique HNF by using a polynomial number ofunimodular operations.Interestingly, sequences of unimodular column operations can be described by specific types ofmatrices calledunimodular matrices. Unimodular matrices allow to obtain the HNF of a rational matrix A of full row rank bymatrix multiplications. To see that, let’s introduce the following definition.

Definition 4 (Unimodular Matrix)A m × n matrix U is unimodular if1. it is integral;2. it has rank m;3. every m × m submatrix B of A has determinant det(B) ∈ {−1, 0, 1}.

In particular, a square matrix U is unimodular if it is integral and det(U) = ±1.

61 | 296




Systems of Linear Diophantine EquationsTheorem 6If U is a square matrix of order n obtained from the identity matrix by performing a single unimodular columnoperation, then U is unimodular.

Proof.The statement trivially follows by observing that if U is obtained by interchanging two columns of the identitymatrix I of by multiplying a column of I by -1, then det(U) = −1. Similarly, ifU is obtained from I by adding toa column an integer multiple of another column then U differs from the identity only in one component, and itis therefore a triangular matrix with all ones in the diagonal. Thus, det(U) = 1.Theorem 6 is interesting because it allows to make the following observation: if matrix H is obtained from am×nmatrixAby a single unimodular columnoperation, then it holds thatH = AU, whereU is a squarematrixof order n obtained from the identity matrix by performing the same single unimodular column operation. Inparticular, ifA is am×n rational matrix of full row rank andH is its unique HNF, thenH = AU, for some squareunimodular matrix U.

62 | 296




Systems of Linear Diophantine EquationsExample: Consider the integral matrix of the previous example:

A′ =

(10 4 3 058 24 19 23 2 0 0

).

At the first step of the procedure used to bringA′ into its HNF, we subtracted 3 times column 3 from column 1.It is easy to see that this unimodular column operation can be performed by the following matrix multiplication

A1 =

(1 4 3 01 24 19 23 2 0 0

)=

(10 4 3 058 24 19 23 2 0 0

) 1 0 0 00 1 0 0−3 0 1 0

0 0 0 1

and that the last matrix

U =

1 0 0 00 1 0 0−3 0 1 0

0 0 0 1

63 | 296




Systems of Linear Diophantine Equationsis unimodular, as stated in Theorem 6.

Theorem 7Let U be a square nonsingular matrix of order n. Then, the following claims are equivalent:

1. U is unimodular;2. U and U−1 are both integral;3. U−1 is unimodular;4. for all x ∈ Rn Ux is integral if and only if x is integral;5. U is obtained from the identity matrix by a sequence of unimodular column operations.

64 | 296




Systems of Linear Diophantine EquationsProof.(1)→(2). AssumeU is unimodular. By standard linear algebraU−1 is the adjoint matrix ofU divided by det(U).Since U is integral, its adjoint matrix is integral. Since U is unimodular det(U) = ±1. Thus, U−1 is integral.(2)→(1). IfU andU−1 are both integral, then det(U) and det(U−1) are both integer numbers. Since det(U) =1/det(U−1), it follows that det(U) = ±1.(1)↔(3) follows from (1)↔(2).(2)→(4). Assume that U and U−1 are both integral matrices and let x ∈ Rn. If x is an integral vector, then Uxis integral because U is integral. Conversely, if y = Ux is integral then x = U−1y is integral because U−1 isintegral.(4)→(2). This is immediate.(5)→(1). Suppose thatU is obtained from the identity matrix by a sequence of unimodular column operations.It follows that U = U1U2 · · · Uk where U1, U2, . . . Uk are matrices obtained from the identity matrix by per-forming a single unimodular column operation. By Theorem 6, such matrices are unimodular, therefore U ismodular, since it is integral and det(U) = det(U1)det(U2) · · · det(Uk) = ±1.

65 | 296




Systems of Linear Diophantine EquationsProof (Continuation).(1)→(5). Suppose that U is unimodular. Let H be the Hermite normal form of U−1. By Theorem 4 and theobservation to Theorem 6, H = U−1U′, where U′ is obtained from the identity matrix by a sequence ofunimodular column operations. It follows from implication (5)→(1) that U′ is unimodular, and it follows from(1)→(3) that U−1 is unimodular. Therefore H is unimodular as well. Since, H is diagonal, H is the identitymatrix. Therefore U = U′, which shows that U can be obtained from the identity matrix by a sequence ofunimodular operations.

66 | 296




Systems of Linear Diophantine EquationsCorollary 3For each rational matrix A of full row rank there is a unimodular matrix U such that AU is the HNF of A. If A isnonsingular, U is unique.

Theorem 8Let A be a rational m × n matrix with full row rank and let b ∈ Rm. Let H = [B 0] be the HNF of A and Uan unimodular matrix such that H = AU. Then, Problem 2, i.e., Ax = b, x ∈ Zn, has a solution if and only ifȳ = B−1b ∈ Zn. In this case, all solutions of Hx = b are of the form{

U(

ȳk

): k ∈ Zn−m

}.

67 | 296




Systems of Linear Diophantine EquationsProof.By Theorem 7, for all y ∈ Rn we have that y ∈ Zn if and only ifUy ∈ Zn. Therefore, y is an integral solution ofHy = AUy = b if and only if x = Uy is an integral solution of Ax = b.

Now, let’s observe that an integral solution to Hy = b is(

ȳ0

), where ȳ = B−1b.

Moreover, from linear algebra we know that all the integral solutions to Hy = b are of the form(ȳk

)=(

B−1bk

), ∀ k ∈ Zn−m.

Since x = Uy, the set {x ∈ Zn : Ax = b} ={

U(

ȳk

): k ∈ Zn−m

}and the statement follows.

Theorem 8 suggests the following approach to solve systems of linear Diophantine equations.Algorithm for solving Problem 2.

68 | 296




Systems of Linear Diophantine EquationsStep 1. Check whether A has full row rank. If not, either the system Ax = b is infeasible (so Problem 2 hasno solution) or it contains reduntant equations that can be removed. This step can be performed by Gaussianelimination. Specifically, if an equation 0x = α is produced and α ̸= 0, then Ax = b is infeasible; in constrast,if α = 0 the equation is redundant.Step 2. Transform A into the HNF H by a sequence of unimodular operations. By the previous theoremsH = AU for some square unimodular matrix U.Step 3. SolveHy = b, y ∈ Zn. If this problem is infeasible then Problem 2 has no solution. Otherwise, denotedȳ as an integral solution to Hy = b, then x = Uȳ is a solution to Problem 2.Example: Consider the following example: Solve the system of Diophantine equationsAx = b, x ∈ Z4, where

A =

(10 4 3 058 24 19 23 2 0 0

)and b =

(355

)

69 | 296




Systems of Linear Diophantine EquationsAsmatrixA is the same of the previous examples, we know that it has full row rank so Step 1 is trivial in this case.Then, we can directly proceed to Step 2, i.e., transformingA into HNF. To this end, we subtract 3 times column3 from column 1. This unimodular column operation can be performed by the following matrix multiplication

A1 =

(1 4 3 01 24 19 23 2 0 0

)=

(10 4 3 058 24 19 2

3 2 0 0

)( 1 0 0 00 1 0 0

−3 0 1 00 0 0 1

).

We now use column 1 of A1 to cancel out the nonzero entries in the first row of columns 2 and 3. This can bedone as follows:

A2 =

(1 0 0 01 20 16 23 −10 −9 0

)=

(1 4 3 01 24 19 23 2 0 0

)(1 −4 −3 00 1 0 00 0 1 00 0 0 1

).

70 | 296




Systems of Linear Diophantine EquationsNext, we interchange columns 2 and 4 of A2:

A3 =

(1 0 0 01 2 16 203 0 −9 −10

)=

(1 0 0 01 20 16 23 −10 −9 0

)(1 0 0 00 0 0 10 0 1 00 1 0 0

).

Then, we cancel out the nonzero entries in the second row of columns 3 and 4 using column 2 of A3. We get

A4 =

(1 0 0 01 2 0 03 0 −9 −10

)=

(1 0 0 01 2 16 203 0 −9 −10

)(1 0 0 00 1 −8 −100 0 1 00 0 0 1

).

71 | 296




Systems of Linear Diophantine EquationsNow, in order to get the matrix in lower triangular form, we subtract column 4 of A4 from column 3, and thenadd 10 times column 3 to column 4. This gives

A5 =

(1 0 0 01 2 0 03 0 1 0

)=

(1 0 0 01 2 0 03 0 −9 −10

)(1 0 0 00 1 0 00 0 1 00 0 −1 1

)(1 0 0 00 1 0 00 0 1 100 0 0 1

).

Finally, in order to satisfy the condition bij < bii, for all j < i, we substract 3 times column 3 ofA5 from column1.

A5 =

(1 0 0 01 2 0 00 0 1 0

)=

(1 0 0 01 2 0 03 0 1 0

)( 1 0 0 00 1 0 0

−3 0 1 00 0 0 1

).

72 | 296




Systems of Linear Diophantine EquationsMatrix H = A5 is the HNF of A. Note that H = AU, where

U =

(−2 0 1 63 0 −1 −93 0 −2 −8

−6 1 2 10

)

is obtained by multiplying the 4 × 4 matrices that were used above to perform the unimodular column opera-tions.Now we can proceed to Step 3. We need to solveHU−1x = b, with x ∈ Z4. To this end, set y = U−1x. SinceU is unimodular, this is equivalent to solving Hy = b, y ∈ Z4. We know from Theorem 8 that the solution tothis system is (

y1y2y3

)= B−1b, y4 ∈ Z

73 | 296




Systems of Linear Diophantine Equationswhere

B =

(1 0 01 2 00 0 1

)and B−1 =

(1 0 0

−1/2 1/2 00 0 1

)As x = Uy, then x = ÛB−1b + ŪZ, where

Û =

(−2 0 13 0 −13 0 −2

−6 1 2

)and Ū =

( 6−9−810

).

Therefore, the solutions of the considered instance of Problem 2 are

x =

(−14

−1−7

)+

( 6−9−810

)k, with k ∈ Z.

74 | 296




Systems of Linear Equations with Nonnegativity ConstraintsNow, let’s move to Problem 3, i.e., the problem of solving Ax = b, x ≥ 0.For this system, we would like to find a characterization similar to the one previously seen for Problems 1 and2. To this end, we introduce some preliminary definitions and results.

Definition 5

1. A nonempty subset C ⊆ Rn is a cone if 0 ∈ C, and for each x in C and positive scalars λ, the product λxis in C. In other words, C is a cone if and only if 0 ∈ C and, for every x ∈ C \ 0, C contains the half linestarting from the origin in the direction x.

2. A cone C is a convex cone if for any positive scalars λ and µ and any pair of vectors x, y ∈ C, the sumλx+µybelongs toC. Here, the adjective “convex” is used to remark the fact that any convex combinationof points is also a conic combination. Hence, a convex cone is a convex set, i.e., it is the set of all convexcombinations of points in C.

3. A convex cone C is said to be polyhedral if there exists a matrix A ∈ Rm×n such that C = {x ∈ Rn :Ax ≤ 0}. In other words, a polyhedral cone is the intersection of finitely many linear half spaces of theform {x : aix ≤ 0}, for some nonzero row vector ai.

75 | 296




Systems of Linear Equations with Nonnegativity ConstraintsTheorem 9 (Fundamental theorem of linear inequalities (Schrijver, 1998, p.85-86))

Let a1, a2, . . . , am and b be a set of vectors in a n-dimensional space. Then, either

1. b is a nonnegative linear combination of linearly independent vectors from a1, a2, . . . , am

or

2. there exists a vector d ∈ Rn, such that dT b < 0 and dT ai ≥ 0, i ∈ {1, 2, . . . , m}.

Note that the alternative 1 states that b lies in the convex cone generated by the vectors ai, i.e.,

b ∈ cone(a1, a2, . . . , am) = {m∑

j=1

λiaj : λi ≥ 0, ∀ i}.

76 | 296




Systems of Linear Equations with Nonnegativity ConstraintsIn contrast, the alternative 2 states that the hyperplane d⊥ = {x ∈ Rn : dT x = 0} strictly separates b fromcone(a1, a2, . . . , am).

Thus, Theorem 9 states an intuitive, yet fundamental, result about convex separation: either b is a member ofcone(a1, a2, . . . , am) or there exists a hyperplane that strictly separates the two objects.

77 | 296




Systems of Linear Equations with Nonnegativity ConstraintsWe can now state the following result for Problem 3:

Theorem 10 (Farkas’ Lemma)

The system of linear equalities Ax = b admits a nonnegative solution x if and only if yT b ≥ 0 for each vectoryT such that yT A ≥ 0.

Proof.Necessity is trivial as yT b = yT Ax ≥ 0 for all x and yT with x ≥ 0, yT A ≥ 0 and Ax = b. To provesufficiency, suppose that the linear system Ax = b admits no nonnegative solution. This fact implies thatb /∈ cone(a1, a2, . . . , an), where a1, a2, . . . , an are the column vectors of A. Then, by Theorem 9 there existssome vector y such that yT b < 0 and yT A ≥ 0.Geometrically, Farkas’ Lemma states that the linear system of equalitiesAx = b admits a nonnegative solutionx if and only if vector b belongs to the cone generated by the column vectors of A.Farkas’ Lemma also proves fundamental to characterize the solvability of Problems 4 and 7, i.e., the solvabilityof the systems of linear inequalities Ax ≤ b with and without nonnegativity constraints:

78 | 296




Systems of Linear Equations with Nonnegativity ConstraintsTheorem 11 (Farkas’ Lemma - Solvability of linear inequalities)

The system of linear inequalities Ax ≤ b admits a solution x if and only if yT b ≥ 0 for each vector y ≥ 0 suchthat yT A = 0.

Theorem 12 (Farkas’ Lemma - Solvability of linear inequalities with nonnegativity constraints)

The system of linear inequalitiesAx ≤ b admits a nonnegative solution x if and only if yT b ≥ 0 for each vectory ≥ 0 such that yT A ≥ 0.Fourier’s elimination method (similar to Gaussian elimination) is a possible approach to solve Problem 3. Simi-larly, the Simplex Algorithm is a possible method to solve Problem 4 and 7. The readers interested in a such amethods are referred to (Schrijver, 1998; Papadimitriou and Steiglitz, 1998).

79 | 296




Systems of Linear Diophantine Equations with Nonnegativity ConstraintsThe essence of the previous theorems is that if Ax = b does not have a solution of a certain type, then we can“infer” one single linear equation from Ax = b (i.e., we can take a linear combination of the given equations)for which it is obvious that it does not have a solution of this type.In the case of Farkas’ Lemma, the linear combination must be nonnegative (i.e., conic).Characterizing the solvability of Problems 4, 6 and 8 is instead more complicate. A kind of characterizationwas obtained by Chvátal (1973) and Schrijver (1980) based on the work carried out by Gomory (1960). Suchcriterion, here formulated just for Problem 8, is based on the following two inference rules:

• Given the inequalities aT1 x ≤ β1, aT2 x ≤ β2, . . . , aTmx ≤ βm and λ1, λ2, . . . , λm ≥ 0, infer the inequality(m∑

i=1

λiaTi

)x ≤

m∑i=1

λiβi

.

• Given the inequalityα1x1 +α2x2 +. . .+αnxn ≤ β, infer the inequality ⌊α1⌋x1 +⌊α2⌋x2 +. . .+⌊αn⌋xn ≤⌊β⌋.

80 | 296




Systems of Linear Diophantine Equations with Nonnegativity ConstraintsIt is obvious that, if a nonnegative integral vector x satisfies the given inequalities in Rule 1 or Rule 2, then it alsosatisfies the inferred inequality. This observation suggests the following result:

Theorem 13 (Nonnegative Integer Solvability of Linear Equations)Assume that A and b are rational. Then, there exists a vector x ∈ Zn, x ≥ 0, such that Ax ≤ b if and only if wecannot infer from Ax ≤ b, by repeated applications of Rule 1 and Rule 2, the inequality 0T x ≤ −1.It is important to note that to derive the inequality 0T x ≤ −1, it may be necessary to apply the Rules 1 and 2 alarge number of times.Example: Consider the following inequality system in R2 :

x + y ≤ 3.5x − y ≤ 0.5

−x + y ≤ 0.5−x − y ≤ −2.5

81 | 296




Systems of Linear Diophantine Equations with Nonnegativity ConstraintsThis system does not have a nonnegative integral solution. In order to see it, first we apply Rule 2 to each of thegiven inequalities to obtain

x + y ≤ 3x − y ≤ 0

−x + y ≤ 0−x − y ≤ −3

From the addition of the first two inequalities above we infer by Rule 1 that

x ≤ 1.5.

Similarly, the lat two inequalities yield, by Rule 1, that

−x ≤ −1.5.

optimization - part i: foundations of discrete optimization - … · 2020. 10. 8. · programming...

Documents