polytopes and the simplex method - university of british ...cass/research/pdf/simplex.pdf ·...
TRANSCRIPT
Last revised 3:49 p.m. January 11, 2017
Polytopes and the simplex method
Bill Casselman
University of British [email protected]
A region in Euclidean space is called convex if the line segment between any two points in the region isalso in the region. For example, disks, cubes, lines, single points, and the empty set are convex but circles,
doughnuts, and coffee cups are not.
Convexity is the core of much mathematics. Whereas arbitrary subsets of Euclidean space can be extremely
complicated, convex regions are relatively simple. For one thing, any closed convex region is the intersectionof all affine halfspaces f ≥ 0 containing it. One can therefore try to approximate it by the intersection of
finite sets of halfspaces—in other words, by polytopes .
A polytope is by definition the intersection of a finite number of halfspaces in a Euclidean space. Thus cubes
are polytopes but spheres are not. The simplest example, and the model in some sense for all others, is the
coordinate octant xi ≥ 0. Another simple example is the simplex xi ≥ 0,∑
xi ≤ 1. This essay will tryto explain how to deal with polytopes in practical terms. The literature on this is extensive, and I’ll not be
innovative. Since linear programming is commercially applicable, little of the literature is aimed at quite thesame audience as this note is. (But compare §3.2 in [Ziegler:2007], titled ‘Linear programming for geometers’,
or [Knuth:2005])
The major consideration in the theory of convex regions is that there are two basic ways to describe one. If
Ω is any set of points in Rn, its convex hull is the set of all points which can be represented as a finite sum∑
cixi with ci ≥ 0,∑
ci = 1, and all xi in Ω. For example, the line segment between two points x and y isthe convex hull of those two points, and the convex hull of a circle is a disk. A convex hull is clearly convex.
On the one hand, a closed convex region is the intersection of a set of halfspaces. On the other, as we shallsee, it is the convex hull of its extremal points. The basic problem in dealing with polytopes is how to transfer
back and forth between these two characterizations.
An extremal point or of a closed convex region is one that does not lie on the interior of a line segment in the
region. In some sense dual to the result that any closed convex region is the intersection of all halfspaces
containing it, any bounded closed convex region is the convex hull of its extremal points.
For polytopes, an extremal point is a vertex . A bounded polytope is therefore the convex hull of its vertices.
For unbounded polytopes, this has to be modified. First of all, a polytope in Rn may be invariant under
a linear subspace U of translations. It then projects onto a convex region in the quotient of Rn by U that
is nondegenerate, in the sense that the subspace of translations leaving it invariant is trivial. The original
region is thne isomorphic to the product of U and the projection. A nondegenerate polytope is the convexhull of its vertices together with a finite number of extremal rays.
Polytopes and the simplex method 2
Defining a polytope by affine inequalities and defining it as a convex hull are dual modes of specification.
Thus we arrive immediately at the most basic computational problem concerned with convex polytopes—given a polytope specified in one of theseways, to find an explicit specification in the other. More specifically,
if a polytope is specified as the intersection of a finite number of halfspaces fi(x) ≥ 0, how can one find its
vertices and extremal rays? If a region is specified as the convex hull of a finite set of points, how can onefind its vertices among this finite set? How can one find a finite number of halfspaces whose intersection is
the polytope?
In the first part of this essay I’ll prove the very basic result that the convex hull of any finite set of points is
a polytope—i.e. it can also be specified in terms of a finite set of affine inequalities. In this, I shall follow
[Weyl:1935]. This simple result has many consequences.
There are several other questions onemight pose. A face of a polytope is any affine polytope on its boundary.
A facet is a proper face of maximal dimension.
• Is the region bounded?
• If it is bounded: What are its vertices?
What is its volume?
What is the number of lattice points it contains?• Given a finite set of points, which among them are the vertices of its convex hull?
• What is its dimension (taken to be−1 if the region is empty)?
• What is a minimal set of affine inequalities defining it? Or, in other words, what are its facets?• What is the maximum value of a linear function on the polytope (which might be infinite)?
• How to describe the face where a linear function takes its maximum value?• What is its facial structure? The answer to this can be extremely complex.
• What is the dual cone of linear functions bounded on the polytope?
• How to find the intersection of two convex sets?• How to find the convex hull of the union of two convex sets?
• Given two convex sets, is one contained in the other?
Answering these questions reduces to answering a few basic ones. For example, to find out whether one
region is contained in the other it suffices that you know the characterization of one in terms of affine
inequalities, but the other in terms of extremal points. Finding the intersection of two convex bodies is trivialif they are specified by affine inequalities, and finding the convex hull of their union is easiest if the regions
are given as convex hulls.
These problems are well known to arise in applications of mathematics. Among other things, convexity
is a crucial feature in the way we perceive things in 3D, and hence problems related to convexity arise in
computer graphics. But in fact they also arise in trying to understand the geometry of configurations in highdimensions, which comes up surprisingly often even in the most theoretical mathematical investigations.
For one thing, acquiring a picture of an object in high dimensions, or navigating regions of high dimensionalspace, often amounts to understanding the structure of convex regions.
Answering these questions can be daunting, since the complexity of problems increases drastically as di
mension increases. There are now many algorithms known to answer such questions, but for most practical
Polytopes and the simplex method 3
purposes the one of choice is still the simplex method . This is known to require time exponential in thenumber of inequalities for certain somewhat artificial problems. But on the average, as is shown in a series
of papers beginning apparently with [Smale:1983] and including most spectacularly [SpielmanTeng:2004],
it works well. It is also more flexible than competitors.
Contents
I. Geometry1. Convex bodies
2. Proof of the basic theoremII. Basic procedures
3. Introduction
4. Column reduction5. Vertex frames
6. Finding a vertex basis to start
7. Affine coordinate transformations8. Pivoting
9. Tracking coordinate changes10. An example in some detail
III. Applications
11. Finding maximum values12. Finding all vertices of a polygon
13. Nearest point
14. Finding the dimension of a convex region15. Implementation
IV. References
As far as I know, convex bodies were first systematically examined by Minkowski, who used them in
spectacular ways in [Minkowski:1910]. [Weyl:1935] is a pleasant account of the basic facts, and I follow his
exposition in the first part.
Mymain reference for the simplexmethodhas been thewell known text [Chvatal:1983], althoughmynotation
is more geometric than his. The audience for expositions of linear programming is not, by and large, madeup of mathematicians, and the standard notation in linear programming is, for most mathematicians, rather
bizarre. I follow more geometric conventions.
Eventually, this essaywill be accompanied a small suite of Pythonprograms. Because I am almost exclusively
interested in theoretical questions, I’ll implementmy routineswith exact rational arithmetic, which eliminates
technical difficulties in dealing with floating point numbers. This is the concern of much of the literature,that does not affect us.
Part I. Geometry
1. Convex bodies
It is a little troublesome to say much about arbitrary convex regions. For example, the union of an open disk
with any subset of its circular boundary is convex. The following is elementary, however.
1.1. Proposition. The closure of a convex region is convex.
I begin now with some observations on arbitrary closed convex regions. Throughout, let V be a real vectorspace of dimension n.
The following result is the basic fact about convex regions:
1.2. Theorem. Duality) Any closed convex set in V is the intersection of all halfspaces containing it.
Polytopes and the simplex method 4
For example, the unit disk in the plane is the intersection of all halfplanes x cos θ + y sin θ ≤ 1 as θ rangesover [0, 2π).
This result remains true (as a consequence of the HahnBanach Theorem) even for locally convex topological
vector spaces of infinite dimension, although the proof I give will remain valid only for Hilbert spaces.
Proof. Assign V a Euclidean metric. Let Ω be a closed convex set, P an arbitrary point not in Ω. It is to be
proved that there exists a hyperplane f(x) = 0 such that f(P ) < 0, f(x) ≥ 0 for all x in Ω.
Since Ω is closed, there exists a point Q in Ω at minimal distance from P . I may as well take Q to be the
origin of the coordinate system. I claim that the hyperplane passing through Q and perpendicular to the line
through P and Q will do. The equation of the corresponding open half space that contains P is
f(x) = −x •P < 0 .
It is to be shown if R lies in Ω then R •P ≤ 0. If R lies in Ω so does tR for all small t. We now have
‖P − tR‖2 = ‖P‖2 − t (R •P ) + t2 (R •R) ,
so that if R •P > 0 the point tR is closer than the origin to P for small t, a contradiction.
If Ω is a closed convex subset of V , its dual Ω∨ is the set of all affine functions f on V such that f(Ω) ≥ 0. Iff lies in Ω∨ and c ≥ 0 then so is cf . Hence Ω∨ is a closed cone in the space of affine functions, which has
dimension n + 1. It is also convex.
Let V be the vector space of linear functions on V . The affine functions on V may be identified with V ⊕ R.
If A is a closed cone in the vector space of affine functions, I’ll define its point dual A∧ to be the subset of allx in V such that f(x) ≥ 0 for all f in A. It is also closed, and it is convex since
f((1 − t)x + ty) = (1 − t)f(x) + tf(y)
for any affine function f . Any x in Ω is of course a point in the point dual of Ω∨.
The following is just a rewording of Theorem 1.2.
1.3. Corollary. If Ω is a closed convex subset of V , then the embedding of Ω into the point dual of Ω∨ is abijection.
There is a special case of the theorem worth mentioning. If Ω is a cone, its dual cone Ω is the set of all linear
functions f with f(Ω) ≥ 0. It is a subset of the linear dual V . For example, if (ei) is a basis of V , (ei) the dual
basis, and Ω the cone generated by the ei for 1 ≤ i ≤ d, then Ω is the set of all sums
d∑
i=1
ciei +
n∑
d+1
ciei (ci ≥ 0 for i ≤ d, arbitrary for i > d) .
1.4. Lemma. If Ω is a cone, then Ω∨ is the set of all f + c with f in Ω and c ≥ 0.
Polytopes and the simplex method 5
Proof. One way is trivial. For the other, suppose f + c to lie in Ω∨. This means that f(x) + c ≥ 0 for all x inΩ. Since 0 lies in Ω, c ≥ 0. But now say x lies in Ω. Then so does λx for all λ > 0, so that f(x) + (c/λ) ≥ 0
for all λ > 0. Therefore f(x) ≥ −ε for all ε > 0, so that f(x) ≥ 0 and f lies in Ω.
1.5. Proposition. If Ω is a closed convex cone in V then the canonical map from Ω toΩ is a bijection.
1.6. Corollary. The map taking Ω to Ω is a bijection between closed convex cones in V and those in V .
It may happen that a convex region Ω is invariant under translation by vectors in some linear subspace.Define Inv(Ω) to be the linear space of all such vectors, the invariance subspace associated to Ω. If Ω is a
closed cone then Inv(Ω) ⊆ Ω since 0 lies in Ω.
If Ω is any convex subset of V , its affine support is the smallest affine subspace containing it. I’ll thedimension of that subspace the dimension of Ω. The dimension will be m if and only if Ω contains a simplex
of dimension m. This will be n if and only if its interior is nonempty.
There is a strong relationship between the dimension of a convex set and the invariance subspace of its dual.
I’ll look just at what happens for cones, which is fairly simple. If Ω is a closed convex cone, its affine support
is a linear subspace, which I’ll also call its linear support..
1.7. Proposition. Suppose Ω to be a closed convex cone. If U is its linear support, then Inv(Ω) is U⊥.
Proof. This depends on a series of elementary claims. () It is trivial to see that U⊥ is contained in Inv(Ω).() The cone contains m linearly dependent vectors ui (1 ≤ i ≤ d) spanning the linear support U of Ω. These
generate a cone Ω∗. () If we extend these to a basis of V and then let ui be the dual basis, the cone Ω∗ is
that spanned by the ui for i > d, and its invariance subspace is clearly U⊥. However, Ω∗ ⊆ Ω if and only if
Ω ⊆ Ω∗.
A cone Ω is called non-degenerate if Inv(Ω) = 0 and it has dimension n.
1.8. Corollary. The duality of closed convex cones in V with those in V induces a bijection of nondegenerate
cones in V with those in V .
1.9. Proposition. If Ω is a closed convex cone then 0 is a vertex if and only if Inv(Ω) = 0.
Proof. If 0 = x + y with x, y in Ω then x = −y and the line through them is contained in Ω as well as Inv(Ω).
Not every cone in the space of affine functions is equal to some Ω∨. This space is not just a vector space,
but one with a particular coordinate, taking f to its constant term, and the cone Ω∨ is a special cone, since ifc ≥ 0 then Ω∨ + c ⊆ Ω∨. I’ll call such a cone upwardly stable . I’ll call a cone in V ⊕ R positive if its support
lies in the half space xn+1 ≥ 0. A closed convex cone is positive if and only if it is equal to the cone over its
slice at level 1.
The following is elementary.
1.10. Lemma. The cone Ω in V ⊕ R is upwardly stable if and only if Ω is positive.
1.11. Proposition. A closed convex cone C in V ⊕ R is equal to Ω∨ for some Ω ⊆ Rn if and and only if it is
upwardly stable. In these circumstances, Ω is the set of all x in V such that (x, 1) lies in C.
If Ω is a closed convex region with U = Inv(Ω and W its affine support, its image Ω in W/U will benondegenerate and Ω will be isomorphic to U ×Ω. The analysis of convex regions thus reduces easily to that
of nondegenerate ones.
CARATHEODORY’S THEOREM. The following first appeared in [Steinitz:1913] (p. 155), after an earlier specialcase had been dealt with in [Caratheodory:1907]. Both these works were primarily interested in questions
about conditional convergence!
1.12. Proposition. Caratheodory’s Theorem) Suppose X to be a nonempty subset of Rn. Then:
Polytopes and the simplex method 6
(a) if y 6= 0 is a nonnegative linear combination of points in X , it may be expressed as a positive linearcombination of linearly independent points of X ;
(b) any point in the convex hull of X lies in the convex hull of some subset of points of X of size ≤ n + 1.
In other words, in (a) every point in the convex cone generated by X is contained in a simplicial conegenerated by elements ofX , and in (b) every element in the convex hull ofX is contained in a simplex whose
vertices are in X .
Proof. The proof is an algorithm.
(•) Suppose v =∑
cixi with ci > 0, xi in X . Let N be the number of xi. If the xi are linearly independent,exit.
Otherwise, there exists an equation ∑aixi = 0 .
Negating if necessary, we may assume one ai > 0. Then for any b
v =∑
(ci − b ai)xi .
The line y = ci − xai in R2 starts off at (0, ci), and has slope −ai. At least one of them crosses the positive
xaxis; suppose the first one crosses at (b, 0). Then all ci − bai are nonnegative, and at least one is 0. Thisgives an expression for y involving a smaller number of xi. Loop to (•).
The procedure cannot go on forever, since y 6= 0 and a single element of X is a linearly independent set.
For (b), let Ω be the convex hull of X . The claim follows from (a) applied to the cone over (Ω, 1) in Rn+1.
1.13. Corollary. In a finitedimensional real vector space
(a) the convex hull Ω of a compact set X is compact;(b) the convex cone generated by a finite set is closed.
Proof. For (a), let Σ be the simplex of all (ci) in Rn+1 with
∑ci = 1. According to the proposition, the set Ω
is the image of Σ × Xn+1 with respect to the map
(c, x) 7−→∑
cixi .
For (b), the claim is certainly true when the points are linearly independent, and the cone is simplicial. ButCaratheodory’s Theorem reduces the claim to that case.
1.14. Lemma. Suppose f to be a linear function, Ω a bounded, closed, convex subset of Rn. There exists an
extremal point of Ω on which the restriction of f to Ω takes its maximum value.
Proof. By induction on n. The case n = 0 or 1 are trivial. Otherwise, suppose that c is the maximum value off on Ω. Since Ω is bounded it is also compact, and the interection ofΩ with the hyperplane H onwhich f = cis not empty. It is also closed and convex. Wemay choose a nontrivial linear function on H . By the induction
hypothesis it achieves its maximum value at an extremal point x of H ∩Ω. But I claim that an extremal pointof H ∩ Ω muts also be an extremal point of Ω. Why is this? Suppose x = (1 − t)y + tz with y, z in Ω, t > 0.But then f(x) = (1 − t)f(y) + tf(z). If either f(y) or f(z) is > c then the other must be < c, which is notpossible. So y and z both lie in H ∩ Ω. .
1.15. Proposition. Any bounded, closed, convex set subset Ω of Rn is the convex hull of its extremal points.
Proof. Let C be the convex hull of the extremal points. The extremal points are a compact set, so C is closed.
The Lemma implies that C∨ and Ω∨ are the same, so the Corollary to the Theorem implies that C = Ω.
POLYTOPES. In the rest of this essay I’ll be concerned with a limited class of convex sets, those which
are the intersections of a finite number of halfspaces. These are convex polytopes . I’ll be mainly interested
Polytopes and the simplex method 7
in algorithms to deal with these, but I’ll begin with a few basic theorems. In fact, in the theory of convexpolytopes there is one really basic theorem, from which many others follow without trouble.
Suppose X to be a finite set of V of dimension n. Suppose f an affine function and let H be the hyperplane
f = 0. The linear halfspace f ≥ 0 will be called extremal for X if X is contained in it, and the dimensionof X ∩ H is n − 1. In these circumstances the boundary of the extremal halfspace will contain a facet of the
convex hull of X . Of course there can be only a finite number of these extremal halfspaces. An extremalfunction is one defining an extremal halfspace. The meaning of ‘extremal’ should be made clearer by:
1.16. Lemma. Suppose f to be an extremal function for X . If f = (1 − c)f1 + cf2 with 0 < c < 1 and fi ≥ 0on X , then the fi are both multiples of f .
Proof. Suppose f = 0 on the set Y of n − 1 linearly independent points of X . If fi > 0 on Y then f > 0 on
Y , a contradiction. So fi = 0 on Y and must be a multiple of f .
The convex hull of the cone generated by X ⊆ V is the set of all finite sums∑
X cxx with all cx ≥ 0. Forexample, if X is a collection of n linearly independent points its convex hull is a simplicial cone, and everysubset of n − 1 points determines an extremal halfspace.
For closed convex cones that are polytopes, we have the following refinement of Theorem 1.2:
1.17. Theorem. If X is a finite set of points in V containing a set of n linearly independent points, itshomogeneous convex hull is the intersection of all linear halfspaces extremal for X .
In particular, it is a polytope. The assumption on X is necessary, but more generally this amy be applied to
X embedded in its linear support.
I’ll postpone the proof to the next section, but I’ll now exhibit here a number of consequences.
1.18. Proposition. Suppose F to be a finite set of linear functions on V , and
Ω =⋂
f∈F
x
∣∣ f(x) ≥ 0
.
Then any f in V with f(Ω) ≥ 0 is in the convex cone generated by F .
Proof. The convex cone generated by F is contained in Ω, and the duals of both are equal to Ω. Hence they
are the same.
The following is a converse to the Theorem. A ray R in a cone is called extremal if whenever x in R is equal
to a sum (1 − t)x + ty with x, y in Ω and 0 < t < 1 we must have both x and y in R.
1.19. Corollary. Suppose Ω to be a closed, convex, nondegenerate cone defined by a finite set of linearinequalities fi(x) ≥ 0. There exists a finite set X of extremal rays such that Ω is the convex hull of X .
Proof. Since Ω is nondegenerate, so is Ω. Proposition 1.18 tells us that Ω is the convex hull of F in V . The
Theorem tells us that Ω is the intersection of a finite number of extremal halfspaces 〈x, u〉 ≥ 0 for x in V .
These x will generate the extremal rays.
If Ω is nondegenerate, so is Ω. Since Ω contains n linearly independent points, Ω is contained in a simplicial
cone. There exists a compact slice through the simplicial cone, which is also a compact slice through Ω. Theextremal rays of Ω will intersect the slice in its vertices.
1.20. Corollary. Any polytope X in V is the convex hull of a finite number of points and rays.
Proof. It suffices to assume Inv(X) = 0 and X of dimension n. I begin with a construction that has already
implicitly appeared. The space V may be embedded into Rn+1 as an affine subspace: x 7→ (x, 1). Rays in
V ⊕ R that intersect xn+1 = 1 correspond to points in V , while rays in V ⊕ R in the hyperplane xn+1 = 0correspond to rays in V . If X is a closed convex subset of V , let X be the unique closed cone generated by
the image of X . It is the homogenization of X .
This Corollary now follows from Corollary 1.19 applied to X .
Polytopes and the simplex method 8
I’ll say more about the ‘homogenization’ used in this argument later on. It will be a basic tool in my accountof algorithms.
Polytopes and the simplex method 9
2. Proof of the basic theorem
Suppose given a finite set X of points in Rn, and let Ω be the homogeneous convex hull of the set X . We
are assuming it to contain an open subset of Rn. It is to be shown that Ω is the intersection of all extremal
halfspaces of X .
One half of this is trivial, since Ω is certainly contained in the intersection of all extremal halfspaces. Itremains to be shown that if a point lies in the intersection of extremal halfspaces then it lies in Ω.
It may very well happen that there are no extremal halfspaces for X . In this case, the theorem asserts thatthe convex hull of the cone generated by X is all of R
n.
The proof proceeds by induction on n ≥ 1. If n = 1 then the points of X span a nontrivial segment. If this
segment contains 0 in its interior, then the homogeneous convex hull is all of Rn, and there are no linear
functions cx taking a nonzero value on all of X except 0 itself. Otherwise, the homogeneous convex hull is
a halfline with 0 on its boundary, and again the result is plain.
Now assume the Theorem to be true for dimension up to n − 1, and suppose X to have dimension n. Thereare two steps to the argument, which is constructive. The first either produces an extremal halfspace ordemonstrates that the homogeneous convex hull of X is all of R
n. The second proves the theoremwhen it is
known that extremal halfspaces exist.
Step 1. Suppose α = 0 to be any hyperplane containing n − 1 linearly independent points of X . Eitherall points of X lie on one side of this hyperplane, say the side α ≥ 0, or they do not. If they do, then
this hyperplane is extremal, and we can pass on to the second step. Suppose they do not. A new extremalhalfspace β ≥ 0 will be found that contains at least one more point of x, so we can set α = β and loop.
Let e be a point of X with 〈α, e〉 < 0. By changing coordinates if necessary, we may assume α to be thecoordinate function xn, and e = (0, 0, . . . ,−1). Let Y be the projection onto the plane xn = 0 of the points
X≥0 ofX with xn ≥ 0. The new system Y is nondegenerate of dimension n− 1, by assumption on α. Eitherit possesses an extremal halfspace 〈β, x〉 ≥ 0 with 〈β, x〉 = b1x1 + · · · + bn−1xn−1 or it does not. If it does,the halfspace 〈β, x〉 ≥ 0 contains the point e in addition to the points in X≥. The hyperplane β = 0 contains
n − 2 linearly independent points on the hyperplane α = 0 together with e, and also contains n− 1 linearly
independent points, so induction is allowable.
e
α = 0
In the second case, when Y does not possess an extremal halfspace, we can apply the induction assumption.
Every point in the hyperplane 〈α, x〉 = 0 must be in the convex hull of Y . Adding positive multiples of e, we
see that every point where xn ≤ 0 is in the convex hull. Since we have eliminated the case where all pointsof X lie on one side of α = 0, we can find a point with xn > 0; a similar argument implies that every point
where xn ≥ 0 is also there.
Step 2. At this point we know there exist extremal halfspaces containing Ω. Let ∆ 6= ∅ be a set of affine
functions α such that the inequalities α ≥ 0 define them.
Choose a point e in the interior of the homogeneous convex hull of X = xi. Thus 〈α, e〉 > 0 for every α in
∆. Let p be an arbitrary point in the intersection of extremal halfspaces, so that also 〈α, p〉 ≥ 0 for all α in ∆.
Polytopes and the simplex method 10
We want to show that p is in Ω, or that
p =∑
cixi
with all ci ≥ 0. We may assume that p 6= e.
The set of all
λe + µp
with λ, µ ≥ 0 is a twodimensional wedge. Any point of the same formwith λ < 0, µ ≥ 0 will be in the plane
spanned by this wedge, outside this wedge on the far side of the line through p. Since
〈α, p − λe〉 = 〈α, p〉 − λ〈α, e〉
and 〈α, e〉 < 0, the ray of all q = p − λe with λ > 0 will find itself eventually in the region where α < 0. Itwill cross the hyperplane α = 0 when
λ =〈α, p〉
〈α, e〉.
If λ is chosen to be the least value of 〈α, p〉/〈α, e〉 as α ranges over ∆, then
(1) p = q + λe where λ ≥ 0 and e is in the homogeneous convex hull of X ;
(2) 〈λ, q〉 ≥ 0 for all α in ∆;
(3) 〈α, q〉 = 0 for some α in ∆.
In order to prove that p is in the homogeneous convex hull of X , it suffices to show that q is in it.
We want to apply the induction hypothesis to q. Let Y be the set of all x in X on the hyperplane Hα where
α = 0, which contains q. Because α is an extremal support, there exist at least n − 1 linearly independent
points in Y , so that it is nondegenerate in n − 1 dimensions. Choose coordinates so that α = xn. By theinduction assumption, the convex hull of Y is the intersection of all its extremal halfspaces in Hα. That q lies
in this intersection will follow from:
2.1. Lemma. Any extremal hyperplane for Y = X ∩ Hα in Hα can be extended to an extremal hyperplanefor X in R
n.
Proof of the Lemma. Suppose
〈β, y〉 = b1x1 + · · · + bn−1xn−1 ≥ 0
be one of the extremal halfspaces for Y in Hα. The function β will vanish on a subset Υ ⊆ Y of essential
dimension n − 2. Any extension of β to X will be of the form
〈B, x〉 = b1x1 + · · · + bn−1xn−1 + cxn
for some constant c.
I claim that c can be chosen so that the B ≥ 0 on X and meets at least one point x in Y − X . This means no
more and no less than that it defines an extremal half space forX . The conditions on B are that
〈B, x〉 = b1x1 + · · · + bn−1xn−1 + cxn ≥ 0
for all x in X − Y , and that 〈B, x〉 = 0 for at least one of these x. These require that
−c ≤b1x1 + · · · + bn−1xn−1
xn
for all x in X − Y , with equality in at least one case. But X is finite and xn > 0 on X − Y , so the right hand
side will achieve a minimum for some c < 0 as x varies.
Part II. Basic procedures
Polytopes and the simplex method 11
3. Introduction
In the remainder of this essay I’ll explain one of the two techniques for implementing the basic theorems
about polytopes. Such technqiues seem to be roughly divided into two general categories. The incrementalmethods extend the idea behind the proof of the basic theorem that I gave in the last part. The other is the
simplex method, which is what I shall be concerned with. The problem I am most interested in is to find allvertices of a bounded polytope, given a finite number of affine inequalities that define it. But this will involve
a number of related tasks: Given an arbitrary set of inequalities, how do we determine the subspace leavingthem invariant? How do we find the nondegenerate quotient? All searches on the polytope begin from agiven vertex. Given just the inequalities, how to locate a starting vertex? This will involve in turn another
very basic question: Given a linear function, how to locate a vertex at which it achieves a maximum value?
At the very start we are given a finite set of inequalities such as
x ≥ 1
y ≥ −1
x + y ≤ 3 .
We rearrange them so as to be in a uniform format:
x − 1 ≥ 0
y + 1 ≥ 0
−x − y + 3 ≥ 0 .
I. e. from now on, I shall assume that we are given a finite set of affine functions fi of the form
fi(x) =n∑
j=1
ai,jxj + ai,n+1
defining the region Ω in which fi ≥ 0. We can even take an equation f(x) = 0 into account, replacing it byinequalities f(x) ≥ 0 and −f(x) ≥ 0. I should say immediately that I shall work only with functions whose
coefficients are rational. Otherwise, random adjustments of floating point numbers in computer programs
make the problem meaningless.
Suppose that we now have at hand m functions of n variables expressing the inequalities defining Ω. What
next? There are a few steps: (i) find U = Inv(Ω) and the corresponding inequalities defined on Rn/U ; (ii)
find an independent subset of the inequalities in terms of which all of them are expressed (an adapted affine
basis); (iii) either find a vertex of Ω or determine that Ω is empty.
There is one thing that will answer some of these questions immediately, and set things up to answer the
other—replace the region by its homogenization. This means (1) interpreting the constants in the given
inequalities as coefficients of a new variable, and (2) adding a final inequality to the set, defining a ‘bottom’to the homogenization. For example, the equations above now become
x − z ≥ 0
y + z ≥ 0
−x − y + 3z ≥ 0 ,
and we add to them
z ≥ 0 .
If f is one of an affine function, I’ll write f for its homogenization. The inequalities f(x) ≥ 0 describe a
homogeneous cone CΩ in Rn+1. The original region may be identified with the slice xn+1 = 1 of CΩ, and a
description of the cone will allow us to describe the original region.
What does homogenization accomplish? It will be a standard exercise in linear algebra to find a coordinate
system well adapted to the homogeneous system of inequalities. More precisely, we shall be able to find a setB ⊆ [1, m] with these properties:
Polytopes and the simplex method 12
• the f i for i in B are linearly independent;• all the other f i can be expressed in terms of these.
Let U be the linear subspace⋂
f i = 0 of Rn.
3.1. Lemma. The linear subspace U is the same as Inv(Ω).
Proof. Because xn+1 is among the fi.
Say |B| = d + 1. In the best situation, d = n. The functions in B are coordinates on the quotient of Rn+1
by U . The quotient cone therefore has dimension d + 1. In addition, the projection of CΩ will be a cone withthe origin as vertex. Similarly, the quotientΩ/Inv(Ω) will have dimension d and will possess vertices. This is
important, because all of the basic calculations to comewill work with vertices of the regionwe are interested
in.
Let’s look at the example again. First of all, we can picture the region:
y + 1 ≥ 0
x−
1≥
0
−x−
y+
3≥
0
(1,−1)
(1, 2)
(4,−1)
Homogenization gives us the matrix
1 0 −10 1 1
−1 −1 30 0 1
.
The standard column reduction algorithm (recalled in the next section) multiplies this on the right by
4/3 1/3 1/3
−1/3 2/3 −1/31/3 1/3 1/3
to get
1 0 00 1 00 0 1
1/3 1/3 1/3
.
This reduction is adequate, but it is convenient to have xn+1 as part of the basis for the cone. In general, the
function xn+1 will now be expressed as a linear combination of the fi in B. We can swap xn+1 with any ofthe fi such that the corresponding coefficient of xn+1 6= 0. In our case, we can swap with any of f1, f2, or f3.
What I do is pivot the first row out of the basis and the fourth into it to get
3 −1 −10 1 00 0 11 0 0
=
1 0 −10 1 1
−1 −1 30 0 1
4 −1 −1
−1 1 01 0 0
.
Polytopes and the simplex method 13
How is this to be interpreted? First of all, after column reduction the functions f1, f2, f3 make up the newcoordinates, The function z is expressed as (1/3)(f1 + f2 + f3). After the pivot, the basis is f2, f3, and z. We
also have the equations
x = 4z − f2 − f3
y = −z + f2
z = z .
\equationz − exp
Why is it convenient to have xn+1 in the final basis? Because we can then read off coordinates on the slice
very easily. In this example, on the plane z = 1 the origin becomes the vertex (4,−1) of the original region,as we can reda off immediately from . (Here, this is a vertex of Ω, but in general this trick is guaranteed only
to give us a kind of pseudovertex, which might not be a true vertex.)
In this example, since all the vertices of our region lie in the halfspace z ≥ 0, the addition of the inequality
z ≥ 0 has no effect. It will have an effect if and only if the convex region at hand is unbounded. In that case,
the vertices that we find on the hyperplane xn+1 = 0 will correspond to the directions of extremal rays. I’llsay very little about that.
So the homogenization simplifies the system by eliminating the invariance subspace. It also does somethingelse—it gives us a region that is closely related to the regionwe are really interested in, for whichwe do know
one vertex—the origin, which is the vertex of the cone.
FRom now on, assume Ω to be nondegenerate.
In the next section I’ll describe how to apply a standard calculation in linear algebra to find a coordinate
system well adapted to the cone.
4. Column reduction
We start with anm×(n+1)matrixF whose rows represent the affine functions involved in the given systemof inequalities. In this section, I’ll explain how to choose a coordinate system well adapted to the system of
homogeneous inequalities determined by F . In this situation, we interpret the rows of F as linear, ratherthan affine, functions.
We’ll apply elementary column operations to F . The output will be
(a) a columnreduced form of F ;(b) an (n + 1) × (n + 1) matrix R determining the coordinate transformation involved;
(c) a list B of d rows of F that are to be taken as the new coordinate functions on Rn/U (with U the
invariance subspace of the system);(d) an index array I ;
I’ll explain these as we go along.
A matrix E is weakly column reduced if has these properties characterized recursively: (a) the first nonzero
row of E has exactly one nonzero entry, which is 1; (b) the submatrix of E made up of the ei,j with i ≥ 1,j ≥ 1 is weakly column reduced. Here is a sample pattern:
1 0 0 0 0∗ 1 0 0 0∗ ∗ 0 0 0∗ ∗ 1 0 0∗ ∗ ∗ 1 0∗ ∗ ∗ ∗ 0∗ ∗ ∗ ∗ 0
.
If E is weakly column reduced of rank d, there is associated to it a unique sequence
r1 < r2 < · · · < rd
Polytopes and the simplex method 14
with the property that ei,j = 0 if i < rj but erj ,j = 1. In the example above, the rank is 4 and this sequenceis 1, 2, 4, 5. A matrix that is weakly column reduced is called column reduced if in each row ri there is only
one nonzero entry. The following is a typical pattern:
1 0 0 0 00 1 0 0 0∗ ∗ 0 0 00 0 1 0 00 0 0 1 0∗ ∗ ∗ ∗ 0∗ ∗ ∗ ∗ 0
.
Given anym×nmatrix F of rank d, a standard algorithm in linear algebra will find a sequence of elementarycolumn operations that will make F column reduced. This finds a matrix E that is columnreduced and an
n× n invertible matrix R such that E = F ·R. In these circumstances the first d columns of R will be a basis
for the subspace spanned by the columns of F , the rows ri of F will be a basis for space of functions spannedby all the rows of F , and the rows of E will be the coordinates of the rows in terms of these special rows.
I can say a little more about this by recalling how linear coordinate transformations work on linear functions.I think of one as a substitution: we have an expression for the current coordinates xi in terms of the new ones
yi (rather than viceversa), by a matrix formula:
x = Ay .
Write linear functions as row vectors, so
f(x) =∑
fixi = [ f1 . . . fn ]
x1
. . .xn
= f ·x .
Then with the coordinate change we get
f ·x = f ·Ay = fA ·y
so that its expression of the linear function in y coordinates is fA.
In our case, the number d is what I call the nominal dimension of the system (only nominal because the
convex body at hand might still be empty). The rows of this matrix are the rows corresponding to the subsetof the fi we have found to form part of a coordinate system. The coordinates of others with respect to this
basis are laid out in the other rows.
The transformation matrixR records the accumulated coordinate changes. Thus fx = fy ·R, if x and y are the
original and new coordinates. Columns d+1 throughn+1 ofR record a basis of vectors (in the original affine
coordinate system) whose translations leave the inequalities invariant. The region defined by the originalinequalities is the product of this space with a region in R
d defined by the matrix we get by eliminating these
columns, which we do. Wemay assume from now on that d = n, or equivalently that the convex region does
not contain a line.
In the end we shall be locating certain points specified in the new coordinate system. But we shall also need
to know their coordinates in the original coordinates. We have
xold = Axnew .
Polytopes and the simplex method 15
5. Vertex frames
As a result of column reduction and a bit of fiddling, we are given an m× (n+1)matrix in column reduction
form defining a closed convex cone CΩ, the cone in Rn+1 generated the copy of the original region Ω in n
dimensions in the slice at level 1.
The next step is to find a vertex ω of the region as well as a coordinate system adapted to it. By this I mean acoordinate system such that
(a) the point ω is the origin of the system;
(b) the coordinates are among the affine functions in the inequalities defining Ω.
In particular, Ω is contained in the coordinate octant.
I’ll call this a vertex-adapted coordinate system . The unit vectors in such a coordinate system make up what
I’ll call a vertex frame , which is easier to picture. This is the basic structure in the simplexmethod, and finding
one is almost always the first step in any exploration of a convex polytope. The point is that any vertex ofthe polytope may be reached by progressing through a succession of vertex frames by following edges of the
frames one encounters. One does this by pivoting, as I’ll explain later.
In a program, a vertex frame amounts to (a) an m× (n + 1) matrix F , each row of which is an affine function
defining the region involved, expressed in the current coordinate system; (b) a list B of the rows of this
matrix that represent the current coordinates; (c) for each of these, an index specifying which column in Fcorresponds to it. For convenience it is vest to incliude, although redundant, (d) a list O of the rows that are
not the coordinates. It is also a good idea to maintain (e) an (n + 1) × (n + 1) matrix R specifying the affine
transformation from current to original coordinates (you could call it the HanselandGretel matrix).
In the standard literature on linear programming, the set of functions that are not coordinates makeup what is called the basis. I am not sure where or why this usage originated. I follow more standardmathematical terminology and call the basis the set of functions that are coordinate functions. They are,after all, a basis for the space of all linear functions. This is consistent with my geometrical approach,whereas the literature often seems to shun geometrical notions.
The simplest vertices are those for which where the region is locally a simplicial cone—given locally at the
origin exactly by inequalities f ≥ 0 for f in the basis. But this does not always occur. This simple situationis the generic case, in the sense that perturbing a given system almost always turns any given polytope into
one all of whose vertices are simplicial. A vertex not of this form is called singular . Keep in mind that it isthe system that is singular, not necessarily the polytope itself. In two dimensions, for example, all vertices
are geometrically simplicial, but systems of inequalities may nonetheless be singular.
For example, suppose given the system of inequalities
2x + y ≥ 1
x + 2y ≥ 1
x + y ≥ 1
x − y ≥ −1
3x + 2y ≤ 6 .
We have the following picture, where the shading along a line f = c indicates the side where f − c ≥ 0. Theconvex region defined by the inequalities is outlined in heavy lines.
Polytopes and the simplex method 16
x−
y=−1
−3x
−2y
=−6
x + 2y = 1
x+
y=
1
2x+
y=
1 (4/5, 9/5)(0, 1)
(1, 0)(5/2,−3/4)
In that example, the system is singular at (0, 1) but not at the other three vertices of the region. The singularityat (0, 1) arises because the three bounding lines
2x + y − 1 = 0
x + y − 1 = 0
x − y + 1 = 0
all pass through that point. Any two of the corresponding affine functions will form a vertex basis. Only thelast two lines actually bound the region. But any two of these three might be the local coordinate system of a
vertex frame.
x−
y=−1
x+
y=
1
2x+
y=
1
Being a vertex basis is a local property—it depends purely on the geometry of the region fi ≥ 0 in a
neighbourhood of the origin.
Suppose we choose
X = 2x + y − 1, Y = x + y − 1 .
We can express this as:
2 1 −11 1 −10 0 1
xy1
=
XY1
,
1 −1 0
−1 2 10 0 1
XY1
=
xy1
.
The pair (X, Y ) make up a vertex frame. In the new system, the inequalities become
X ≥ 0
Y ≥ 0
−X + 3Y + 1 ≥ 0
2X − 3Y ≥ 0
−X − Y + 4 ≥ 0 .
Polytopes and the simplex method 17
6. Finding a vertex basis to start
As we shall see later, and as I have already suggested, nearly all analysis of a polytope starts with a vertex
frame. But if we are given just a set of defining inequalities, it might not be evident how to find one. As Ihave mentioned, we must deal locally. We cannot get an overall view of our region, but just inspect what is
going on in the neighbourhood of the current origin.
At this point, we have at hand a vertex frame for the homogenized version of the original system of affine
inequalities. It is from this that we shall find a vertex frame for the original system (or find that the region is
in fact empty).
One of the more elegant features of the simplex method is that this will turn out to be a special case of
the original problem that originally motivated the simplex method, that of finding the maximum value ofa linear function on a polytope, given an initial vertex frame. I postpone the detailed explanation of how
this works. At any rate, in our situation we are given linear coordinates of the function that was our original
homogeneous variable xn+1. Our given vertex is the origin, where xn+1 = 0. What we want to locate is avertex of the original inhomogeneous system, which corresponds to the slice of our cone at xn+1. So we add
to system the affine inequality xn+1 ≤ 1, and find themaximum value of xn+1 subject to the new conditions.If the maximum value is 1, we have located a good vertex frame for the original region. If not, the solution
set of the original inequalities is empty.
It’s a little difficult to picturewhat is going on, except in very low dimensions. Suppose the original equationswere
x ≥ 1
x ≥ 1
x ≤ 2
in one dimension. The important point is to realize that the origin of this coordinate system is not a vertex of
the polytope—the function x − 1 takes a negative value at x = 0.
0 1 2
x ≥ 0 x ≥ 1 x ≤ 2
These become the system
x ≥ 0
x − y ≥ 0
−x + 2y ≥ 0
y ≥ 0
−y + 1 ≥ 0 .
in two dimensions, after adding the new equations in y.
x ≥ 0
y ≥ 0
−y + 1 ≥ 0
−x + 2y≥ 0
x−
y≥
0
For a vertex frame at the origin we have six choices—any two functions out of the four that vanish at 0:x, x − y, −x + 2y, y. What we get by elimination depends on the order in which we write the equations.
Polytopes and the simplex method 18
Assuming y ≥ 0, −y + 1 ≥ 0 to be last, we are reduced to three choices. The choice X = x, Y = x − y (soy = X − Y ) gives us the new system
X ≥ 0
Y ≥ 0
X − 2Y ≥ 0
X − Y ≥ 0
−X + Y + 1 ≥ 0 .
In this case, the simplex method finds that there is indeed a vertex frame satisfying the original inequalities.
This process will be a bit unusual from the standpoint of linear programming, as we shall see later. All movesexcept the final one will be what I call virtual. What we are doing, roughly, is moving around in the ‘web’ of
vertex frames defined by the homogenized functions fi.
ANOTHER EXAMPLE. An example in which Ω = ∅might be instructive. Start with the inequalities
x ≥ 1
x ≤ 0 .
This gives us the homogeneous matrix
F =
1 −1
−1 00 1
,
and then the affine system
1 0 0−1 −1 0
0 1 00 −1 1
with basis f1, z in 3 dimensions. In this case, maximizing z doesn’t do all that well. The variable z is the only
choice to exit the basis. The only choice for one to enter it is f2, but (of course) the move is only virtual. It
results in the matrix
1 0 00 1 0
−1 −1 01 1 1
.
But now all coefficients of z are negative,, telling us that Ω = ∅.
7. Affine coordinate transformations
The essence of the simplex method is to move from one vertex of a polytope to another. Actually, this isn’t
quite what it does, as we shall see, but it is what it would like to do. What it actually does is to move from
one vertex frame to another, and it might do this without changing the vertex itself. That is to say, it involvesa change of coordinates by an affine transformation. In this section I’ll say something about this.
An affine coordinate change from (xi) to (yi) is one of the form
[x1
]=
[A a0 1
] [y1
].
Write affine functions as row vectors, so
f(x) = [∇f c ]
[x1
],
Polytopes and the simplex method 19
in which ∇f is the linear part of f , and the vector representing it is made up of the coordinates of f in thiscoordinate system. Then
[∇f c ]
[x1
]= [∇f c ]
[A a0 1
] [y1
]
so that its expression in y coordinates is
[∇f c ]
[A a0 1
].
In our case, we are going to be making very special coordinate changes by a process known as pivoting . Weare given a coordinate system (xi) and a set of linear functions fj , and we are going to swap some fℓ for a
variable xk. At the same time, we want to express all of the fj , which are given to us in a matrix, in terms of
the new variables. This reduces to the following problem: we are given
fℓ =
n∑
1
aℓ,jxj + aℓ,n+1 ,
in which aℓ,k 6= 0 (so that the coordinate transformation is legitimate). We are also given the
fi =∑
1,n
ai,jxj + ai,n+1 (j 6= ℓ) ,
each of which we wish to convert to an expression with fℓ replacing xk . We get
xk = (1/aℓ,k)(fℓ −
∑
j 6=k
aℓ,jxj − aℓ,n+1
)
fi =∑
j 6=k
ai,jxj + (ai,k/aℓ,k)(fℓ −
∑
j 6=k
aℓ,jxj − aℓ,n+1
)+ ai,n+1
=∑
j 6=k
(ai,j − (aℓ,j/aℓ,k)ai,k
)xj + (ai,k/aℓ,k)fℓ +
(ai,n+1 − (aℓ,n+1/aℓ,k)ai,k
).
In effect, we are
(a) dividing the kth column of F by aℓ,k,(b) for each j 6= k (including j = n), replacing its jth column cj of F by cj − pck, with p = aℓ,j/aℓ,k.
This is a simple matrix operation. We are given two rows in a matrix (ai,j), say ℓ and m. Row m has a singlenonzero entry in it, which is 1. Say it occurs in column k. In the other row, aℓ,k 6= 0. We apply elementary
column operations to make the new row ℓ look like the old row m. Since column operations are equivalent
to multiplication by certain (n + 1) × (n + 1) matrices on the right, this is consistent with earlier remarkson coordinate change. Explicitly, the matrix by which we finally multiply is that obtained by performing the
same column operations on the identity matrix In+1. These effects accumulate.
This operation is called pivoting on the pair (m, ℓ).
Polytopes and the simplex method 20
8. Pivoting
The basic step in using the simplex method is moving from one vertex basis to another—and, hopefully, from
one vertex to another—along a single edge of the polygon. (Or rather, to be precise, along what you mightcall a virtual edge.) Geometrically we are moving from one vertex to a neighbouring one, except that in the
case of a singular vertex where more than the generic number of faces meet the move might travel along an
edge of length 0. In all cases, this basic step is called a pivot. It is executed by pivoting the matrix F .
The explicit problem encountered in pivoting is this: we are given a vertex basis and a variable xc in the
basis. The xcaxis represents an edge (or possibly a virtual edge, as I’ll explain in a moment) of the polygon.We want to locate the vertex of the polygon next along that edge, and replace the old vertex frame by one
adapted to that new vertex. The variable xc is part of the original basis. The new basis will be obtained by
replacing that variable by one of the affine functions fr in the system of inequalities defining the polygon.In applications, the exiting variable xc is chosen according to various criteria which depend on the goal in
mind, as we shall see in the next section. Sometimes there are several choices for fr as well.
There are conditions on the possible fr . For one thing, the new vertex must be also actually a vertex of the
region Ω concerned. This puts a bound on how far out along the xcaxis we can go. Suppose
f(x) =∑
aixi + an+1 .
We know already that an+1 ≥ 0, because our vertex does lie in Ω. On the xcaxis
f(x) = acxc + an+1 .
If ac ≥ 0, there is no limit to xc. But if ac < 0, we must have
acxc + an+1 ≥ 0, xc ≤an+1
−ac
.
This has to hold for all the functions fr , so the new value of xc is the minimum value of −ar,n+1/ar,c forthose r for which ar,c < 0. Given this choice of variable xc, and without further conditions imposed, the
possible choices for the new basis variable will be any of the fr producing this minimum value of xc.
In the example we have already looked at, we could start with coordinates x+2y = 1, x+y = 1 at the vertex
(1, 0) and move along the edge x + y = 1, with xc = x + 2y − 1. We can only go as far as the vertex (0, 1),because beyond there both of the affine functions x − y + 1 and 2x + y − 1 become negative. We can choose
either one of them to replace x + 2y − 1 in the vertex coordinates.
When a pivot is made, the current coordinate frame is the given vertex frame, and in the course of the pivot
it is changed to the new vertex frame. This involves expressing all the fi in terms of the new basis.
Polytopes and the simplex method 21
How does it work in general? Given a vertex frame, any of the fi may be expressed as
fi = ai +∑
j
ai,jxj
with the ai ≥ 0 since the origin is assumed, by definition of a vertex frame, to lie in the region fi ≥ 0. We
propose to move out along the directed edge where all basic variables but one of them, say xc, are equal to
0, and in the direction where xc takes nonnegative values. This amounts to moving along an edge of thepolytope. We shall move along this edge until we come to the vertex at its other end, if there is one, or take
into account the fact that this edge extends to infinity. So we must find the maximum value that xc can take
along the line it is moving on. Along this edge the other xj vanish, and we have each
fi = ai + ai,cxc
There are two cases to deal with:
(a) If ai,c ≥ 0 there is no restriction on how large xc can be;
(b) if ai,c < 0 then we can only go as far as xc = −ai/ai,c.
Therefore if all the ai,c are nonnegative then the edge goes off to infinity, but otherwise the value of xc isbounded by all of the−ar/ar,c. In that case, we choose r such that ar,c < 0 and−ar/ar,c takes that minimum
value among those where ai,c < 0, and set xc equal to it.
fi = 0
xc = 0
xc
fr = 0
We also change bases—replacing xc by fr. The minimum value might in fact be 0, in which case we change
vertex frames (i.e. virtual vertices) without changing the geometric vertex. This sort of virtual move willhappen only if the vertex is singular, which means as I have said that more than the generic number of faces
meet in that vertex, and the maximal value of xc will be 0.
Thus our choice of r is made so that
ai − (ar/ar,c)ai,c ≥ 0
for all i, since if ai,c ≥ 0 there is no condition (ai, ai,c and −ar/ar,c are all nonnegative) and if ai,c < 0 thenby choice of r
−ai
ai,c
≥ −ar
ar,c
.
When we change frame, we have to change all occurrences of xc (which becomes just a function fc in our
system) by substitution. In short, we pivot.
Sometimes, as in the example, there will be several possible choices of fr , if the target vertex isn’t simplicial.
Indeed, in practice when pivoting is done in an application there may be several possible edges to move
out along—i.e. which variable xc to exit from the basis. The most important thing in all cases is to preventlooping. There are two commonly used methods to do this. One is a method of choosing both xc and fr,
while the other is fussy about only fr . I’ll explain these in the next section.
Polytopes and the simplex method 22
9. Tracking coordinate changes
As wemove among vertices, changing from one vertex basis to another, we’ll want to keep track of wherewe
are in the original coordinate system. We can recover the location of a vertex from the basis, since it amountsto solving a system of linear equations. But this is a lot of unnecessary work. We can instead maintain some
data as we go along that make this a much simpler task.What data do we have to maintain in order to locateeasily the current vertex? A slightly better question: What data do we have to maintain in order to changeback and forth between the current basis and the original one?
Keeping track of location comes in two halves: changing back and forth between the original coordinatesystem and the one associated to the homogenized problem, in which we located the invariance subspace
and wound up with the cone; changing back and forth between that frame and the current vertex frame. The
first is done by a linear coordinate change, and after that, by affine changes, substituting fr for xc. Thus isdone by exactly the same column operations that are performed on F , and in programs R is often tacked
onto the bottom of F to make this convenient. Its rows are not listed in either of the two lists accompanyingthe data structure, of course.
Polytopes and the simplex method 23
10. An example in some detail
In this section, I’ll summarize the process I have outlined, continuing with an earlier example.
Step 1. Start with affine inequalities fi(x) ≥ 0 defining the region Ω in Rn.
In the example at hand, these are
2x + y ≥ 1
x + 2y ≥ 1
x + y ≥ 1
x − y ≥ −1
3x + 2y ≤ 6 .
In pictures:
x−
y=−1
−3x
−2y
=−6
x + 2y = 1
x+
y=
1
2x+
y=
1 (4/5, 9/5)(0, 1)
(1, 0)(5/2,−3/4)
Step 2. We interpret these as homogeneous inequalities tomakeupamatrixF whose rows are linear functions
in dimension n + 1. We do this by interpreting the constants in the affine functions as coefficients of the newvariable xn+1. We add the inequality xn+1 ≥ 0 to get Fh. This corresponds to a system of linear inequalities
defining the cone CΩ over (Ω, 1).
Here, this gives the matrix
Fh =
2 1 −11 2 −11 1 −11 −1 1
−3 −2 60 0 1
,
corresponding to inequalities
2x + y − z ≥ 0
x + 2y − z ≥ 0
x + y − z ≥ 0
x − y + z ≥ 0
−3x − 2y + 6z ≥ 0
z ≥ 0 .
Step 3. We now apply column reduction to get a matrixG as the column reduction of Fh and a matrixR suchthat Fh ·R = G
Polytopes and the simplex method 24
Here
G =
1 0 00 1 00 0 12 0 −33 4 −131 1 −3
, R =
1 0 −10 1 −11 1 −3
.
In general, the columns ofR will be partitioned into ranges [1, d+1] and [d+2, n+1]. The first fewmake up
a (m+1)× (d+1)matrix whose columns amount to a basis of a section of the projection onto Rn+1/Inv(Ω),
the remaining ones give a basis of the null space of F . In this example d = n.
Step 4. It is convenient to perform one more sequence of column operations, pivoting the function z into thebasis of coordinates.
In this case, getting
G =
1 −1 30 1 00 0 12 −2 33 1 −41 0 0
, R =
1 −1 20 1 −11 0 0
.
This R should be stored as R0 before proceeding, and start afresh.
Step 5. Wemake this into a vertex frame in dimension n + 1 (representing affine inequalities) with an added
inequality z ≤ 1:
G =
1 −1 3 00 1 0 00 0 1 02 −2 3 03 1 −4 01 0 0 0
−1 0 0 1
.
Step 6. We now follow the basic simplex algorithm that starts out with a vertex frame to maximize z. Eitherthis demonstrates that the Ω is empty or it produces a vertex frame for Ω. in this example we pivot just
once (exchanging x with −z + 1) to maximize z. This gives us the vertex frame on the extended region in
dimension n + 1. Here
G =
−1 −1 3 10 1 0 00 0 1 0
−2 −2 3 2−3 1 −4 3−1 0 0 1
1 0 0 0
.
Step 7. This is a point on the slice z = 1, and corresponds to a vertex of the original region.
In the example, it is in fact the intersection of the two lines x + 2y − 1 ≥ 0, x + y − 1 ≥ 0, the point (1, 0). We
now want to drop all the extra apparatus we needed to get to this point, so as to be dealing only with Ω. To
do this, we drop rows and columns referring to z. Most obviously, we remove the last two rows, referring to
Polytopes and the simplex method 25
z and −z + 1. At the end, the variable−z + 1 will be part of the basis. It will be correspond to some columnm. We remove columns m from both G and R.
G =
−1 3 11 0 00 1 0
−2 3 21 −4 3
.
What is the meaning of these matrices? We are looking at the vertex (1, 0) of Ω, the intersection of the lines
f2 = f3 = 0. The other inequalities are expressed in terms of these:
−f2 + 3f3 + 1 ≥ 0
f2 ≥ 0
f3 ≥ 0
−2f2 + 3f3 + 2 ≥ 0
f2 − 4f3 + 3 ≥ 0 .
Step 8. While the computation of G has been going on, simultaneously a matrix R has also been calculatedby applying the same column operations to it. Each stage produces a matrix R, and the final one is obtained
by multiplying them all together. The final matrix R tells us how to get from the current coordinate system
to the original one.
In this example
R =
−1 2 1
1 −1 00 0 1
.
For example, the point (0, 0) in the current system is the point (1, 0) in the original, as I have already
mentioned. We can therefore check how all this has gone by calculating
2 1 −11 2 −11 1 −11 −1 1
−3 −2 6
−1 2 1
1 −1 00 0 1
=
−1 3 11 0 00 1 0
−2 3 21 −4 3
.
We can also verify that this is a vertex frame by observing that all constants (i.e. last column) in the final
matrix are nonnegative.
At any rate, given this vertex frame one is ready to start serious exploration.
Part III. Applications
Polytopes and the simplex method 26
11. Finding maximum values
The original problem that motivated the introduction of the simplex method in linear programming was
to maximize an affine function z = c +∑
cixi subject to a set of conditions fi = ai +∑
ai,jxj ≥ 0 thatdetermine a convex region Ω. In doing this, one proceeds from one vertex of Ω to another, always increasing
the value of z as one goes. This is kept up until it is seen either that z is unbounded on Ω, or that no choice
of maximizing edge is possible and the maximal value of z has been found.
Well, not quite. What one really does is proceed, not from one vertex to another, but from one vertex frame to
another. This takes place through a pivot operation. As one changes vertex frames one also keeps track of theexpression for z in the current coordinate system. If the edge has length 0, the vertex itself does not change
although the frame does, and the value of z does not in fact change. If the move is virtual—does not change
vertices—then something has to be done to avoid going around in circles.
The problem is that in the simplex method one is always proceeding with only local information. There is no
way to peek ahead to see where it is going. So one must find a way to pivot, based only on local information,that somehow avoids cycling. There have been a number of methods devised to do this, of which I shall
discuss two.
BLAND’S RULE. At each step, one affine function fi in the current basis is replaced by another. One valid
rule to follow is to choose the exiting and entering functions to have the least possible subscript. This rule
was introduced in [Bland:1977]. It is covered in many places in the literature, but most of them are simplevariations of one another. I’ll follow the explanation in [Camarena:2016], which is a variant of the treatment on
pages 37–38 of Chvatal’s book.A slightly different treatment can be found in §5.8 of [Matousek–Gartner:2007].
11.1. Proposition. Following Bland’s rule, the simplex method will terminate.
Proof. If the process does not terminate, it must cycle; and if it cycles, it does so at a fixed vertex. We maytherefore forget about those functions that do not vanish at that vertex. For convenience, wemay also assume
that the function z to be maximized is 0 at the given vertex.
Step 1. Let S be the set of all indices of the given affine functions. At any stage, let B be the indices of the
functions fi in the current basis, O its complement in S. Then write each affine function f as
f =∑
Scf,jfj =
∑B
cf,jfj
in which the first sum is over all of S, but cf,j = 0 unless j is in B. In particular we write
fi =∑
Sfi,jfj =
∑B
fi,jfj .
With these assumptions, here are the rules for pivoting. Suppose at a given stage
z =∑
Bζifi
fj =∑
Bfj,ifi
Then: (1) we choose i minimal with ζi > 0; (2) we choose j minimal such that fj,i < 0; (3) we replace fi inthe basis by fj .
Step 2. Now suppose that a cycle does occur while following Bland’s rule. Call s in S volatile if in the cyclefs is at one stage in the basis and at another stage is not.
11.2. Lemma. In order for s to be volatile, it suffices that at some stage it either leaves the basis or enters it.
Proof. Because it must reverse the move later in the cycle in order to restore the state.
Polytopes and the simplex method 27
Step 3. Let a be greatest among the volatile indices. Suppose it enters the basis at some stage p (i.e. in goingfrom stage p to the next). Let b be such that fb exits the basis in that step. Then b is also volatile, so that by
assumption b < a. Write at stage p
z =∑
S
ζifi .
Thus b is least with ζb > 0, and a is least with fa,b < 0. Since a is also maximal among volatile indices,
fc,b ≥ 0 for all other c.
Step 4. Set Pt to be the point whose coordinates at stage p are
fi =
t if i = b0 otherwise.
It is on the fb axis. Then
fj(Pt) = fj,bt
for any j in S and
z(Pt) = ζbt .
Now suppose at some later stage q that fa leaves the basis. Suppose that at this stage
z =∑
Sζ∗i fi .
Then
z(Pt) = ζ∗b t +∑
Op
ζ∗i fi,bt
so
(11.3) ζb = ζ∗b +∑
Op
ζ∗i fi,b .
This equation is the essential link between two stages.
Step 5. We now have ζ∗a > 0 because a exits the basis at stage q; ζ∗s ≤ 0 for all other s in S. Why? If s does
not lie in Bq then c∗b vanishes by definition, while if b does lie in Bq then according to Bland’s rule ζ∗s > 0 forall s 6= a in Bq.
Step 6. Since ζb > 0 but ζ∗b ≤ 0, (11.3) implies that ζ∗c fc,b > 0 for some c. Let’s look at fc more closely.
It is not in the basis Bp because the sum in (11.3) ranges only over Op. It is in Bq because ζ∗c 6= 0. Thereforec is volatile.
This implies that c ≤ a. But c 6= a. Why? Because ζ∗c fc,b > 0 while ζ∗afa,b < 0. This implies in turn that
fc,b > 0 because among volatile indices only fa,b < 0. Since ζ∗c fc,b > 0, we deduce that ζ∗c > 0. But this is acontradiction, since according to Bland’s rule only ζ∗a > 0.
PERTURBATION. Bland’s rule is very simple to implement. Not so simple is the perturbation method, which
has as variant the lexicographic method. This also is explained in Chvatal’s book. The variant is the practicalimplementation of the first, which is simpler to understand. In this, as soon as we have found an affine basis,
we add to each of the ‘slack’ variables fi a symbolic constant εi with these subject to conditions
1 ≫ ε1 ≫ ε2 ≫ · · · ≫ εℓ > 0 ,
in which ℓ is the number of nonbasic inequalities. Then whenever we pivot, the constant terms in all thevariables and the objective function will be linear combinations of numbers and these εi. In effect, we are
setting the constants to be some kind of εnumbers c0 +∑
ciεi, which we can add, subtract, and multiply
by scalars. Such numbers are compared in the natural way, which amounts to lexicographic ordering of thearrays. Furthermore, with these adjustments th system becomes simplicial—none of the functions fi (i.e.
Polytopes and the simplex method 28
no variable other than the basis variables) will ever evaluate to 0 at the vertex, because the different linearcombinations of the εi will always be linearly independent (pivoting preserves this property, and the matrix
starts with I). Thus, no pivots will ever be virtual.
At the end, set all εi = 0. This method has the property that we are allowed to go out along any coordinateedge, and need only make a choice among the functions that will become part of the basis. This happens
only when there are ties. This feature plays a role in the process discovered by Avis and Fukuda that I shallmention later on.
There is an equivalent description common in the literature, which also uses lexicographic ordering of arrays.The best account I know is in [Apte:2012], although more cryptic accounts are easy enough to find. It does
not use εnumbers, but expands the matrices of dictionaries instead. I prefer the version given here, which
seems to be more intuitive even if slightly less efficient.
The advantage of either one of these lexicographic strategies is that it is only the choice of new variable to set
as coordinate that uses it. This makes it exactly suitable for the lrsmethod of Avis and Fukuda for traversingall vertices of a conves polytope from top down.
12. Finding all vertices of a polygon
The traditional method from this point on is to use a stack to index and locate vertex bases, collecting them
into equivalence classes, each corresponding to a vertex. Note that each vertex will correspond to a subsetof the original set of functions, so our search is finite. Indeed, a good way to specify a vertex, at least when
using exact arithmetic, is to tag a vertex by the set of functions vanishing there. On the other hand, it mayhappen that we have a singular vertex where several different sets of basic variables give rise to the same
geometrical vertex. If we are doing floating point arithmetic, because of possible floating point roundoff we
should ignore this possibility, and proceed as if it doesn’t occur. It shouldn’t cause trouble—a single pointmight decompose into several, but they will all be close together and effectively indistinguishable.
Of course the regionmay be unbounded, in which case we should find the infinite edges of the polytope, too.
Thismethod locates all edges and vertices by the simplexmethod, indexing the vertices by the set of vanishing
variables, edges by the coordinate of the edge. The trouble is that obvious methods pass from one vertexframe to another, which means an apparent huge waste of time. Anyway, vertex frames are pushed onto the
stack as they are met. They are distinguished by the subset of basic functions.
[AvisFukuda:1992] and [Avis:2000] describe a better method, called in it current version lexicographicreverse search . I’ll explain this in detail in a later version of this essay.
13. Nearest point
There is another problem I’d like to also take into consideration—that of finding for a point of space thenearest point of the polytope. The first step is to scan all the inequalities, to see which the given point satisfies.
If it satisfies all of them, then the point lies in the region, and we are through. Otherwise, throw away those
satisfied by the point, and calculating perpendicular projections onto the remaining facial supports. Finally,throw away the projections that do not lie strictly inside the corresponding face. This is not very efficient, but
is there a better way?
Polytopes and the simplex method 29
14. Finding the dimension of a convex region
The dimension of a polytope makes sense in computation only when exact arithmetic is used, because the
smallest of perturbations can change a set of low dimension to an open one.
Start with trying to find a vertex frame. If there is none, the dimension is−1. Next, throw away all but the fi
vanishing at the origin, so we are looking at the tangent cone of the vertex. Take as function to maximize thelast basis variable; then find its maximum. If this is 0, we reduce the problem by eliminating that variable.
Or we come across an infinite edge (axis of some xc) out. In that case, throw away all those containing the
variable xc, in effect looking now at the tangent cone of that edge. Recursion: the dimension is one more thanthe dimension of that tangent cone. The whole process amounts to a succession of one or the other of these
two strategies, applied recursing on rank. The case of rank one is straightforward—if the rank is 1, then in
case (1) the dimension is 0 while in case (2) it is 1.
15. Implementation
I follow [Chvatal:1983] closely. Classically one uses tableaux , but Chvatal replaces them by what he calls
dictionaries . All the operations I have described amount to constructing an initial dictionary, and thenproceeding through a sequence of them according to what one is looking for.
So in my programs, an LP dictionary is a Python class with data an m × (n + 1) matrix whose rowsrepresent affine inequalities fi ≥ 0. In initialization, we pass an arbitrary matrix of this type, and from it we
immediately turn it to an affine basis togetherwithR, ρ. After the initializationwehave lists of basis variablesand dependent variables . Through all subsequent operations we maintain these structures. In effect, eachinequality represents one of the possible basic variables (these are called slack variables in the literature).
The next step is to turn this affine basis into a vertex basis. After that, what happens on what the ultimategoal is. The basic routine is a pivot, whose argument is one of the basis variables. This variable is removed
from the basis variables, and replaced by a new one of the affine functions, which is chosen according to the
minimal subscript rule. In the pivot, the matrix of affine functions is expressed in terms of the new basis.
I have said that the conventional terminology is bizarre. The most bizarre usage is that the affine functions
among the inequalities that are not coordinates in a frame are said to make up the basis of that frame. Thus,when a function f(x) ceases to be one of the coordinate system it is said to ‘enter the basis’, and when another
is chosen as the new coordinate, it is said to ‘leave the basis’. This terminology is not aimed atmathematicians,but then most consumers of linear programming are not mathematicians, either.
Part IV. References
1. David Avis, ‘lrs: a revised implementation of the reverse search vertex enumeration algorithm’, pp.177–198 in Polytopes—Combinatorics and Computation , edited byGil Kalai and Gunter Ziegler, Birkhauser
Verlag, 2000.
2. ——andKomei Fukuda, ‘A pivoting algorithm for convex hulls and vertex enumeration of arrangements
and polyhedra’, Discrete and Computational Geometry 8 (1992), 295–313.
3. Robert G. Bland, ‘Newfinite pivoting rules for the simplex method’,Mathematics of Operations Research2 (1977), 103–107.
4. Omar Antolin Camarena, ‘Bland’s rule guarantees termination’, course notes for the spring of 2016,
available at
http://www.math.ubc.ca/~oantolin/math340/blands-rule.html
5. ConstantinCaratheodory, ‘UberdenVariabilitatsbereichderKoeffizientenvonPotenzreihen, die gegebene
Werte nicht annehmen’,Mathematische Annalen 64 (1907), 95–115.
6. Vasek Chvatal, Linear programming , W. H. Freeman, 1983.
Polytopes and the simplex method 30
7. Donald E. Knuth,lp.w, a CWEBprogram inwhichKnuth lays out an introduction to linear programming.Posted August, 2005 at
http://www-cs-faculty.stanford.edu/ uno/programs.html
8. Jiri Matousek and Bernd Gartner, Understanding and using linear programming , SpringerVerlag, 2007.
9. HermannMinkowski, ‘TheoriederkonvexenKorper, insbesondereBegrundung ihresOber flachen begriffs’.
This has been published only as item XXV in volume II of Gesammelte Abhandlungen , Leipzig, 1911.
10. ——, Geometrie der Zahlen , Teubner, Berin, 1910.
11. Stephen Smale, ‘On the average number of steps of the simplex method of linear programming’,Mathematical programming 27 (1983), 241–262.
12. Daniel Spielman and ShangHua Teng, ‘Smoothed analysis: why the simplex method usually takes
polynomial time’, Journal of the ACM 51 (2004), 385–463.
13. Ernst Steinitz, ‘Bedingt konvergenteReihenundkonvexe Systeme’, Journal fur dieReine andAngewandteMathematik 143 (1913), 128â^175.
14. Hermann Weyl, ‘Elementare Theorie der konvexen Polyeder’, Commentarii Mathematici Helvetici 7,1935, pp. 290–306.
There is a translation of this into English by H. Kuhn: ‘The elementary theory of convex polytopes’, pp. 3–18
in volume I of Contributions to the theory of games , Annals of Mathematics Studies 21, Princeton Press,
1950.
15. Gunter Ziegler, Lectures on polytopes , Springer, 2007.