polytopes and the simplex method - university of british ...cass/research/pdf/simplex.pdf ·...

Last revised 3:49 p.m. January 11, 2017

Polytopes and the simplex method

Bill Casselman

University of British [email protected]

A region in Euclidean space is called convex if the line segment between any two points in the region isalso in the region. For example, disks, cubes, lines, single points, and the empty set are convex but circles,

doughnuts, and coffee cups are not.

Convexity is the core of much mathematics. Whereas arbitrary subsets of Euclidean space can be extremely

complicated, convex regions are relatively simple. For one thing, any closed convex region is the intersectionof all affine halfspaces f ≥ 0 containing it. One can therefore try to approximate it by the intersection of

finite sets of halfspaces—in other words, by polytopes .

A polytope is by definition the intersection of a finite number of halfspaces in a Euclidean space. Thus cubes

are polytopes but spheres are not. The simplest example, and the model in some sense for all others, is the

coordinate octant xi ≥ 0. Another simple example is the simplex xi ≥ 0,∑

xi ≤ 1. This essay will tryto explain how to deal with polytopes in practical terms. The literature on this is extensive, and I’ll not be

innovative. Since linear programming is commercially applicable, little of the literature is aimed at quite thesame audience as this note is. (But compare §3.2 in [Ziegler:2007], titled ‘Linear programming for geometers’,

or [Knuth:2005])

The major consideration in the theory of convex regions is that there are two basic ways to describe one. If

Ω is any set of points in Rn, its convex hull is the set of all points which can be represented as a finite sum∑

cixi with ci ≥ 0,∑

ci = 1, and all xi in Ω. For example, the line segment between two points x and y isthe convex hull of those two points, and the convex hull of a circle is a disk. A convex hull is clearly convex.

On the one hand, a closed convex region is the intersection of a set of halfspaces. On the other, as we shallsee, it is the convex hull of its extremal points. The basic problem in dealing with polytopes is how to transfer

back and forth between these two characterizations.

An extremal point or of a closed convex region is one that does not lie on the interior of a line segment in the

region. In some sense dual to the result that any closed convex region is the intersection of all halfspaces

containing it, any bounded closed convex region is the convex hull of its extremal points.

For polytopes, an extremal point is a vertex . A bounded polytope is therefore the convex hull of its vertices.

For unbounded polytopes, this has to be modified. First of all, a polytope in Rn may be invariant under

a linear subspace U of translations. It then projects onto a convex region in the quotient of Rn by U that

is nondegenerate, in the sense that the subspace of translations leaving it invariant is trivial. The original

region is thne isomorphic to the product of U and the projection. A nondegenerate polytope is the convexhull of its vertices together with a finite number of extremal rays.

Polytopes and the simplex method 2

Defining a polytope by affine inequalities and defining it as a convex hull are dual modes of specification.

Thus we arrive immediately at the most basic computational problem concerned with convex polytopes—given a polytope specified in one of theseways, to find an explicit specification in the other. More specifically,

if a polytope is specified as the intersection of a finite number of halfspaces fi(x) ≥ 0, how can one find its

vertices and extremal rays? If a region is specified as the convex hull of a finite set of points, how can onefind its vertices among this finite set? How can one find a finite number of halfspaces whose intersection is

the polytope?

In the first part of this essay I’ll prove the very basic result that the convex hull of any finite set of points is

a polytope—i.e. it can also be specified in terms of a finite set of affine inequalities. In this, I shall follow

[Weyl:1935]. This simple result has many consequences.

There are several other questions onemight pose. A face of a polytope is any affine polytope on its boundary.

A facet is a proper face of maximal dimension.

• Is the region bounded?

• If it is bounded: What are its vertices?

What is its volume?

What is the number of lattice points it contains?• Given a finite set of points, which among them are the vertices of its convex hull?

• What is its dimension (taken to be−1 if the region is empty)?

• What is a minimal set of affine inequalities defining it? Or, in other words, what are its facets?• What is the maximum value of a linear function on the polytope (which might be infinite)?

• How to describe the face where a linear function takes its maximum value?• What is its facial structure? The answer to this can be extremely complex.

• What is the dual cone of linear functions bounded on the polytope?

• How to find the intersection of two convex sets?• How to find the convex hull of the union of two convex sets?

• Given two convex sets, is one contained in the other?

Answering these questions reduces to answering a few basic ones. For example, to find out whether one

region is contained in the other it suffices that you know the characterization of one in terms of affine

inequalities, but the other in terms of extremal points. Finding the intersection of two convex bodies is trivialif they are specified by affine inequalities, and finding the convex hull of their union is easiest if the regions

are given as convex hulls.

These problems are well known to arise in applications of mathematics. Among other things, convexity

is a crucial feature in the way we perceive things in 3D, and hence problems related to convexity arise in

computer graphics. But in fact they also arise in trying to understand the geometry of configurations in highdimensions, which comes up surprisingly often even in the most theoretical mathematical investigations.

For one thing, acquiring a picture of an object in high dimensions, or navigating regions of high dimensionalspace, often amounts to understanding the structure of convex regions.

Answering these questions can be daunting, since the complexity of problems increases drastically as di

mension increases. There are now many algorithms known to answer such questions, but for most practical


purposes the one of choice is still the simplex method . This is known to require time exponential in thenumber of inequalities for certain somewhat artificial problems. But on the average, as is shown in a series

of papers beginning apparently with [Smale:1983] and including most spectacularly [SpielmanTeng:2004],

it works well. It is also more flexible than competitors.

Contents

I. Geometry1. Convex bodies

2. Proof of the basic theoremII. Basic procedures

3. Introduction

4. Column reduction5. Vertex frames

6. Finding a vertex basis to start

7. Affine coordinate transformations8. Pivoting

9. Tracking coordinate changes10. An example in some detail

III. Applications

11. Finding maximum values12. Finding all vertices of a polygon

13. Nearest point

14. Finding the dimension of a convex region15. Implementation

IV. References

As far as I know, convex bodies were first systematically examined by Minkowski, who used them in

spectacular ways in [Minkowski:1910]. [Weyl:1935] is a pleasant account of the basic facts, and I follow his

exposition in the first part.

Mymain reference for the simplexmethodhas been thewell known text [Chvatal:1983], althoughmynotation

is more geometric than his. The audience for expositions of linear programming is not, by and large, madeup of mathematicians, and the standard notation in linear programming is, for most mathematicians, rather

bizarre. I follow more geometric conventions.

Eventually, this essaywill be accompanied a small suite of Pythonprograms. Because I am almost exclusively

interested in theoretical questions, I’ll implementmy routineswith exact rational arithmetic, which eliminates

technical difficulties in dealing with floating point numbers. This is the concern of much of the literature,that does not affect us.

Part I. Geometry

1. Convex bodies

It is a little troublesome to say much about arbitrary convex regions. For example, the union of an open disk

with any subset of its circular boundary is convex. The following is elementary, however.

1.1. Proposition. The closure of a convex region is convex.

I begin now with some observations on arbitrary closed convex regions. Throughout, let V be a real vectorspace of dimension n.

The following result is the basic fact about convex regions:

1.2. Theorem. Duality) Any closed convex set in V is the intersection of all halfspaces containing it.


For example, the unit disk in the plane is the intersection of all halfplanes x cos θ + y sin θ ≤ 1 as θ rangesover [0, 2π).

This result remains true (as a consequence of the HahnBanach Theorem) even for locally convex topological

vector spaces of infinite dimension, although the proof I give will remain valid only for Hilbert spaces.

Proof. Assign V a Euclidean metric. Let Ω be a closed convex set, P an arbitrary point not in Ω. It is to be

proved that there exists a hyperplane f(x) = 0 such that f(P ) < 0, f(x) ≥ 0 for all x in Ω.

Since Ω is closed, there exists a point Q in Ω at minimal distance from P . I may as well take Q to be the

origin of the coordinate system. I claim that the hyperplane passing through Q and perpendicular to the line

through P and Q will do. The equation of the corresponding open half space that contains P is

f(x) = −x •P < 0 .

It is to be shown if R lies in Ω then R •P ≤ 0. If R lies in Ω so does tR for all small t. We now have

‖P − tR‖2 = ‖P‖2 − t (R •P ) + t2 (R •R) ,

so that if R •P > 0 the point tR is closer than the origin to P for small t, a contradiction.

If Ω is a closed convex subset of V , its dual Ω∨ is the set of all affine functions f on V such that f(Ω) ≥ 0. Iff lies in Ω∨ and c ≥ 0 then so is cf . Hence Ω∨ is a closed cone in the space of affine functions, which has

dimension n + 1. It is also convex.

Let V be the vector space of linear functions on V . The affine functions on V may be identified with V ⊕ R.

If A is a closed cone in the vector space of affine functions, I’ll define its point dual A∧ to be the subset of allx in V such that f(x) ≥ 0 for all f in A. It is also closed, and it is convex since

f((1 − t)x + ty) = (1 − t)f(x) + tf(y)

for any affine function f . Any x in Ω is of course a point in the point dual of Ω∨.

The following is just a rewording of Theorem 1.2.

1.3. Corollary. If Ω is a closed convex subset of V , then the embedding of Ω into the point dual of Ω∨ is abijection.

There is a special case of the theorem worth mentioning. If Ω is a cone, its dual cone Ω is the set of all linear

functions f with f(Ω) ≥ 0. It is a subset of the linear dual V . For example, if (ei) is a basis of V , (ei) the dual

basis, and Ω the cone generated by the ei for 1 ≤ i ≤ d, then Ω is the set of all sums

d∑

i=1

ciei +

n∑

d+1

ciei (ci ≥ 0 for i ≤ d, arbitrary for i > d) .

1.4. Lemma. If Ω is a cone, then Ω∨ is the set of all f + c with f in Ω and c ≥ 0.


Proof. One way is trivial. For the other, suppose f + c to lie in Ω∨. This means that f(x) + c ≥ 0 for all x inΩ. Since 0 lies in Ω, c ≥ 0. But now say x lies in Ω. Then so does λx for all λ > 0, so that f(x) + (c/λ) ≥ 0

for all λ > 0. Therefore f(x) ≥ −ε for all ε > 0, so that f(x) ≥ 0 and f lies in Ω.

1.5. Proposition. If Ω is a closed convex cone in V then the canonical map from Ω toΩ is a bijection.

1.6. Corollary. The map taking Ω to Ω is a bijection between closed convex cones in V and those in V .

It may happen that a convex region Ω is invariant under translation by vectors in some linear subspace.Define Inv(Ω) to be the linear space of all such vectors, the invariance subspace associated to Ω. If Ω is a

closed cone then Inv(Ω) ⊆ Ω since 0 lies in Ω.

If Ω is any convex subset of V , its affine support is the smallest affine subspace containing it. I’ll thedimension of that subspace the dimension of Ω. The dimension will be m if and only if Ω contains a simplex

of dimension m. This will be n if and only if its interior is nonempty.

There is a strong relationship between the dimension of a convex set and the invariance subspace of its dual.

I’ll look just at what happens for cones, which is fairly simple. If Ω is a closed convex cone, its affine support

is a linear subspace, which I’ll also call its linear support..

1.7. Proposition. Suppose Ω to be a closed convex cone. If U is its linear support, then Inv(Ω) is U⊥.

Proof. This depends on a series of elementary claims. () It is trivial to see that U⊥ is contained in Inv(Ω).() The cone contains m linearly dependent vectors ui (1 ≤ i ≤ d) spanning the linear support U of Ω. These

generate a cone Ω∗. () If we extend these to a basis of V and then let ui be the dual basis, the cone Ω∗ is

that spanned by the ui for i > d, and its invariance subspace is clearly U⊥. However, Ω∗ ⊆ Ω if and only if

Ω ⊆ Ω∗.

A cone Ω is called non-degenerate if Inv(Ω) = 0 and it has dimension n.

1.8. Corollary. The duality of closed convex cones in V with those in V induces a bijection of nondegenerate

cones in V with those in V .

1.9. Proposition. If Ω is a closed convex cone then 0 is a vertex if and only if Inv(Ω) = 0.

Proof. If 0 = x + y with x, y in Ω then x = −y and the line through them is contained in Ω as well as Inv(Ω).

Not every cone in the space of affine functions is equal to some Ω∨. This space is not just a vector space,

but one with a particular coordinate, taking f to its constant term, and the cone Ω∨ is a special cone, since ifc ≥ 0 then Ω∨ + c ⊆ Ω∨. I’ll call such a cone upwardly stable . I’ll call a cone in V ⊕ R positive if its support

lies in the half space xn+1 ≥ 0. A closed convex cone is positive if and only if it is equal to the cone over its

slice at level 1.

The following is elementary.

1.10. Lemma. The cone Ω in V ⊕ R is upwardly stable if and only if Ω is positive.

1.11. Proposition. A closed convex cone C in V ⊕ R is equal to Ω∨ for some Ω ⊆ Rn if and and only if it is

upwardly stable. In these circumstances, Ω is the set of all x in V such that (x, 1) lies in C.

If Ω is a closed convex region with U = Inv(Ω and W its affine support, its image Ω in W/U will benondegenerate and Ω will be isomorphic to U ×Ω. The analysis of convex regions thus reduces easily to that

of nondegenerate ones.

CARATHEODORY’S THEOREM. The following first appeared in [Steinitz:1913] (p. 155), after an earlier specialcase had been dealt with in [Caratheodory:1907]. Both these works were primarily interested in questions

about conditional convergence!

1.12. Proposition. Caratheodory’s Theorem) Suppose X to be a nonempty subset of Rn. Then:


(a) if y 6= 0 is a nonnegative linear combination of points in X , it may be expressed as a positive linearcombination of linearly independent points of X ;

(b) any point in the convex hull of X lies in the convex hull of some subset of points of X of size ≤ n + 1.

In other words, in (a) every point in the convex cone generated by X is contained in a simplicial conegenerated by elements ofX , and in (b) every element in the convex hull ofX is contained in a simplex whose

vertices are in X .

Proof. The proof is an algorithm.

(•) Suppose v =∑

cixi with ci > 0, xi in X . Let N be the number of xi. If the xi are linearly independent,exit.

Otherwise, there exists an equation ∑aixi = 0 .

Negating if necessary, we may assume one ai > 0. Then for any b

v =∑

(ci − b ai)xi .

The line y = ci − xai in R2 starts off at (0, ci), and has slope −ai. At least one of them crosses the positive

xaxis; suppose the first one crosses at (b, 0). Then all ci − bai are nonnegative, and at least one is 0. Thisgives an expression for y involving a smaller number of xi. Loop to (•).

The procedure cannot go on forever, since y 6= 0 and a single element of X is a linearly independent set.

For (b), let Ω be the convex hull of X . The claim follows from (a) applied to the cone over (Ω, 1) in Rn+1.

1.13. Corollary. In a finitedimensional real vector space

(a) the convex hull Ω of a compact set X is compact;(b) the convex cone generated by a finite set is closed.

Proof. For (a), let Σ be the simplex of all (ci) in Rn+1 with

∑ci = 1. According to the proposition, the set Ω

is the image of Σ × Xn+1 with respect to the map

(c, x) 7−→∑

cixi .

For (b), the claim is certainly true when the points are linearly independent, and the cone is simplicial. ButCaratheodory’s Theorem reduces the claim to that case.

1.14. Lemma. Suppose f to be a linear function, Ω a bounded, closed, convex subset of Rn. There exists an

extremal point of Ω on which the restriction of f to Ω takes its maximum value.

Proof. By induction on n. The case n = 0 or 1 are trivial. Otherwise, suppose that c is the maximum value off on Ω. Since Ω is bounded it is also compact, and the interection ofΩ with the hyperplane H onwhich f = cis not empty. It is also closed and convex. Wemay choose a nontrivial linear function on H . By the induction

hypothesis it achieves its maximum value at an extremal point x of H ∩Ω. But I claim that an extremal pointof H ∩ Ω muts also be an extremal point of Ω. Why is this? Suppose x = (1 − t)y + tz with y, z in Ω, t > 0.But then f(x) = (1 − t)f(y) + tf(z). If either f(y) or f(z) is > c then the other must be < c, which is notpossible. So y and z both lie in H ∩ Ω. .

1.15. Proposition. Any bounded, closed, convex set subset Ω of Rn is the convex hull of its extremal points.

Proof. Let C be the convex hull of the extremal points. The extremal points are a compact set, so C is closed.

The Lemma implies that C∨ and Ω∨ are the same, so the Corollary to the Theorem implies that C = Ω.

POLYTOPES. In the rest of this essay I’ll be concerned with a limited class of convex sets, those which

are the intersections of a finite number of halfspaces. These are convex polytopes . I’ll be mainly interested


in algorithms to deal with these, but I’ll begin with a few basic theorems. In fact, in the theory of convexpolytopes there is one really basic theorem, from which many others follow without trouble.

Suppose X to be a finite set of V of dimension n. Suppose f an affine function and let H be the hyperplane

f = 0. The linear halfspace f ≥ 0 will be called extremal for X if X is contained in it, and the dimensionof X ∩ H is n − 1. In these circumstances the boundary of the extremal halfspace will contain a facet of the

convex hull of X . Of course there can be only a finite number of these extremal halfspaces. An extremalfunction is one defining an extremal halfspace. The meaning of ‘extremal’ should be made clearer by:

1.16. Lemma. Suppose f to be an extremal function for X . If f = (1 − c)f1 + cf2 with 0 < c < 1 and fi ≥ 0on X , then the fi are both multiples of f .

Proof. Suppose f = 0 on the set Y of n − 1 linearly independent points of X . If fi > 0 on Y then f > 0 on

Y , a contradiction. So fi = 0 on Y and must be a multiple of f .

The convex hull of the cone generated by X ⊆ V is the set of all finite sums∑

X cxx with all cx ≥ 0. Forexample, if X is a collection of n linearly independent points its convex hull is a simplicial cone, and everysubset of n − 1 points determines an extremal halfspace.

For closed convex cones that are polytopes, we have the following refinement of Theorem 1.2:

1.17. Theorem. If X is a finite set of points in V containing a set of n linearly independent points, itshomogeneous convex hull is the intersection of all linear halfspaces extremal for X .

In particular, it is a polytope. The assumption on X is necessary, but more generally this amy be applied to

X embedded in its linear support.

I’ll postpone the proof to the next section, but I’ll now exhibit here a number of consequences.

1.18. Proposition. Suppose F to be a finite set of linear functions on V , and

Ω =⋂

f∈F

x

∣∣ f(x) ≥ 0

.

Then any f in V with f(Ω) ≥ 0 is in the convex cone generated by F .

Proof. The convex cone generated by F is contained in Ω, and the duals of both are equal to Ω. Hence they

are the same.

The following is a converse to the Theorem. A ray R in a cone is called extremal if whenever x in R is equal

to a sum (1 − t)x + ty with x, y in Ω and 0 < t < 1 we must have both x and y in R.

1.19. Corollary. Suppose Ω to be a closed, convex, nondegenerate cone defined by a finite set of linearinequalities fi(x) ≥ 0. There exists a finite set X of extremal rays such that Ω is the convex hull of X .

Proof. Since Ω is nondegenerate, so is Ω. Proposition 1.18 tells us that Ω is the convex hull of F in V . The

Theorem tells us that Ω is the intersection of a finite number of extremal halfspaces 〈x, u〉 ≥ 0 for x in V .

These x will generate the extremal rays.

If Ω is nondegenerate, so is Ω. Since Ω contains n linearly independent points, Ω is contained in a simplicial

cone. There exists a compact slice through the simplicial cone, which is also a compact slice through Ω. Theextremal rays of Ω will intersect the slice in its vertices.

1.20. Corollary. Any polytope X in V is the convex hull of a finite number of points and rays.

Proof. It suffices to assume Inv(X) = 0 and X of dimension n. I begin with a construction that has already

implicitly appeared. The space V may be embedded into Rn+1 as an affine subspace: x 7→ (x, 1). Rays in

V ⊕ R that intersect xn+1 = 1 correspond to points in V , while rays in V ⊕ R in the hyperplane xn+1 = 0correspond to rays in V . If X is a closed convex subset of V , let X be the unique closed cone generated by

the image of X . It is the homogenization of X .

This Corollary now follows from Corollary 1.19 applied to X .


I’ll say more about the ‘homogenization’ used in this argument later on. It will be a basic tool in my accountof algorithms.


2. Proof of the basic theorem

Suppose given a finite set X of points in Rn, and let Ω be the homogeneous convex hull of the set X . We

are assuming it to contain an open subset of Rn. It is to be shown that Ω is the intersection of all extremal

halfspaces of X .

One half of this is trivial, since Ω is certainly contained in the intersection of all extremal halfspaces. Itremains to be shown that if a point lies in the intersection of extremal halfspaces then it lies in Ω.

It may very well happen that there are no extremal halfspaces for X . In this case, the theorem asserts thatthe convex hull of the cone generated by X is all of R

n.

The proof proceeds by induction on n ≥ 1. If n = 1 then the points of X span a nontrivial segment. If this

segment contains 0 in its interior, then the homogeneous convex hull is all of Rn, and there are no linear

functions cx taking a nonzero value on all of X except 0 itself. Otherwise, the homogeneous convex hull is

a halfline with 0 on its boundary, and again the result is plain.

Now assume the Theorem to be true for dimension up to n − 1, and suppose X to have dimension n. Thereare two steps to the argument, which is constructive. The first either produces an extremal halfspace ordemonstrates that the homogeneous convex hull of X is all of R

n. The second proves the theoremwhen it is

known that extremal halfspaces exist.

Step 1. Suppose α = 0 to be any hyperplane containing n − 1 linearly independent points of X . Eitherall points of X lie on one side of this hyperplane, say the side α ≥ 0, or they do not. If they do, then

this hyperplane is extremal, and we can pass on to the second step. Suppose they do not. A new extremalhalfspace β ≥ 0 will be found that contains at least one more point of x, so we can set α = β and loop.

Let e be a point of X with 〈α, e〉 < 0. By changing coordinates if necessary, we may assume α to be thecoordinate function xn, and e = (0, 0, . . . ,−1). Let Y be the projection onto the plane xn = 0 of the points

X≥0 ofX with xn ≥ 0. The new system Y is nondegenerate of dimension n− 1, by assumption on α. Eitherit possesses an extremal halfspace 〈β, x〉 ≥ 0 with 〈β, x〉 = b1x1 + · · · + bn−1xn−1 or it does not. If it does,the halfspace 〈β, x〉 ≥ 0 contains the point e in addition to the points in X≥. The hyperplane β = 0 contains

n − 2 linearly independent points on the hyperplane α = 0 together with e, and also contains n− 1 linearly

independent points, so induction is allowable.

e

α = 0

In the second case, when Y does not possess an extremal halfspace, we can apply the induction assumption.

Every point in the hyperplane 〈α, x〉 = 0 must be in the convex hull of Y . Adding positive multiples of e, we

see that every point where xn ≤ 0 is in the convex hull. Since we have eliminated the case where all pointsof X lie on one side of α = 0, we can find a point with xn > 0; a similar argument implies that every point

where xn ≥ 0 is also there.

Step 2. At this point we know there exist extremal halfspaces containing Ω. Let ∆ 6= ∅ be a set of affine

functions α such that the inequalities α ≥ 0 define them.

Choose a point e in the interior of the homogeneous convex hull of X = xi. Thus 〈α, e〉 > 0 for every α in

∆. Let p be an arbitrary point in the intersection of extremal halfspaces, so that also 〈α, p〉 ≥ 0 for all α in ∆.


We want to show that p is in Ω, or that

p =∑

cixi

with all ci ≥ 0. We may assume that p 6= e.

The set of all

λe + µp

with λ, µ ≥ 0 is a twodimensional wedge. Any point of the same formwith λ < 0, µ ≥ 0 will be in the plane

spanned by this wedge, outside this wedge on the far side of the line through p. Since

〈α, p − λe〉 = 〈α, p〉 − λ〈α, e〉

and 〈α, e〉 < 0, the ray of all q = p − λe with λ > 0 will find itself eventually in the region where α < 0. Itwill cross the hyperplane α = 0 when

λ =〈α, p〉

〈α, e〉.

If λ is chosen to be the least value of 〈α, p〉/〈α, e〉 as α ranges over ∆, then

(1) p = q + λe where λ ≥ 0 and e is in the homogeneous convex hull of X ;

(2) 〈λ, q〉 ≥ 0 for all α in ∆;

(3) 〈α, q〉 = 0 for some α in ∆.

In order to prove that p is in the homogeneous convex hull of X , it suffices to show that q is in it.

We want to apply the induction hypothesis to q. Let Y be the set of all x in X on the hyperplane Hα where

α = 0, which contains q. Because α is an extremal support, there exist at least n − 1 linearly independent

points in Y , so that it is nondegenerate in n − 1 dimensions. Choose coordinates so that α = xn. By theinduction assumption, the convex hull of Y is the intersection of all its extremal halfspaces in Hα. That q lies

in this intersection will follow from:

2.1. Lemma. Any extremal hyperplane for Y = X ∩ Hα in Hα can be extended to an extremal hyperplanefor X in R

n.

Proof of the Lemma. Suppose

〈β, y〉 = b1x1 + · · · + bn−1xn−1 ≥ 0

be one of the extremal halfspaces for Y in Hα. The function β will vanish on a subset Υ ⊆ Y of essential

dimension n − 2. Any extension of β to X will be of the form

〈B, x〉 = b1x1 + · · · + bn−1xn−1 + cxn

for some constant c.

I claim that c can be chosen so that the B ≥ 0 on X and meets at least one point x in Y − X . This means no

more and no less than that it defines an extremal half space forX . The conditions on B are that

〈B, x〉 = b1x1 + · · · + bn−1xn−1 + cxn ≥ 0

for all x in X − Y , and that 〈B, x〉 = 0 for at least one of these x. These require that

−c ≤b1x1 + · · · + bn−1xn−1

xn

for all x in X − Y , with equality in at least one case. But X is finite and xn > 0 on X − Y , so the right hand

side will achieve a minimum for some c < 0 as x varies.

Part II. Basic procedures


3. Introduction

In the remainder of this essay I’ll explain one of the two techniques for implementing the basic theorems

about polytopes. Such technqiues seem to be roughly divided into two general categories. The incrementalmethods extend the idea behind the proof of the basic theorem that I gave in the last part. The other is the

simplex method, which is what I shall be concerned with. The problem I am most interested in is to find allvertices of a bounded polytope, given a finite number of affine inequalities that define it. But this will involve

a number of related tasks: Given an arbitrary set of inequalities, how do we determine the subspace leavingthem invariant? How do we find the nondegenerate quotient? All searches on the polytope begin from agiven vertex. Given just the inequalities, how to locate a starting vertex? This will involve in turn another

very basic question: Given a linear function, how to locate a vertex at which it achieves a maximum value?

At the very start we are given a finite set of inequalities such as

x ≥ 1

y ≥ −1

x + y ≤ 3 .

We rearrange them so as to be in a uniform format:

x − 1 ≥ 0

y + 1 ≥ 0

−x − y + 3 ≥ 0 .

I. e. from now on, I shall assume that we are given a finite set of affine functions fi of the form

fi(x) =n∑

j=1

ai,jxj + ai,n+1

defining the region Ω in which fi ≥ 0. We can even take an equation f(x) = 0 into account, replacing it byinequalities f(x) ≥ 0 and −f(x) ≥ 0. I should say immediately that I shall work only with functions whose

coefficients are rational. Otherwise, random adjustments of floating point numbers in computer programs

make the problem meaningless.

Suppose that we now have at hand m functions of n variables expressing the inequalities defining Ω. What

next? There are a few steps: (i) find U = Inv(Ω) and the corresponding inequalities defined on Rn/U ; (ii)

find an independent subset of the inequalities in terms of which all of them are expressed (an adapted affine

basis); (iii) either find a vertex of Ω or determine that Ω is empty.

There is one thing that will answer some of these questions immediately, and set things up to answer the

other—replace the region by its homogenization. This means (1) interpreting the constants in the given

inequalities as coefficients of a new variable, and (2) adding a final inequality to the set, defining a ‘bottom’to the homogenization. For example, the equations above now become

x − z ≥ 0

y + z ≥ 0

−x − y + 3z ≥ 0 ,

and we add to them

z ≥ 0 .

If f is one of an affine function, I’ll write f for its homogenization. The inequalities f(x) ≥ 0 describe a

homogeneous cone CΩ in Rn+1. The original region may be identified with the slice xn+1 = 1 of CΩ, and a

description of the cone will allow us to describe the original region.

What does homogenization accomplish? It will be a standard exercise in linear algebra to find a coordinate

system well adapted to the homogeneous system of inequalities. More precisely, we shall be able to find a setB ⊆ [1, m] with these properties:


• the f i for i in B are linearly independent;• all the other f i can be expressed in terms of these.

Let U be the linear subspace⋂

f i = 0 of Rn.

3.1. Lemma. The linear subspace U is the same as Inv(Ω).

Proof. Because xn+1 is among the fi.

Say |B| = d + 1. In the best situation, d = n. The functions in B are coordinates on the quotient of Rn+1

by U . The quotient cone therefore has dimension d + 1. In addition, the projection of CΩ will be a cone withthe origin as vertex. Similarly, the quotientΩ/Inv(Ω) will have dimension d and will possess vertices. This is

important, because all of the basic calculations to comewill work with vertices of the regionwe are interested

in.

Let’s look at the example again. First of all, we can picture the region:

y + 1 ≥ 0

x−

1≥

0

−x−

y+

3≥

0

(1,−1)

(1, 2)

(4,−1)

Homogenization gives us the matrix

1 0 −10 1 1

−1 −1 30 0 1

.

The standard column reduction algorithm (recalled in the next section) multiplies this on the right by

4/3 1/3 1/3

−1/3 2/3 −1/31/3 1/3 1/3

to get

1 0 00 1 00 0 1

1/3 1/3 1/3

.

This reduction is adequate, but it is convenient to have xn+1 as part of the basis for the cone. In general, the

function xn+1 will now be expressed as a linear combination of the fi in B. We can swap xn+1 with any ofthe fi such that the corresponding coefficient of xn+1 6= 0. In our case, we can swap with any of f1, f2, or f3.

What I do is pivot the first row out of the basis and the fourth into it to get

3 −1 −10 1 00 0 11 0 0

=

1 0 −10 1 1

−1 −1 30 0 1

4 −1 −1

−1 1 01 0 0

.


How is this to be interpreted? First of all, after column reduction the functions f1, f2, f3 make up the newcoordinates, The function z is expressed as (1/3)(f1 + f2 + f3). After the pivot, the basis is f2, f3, and z. We

also have the equations

x = 4z − f2 − f3

y = −z + f2

z = z .

\equationz − exp

Why is it convenient to have xn+1 in the final basis? Because we can then read off coordinates on the slice

very easily. In this example, on the plane z = 1 the origin becomes the vertex (4,−1) of the original region,as we can reda off immediately from . (Here, this is a vertex of Ω, but in general this trick is guaranteed only

to give us a kind of pseudovertex, which might not be a true vertex.)

In this example, since all the vertices of our region lie in the halfspace z ≥ 0, the addition of the inequality

z ≥ 0 has no effect. It will have an effect if and only if the convex region at hand is unbounded. In that case,

the vertices that we find on the hyperplane xn+1 = 0 will correspond to the directions of extremal rays. I’llsay very little about that.

So the homogenization simplifies the system by eliminating the invariance subspace. It also does somethingelse—it gives us a region that is closely related to the regionwe are really interested in, for whichwe do know

one vertex—the origin, which is the vertex of the cone.

FRom now on, assume Ω to be nondegenerate.

In the next section I’ll describe how to apply a standard calculation in linear algebra to find a coordinate

system well adapted to the cone.

4. Column reduction

We start with anm×(n+1)matrixF whose rows represent the affine functions involved in the given systemof inequalities. In this section, I’ll explain how to choose a coordinate system well adapted to the system of

homogeneous inequalities determined by F . In this situation, we interpret the rows of F as linear, ratherthan affine, functions.

We’ll apply elementary column operations to F . The output will be

(a) a columnreduced form of F ;(b) an (n + 1) × (n + 1) matrix R determining the coordinate transformation involved;

(c) a list B of d rows of F that are to be taken as the new coordinate functions on Rn/U (with U the

invariance subspace of the system);(d) an index array I ;

I’ll explain these as we go along.

A matrix E is weakly column reduced if has these properties characterized recursively: (a) the first nonzero

row of E has exactly one nonzero entry, which is 1; (b) the submatrix of E made up of the ei,j with i ≥ 1,j ≥ 1 is weakly column reduced. Here is a sample pattern:

1 0 0 0 0∗ 1 0 0 0∗ ∗ 0 0 0∗ ∗ 1 0 0∗ ∗ ∗ 1 0∗ ∗ ∗ ∗ 0∗ ∗ ∗ ∗ 0

.

If E is weakly column reduced of rank d, there is associated to it a unique sequence

r1 < r2 < · · · < rd


with the property that ei,j = 0 if i < rj but erj ,j = 1. In the example above, the rank is 4 and this sequenceis 1, 2, 4, 5. A matrix that is weakly column reduced is called column reduced if in each row ri there is only

one nonzero entry. The following is a typical pattern:

1 0 0 0 00 1 0 0 0∗ ∗ 0 0 00 0 1 0 00 0 0 1 0∗ ∗ ∗ ∗ 0∗ ∗ ∗ ∗ 0

.

Given anym×nmatrix F of rank d, a standard algorithm in linear algebra will find a sequence of elementarycolumn operations that will make F column reduced. This finds a matrix E that is columnreduced and an

n× n invertible matrix R such that E = F ·R. In these circumstances the first d columns of R will be a basis

for the subspace spanned by the columns of F , the rows ri of F will be a basis for space of functions spannedby all the rows of F , and the rows of E will be the coordinates of the rows in terms of these special rows.

I can say a little more about this by recalling how linear coordinate transformations work on linear functions.I think of one as a substitution: we have an expression for the current coordinates xi in terms of the new ones

yi (rather than viceversa), by a matrix formula:

x = Ay .

Write linear functions as row vectors, so

f(x) =∑

fixi = [ f1 . . . fn ]

x1

. . .xn

= f ·x .

Then with the coordinate change we get

f ·x = f ·Ay = fA ·y

so that its expression of the linear function in y coordinates is fA.

In our case, the number d is what I call the nominal dimension of the system (only nominal because the

convex body at hand might still be empty). The rows of this matrix are the rows corresponding to the subsetof the fi we have found to form part of a coordinate system. The coordinates of others with respect to this

basis are laid out in the other rows.

The transformation matrixR records the accumulated coordinate changes. Thus fx = fy ·R, if x and y are the

original and new coordinates. Columns d+1 throughn+1 ofR record a basis of vectors (in the original affine

coordinate system) whose translations leave the inequalities invariant. The region defined by the originalinequalities is the product of this space with a region in R

d defined by the matrix we get by eliminating these

columns, which we do. Wemay assume from now on that d = n, or equivalently that the convex region does

not contain a line.

In the end we shall be locating certain points specified in the new coordinate system. But we shall also need

to know their coordinates in the original coordinates. We have

xold = Axnew .


5. Vertex frames

As a result of column reduction and a bit of fiddling, we are given an m× (n+1)matrix in column reduction

form defining a closed convex cone CΩ, the cone in Rn+1 generated the copy of the original region Ω in n

dimensions in the slice at level 1.

The next step is to find a vertex ω of the region as well as a coordinate system adapted to it. By this I mean acoordinate system such that

(a) the point ω is the origin of the system;

(b) the coordinates are among the affine functions in the inequalities defining Ω.

In particular, Ω is contained in the coordinate octant.

I’ll call this a vertex-adapted coordinate system . The unit vectors in such a coordinate system make up what

I’ll call a vertex frame , which is easier to picture. This is the basic structure in the simplexmethod, and finding

one is almost always the first step in any exploration of a convex polytope. The point is that any vertex ofthe polytope may be reached by progressing through a succession of vertex frames by following edges of the

frames one encounters. One does this by pivoting, as I’ll explain later.

In a program, a vertex frame amounts to (a) an m× (n + 1) matrix F , each row of which is an affine function

defining the region involved, expressed in the current coordinate system; (b) a list B of the rows of this

matrix that represent the current coordinates; (c) for each of these, an index specifying which column in Fcorresponds to it. For convenience it is vest to incliude, although redundant, (d) a list O of the rows that are

not the coordinates. It is also a good idea to maintain (e) an (n + 1) × (n + 1) matrix R specifying the affine

transformation from current to original coordinates (you could call it the HanselandGretel matrix).

In the standard literature on linear programming, the set of functions that are not coordinates makeup what is called the basis. I am not sure where or why this usage originated. I follow more standardmathematical terminology and call the basis the set of functions that are coordinate functions. They are,after all, a basis for the space of all linear functions. This is consistent with my geometrical approach,whereas the literature often seems to shun geometrical notions.

The simplest vertices are those for which where the region is locally a simplicial cone—given locally at the

origin exactly by inequalities f ≥ 0 for f in the basis. But this does not always occur. This simple situationis the generic case, in the sense that perturbing a given system almost always turns any given polytope into

one all of whose vertices are simplicial. A vertex not of this form is called singular . Keep in mind that it isthe system that is singular, not necessarily the polytope itself. In two dimensions, for example, all vertices

are geometrically simplicial, but systems of inequalities may nonetheless be singular.

For example, suppose given the system of inequalities

2x + y ≥ 1

x + 2y ≥ 1

x + y ≥ 1

x − y ≥ −1

3x + 2y ≤ 6 .

We have the following picture, where the shading along a line f = c indicates the side where f − c ≥ 0. Theconvex region defined by the inequalities is outlined in heavy lines.


x−

y=−1

−3x

−2y

=−6

x + 2y = 1

x+

y=

1

2x+

y=

1 (4/5, 9/5)(0, 1)

(1, 0)(5/2,−3/4)

In that example, the system is singular at (0, 1) but not at the other three vertices of the region. The singularityat (0, 1) arises because the three bounding lines

2x + y − 1 = 0

x + y − 1 = 0

x − y + 1 = 0

all pass through that point. Any two of the corresponding affine functions will form a vertex basis. Only thelast two lines actually bound the region. But any two of these three might be the local coordinate system of a

vertex frame.

x−

y=−1

x+

y=

1

2x+

y=

1

Being a vertex basis is a local property—it depends purely on the geometry of the region fi ≥ 0 in a

neighbourhood of the origin.

Suppose we choose

X = 2x + y − 1, Y = x + y − 1 .

We can express this as:

2 1 −11 1 −10 0 1

xy1

=

XY1

,

1 −1 0

−1 2 10 0 1

XY1

=

xy1

.

The pair (X, Y ) make up a vertex frame. In the new system, the inequalities become

X ≥ 0

Y ≥ 0

−X + 3Y + 1 ≥ 0

2X − 3Y ≥ 0

−X − Y + 4 ≥ 0 .


6. Finding a vertex basis to start

As we shall see later, and as I have already suggested, nearly all analysis of a polytope starts with a vertex

frame. But if we are given just a set of defining inequalities, it might not be evident how to find one. As Ihave mentioned, we must deal locally. We cannot get an overall view of our region, but just inspect what is

going on in the neighbourhood of the current origin.

At this point, we have at hand a vertex frame for the homogenized version of the original system of affine

inequalities. It is from this that we shall find a vertex frame for the original system (or find that the region is

in fact empty).

One of the more elegant features of the simplex method is that this will turn out to be a special case of

the original problem that originally motivated the simplex method, that of finding the maximum value ofa linear function on a polytope, given an initial vertex frame. I postpone the detailed explanation of how

this works. At any rate, in our situation we are given linear coordinates of the function that was our original

homogeneous variable xn+1. Our given vertex is the origin, where xn+1 = 0. What we want to locate is avertex of the original inhomogeneous system, which corresponds to the slice of our cone at xn+1. So we add

to system the affine inequality xn+1 ≤ 1, and find themaximum value of xn+1 subject to the new conditions.If the maximum value is 1, we have located a good vertex frame for the original region. If not, the solution

set of the original inequalities is empty.

It’s a little difficult to picturewhat is going on, except in very low dimensions. Suppose the original equationswere

x ≥ 1

x ≥ 1

x ≤ 2

in one dimension. The important point is to realize that the origin of this coordinate system is not a vertex of

the polytope—the function x − 1 takes a negative value at x = 0.

0 1 2

x ≥ 0 x ≥ 1 x ≤ 2

These become the system

x ≥ 0

x − y ≥ 0

−x + 2y ≥ 0

y ≥ 0

−y + 1 ≥ 0 .

in two dimensions, after adding the new equations in y.

x ≥ 0

y ≥ 0

−y + 1 ≥ 0

−x + 2y≥ 0

x−

y≥

0

For a vertex frame at the origin we have six choices—any two functions out of the four that vanish at 0:x, x − y, −x + 2y, y. What we get by elimination depends on the order in which we write the equations.


Assuming y ≥ 0, −y + 1 ≥ 0 to be last, we are reduced to three choices. The choice X = x, Y = x − y (soy = X − Y ) gives us the new system

X ≥ 0

Y ≥ 0

X − 2Y ≥ 0

X − Y ≥ 0

−X + Y + 1 ≥ 0 .

In this case, the simplex method finds that there is indeed a vertex frame satisfying the original inequalities.

This process will be a bit unusual from the standpoint of linear programming, as we shall see later. All movesexcept the final one will be what I call virtual. What we are doing, roughly, is moving around in the ‘web’ of

vertex frames defined by the homogenized functions fi.

ANOTHER EXAMPLE. An example in which Ω = ∅might be instructive. Start with the inequalities

x ≥ 1

x ≤ 0 .

This gives us the homogeneous matrix

F =

1 −1

−1 00 1

,

and then the affine system

1 0 0−1 −1 0

0 1 00 −1 1

with basis f1, z in 3 dimensions. In this case, maximizing z doesn’t do all that well. The variable z is the only

choice to exit the basis. The only choice for one to enter it is f2, but (of course) the move is only virtual. It

results in the matrix

1 0 00 1 0

−1 −1 01 1 1

.

But now all coefficients of z are negative,, telling us that Ω = ∅.

7. Affine coordinate transformations

The essence of the simplex method is to move from one vertex of a polytope to another. Actually, this isn’t

quite what it does, as we shall see, but it is what it would like to do. What it actually does is to move from

one vertex frame to another, and it might do this without changing the vertex itself. That is to say, it involvesa change of coordinates by an affine transformation. In this section I’ll say something about this.

An affine coordinate change from (xi) to (yi) is one of the form

[x1

]=

[A a0 1

] [y1

].

Write affine functions as row vectors, so

f(x) = [∇f c ]

[x1

],


in which ∇f is the linear part of f , and the vector representing it is made up of the coordinates of f in thiscoordinate system. Then

[∇f c ]

[x1

]= [∇f c ]

[A a0 1

] [y1

]

so that its expression in y coordinates is

[∇f c ]

[A a0 1

].

In our case, we are going to be making very special coordinate changes by a process known as pivoting . Weare given a coordinate system (xi) and a set of linear functions fj , and we are going to swap some fℓ for a

variable xk. At the same time, we want to express all of the fj , which are given to us in a matrix, in terms of

the new variables. This reduces to the following problem: we are given

fℓ =

n∑

1

aℓ,jxj + aℓ,n+1 ,

in which aℓ,k 6= 0 (so that the coordinate transformation is legitimate). We are also given the

fi =∑

1,n

ai,jxj + ai,n+1 (j 6= ℓ) ,

each of which we wish to convert to an expression with fℓ replacing xk . We get

xk = (1/aℓ,k)(fℓ −

∑

j 6=k

aℓ,jxj − aℓ,n+1

)

fi =∑

j 6=k

ai,jxj + (ai,k/aℓ,k)(fℓ −

∑

j 6=k

aℓ,jxj − aℓ,n+1

)+ ai,n+1

=∑

j 6=k

(ai,j − (aℓ,j/aℓ,k)ai,k

)xj + (ai,k/aℓ,k)fℓ +

(ai,n+1 − (aℓ,n+1/aℓ,k)ai,k

).

In effect, we are

(a) dividing the kth column of F by aℓ,k,(b) for each j 6= k (including j = n), replacing its jth column cj of F by cj − pck, with p = aℓ,j/aℓ,k.

This is a simple matrix operation. We are given two rows in a matrix (ai,j), say ℓ and m. Row m has a singlenonzero entry in it, which is 1. Say it occurs in column k. In the other row, aℓ,k 6= 0. We apply elementary

column operations to make the new row ℓ look like the old row m. Since column operations are equivalent

to multiplication by certain (n + 1) × (n + 1) matrices on the right, this is consistent with earlier remarkson coordinate change. Explicitly, the matrix by which we finally multiply is that obtained by performing the

same column operations on the identity matrix In+1. These effects accumulate.

This operation is called pivoting on the pair (m, ℓ).


8. Pivoting

The basic step in using the simplex method is moving from one vertex basis to another—and, hopefully, from

one vertex to another—along a single edge of the polygon. (Or rather, to be precise, along what you mightcall a virtual edge.) Geometrically we are moving from one vertex to a neighbouring one, except that in the

case of a singular vertex where more than the generic number of faces meet the move might travel along an

edge of length 0. In all cases, this basic step is called a pivot. It is executed by pivoting the matrix F .

The explicit problem encountered in pivoting is this: we are given a vertex basis and a variable xc in the

basis. The xcaxis represents an edge (or possibly a virtual edge, as I’ll explain in a moment) of the polygon.We want to locate the vertex of the polygon next along that edge, and replace the old vertex frame by one

adapted to that new vertex. The variable xc is part of the original basis. The new basis will be obtained by

replacing that variable by one of the affine functions fr in the system of inequalities defining the polygon.In applications, the exiting variable xc is chosen according to various criteria which depend on the goal in

mind, as we shall see in the next section. Sometimes there are several choices for fr as well.

There are conditions on the possible fr . For one thing, the new vertex must be also actually a vertex of the

region Ω concerned. This puts a bound on how far out along the xcaxis we can go. Suppose

f(x) =∑

aixi + an+1 .

We know already that an+1 ≥ 0, because our vertex does lie in Ω. On the xcaxis

f(x) = acxc + an+1 .

If ac ≥ 0, there is no limit to xc. But if ac < 0, we must have

acxc + an+1 ≥ 0, xc ≤an+1

−ac

.

This has to hold for all the functions fr , so the new value of xc is the minimum value of −ar,n+1/ar,c forthose r for which ar,c < 0. Given this choice of variable xc, and without further conditions imposed, the

possible choices for the new basis variable will be any of the fr producing this minimum value of xc.

In the example we have already looked at, we could start with coordinates x+2y = 1, x+y = 1 at the vertex

(1, 0) and move along the edge x + y = 1, with xc = x + 2y − 1. We can only go as far as the vertex (0, 1),because beyond there both of the affine functions x − y + 1 and 2x + y − 1 become negative. We can choose

either one of them to replace x + 2y − 1 in the vertex coordinates.

When a pivot is made, the current coordinate frame is the given vertex frame, and in the course of the pivot

it is changed to the new vertex frame. This involves expressing all the fi in terms of the new basis.


How does it work in general? Given a vertex frame, any of the fi may be expressed as

fi = ai +∑

j

ai,jxj

with the ai ≥ 0 since the origin is assumed, by definition of a vertex frame, to lie in the region fi ≥ 0. We

propose to move out along the directed edge where all basic variables but one of them, say xc, are equal to

0, and in the direction where xc takes nonnegative values. This amounts to moving along an edge of thepolytope. We shall move along this edge until we come to the vertex at its other end, if there is one, or take

into account the fact that this edge extends to infinity. So we must find the maximum value that xc can take

along the line it is moving on. Along this edge the other xj vanish, and we have each

fi = ai + ai,cxc

There are two cases to deal with:

(a) If ai,c ≥ 0 there is no restriction on how large xc can be;

(b) if ai,c < 0 then we can only go as far as xc = −ai/ai,c.

Therefore if all the ai,c are nonnegative then the edge goes off to infinity, but otherwise the value of xc isbounded by all of the−ar/ar,c. In that case, we choose r such that ar,c < 0 and−ar/ar,c takes that minimum

value among those where ai,c < 0, and set xc equal to it.

fi = 0

xc = 0

xc

fr = 0

We also change bases—replacing xc by fr. The minimum value might in fact be 0, in which case we change

vertex frames (i.e. virtual vertices) without changing the geometric vertex. This sort of virtual move willhappen only if the vertex is singular, which means as I have said that more than the generic number of faces

meet in that vertex, and the maximal value of xc will be 0.

Thus our choice of r is made so that

ai − (ar/ar,c)ai,c ≥ 0

for all i, since if ai,c ≥ 0 there is no condition (ai, ai,c and −ar/ar,c are all nonnegative) and if ai,c < 0 thenby choice of r

−ai

ai,c

≥ −ar

ar,c

.

When we change frame, we have to change all occurrences of xc (which becomes just a function fc in our

system) by substitution. In short, we pivot.

Sometimes, as in the example, there will be several possible choices of fr , if the target vertex isn’t simplicial.

Indeed, in practice when pivoting is done in an application there may be several possible edges to move

out along—i.e. which variable xc to exit from the basis. The most important thing in all cases is to preventlooping. There are two commonly used methods to do this. One is a method of choosing both xc and fr,

while the other is fussy about only fr . I’ll explain these in the next section.


9. Tracking coordinate changes

As wemove among vertices, changing from one vertex basis to another, we’ll want to keep track of wherewe

are in the original coordinate system. We can recover the location of a vertex from the basis, since it amountsto solving a system of linear equations. But this is a lot of unnecessary work. We can instead maintain some

data as we go along that make this a much simpler task.What data do we have to maintain in order to locateeasily the current vertex? A slightly better question: What data do we have to maintain in order to changeback and forth between the current basis and the original one?

Keeping track of location comes in two halves: changing back and forth between the original coordinatesystem and the one associated to the homogenized problem, in which we located the invariance subspace

and wound up with the cone; changing back and forth between that frame and the current vertex frame. The

first is done by a linear coordinate change, and after that, by affine changes, substituting fr for xc. Thus isdone by exactly the same column operations that are performed on F , and in programs R is often tacked

onto the bottom of F to make this convenient. Its rows are not listed in either of the two lists accompanyingthe data structure, of course.


10. An example in some detail

In this section, I’ll summarize the process I have outlined, continuing with an earlier example.

Step 1. Start with affine inequalities fi(x) ≥ 0 defining the region Ω in Rn.

In the example at hand, these are

2x + y ≥ 1

x + 2y ≥ 1

x + y ≥ 1

x − y ≥ −1

3x + 2y ≤ 6 .

In pictures:

x−

y=−1

−3x

−2y

=−6

x + 2y = 1

x+

y=

1

2x+

y=

1 (4/5, 9/5)(0, 1)

(1, 0)(5/2,−3/4)

Step 2. We interpret these as homogeneous inequalities tomakeupamatrixF whose rows are linear functions

in dimension n + 1. We do this by interpreting the constants in the affine functions as coefficients of the newvariable xn+1. We add the inequality xn+1 ≥ 0 to get Fh. This corresponds to a system of linear inequalities

defining the cone CΩ over (Ω, 1).

Here, this gives the matrix

Fh =

2 1 −11 2 −11 1 −11 −1 1

−3 −2 60 0 1

,

corresponding to inequalities

2x + y − z ≥ 0

x + 2y − z ≥ 0

x + y − z ≥ 0

x − y + z ≥ 0

−3x − 2y + 6z ≥ 0

z ≥ 0 .

Step 3. We now apply column reduction to get a matrixG as the column reduction of Fh and a matrixR suchthat Fh ·R = G


Here

G =

1 0 00 1 00 0 12 0 −33 4 −131 1 −3

, R =

1 0 −10 1 −11 1 −3

.

In general, the columns ofR will be partitioned into ranges [1, d+1] and [d+2, n+1]. The first fewmake up

a (m+1)× (d+1)matrix whose columns amount to a basis of a section of the projection onto Rn+1/Inv(Ω),

the remaining ones give a basis of the null space of F . In this example d = n.

Step 4. It is convenient to perform one more sequence of column operations, pivoting the function z into thebasis of coordinates.

In this case, getting

G =

1 −1 30 1 00 0 12 −2 33 1 −41 0 0

, R =

1 −1 20 1 −11 0 0

.

This R should be stored as R0 before proceeding, and start afresh.

Step 5. Wemake this into a vertex frame in dimension n + 1 (representing affine inequalities) with an added

inequality z ≤ 1:

G =

1 −1 3 00 1 0 00 0 1 02 −2 3 03 1 −4 01 0 0 0

−1 0 0 1

.

Step 6. We now follow the basic simplex algorithm that starts out with a vertex frame to maximize z. Eitherthis demonstrates that the Ω is empty or it produces a vertex frame for Ω. in this example we pivot just

once (exchanging x with −z + 1) to maximize z. This gives us the vertex frame on the extended region in

dimension n + 1. Here

G =

−1 −1 3 10 1 0 00 0 1 0

−2 −2 3 2−3 1 −4 3−1 0 0 1

1 0 0 0

.

Step 7. This is a point on the slice z = 1, and corresponds to a vertex of the original region.

In the example, it is in fact the intersection of the two lines x + 2y − 1 ≥ 0, x + y − 1 ≥ 0, the point (1, 0). We

now want to drop all the extra apparatus we needed to get to this point, so as to be dealing only with Ω. To

do this, we drop rows and columns referring to z. Most obviously, we remove the last two rows, referring to


z and −z + 1. At the end, the variable−z + 1 will be part of the basis. It will be correspond to some columnm. We remove columns m from both G and R.

G =

−1 3 11 0 00 1 0

−2 3 21 −4 3

.

What is the meaning of these matrices? We are looking at the vertex (1, 0) of Ω, the intersection of the lines

f2 = f3 = 0. The other inequalities are expressed in terms of these:

−f2 + 3f3 + 1 ≥ 0

f2 ≥ 0

f3 ≥ 0

−2f2 + 3f3 + 2 ≥ 0

f2 − 4f3 + 3 ≥ 0 .

Step 8. While the computation of G has been going on, simultaneously a matrix R has also been calculatedby applying the same column operations to it. Each stage produces a matrix R, and the final one is obtained

by multiplying them all together. The final matrix R tells us how to get from the current coordinate system

to the original one.

In this example

R =

−1 2 1

1 −1 00 0 1

.

For example, the point (0, 0) in the current system is the point (1, 0) in the original, as I have already

mentioned. We can therefore check how all this has gone by calculating

2 1 −11 2 −11 1 −11 −1 1

−3 −2 6

−1 2 1

1 −1 00 0 1

=

−1 3 11 0 00 1 0

−2 3 21 −4 3

.

We can also verify that this is a vertex frame by observing that all constants (i.e. last column) in the final

matrix are nonnegative.

At any rate, given this vertex frame one is ready to start serious exploration.

Part III. Applications


11. Finding maximum values

The original problem that motivated the introduction of the simplex method in linear programming was

to maximize an affine function z = c +∑

cixi subject to a set of conditions fi = ai +∑

ai,jxj ≥ 0 thatdetermine a convex region Ω. In doing this, one proceeds from one vertex of Ω to another, always increasing

the value of z as one goes. This is kept up until it is seen either that z is unbounded on Ω, or that no choice

of maximizing edge is possible and the maximal value of z has been found.

Well, not quite. What one really does is proceed, not from one vertex to another, but from one vertex frame to

another. This takes place through a pivot operation. As one changes vertex frames one also keeps track of theexpression for z in the current coordinate system. If the edge has length 0, the vertex itself does not change

although the frame does, and the value of z does not in fact change. If the move is virtual—does not change

vertices—then something has to be done to avoid going around in circles.

The problem is that in the simplex method one is always proceeding with only local information. There is no

way to peek ahead to see where it is going. So one must find a way to pivot, based only on local information,that somehow avoids cycling. There have been a number of methods devised to do this, of which I shall

discuss two.

BLAND’S RULE. At each step, one affine function fi in the current basis is replaced by another. One valid

rule to follow is to choose the exiting and entering functions to have the least possible subscript. This rule

was introduced in [Bland:1977]. It is covered in many places in the literature, but most of them are simplevariations of one another. I’ll follow the explanation in [Camarena:2016], which is a variant of the treatment on

pages 37–38 of Chvatal’s book.A slightly different treatment can be found in §5.8 of [Matousek–Gartner:2007].

11.1. Proposition. Following Bland’s rule, the simplex method will terminate.

Proof. If the process does not terminate, it must cycle; and if it cycles, it does so at a fixed vertex. We maytherefore forget about those functions that do not vanish at that vertex. For convenience, wemay also assume

that the function z to be maximized is 0 at the given vertex.

Step 1. Let S be the set of all indices of the given affine functions. At any stage, let B be the indices of the

functions fi in the current basis, O its complement in S. Then write each affine function f as

f =∑

Scf,jfj =

∑B

cf,jfj

in which the first sum is over all of S, but cf,j = 0 unless j is in B. In particular we write

fi =∑

Sfi,jfj =

∑B

fi,jfj .

With these assumptions, here are the rules for pivoting. Suppose at a given stage

z =∑

Bζifi

fj =∑

Bfj,ifi

Then: (1) we choose i minimal with ζi > 0; (2) we choose j minimal such that fj,i < 0; (3) we replace fi inthe basis by fj .

Step 2. Now suppose that a cycle does occur while following Bland’s rule. Call s in S volatile if in the cyclefs is at one stage in the basis and at another stage is not.

11.2. Lemma. In order for s to be volatile, it suffices that at some stage it either leaves the basis or enters it.

Proof. Because it must reverse the move later in the cycle in order to restore the state.


Step 3. Let a be greatest among the volatile indices. Suppose it enters the basis at some stage p (i.e. in goingfrom stage p to the next). Let b be such that fb exits the basis in that step. Then b is also volatile, so that by

assumption b < a. Write at stage p

z =∑

S

ζifi .

Thus b is least with ζb > 0, and a is least with fa,b < 0. Since a is also maximal among volatile indices,

fc,b ≥ 0 for all other c.

Step 4. Set Pt to be the point whose coordinates at stage p are

fi =

t if i = b0 otherwise.

It is on the fb axis. Then

fj(Pt) = fj,bt

for any j in S and

z(Pt) = ζbt .

Now suppose at some later stage q that fa leaves the basis. Suppose that at this stage

z =∑

Sζ∗i fi .

Then

z(Pt) = ζ∗b t +∑

Op

ζ∗i fi,bt

so

(11.3) ζb = ζ∗b +∑

Op

ζ∗i fi,b .

This equation is the essential link between two stages.

Step 5. We now have ζ∗a > 0 because a exits the basis at stage q; ζ∗s ≤ 0 for all other s in S. Why? If s does

not lie in Bq then c∗b vanishes by definition, while if b does lie in Bq then according to Bland’s rule ζ∗s > 0 forall s 6= a in Bq.

Step 6. Since ζb > 0 but ζ∗b ≤ 0, (11.3) implies that ζ∗c fc,b > 0 for some c. Let’s look at fc more closely.

It is not in the basis Bp because the sum in (11.3) ranges only over Op. It is in Bq because ζ∗c 6= 0. Thereforec is volatile.

This implies that c ≤ a. But c 6= a. Why? Because ζ∗c fc,b > 0 while ζ∗afa,b < 0. This implies in turn that

fc,b > 0 because among volatile indices only fa,b < 0. Since ζ∗c fc,b > 0, we deduce that ζ∗c > 0. But this is acontradiction, since according to Bland’s rule only ζ∗a > 0.

PERTURBATION. Bland’s rule is very simple to implement. Not so simple is the perturbation method, which

has as variant the lexicographic method. This also is explained in Chvatal’s book. The variant is the practicalimplementation of the first, which is simpler to understand. In this, as soon as we have found an affine basis,

we add to each of the ‘slack’ variables fi a symbolic constant εi with these subject to conditions

1 ≫ ε1 ≫ ε2 ≫ · · · ≫ εℓ > 0 ,

in which ℓ is the number of nonbasic inequalities. Then whenever we pivot, the constant terms in all thevariables and the objective function will be linear combinations of numbers and these εi. In effect, we are

setting the constants to be some kind of εnumbers c0 +∑

ciεi, which we can add, subtract, and multiply

by scalars. Such numbers are compared in the natural way, which amounts to lexicographic ordering of thearrays. Furthermore, with these adjustments th system becomes simplicial—none of the functions fi (i.e.


no variable other than the basis variables) will ever evaluate to 0 at the vertex, because the different linearcombinations of the εi will always be linearly independent (pivoting preserves this property, and the matrix

starts with I). Thus, no pivots will ever be virtual.

At the end, set all εi = 0. This method has the property that we are allowed to go out along any coordinateedge, and need only make a choice among the functions that will become part of the basis. This happens

only when there are ties. This feature plays a role in the process discovered by Avis and Fukuda that I shallmention later on.

There is an equivalent description common in the literature, which also uses lexicographic ordering of arrays.The best account I know is in [Apte:2012], although more cryptic accounts are easy enough to find. It does

not use εnumbers, but expands the matrices of dictionaries instead. I prefer the version given here, which

seems to be more intuitive even if slightly less efficient.

The advantage of either one of these lexicographic strategies is that it is only the choice of new variable to set

as coordinate that uses it. This makes it exactly suitable for the lrsmethod of Avis and Fukuda for traversingall vertices of a conves polytope from top down.

12. Finding all vertices of a polygon

The traditional method from this point on is to use a stack to index and locate vertex bases, collecting them

into equivalence classes, each corresponding to a vertex. Note that each vertex will correspond to a subsetof the original set of functions, so our search is finite. Indeed, a good way to specify a vertex, at least when

using exact arithmetic, is to tag a vertex by the set of functions vanishing there. On the other hand, it mayhappen that we have a singular vertex where several different sets of basic variables give rise to the same

geometrical vertex. If we are doing floating point arithmetic, because of possible floating point roundoff we

should ignore this possibility, and proceed as if it doesn’t occur. It shouldn’t cause trouble—a single pointmight decompose into several, but they will all be close together and effectively indistinguishable.

Of course the regionmay be unbounded, in which case we should find the infinite edges of the polytope, too.

Thismethod locates all edges and vertices by the simplexmethod, indexing the vertices by the set of vanishing

variables, edges by the coordinate of the edge. The trouble is that obvious methods pass from one vertexframe to another, which means an apparent huge waste of time. Anyway, vertex frames are pushed onto the

stack as they are met. They are distinguished by the subset of basic functions.

[AvisFukuda:1992] and [Avis:2000] describe a better method, called in it current version lexicographicreverse search . I’ll explain this in detail in a later version of this essay.

13. Nearest point

There is another problem I’d like to also take into consideration—that of finding for a point of space thenearest point of the polytope. The first step is to scan all the inequalities, to see which the given point satisfies.

If it satisfies all of them, then the point lies in the region, and we are through. Otherwise, throw away those

satisfied by the point, and calculating perpendicular projections onto the remaining facial supports. Finally,throw away the projections that do not lie strictly inside the corresponding face. This is not very efficient, but

is there a better way?


14. Finding the dimension of a convex region

The dimension of a polytope makes sense in computation only when exact arithmetic is used, because the

smallest of perturbations can change a set of low dimension to an open one.

Start with trying to find a vertex frame. If there is none, the dimension is−1. Next, throw away all but the fi

vanishing at the origin, so we are looking at the tangent cone of the vertex. Take as function to maximize thelast basis variable; then find its maximum. If this is 0, we reduce the problem by eliminating that variable.

Or we come across an infinite edge (axis of some xc) out. In that case, throw away all those containing the

variable xc, in effect looking now at the tangent cone of that edge. Recursion: the dimension is one more thanthe dimension of that tangent cone. The whole process amounts to a succession of one or the other of these

two strategies, applied recursing on rank. The case of rank one is straightforward—if the rank is 1, then in

case (1) the dimension is 0 while in case (2) it is 1.

15. Implementation

I follow [Chvatal:1983] closely. Classically one uses tableaux , but Chvatal replaces them by what he calls

dictionaries . All the operations I have described amount to constructing an initial dictionary, and thenproceeding through a sequence of them according to what one is looking for.

So in my programs, an LP dictionary is a Python class with data an m × (n + 1) matrix whose rowsrepresent affine inequalities fi ≥ 0. In initialization, we pass an arbitrary matrix of this type, and from it we

immediately turn it to an affine basis togetherwithR, ρ. After the initializationwehave lists of basis variablesand dependent variables . Through all subsequent operations we maintain these structures. In effect, eachinequality represents one of the possible basic variables (these are called slack variables in the literature).

The next step is to turn this affine basis into a vertex basis. After that, what happens on what the ultimategoal is. The basic routine is a pivot, whose argument is one of the basis variables. This variable is removed

from the basis variables, and replaced by a new one of the affine functions, which is chosen according to the

minimal subscript rule. In the pivot, the matrix of affine functions is expressed in terms of the new basis.

I have said that the conventional terminology is bizarre. The most bizarre usage is that the affine functions

among the inequalities that are not coordinates in a frame are said to make up the basis of that frame. Thus,when a function f(x) ceases to be one of the coordinate system it is said to ‘enter the basis’, and when another

is chosen as the new coordinate, it is said to ‘leave the basis’. This terminology is not aimed atmathematicians,but then most consumers of linear programming are not mathematicians, either.

Part IV. References

1. David Avis, ‘lrs: a revised implementation of the reverse search vertex enumeration algorithm’, pp.177–198 in Polytopes—Combinatorics and Computation , edited byGil Kalai and Gunter Ziegler, Birkhauser

Verlag, 2000.

2. ——andKomei Fukuda, ‘A pivoting algorithm for convex hulls and vertex enumeration of arrangements

and polyhedra’, Discrete and Computational Geometry 8 (1992), 295–313.

3. Robert G. Bland, ‘Newfinite pivoting rules for the simplex method’,Mathematics of Operations Research2 (1977), 103–107.

4. Omar Antolin Camarena, ‘Bland’s rule guarantees termination’, course notes for the spring of 2016,

available at

http://www.math.ubc.ca/~oantolin/math340/blands-rule.html

5. ConstantinCaratheodory, ‘UberdenVariabilitatsbereichderKoeffizientenvonPotenzreihen, die gegebene

Werte nicht annehmen’,Mathematische Annalen 64 (1907), 95–115.

6. Vasek Chvatal, Linear programming , W. H. Freeman, 1983.


7. Donald E. Knuth,lp.w, a CWEBprogram inwhichKnuth lays out an introduction to linear programming.Posted August, 2005 at

http://www-cs-faculty.stanford.edu/ uno/programs.html

8. Jiri Matousek and Bernd Gartner, Understanding and using linear programming , SpringerVerlag, 2007.

9. HermannMinkowski, ‘TheoriederkonvexenKorper, insbesondereBegrundung ihresOber flachen begriffs’.

This has been published only as item XXV in volume II of Gesammelte Abhandlungen , Leipzig, 1911.

10. ——, Geometrie der Zahlen , Teubner, Berin, 1910.

11. Stephen Smale, ‘On the average number of steps of the simplex method of linear programming’,Mathematical programming 27 (1983), 241–262.

12. Daniel Spielman and ShangHua Teng, ‘Smoothed analysis: why the simplex method usually takes

polynomial time’, Journal of the ACM 51 (2004), 385–463.

13. Ernst Steinitz, ‘Bedingt konvergenteReihenundkonvexe Systeme’, Journal fur dieReine andAngewandteMathematik 143 (1913), 128â^175.

14. Hermann Weyl, ‘Elementare Theorie der konvexen Polyeder’, Commentarii Mathematici Helvetici 7,1935, pp. 290–306.

There is a translation of this into English by H. Kuhn: ‘The elementary theory of convex polytopes’, pp. 3–18

in volume I of Contributions to the theory of games , Annals of Mathematics Studies 21, Princeton Press,

1950.

15. Gunter Ziegler, Lectures on polytopes , Springer, 2007.

polytopes and the simplex method - university of british ...cass/research/pdf/simplex.pdf ·...

Documents