[lecture notes in mathematics] symposium on the theory of numerical analysis volume 193 || matrix...

Matrix Methods in Mathematical Programming

GENE GOLUB

Stanford University

22

I. Introduction

With the advent of modern computers, there has been a great development in

matrix algorithms. A major contributer to this advance is J. H. Wilkinson [30].

Simultaneously, a considerable growth has occurred in the field of mathematical

programming. However, in this field, until recently, very little analysis has been

carried out for the matrix algorithms involved.

In the following lectures, matrix algorithms will be developed which can be

efficiently applied in certain areas of mathematical programming and which give

rise to stable processes.

We consider problems of the following types:

maximize ~ (~) , where ~ = (x,, x,, .. Xn) T

subject to Ax= b

Gx ~h

where the objective function ~ (~) is linear or quadratic.

2. Linear Programming

The linear programming problem can be posed as follows:

T m~x~i,e ~ (~) = ~

subject to A~_ = b (2.1)

) 0 (2.2)

We assume that A is an m x n matrix, with m < n, which satisfies the Haar

condition (that is, every m x m submatrix of A is non-singular). The vector ~ is

said to be feasible if it satisfies the constraints (2.1) and (2.2).

Let I = lil, i2, .. iml be a set of m indices such that, on setting xj = O,

j $ I, we can solve the remaining m equations in (2.1) and obtain a solution such

that

xij > 0 , J = I, 2, .. m .

Thi8 vector x is said to be a basic feasible solution. It is well-known that

the vector ~ which maximizes ~ (~) = o T x is a basic feasible solution, and this

suggests a possible algorithm for obtaining the optimum solution, namely, examine

all possible basic feasible solutions.

23

Such a process is generally inefficient. A more systematic procedure, due to

Dantzig, is the SimylexAl~orithm. In this algorithm, a series of basic feasible

solutions is generated by changing one variable at a time in such a way that the

value of the objective function is increased at each step. There seems to be no

way of determining the rate of convergence of the simplex method; however, it works

well in practice.

The steps involved may be given as follows:

(i) Assume that we can determine a set of m indices I = liI , i,, .. iml such that

the corresponding x i are the non-zero variables in a basic feasible solution.

J Define the basis matrix

B = [ai , Ai2, .. aim ]

where the a are columns of A corresponding to the basic variables. --lj

(ii) Solve the system of equations:

B~=b

where ~.T= [Xil, Xi, ' .. Xim]

(iii) Solve the system of equations:

B T ^ W = C

where _~T__ [ci,, ci2' .. cim] are the coefficients of the basic variables in the

objective function.

(iv) Calculate

~T w] - T w , say. ( .cj - ~ ~. = Cr ~r - max

j £ I T

If c r - ~ w • 0 , then the optimum solution has been reached. Otherwise, a is to ~r

be introduced into the basis.

(v) Solve the system of equations:

Bt = - a --r

If t ~ 0 , k = I, 2, r k

bounded.

• . m , then this indicates that the optimum solution is un-

Otherwise determine the component s for which

x i x s = min - ~ trk 0

t r s 1 ~k~m t r k

24

Eliminate the column a i from the basis matrix and introduce column a r. s

This process is continued from step (ii) until an optimum solution is obtained (or

shown to be unbounded).

We have defined the complete algorithm explicitly, provided a termination rule,

and indicated how to detect an unbounded solution. We now show how the simplex

algorithm can be implemented in a stable numerical fashion.

~. A stable implementation of the simplex al6orithm

Throughout the algorithm, there are three systems of linear equations to be

solved at each iteration. These are:

B~ = b , m

BTw = c ,

Bt = -a --r -r

Assuming Gaussian elimination is used, this requires about m3/3 multiplica-

tions for each system. However, if it is assumed that the triangular factors of B

are available, then only O(m 2) multiplications are needed. An important considera-

tion is that only one column of B is changed in one iteration, and it seem, reasonable

to assume that the number of multiplications can be reduced if use is made of this.

We would hope to reduce the m3/3 multiplications to O(m 2) multiplications per step.

This is the basis of the classical simplex method. The disadvantage of this method

is that the pivoting strategy which is generally used does not take numerical

stability into consideration. We now show that it is possible to implement the

simplex algorithm in a more stable manner, the cost being that more storage is re-

quired.

Consider methods for the solution of a set of linear equations. It is well-

known that there exists a permutation matrix n such that

HB = LU

where L is a lower triangular matrix, and U is an upper triangular matrix.

If Gaussian elimination with partial (row) pivoting is used, then we proceed

as follows :

Choose a permutation matrix H, such that the maximum modulus element of the

25

first column of B becomes the (I, 1) - element of 1"] 1 B.

Define an elementary lower triangular matrix F k as

k ~ | -

r k = I ' ! - ! f

" i |

". ~ I

'LL I ' l , I ' | " ~ , J ".

Now~ can be chosen so that

P, HI B

has all elements below the diagonal in the first column set equal to zero.

Now choose 92 so that

92 r, 9, B

has the maximum modulus element in the second column in position (2, 2), and

choose r e so that

r= fl~ 1"t H2 B

has all elements below the diagonal in the second column set equal to zero. This

can be done without affecting the zeros already computed in the first column.

Continuing in this way we obtain:

rm- , ~ m - , . . . P 2 ~ , r, 9, B = U

where U is an upper triangular matrix.

Note that permuting the rows of the matrix B merely implies a re-ordering of

the right-hand-side elements. Thus, no actual permutation need be performed,

merely a record kept. Further any product of elementary lower triangular matrices

is a lower triangular matrix, as may easily be shown. Thus on the left-hand side

we have essentially a lower triangular matrix, and thus the required factorization.

The relevant elements of the successive matrices F k can be stored in the

lower triangle of B, in the space where zeros have been introduced. Thus the

method is economical in storage.

26

To return to the linear programming problem, we require to solve a system of

equations of the form

B ( 1 ) ~ = v ( 3 . ~ )

where B (i) and B (i-I) differ in only one column (although the columns may be re-

ordered)°

Consider the first iteration of the algorithm. Suppose that we have obtained

the factorization:

B (°) = S (°) U(o)

where the right-hand-side vector has been re-ordered to take account of the permuta-

tions.

The solution to (3 . i ) with i = 0 is obtained by computing

= (L~°)) -~ x

and solving the triangular system

v(O) = ~ ,

2 each of which requires m + 0 (m) multiplications.

Suppose that the column b (°) is eliminated from B (°) and the column g(O) is S O

introduced as the last column, then

BO) = [b(O) b(O) . b(O) bCo) ~(o)] L t • ~ 2 ' " ~ S " t • ~ S * 1 ' " "

0 0

Therefore,

( ~ ( o ) ) .1 BO) = HO) ,

where H (I) has the form:

/

{ <

27

Such a matrix is called an upper Hessenberg matrix. 0nly the last column need be

computed, as all others are available from the previous step. We require to apply

a sequence of transformations to restore the upper triangular form. It is clear

that we have a particularly simple case of the LU factorization procedure as

previously described, where r! I) is of the form: i

R I ' I i I #-~

• I k_Y ' I " I

11 1

,q~'/ I1 . J i "

I I 0

r~ I) =

only one element requiring to be calculated. On applying a sequence of transforma-

tion matrices and permutation matrices as before, we obtain

1) 1) . . r (1) H(1) = u (1) s s

o o

where U (I) is upper triangular.

(I) it is only necessary to compare two Note that in this case to obtain Hj

(I) and elements. Thus the storage required is very small: (m - So) multipliers gi

(m - So) bits to indicate whether or not interchanges are necessary.

All elements in the computation are bounded, and so we have good numerical

accuracy throughout. The whole procedure compares favourably with standard forms,

for example, the product form of the inverse where no account of numerical accuracy

is taken. Further this procedure requires fewer operations than the method which

uses the product form of the inverse. If we consider the steps involved, forward

and backward substitution with L (°) and U (i) require a total of m 2 multiplications

and the application of the remaining transformation in (L(i)) -I requires at most

i(m - I) multiplications. (If we assume that on the average the middle column of

the Basis matrix is eliminated, then this will be closer to (i/2) (m - I) ). Thus

a total of m 2 + i (m - I) multiplications are required to solve the system at each

28

stage, assuming an initial factorization is available. Note that if the matrix A

is sparse, then the algorithm can make use of this structure as is done in the

method using the product form of the inverse.

4" Iterative refinement of the.solution

Consider the set of equations

B~ = X

and suppose that ~ is a computed approximation to ~ . Let

-- ~+£

Therefore,

that is,

B(~ + 2) : v ,

Be_ -- v-B~

We can now solve for c very efficiently, since the LU decomposition of B is

available. This process can be repeated until ~ is obtained to the required accur-

acy. The algorithm can be outlined as follows:

(i) Compute ~j = ~ - B~_j

(ii) Solve B_cj = r -j

(iii) Compute ~j+1 = ~J + ~J

It is necessary for r to be computed in double precision and then rounded to --j

single precision. Note that step (ii) requires 0(m 2) operations, since the LU de-

composition of B is available. This procedure can be used in the following sections.

~. Householder Trian~ularization

Householder transformations have been widely discussed in the literature. In

this section we are concerned with their use in reducing a matrix A to upper-

triangular form, and in particular we wish to show how to update the decomposition

of A when its columns are changed one by one. This will open the way to implemen-

tation of efficient and stable algorithms for solving problems involving linear

constraints.

Householder transformations are symmetric orthogonal matrices of the form

Pk = I - k UkUk where u k is a vector and Ck = 2/( ). Their utility in this

29

context is due to the fact that for any non-zero vector 2 it is possible to choos~

u k in such a way that the transformed vector Pk a is zero except for its first

element. Householder [15] used this property to construct a sequence of transfor-

mations to reduce a matrix to upper-triangular form. In [29], Wilkinson describes

the process and his error analysis shows it to be very stable.

Given any A, we can construct a sequence of transformations such that A is

reduced to upper triangular form. Premultiplying by P annihilates (m - 1) O

elements in the first column. Similarly, premultiplying by PI eliminates (m - 2)

elements in the second column, and so on.

Therefore,

em-1 Pm-2 "'PI PoA = [ RO ] ' (5.1)

where R is an upper triangular matrix.

Since the product of orthogonal matrices is an orthogonal matrix, we can

write (5.1) as

QA = [ R] 0

A=QT[ R ] 0

The above process is close to the Gram-Schmidt process in that it produces

a set of orthogonal vectors spanning E . In addition, the Householder transforma- n

tion produces a complementary set of vectors which is often useful. Since this

process has been shown to be numerically stable, it does produce an orthogonal

matrix, in contrast to the Gram-Schmidt process.

If A = (~I ,...,~n) is an mxn matrix of rank r, then at the k-th stage of the

triangularization (k < r ) we have

where R k

A (k) PoA= = Pk-1Pk-2 "'"

0

is an upper-triangular matrix of order r.

T k

The next step is to compute

A.k+1.( ~ = Pk A'k" ( ~ where Pk is chosen to reduce the first column of T k to zero

except for the first component. This component becomes the last diagonal element

30

of ~+I and since its modulus is equal to the Euclidean length of the first column

of T k it should in general be maximized by a suitable interchange of the columns

of Sk . After r steps, T will be effectively zero (the length of each of its r

T k

col~Im=~ will be smaller than some tolerance) and the process stops.

Hence we conclude that if rank(A) = r then for some permutation matrix H the

Householder decomposition (or "QR decomposition") of A is

Q A ~ = Pr-1 Pr-2 "'" PO A =

r

O 0

where Q = Pr-1Pr-2 "'" PO is an m x m orthogonal matrix and R is upper-triangular

and non-singular.

We are now concerned with the manner in which Q should be stored and the

means by which Q, R, S may be updated if the columns of A are changed. We will

suppose that a column a is deleted from A and that a column a is added. It will ~p ~q

be clear what is to be done if only one or the other takes place.

Since the Householder transformations Pk are defined by the vectors u k the

usual method is to store the Uk'S in the area beneath R, with a few extra words of

memory being used to store the ~k'S and the diagonal elements of R. The product

Q~ for some vector ~ is then easily computed in the form Pr-1Pr-2 "'" PO ~ where,

T T for example, PO ~ = (I - ~0~O~0)~ = ~ - ~o(Uo~)Uo . The updating is best

accomplished as follows. The first p-1 columns of the new R are the same as before;

the other columns p through n are simply overwritten by columns ap+1, ..., an, aq

and transformed by the product Pp-1Pp-2 "'" PO to obtain a new

I (Sp_ I ~ I' then T is triangularized as usual.

\%1 ] p-1

This method allows Q to be kept in product form always, and there is no accumula-

tion of errors. Of course, if p = I the complete decomposition must be re-done

and since with m~ n the work is roughly proportional to (m-n/3)n 2 this can mean

a lot of work. But if p A n/2 on the average, then only about I/8 of the original

work must be repeated each updating.

31

Assume that we have a matrix A which is to be replaced by a matrix ~ formed

from A by eliminating column a and inserting a new vector g as the last column.

As in the simplex method, we can produce an updating procedure using Householder

transformations. If ~ is premultiplied by Q, the resulting matrix has upper

Qi = /

/

<

As before, this can be reduced to an upper triangular matrix in O(m 2) multiplica-

tions.

6. Projections

In optimization problems involving linear constraints it is often necessary

to compute the projections of some vector either into or orthogonal to the space

defined by a subset of the constraints (usually the current "basis"). In this

section we show how Householder transformations may be used to compute such pro-

jections. As we have shown, it is possible to update the Householder decomposi-

tion of a matrix when the number of columns in the matrix is changed, and thus we

will have an efficient and stable means of orthogonalizing vectors with respect to

basis sets whose component vectors are changing one by one.

Let the basis set of vectors a 1,a2,...,a n form the columns of an m x n

matrix A, and let S be the sub-space spanned by fail • We shall assume that the r

first r vectors are linearly independent and that rank(A) = r. In general,

m > n > r , although the following is true even if m < n •

Given an arbitrary vector z we wish to compute the projections

u = Pz , v = (I - P) z

for some projection matrix P , such that

Diagramatically, Hessenberg form as before.

32

a) z = u + v

(b) 2v = 0

(o) ~s r (i.e., 3~ ~uoh that ~ = ~)

(i.e., ATv (d) v is orthogonal to S r ~ = o)

One method is to write P as AA + where A + is the n x m generalized inverse of A,

and in [7~ Fletcher shows how A + may be updated upon changes of basis. In contrast,

the method based on Householder transformations does not deal with A + explicitly

but instead keeps AA + in factorized form and simply updates the orthogonal matrix

required to produce this form. Apart from being more stable and just as efficient,

the method has the added advantage that there are always two orthonormal sets of

vectors available, one spanning S and the other spanning its complement. r

As already shown, we can construct an m x n orthogona~ matrix Q such that

r n-r

QA = £i 0 S1

where R is an r x r upper-triangular matrix. Let

W = Qz =

I r

m-r

(6.~)

and define

~ ' X= ~2 (6.2)

Then it is easily verified that ~,~ are the required projections of ~, which is to

say they satisfy the above four properties. Also, the x in (c) is readily shown

to be

In effect, we are representing the projection matrices in the form

33

and

P Q C: r) = (z r o)Q ( 6 . ~ )

I-P =QT (im_rO ) (OI r)Q (6.A)

and we are computing ~ = P z, Z = (I - P)~ by means of (6.1), (6.2) • The first r

col,m~R of Q span S and the remaining m-r span its complement. Since Q and R may r

be updated accurately and efficiently if they are computed using Householder

transformations, we have as claimed the means of orthogonalizing vectors with re-

spect to varying bases.

As an example of the use of the projection (6.4), consider the problem of

finding the stationary values of xTAx subject to xTx = I and cTx = O, where A is a

real symmetric matrix of order n and C is an n x p matrix of rank r, with r ! P <~ n.

It is shown in [12] that if the usual Householder decomposition of C is

r n-r

Qc= (Ro OS )

th@n the problem is equivalent to that of finding the eigenvalues and eigenvectors

of the matrix PA , where

P = I-P = O O Q

0 In_ r

is the projection matrix in (6.4).

Note that, although PA is not symmetric, since P~ = P , then

PA = P2A

and further the eigenvalues of P2A are equal to the eigsnvaluee of the s~etric

matrix PAP. The dimensionality of the problem is not reduced; some of the eigen-

values will be zero.

~. Linear least-squares problem

The least-squares problem to be considered here is'

34

m£n l l b - A~_It 2

where we assume that the rank of A is n.

Since length is invariant under an orthogonal transformation we have

where QA =

lib - A x l l 2 = l l Q b - QA~_II "+ 2 2

[ 1{ ]. Let 0

Qb = c : [o_, ]. - - - - C 2 m- n

Then,

2, 1{] x U' = Ha_,- ~_H" + lla.il" " [~_,] - [o - , ,

and the solution to the least-squares problem is given by

= 1{ -1 c,

Thus it is easy to solve the least-squares problem using orthogonal transformations.

Alternatively, the least-squares problem can be solved by constructing the

normal equations

A x = A D

However these are well-known to be ill-conditioned.

Nevertheless the normal equations can be used in the following way.

Let the residual vector r be defined by:

r = b - A ~

Then,

ATr = ATb - ATA~ = 0

These equations can be written:

[IA A]O Ir> (:Jx+ Thus,

0 I

Multiplying out:

(1{7o) o

IAT AIi TO IOii IO

C CO/o

(r) X

(7.~)

:I(:)

35

where ~ = QE and S = Q~ .

This system can easily be solved for ~ and ~. The method of iterative refine-

ment may he applied to obtain a very accurate solution.

This method has been analysed by BJhrck [2].

8. Least-squares problem with linear constraints

Here we consider the problem

minimize ~ - A~_~ 2 2

subject to G~ = ~ .

Using Lagrange multipliers ~ , we may incorporate the constraints into

equation (7.1) and obtain

0 I A

G T A T 0 1 b

0

The methods of the previous sections can be applied to obtain the solution of this

system of equations, without actually constructing the above matrix. The problem

simplifies and a very accurate solution may be obtained.

Now we consider the problem

minimize llb - A~_~ 2 2

subject to Gx ~> h.

Such a problem might arise in the following manner. Suppose we wish to approximate

given aata by the polynomial

y(t) = ~t ~ + @t 2 + yt +

such that y(t) is convex. This implies

y(')(t) = 6at + 2~ ) 0 .

Thus, we require

6 a t i + 2~ ) 0

where t. are the data points, (This aces not necessarily guarantee that the poly- l

hernial will be convex throughout the interval. ) Introduce slack variables w such

that Gx - w = h

where w ~ _O .

36

Introducing Lagrange multipliers as before, we may write the system as:

i O 0 G -I 0 I A 0

G T A T 0 0

r

x

w

h

b

0

At the solution, we must have

T • _~o, w~o, _z_w=0.

This implies that when a Lagrange multiplier is non-zero then the corresponding

constraint holds with equality.

Conversely, corresponding to a non-zero w i the Lagrange multiplier must be

zero. Therefore, if we know which constraints held with equality at the solution,

we could treat the problem as a linear least-squares problem with linear equality

constraints. A technique, due to Cottle and Dantzig [5], exists for solving the

problem inthis way.

37

Bibliography

[11 Beale, E.M.L., "Numerical Methods", in Ngn~.inear Programming, J. Abadie (ed.).

John Wiley, New York, 1967; pp. 133-205.

[2] Bjorck, ~., "Iterative Refinement of Linear Least Squares Solutions II", BIT 8

(1968), pp. 8-30.

[3] and G. H. Golub, "Iterative Refinement of Linear Least Squares

Solutions by Householder Transformations", BIT 7 (1967), pp. 322-37.

[4] and V. Pereyra, "Solution of Vandermonde Systems of Equations",

Publicaion 70-02, Universidad Central de Venezuela, Caracas, Venezuela, 1970.

[5] Cottle, R. W. and @. B. Dantzig, "Complementary Pivot Theory of Mathematical

Programming", Mathematics of the Decision Sclences~ Part 1, G. B. Dantzig and

A. F. Veinott (eds.), American Mathematical Societ 2 (1968), pp. 115-136.

[6] Dantzig, G. B., R. P. Harvey, R. D. McKnight, and S. S. Smith, "Sparse Matrix

Techniques in Two Mathematical Programming Codes", Proceedinss of the S.ymposium

on Sparse Matrices and Their Appllcations, T. J. Watson Research Publications

RAI, no. 11707, 1969.

[7] Fletcher, R., "A Technique for Orthogonalization", J. Inst. Maths. Applics. 5

(1969), pp. 162-66.

[8] Forsythe, G. E., and G. H. Golub, "On the Stationary Values of a Second-Degree

Polynomial on the Unit Sphere", J. SIAM, 13 (1965), pp. 1050-68.

[9] and C. B. Moler, Computer Solution of Linear Algebraic Systems,

Prentice-Hall, Englewood Cliffs, New Jersey, 1967.

[10] Francis, J., "The QR Transformation. A Unitary Analogue to the LR Transforma-

tion," Comput. J. 4 (1961-62), pp. 265-71.

[11] golub, G. H., and C. Reinsch, "Singular Value Decomposition and Least Squares

Solutions", Numer. Math., 14(1970), pp. 403-20.

[12] and R. Underwood, "Stationary Values of the Ratio of Quadratic

Forms Subject to Linear Constraints", Technical Report No. CS 142, Computer

Science Department, Stanford University, 1969.

[13] Hanson, R. J., "Computing Quadratic Programming Problems: Linear Inequality

and Equality Constraints", Technical Memorandum No. 240, Jet Propulsion

38

Laboratory, Pasadena, California, 1970.

[14] and C. L. Lawson, "Extensions and Applications of the House-

holder Algorithm for Solving Linear Least Squares Problems", Math. Comp., 23

(1969), pp. 787-812.

[15] Householder, A.S., "Unitary Triangularization of a Nonsymmetric Matrix",

J. Assoc. Comp. Mach., 5 (1968), pp. 339-42.

[16] Lanozos, C., Linear Differential Operators. Van Nostrand, London, 1961.

Chapter 3 •

[17] Leringe, 0., and P. Wedln, "A Comparison Betweem Different Methods to Compute

a Vector x Which Minimizes JJAx - bH2 When Gx = h", Technical Report, Depart-

ment of Computer Sciences, Lund University, Sweden.

[18] Levenberg, K., "A Method for the solution of Certain Non-Linear Problems in

Least Squares", ~uart. Appl. Math., 2 (1944), pp. 164-68.

[19] Marquardt, D. W., "An Algorithm for Least-Squares Estimation of Non-Linear

Parameters", J. SIAM, 11 (1963), pp. 431-41.

[20] Meyer, R. R., "Theoretical and Computational Aspects of Nonlinear Regression",

P-181 9, Shell Development Company, Emeryville, California.

[21] Penrose, R., "A Generalized Inverse for Matrices", Proceedings of the

Cambridge Philosophical Society, 51 (1955), pp. 406-13.

[22] Peters, G., and J. H. Wilkinson, "Eigenvalues of Ax = kB x with Band Symmetric

A and B", Comput. J., 12 (1969), pp. 398-404.

[23] Powell, M.J.D., "Rank One Methods for Unconstrained Optimization", T. P. 372,

Atomic Energy Research Establishment, Harwell, England, (1969).

[24] Rosen, J. B., "Gradient Projection Method for Non-linear Programming. Part

I. Linear Constraints", J. SIAM, 8 (1960), pp. 181-217.

[25] Shanno, D. C. "Parameter Selection for Modified Newton Methods for Function

Minimization", J. SIAM, Numer. Anal., Ser. B,7 (1970).

[26] Stoer, J., "On the Numerical Solution of Constrained Least Squares Problems",

(private communication), 1970.

[27] Tewarson, R. P., "The Gaussian Elimination and Sparse Systems", Proceedings

of the Symposium on Sparse Matrices and Their Applications~ T. J. Watson

39

Research Publication RA1, no. 11707, 1969.

[28] Wilkinson, J. H., "Error Analysis of Direct Methods of Matrix Inversion",

J. Assoc. Comp. Mach., 8 (1961), pp. 281-330.

[29] "Error Analysis of Transformations Based on the Use of

Matrices of the Form I - 2ww H', in Error in Digital Computation, Vol. ii, L.

B. Rall (ed.), John Wiley and Sons, Inc., New York, 1965, pp. 77-101.

[30] The Algebraic Eigenvalue Problem, Clarendon Press, Oxford,

1 965.

[31] ZoutendiJk, G., Methods of Feasible Directions, Elsevier Publishing Company,

Amsterdam (1960), pp. 80-90.

[lecture notes in mathematics] symposium on the theory of numerical analysis volume 193 || matrix...

Documents