[lecture notes in mathematics] symposium on the theory of numerical analysis volume 193 || matrix...
TRANSCRIPT
Matrix Methods in Mathematical Programming
GENE GOLUB
Stanford University
22
I. Introduction
With the advent of modern computers, there has been a great development in
matrix algorithms. A major contributer to this advance is J. H. Wilkinson [30].
Simultaneously, a considerable growth has occurred in the field of mathematical
programming. However, in this field, until recently, very little analysis has been
carried out for the matrix algorithms involved.
In the following lectures, matrix algorithms will be developed which can be
efficiently applied in certain areas of mathematical programming and which give
rise to stable processes.
We consider problems of the following types:
maximize ~ (~) , where ~ = (x,, x,, .. Xn) T
subject to Ax= b
Gx ~h
where the objective function ~ (~) is linear or quadratic.
2. Linear Programming
The linear programming problem can be posed as follows:
T m~x~i,e ~ (~) = ~
subject to A~_ = b (2.1)
) 0 (2.2)
We assume that A is an m x n matrix, with m < n, which satisfies the Haar
condition (that is, every m x m submatrix of A is non-singular). The vector ~ is
said to be feasible if it satisfies the constraints (2.1) and (2.2).
Let I = lil, i2, .. iml be a set of m indices such that, on setting xj = O,
j $ I, we can solve the remaining m equations in (2.1) and obtain a solution such
that
xij > 0 , J = I, 2, .. m .
Thi8 vector x is said to be a basic feasible solution. It is well-known that
the vector ~ which maximizes ~ (~) = o T x is a basic feasible solution, and this
suggests a possible algorithm for obtaining the optimum solution, namely, examine
all possible basic feasible solutions.
23
Such a process is generally inefficient. A more systematic procedure, due to
Dantzig, is the SimylexAl~orithm. In this algorithm, a series of basic feasible
solutions is generated by changing one variable at a time in such a way that the
value of the objective function is increased at each step. There seems to be no
way of determining the rate of convergence of the simplex method; however, it works
well in practice.
The steps involved may be given as follows:
(i) Assume that we can determine a set of m indices I = liI , i,, .. iml such that
the corresponding x i are the non-zero variables in a basic feasible solution.
J Define the basis matrix
B = [ai , Ai2, .. aim ]
where the a are columns of A corresponding to the basic variables. --lj
(ii) Solve the system of equations:
B~=b
where ~.T= [Xil, Xi, ' .. Xim]
(iii) Solve the system of equations:
B T ^ W = C
where _~T__ [ci,, ci2' .. cim] are the coefficients of the basic variables in the
objective function.
(iv) Calculate
~T w] - T w , say. ( .cj - ~ ~. = Cr ~r - max
j £ I T
If c r - ~ w • 0 , then the optimum solution has been reached. Otherwise, a is to ~r
be introduced into the basis.
(v) Solve the system of equations:
Bt = - a --r
If t ~ 0 , k = I, 2, r k
bounded.
• . m , then this indicates that the optimum solution is un-
Otherwise determine the component s for which
x i x s = min - ~ trk 0
t r s 1 ~k~m t r k
24
Eliminate the column a i from the basis matrix and introduce column a r. s
This process is continued from step (ii) until an optimum solution is obtained (or
shown to be unbounded).
We have defined the complete algorithm explicitly, provided a termination rule,
and indicated how to detect an unbounded solution. We now show how the simplex
algorithm can be implemented in a stable numerical fashion.
~. A stable implementation of the simplex al6orithm
Throughout the algorithm, there are three systems of linear equations to be
solved at each iteration. These are:
B~ = b , m
BTw = c ,
Bt = -a --r -r
Assuming Gaussian elimination is used, this requires about m3/3 multiplica-
tions for each system. However, if it is assumed that the triangular factors of B
are available, then only O(m 2) multiplications are needed. An important considera-
tion is that only one column of B is changed in one iteration, and it seem, reasonable
to assume that the number of multiplications can be reduced if use is made of this.
We would hope to reduce the m3/3 multiplications to O(m 2) multiplications per step.
This is the basis of the classical simplex method. The disadvantage of this method
is that the pivoting strategy which is generally used does not take numerical
stability into consideration. We now show that it is possible to implement the
simplex algorithm in a more stable manner, the cost being that more storage is re-
quired.
Consider methods for the solution of a set of linear equations. It is well-
known that there exists a permutation matrix n such that
HB = LU
where L is a lower triangular matrix, and U is an upper triangular matrix.
If Gaussian elimination with partial (row) pivoting is used, then we proceed
as follows :
Choose a permutation matrix H, such that the maximum modulus element of the
25
first column of B becomes the (I, 1) - element of 1"] 1 B.
Define an elementary lower triangular matrix F k as
k ~ | -
r k = I ' ! - ! f
" i |
". ~ I
'LL I ' l , I ' | " ~ , J ".
Now~ can be chosen so that
P, HI B
has all elements below the diagonal in the first column set equal to zero.
Now choose 92 so that
92 r, 9, B
has the maximum modulus element in the second column in position (2, 2), and
choose r e so that
r= fl~ 1"t H2 B
has all elements below the diagonal in the second column set equal to zero. This
can be done without affecting the zeros already computed in the first column.
Continuing in this way we obtain:
rm- , ~ m - , . . . P 2 ~ , r, 9, B = U
where U is an upper triangular matrix.
Note that permuting the rows of the matrix B merely implies a re-ordering of
the right-hand-side elements. Thus, no actual permutation need be performed,
merely a record kept. Further any product of elementary lower triangular matrices
is a lower triangular matrix, as may easily be shown. Thus on the left-hand side
we have essentially a lower triangular matrix, and thus the required factorization.
The relevant elements of the successive matrices F k can be stored in the
lower triangle of B, in the space where zeros have been introduced. Thus the
method is economical in storage.
26
To return to the linear programming problem, we require to solve a system of
equations of the form
B ( 1 ) ~ = v ( 3 . ~ )
where B (i) and B (i-I) differ in only one column (although the columns may be re-
ordered)°
Consider the first iteration of the algorithm. Suppose that we have obtained
the factorization:
B (°) = S (°) U(o)
where the right-hand-side vector has been re-ordered to take account of the permuta-
tions.
The solution to (3 . i ) with i = 0 is obtained by computing
= (L~°)) -~ x
and solving the triangular system
v(O) = ~ ,
2 each of which requires m + 0 (m) multiplications.
Suppose that the column b (°) is eliminated from B (°) and the column g(O) is S O
introduced as the last column, then
BO) = [b(O) b(O) . b(O) bCo) ~(o)] L t • ~ 2 ' " ~ S " t • ~ S * 1 ' " "
0 0
Therefore,
( ~ ( o ) ) .1 BO) = HO) ,
where H (I) has the form:
/
{ <
27
Such a matrix is called an upper Hessenberg matrix. 0nly the last column need be
computed, as all others are available from the previous step. We require to apply
a sequence of transformations to restore the upper triangular form. It is clear
that we have a particularly simple case of the LU factorization procedure as
previously described, where r! I) is of the form: i
R I ' I i I #-~
• I k_Y ' I " I
11 1
,q~'/ I1 . J i "
I I 0
r~ I) =
only one element requiring to be calculated. On applying a sequence of transforma-
tion matrices and permutation matrices as before, we obtain
1) 1) . . r (1) H(1) = u (1) s s
o o
where U (I) is upper triangular.
(I) it is only necessary to compare two Note that in this case to obtain Hj
(I) and elements. Thus the storage required is very small: (m - So) multipliers gi
(m - So) bits to indicate whether or not interchanges are necessary.
All elements in the computation are bounded, and so we have good numerical
accuracy throughout. The whole procedure compares favourably with standard forms,
for example, the product form of the inverse where no account of numerical accuracy
is taken. Further this procedure requires fewer operations than the method which
uses the product form of the inverse. If we consider the steps involved, forward
and backward substitution with L (°) and U (i) require a total of m 2 multiplications
and the application of the remaining transformation in (L(i)) -I requires at most
i(m - I) multiplications. (If we assume that on the average the middle column of
the Basis matrix is eliminated, then this will be closer to (i/2) (m - I) ). Thus
a total of m 2 + i (m - I) multiplications are required to solve the system at each
28
stage, assuming an initial factorization is available. Note that if the matrix A
is sparse, then the algorithm can make use of this structure as is done in the
method using the product form of the inverse.
4" Iterative refinement of the.solution
Consider the set of equations
B~ = X
and suppose that ~ is a computed approximation to ~ . Let
-- ~+£
Therefore,
that is,
B(~ + 2) : v ,
Be_ -- v-B~
We can now solve for c very efficiently, since the LU decomposition of B is
available. This process can be repeated until ~ is obtained to the required accur-
acy. The algorithm can be outlined as follows:
(i) Compute ~j = ~ - B~_j
(ii) Solve B_cj = r -j
(iii) Compute ~j+1 = ~J + ~J
It is necessary for r to be computed in double precision and then rounded to --j
single precision. Note that step (ii) requires 0(m 2) operations, since the LU de-
composition of B is available. This procedure can be used in the following sections.
~. Householder Trian~ularization
Householder transformations have been widely discussed in the literature. In
this section we are concerned with their use in reducing a matrix A to upper-
triangular form, and in particular we wish to show how to update the decomposition
of A when its columns are changed one by one. This will open the way to implemen-
tation of efficient and stable algorithms for solving problems involving linear
constraints.
Householder transformations are symmetric orthogonal matrices of the form
Pk = I - k UkUk where u k is a vector and Ck = 2/( ). Their utility in this
29
context is due to the fact that for any non-zero vector 2 it is possible to choos~
u k in such a way that the transformed vector Pk a is zero except for its first
element. Householder [15] used this property to construct a sequence of transfor-
mations to reduce a matrix to upper-triangular form. In [29], Wilkinson describes
the process and his error analysis shows it to be very stable.
Given any A, we can construct a sequence of transformations such that A is
reduced to upper triangular form. Premultiplying by P annihilates (m - 1) O
elements in the first column. Similarly, premultiplying by PI eliminates (m - 2)
elements in the second column, and so on.
Therefore,
em-1 Pm-2 "'PI PoA = [ RO ] ' (5.1)
where R is an upper triangular matrix.
Since the product of orthogonal matrices is an orthogonal matrix, we can
write (5.1) as
QA = [ R] 0
A=QT[ R ] 0
The above process is close to the Gram-Schmidt process in that it produces
a set of orthogonal vectors spanning E . In addition, the Householder transforma- n
tion produces a complementary set of vectors which is often useful. Since this
process has been shown to be numerically stable, it does produce an orthogonal
matrix, in contrast to the Gram-Schmidt process.
If A = (~I ,...,~n) is an mxn matrix of rank r, then at the k-th stage of the
triangularization (k < r ) we have
where R k
A (k) PoA= = Pk-1Pk-2 "'"
0
is an upper-triangular matrix of order r.
T k
The next step is to compute
A.k+1.( ~ = Pk A'k" ( ~ where Pk is chosen to reduce the first column of T k to zero
except for the first component. This component becomes the last diagonal element
30
of ~+I and since its modulus is equal to the Euclidean length of the first column
of T k it should in general be maximized by a suitable interchange of the columns
of Sk . After r steps, T will be effectively zero (the length of each of its r
T k
col~Im=~ will be smaller than some tolerance) and the process stops.
Hence we conclude that if rank(A) = r then for some permutation matrix H the
Householder decomposition (or "QR decomposition") of A is
Q A ~ = Pr-1 Pr-2 "'" PO A =
r
O 0
where Q = Pr-1Pr-2 "'" PO is an m x m orthogonal matrix and R is upper-triangular
and non-singular.
We are now concerned with the manner in which Q should be stored and the
means by which Q, R, S may be updated if the columns of A are changed. We will
suppose that a column a is deleted from A and that a column a is added. It will ~p ~q
be clear what is to be done if only one or the other takes place.
Since the Householder transformations Pk are defined by the vectors u k the
usual method is to store the Uk'S in the area beneath R, with a few extra words of
memory being used to store the ~k'S and the diagonal elements of R. The product
Q~ for some vector ~ is then easily computed in the form Pr-1Pr-2 "'" PO ~ where,
T T for example, PO ~ = (I - ~0~O~0)~ = ~ - ~o(Uo~)Uo . The updating is best
accomplished as follows. The first p-1 columns of the new R are the same as before;
the other columns p through n are simply overwritten by columns ap+1, ..., an, aq
and transformed by the product Pp-1Pp-2 "'" PO to obtain a new
I (Sp_ I ~ I' then T is triangularized as usual.
\%1 ] p-1
This method allows Q to be kept in product form always, and there is no accumula-
tion of errors. Of course, if p = I the complete decomposition must be re-done
and since with m~ n the work is roughly proportional to (m-n/3)n 2 this can mean
a lot of work. But if p A n/2 on the average, then only about I/8 of the original
work must be repeated each updating.
31
Assume that we have a matrix A which is to be replaced by a matrix ~ formed
from A by eliminating column a and inserting a new vector g as the last column.
As in the simplex method, we can produce an updating procedure using Householder
transformations. If ~ is premultiplied by Q, the resulting matrix has upper
Qi = /
/
<
As before, this can be reduced to an upper triangular matrix in O(m 2) multiplica-
tions.
6. Projections
In optimization problems involving linear constraints it is often necessary
to compute the projections of some vector either into or orthogonal to the space
defined by a subset of the constraints (usually the current "basis"). In this
section we show how Householder transformations may be used to compute such pro-
jections. As we have shown, it is possible to update the Householder decomposi-
tion of a matrix when the number of columns in the matrix is changed, and thus we
will have an efficient and stable means of orthogonalizing vectors with respect to
basis sets whose component vectors are changing one by one.
Let the basis set of vectors a 1,a2,...,a n form the columns of an m x n
matrix A, and let S be the sub-space spanned by fail • We shall assume that the r
first r vectors are linearly independent and that rank(A) = r. In general,
m > n > r , although the following is true even if m < n •
Given an arbitrary vector z we wish to compute the projections
u = Pz , v = (I - P) z
for some projection matrix P , such that
Diagramatically, Hessenberg form as before.
32
a) z = u + v
(b) 2v = 0
(o) ~s r (i.e., 3~ ~uoh that ~ = ~)
(i.e., ATv (d) v is orthogonal to S r ~ = o)
One method is to write P as AA + where A + is the n x m generalized inverse of A,
and in [7~ Fletcher shows how A + may be updated upon changes of basis. In contrast,
the method based on Householder transformations does not deal with A + explicitly
but instead keeps AA + in factorized form and simply updates the orthogonal matrix
required to produce this form. Apart from being more stable and just as efficient,
the method has the added advantage that there are always two orthonormal sets of
vectors available, one spanning S and the other spanning its complement. r
As already shown, we can construct an m x n orthogona~ matrix Q such that
r n-r
QA = £i 0 S1
where R is an r x r upper-triangular matrix. Let
W = Qz =
I r
m-r
(6.~)
and define
~ ' X= ~2 (6.2)
Then it is easily verified that ~,~ are the required projections of ~, which is to
say they satisfy the above four properties. Also, the x in (c) is readily shown
to be
In effect, we are representing the projection matrices in the form
33
and
P Q C: r) = (z r o)Q ( 6 . ~ )
I-P =QT (im_rO ) (OI r)Q (6.A)
and we are computing ~ = P z, Z = (I - P)~ by means of (6.1), (6.2) • The first r
col,m~R of Q span S and the remaining m-r span its complement. Since Q and R may r
be updated accurately and efficiently if they are computed using Householder
transformations, we have as claimed the means of orthogonalizing vectors with re-
spect to varying bases.
As an example of the use of the projection (6.4), consider the problem of
finding the stationary values of xTAx subject to xTx = I and cTx = O, where A is a
real symmetric matrix of order n and C is an n x p matrix of rank r, with r ! P <~ n.
It is shown in [12] that if the usual Householder decomposition of C is
r n-r
Qc= (Ro OS )
th@n the problem is equivalent to that of finding the eigenvalues and eigenvectors
of the matrix PA , where
P = I-P = O O Q
0 In_ r
is the projection matrix in (6.4).
Note that, although PA is not symmetric, since P~ = P , then
PA = P2A
and further the eigenvalues of P2A are equal to the eigsnvaluee of the s~etric
matrix PAP. The dimensionality of the problem is not reduced; some of the eigen-
values will be zero.
~. Linear least-squares problem
The least-squares problem to be considered here is'
34
m£n l l b - A~_It 2
where we assume that the rank of A is n.
Since length is invariant under an orthogonal transformation we have
where QA =
lib - A x l l 2 = l l Q b - QA~_II "+ 2 2
[ 1{ ]. Let 0
Qb = c : [o_, ]. - - - - C 2 m- n
Then,
2, 1{] x U' = Ha_,- ~_H" + lla.il" " [~_,] - [o - , ,
and the solution to the least-squares problem is given by
= 1{ -1 c,
Thus it is easy to solve the least-squares problem using orthogonal transformations.
Alternatively, the least-squares problem can be solved by constructing the
normal equations
A x = A D
However these are well-known to be ill-conditioned.
Nevertheless the normal equations can be used in the following way.
Let the residual vector r be defined by:
r = b - A ~
Then,
ATr = ATb - ATA~ = 0
These equations can be written:
[IA A]O Ir> (:Jx+ Thus,
0 I
Multiplying out:
(1{7o) o
IAT AIi TO IOii IO
C CO/o
(r) X
(7.~)
:I(:)
35
where ~ = QE and S = Q~ .
This system can easily be solved for ~ and ~. The method of iterative refine-
ment may he applied to obtain a very accurate solution.
This method has been analysed by BJhrck [2].
8. Least-squares problem with linear constraints
Here we consider the problem
minimize ~ - A~_~ 2 2
subject to G~ = ~ .
Using Lagrange multipliers ~ , we may incorporate the constraints into
equation (7.1) and obtain
0 I A
G T A T 0 1 b
0
The methods of the previous sections can be applied to obtain the solution of this
system of equations, without actually constructing the above matrix. The problem
simplifies and a very accurate solution may be obtained.
Now we consider the problem
minimize llb - A~_~ 2 2
subject to Gx ~> h.
Such a problem might arise in the following manner. Suppose we wish to approximate
given aata by the polynomial
y(t) = ~t ~ + @t 2 + yt +
such that y(t) is convex. This implies
y(')(t) = 6at + 2~ ) 0 .
Thus, we require
6 a t i + 2~ ) 0
where t. are the data points, (This aces not necessarily guarantee that the poly- l
hernial will be convex throughout the interval. ) Introduce slack variables w such
that Gx - w = h
where w ~ _O .
36
Introducing Lagrange multipliers as before, we may write the system as:
i O 0 G -I 0 I A 0
G T A T 0 0
r
x
w
h
b
0
At the solution, we must have
T • _~o, w~o, _z_w=0.
This implies that when a Lagrange multiplier is non-zero then the corresponding
constraint holds with equality.
Conversely, corresponding to a non-zero w i the Lagrange multiplier must be
zero. Therefore, if we know which constraints held with equality at the solution,
we could treat the problem as a linear least-squares problem with linear equality
constraints. A technique, due to Cottle and Dantzig [5], exists for solving the
problem inthis way.
37
Bibliography
[11 Beale, E.M.L., "Numerical Methods", in Ngn~.inear Programming, J. Abadie (ed.).
John Wiley, New York, 1967; pp. 133-205.
[2] Bjorck, ~., "Iterative Refinement of Linear Least Squares Solutions II", BIT 8
(1968), pp. 8-30.
[3] and G. H. Golub, "Iterative Refinement of Linear Least Squares
Solutions by Householder Transformations", BIT 7 (1967), pp. 322-37.
[4] and V. Pereyra, "Solution of Vandermonde Systems of Equations",
Publicaion 70-02, Universidad Central de Venezuela, Caracas, Venezuela, 1970.
[5] Cottle, R. W. and @. B. Dantzig, "Complementary Pivot Theory of Mathematical
Programming", Mathematics of the Decision Sclences~ Part 1, G. B. Dantzig and
A. F. Veinott (eds.), American Mathematical Societ 2 (1968), pp. 115-136.
[6] Dantzig, G. B., R. P. Harvey, R. D. McKnight, and S. S. Smith, "Sparse Matrix
Techniques in Two Mathematical Programming Codes", Proceedinss of the S.ymposium
on Sparse Matrices and Their Appllcations, T. J. Watson Research Publications
RAI, no. 11707, 1969.
[7] Fletcher, R., "A Technique for Orthogonalization", J. Inst. Maths. Applics. 5
(1969), pp. 162-66.
[8] Forsythe, G. E., and G. H. Golub, "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere", J. SIAM, 13 (1965), pp. 1050-68.
[9] and C. B. Moler, Computer Solution of Linear Algebraic Systems,
Prentice-Hall, Englewood Cliffs, New Jersey, 1967.
[10] Francis, J., "The QR Transformation. A Unitary Analogue to the LR Transforma-
tion," Comput. J. 4 (1961-62), pp. 265-71.
[11] golub, G. H., and C. Reinsch, "Singular Value Decomposition and Least Squares
Solutions", Numer. Math., 14(1970), pp. 403-20.
[12] and R. Underwood, "Stationary Values of the Ratio of Quadratic
Forms Subject to Linear Constraints", Technical Report No. CS 142, Computer
Science Department, Stanford University, 1969.
[13] Hanson, R. J., "Computing Quadratic Programming Problems: Linear Inequality
and Equality Constraints", Technical Memorandum No. 240, Jet Propulsion
38
Laboratory, Pasadena, California, 1970.
[14] and C. L. Lawson, "Extensions and Applications of the House-
holder Algorithm for Solving Linear Least Squares Problems", Math. Comp., 23
(1969), pp. 787-812.
[15] Householder, A.S., "Unitary Triangularization of a Nonsymmetric Matrix",
J. Assoc. Comp. Mach., 5 (1968), pp. 339-42.
[16] Lanozos, C., Linear Differential Operators. Van Nostrand, London, 1961.
Chapter 3 •
[17] Leringe, 0., and P. Wedln, "A Comparison Betweem Different Methods to Compute
a Vector x Which Minimizes JJAx - bH2 When Gx = h", Technical Report, Depart-
ment of Computer Sciences, Lund University, Sweden.
[18] Levenberg, K., "A Method for the solution of Certain Non-Linear Problems in
Least Squares", ~uart. Appl. Math., 2 (1944), pp. 164-68.
[19] Marquardt, D. W., "An Algorithm for Least-Squares Estimation of Non-Linear
Parameters", J. SIAM, 11 (1963), pp. 431-41.
[20] Meyer, R. R., "Theoretical and Computational Aspects of Nonlinear Regression",
P-181 9, Shell Development Company, Emeryville, California.
[21] Penrose, R., "A Generalized Inverse for Matrices", Proceedings of the
Cambridge Philosophical Society, 51 (1955), pp. 406-13.
[22] Peters, G., and J. H. Wilkinson, "Eigenvalues of Ax = kB x with Band Symmetric
A and B", Comput. J., 12 (1969), pp. 398-404.
[23] Powell, M.J.D., "Rank One Methods for Unconstrained Optimization", T. P. 372,
Atomic Energy Research Establishment, Harwell, England, (1969).
[24] Rosen, J. B., "Gradient Projection Method for Non-linear Programming. Part
I. Linear Constraints", J. SIAM, 8 (1960), pp. 181-217.
[25] Shanno, D. C. "Parameter Selection for Modified Newton Methods for Function
Minimization", J. SIAM, Numer. Anal., Ser. B,7 (1970).
[26] Stoer, J., "On the Numerical Solution of Constrained Least Squares Problems",
(private communication), 1970.
[27] Tewarson, R. P., "The Gaussian Elimination and Sparse Systems", Proceedings
of the Symposium on Sparse Matrices and Their Applications~ T. J. Watson
39
Research Publication RA1, no. 11707, 1969.
[28] Wilkinson, J. H., "Error Analysis of Direct Methods of Matrix Inversion",
J. Assoc. Comp. Mach., 8 (1961), pp. 281-330.
[29] "Error Analysis of Transformations Based on the Use of
Matrices of the Form I - 2ww H', in Error in Digital Computation, Vol. ii, L.
B. Rall (ed.), John Wiley and Sons, Inc., New York, 1965, pp. 77-101.
[30] The Algebraic Eigenvalue Problem, Clarendon Press, Oxford,
1 965.
[31] ZoutendiJk, G., Methods of Feasible Directions, Elsevier Publishing Company,
Amsterdam (1960), pp. 80-90.