o(x/-ffl)users.iems.northwestern.edu/~4er/mehrotranomination/ref2... · 2009. 6. 23. · 576 sanjay...

SIAM J. OPTIMIZATIONVol. 2, No. 4, pp. 575-601, November 1992

(C) 1992 Society for Industrial and Applied Mathematics003

ON THE IMPLEMENTATION OF A PRIMAL-DUALINTERIOR POINT METHOD*

SANJAY MEHROTRAt

Abstract. This paper gives an approach to implementing a second-order primal-dual interior pointmethod. It uses a Taylor polynomial of second order to approximate a primal-dual trajectory. The computa-tions for the second derivative are combined with the computations for the centering direction. Computationsin this approach do not require that primal and dual solutions be feasible. Expressions are given to computeall the higher-order derivatives of the trajectory of interest. The implementation ensures that a suitablepotential function is reduced by a constant amount at each iteration.

There are several salient features of this approach. An adaptive heuristic for estimating the centeringparameter is given. The approach used to compute the step length is also adaptive. A new practical approachto compute the starting point is given. This approach treats primal and dual problems symmetrically.

Computational results on a subset of problems available from netlib are given. On mutually testedproblems the results show that the proposed method requires approximately 40 percent fewer iterationsthan the implementation proposed in Lustig, Marsten, and Shanno Tech. Rep. TR J-89-11, Georgia Inst.of Technology, Atlanta, 1989]. It requires approximately 50 percent fewer iterations than the dual affinescaling method in Adler, Karmarkar, Resende, and Veiga [Math. Programming, 44 (1989), pp. 297-336],and 35 percent fewer iterations than the second-order dual affine scaling method in the same paper. Thenew approach for estimating the centering parameter and finding the step length and the starting point havecontributed to the reduction in the number of iterations. However, the contribution due to the use of secondderivative is most significant.

On the tested problems, on the average the implementation shown was found to be approximately twotimes faster than OB1 (version 02/90) described in Lustig, Marsten, and Shanno and 2.5 times faster thanMINOS 5.3 described in Murtagh and Saunders [Tech. Rep. SOL 83-20, Dept. of Operations Research,Stanford Univ., Stanford, CA, 1983].

Key words, linear programming, interior point methods, primal-dual methods, power series methods,predictor-corrector methods

AMS(MOS) subject classifications. 90C05, 90C06, 90C20, 49M15, 49M35

1. Introduction. This paper considers interior point algorithms for simultaneouslysolving the primal linear program"

minimize

(P) s.t.

and its dual

maximize

(D) s.t.

cxAx b,

x>=O,

b TTl"

ATTr + S C,

s>=O,

where c, x, s 9]", 7r, b 91", and A 9]"n. It is assumed that A has full row rank.This can be ensured by removing the linearly dependent rows in the beginning. The

* Received by the editors June 18, 1990; accepted for publication (in revised form) August 30, 1991.This paper was presented at the Second Asilomar Workshop on Progress in Mathematical Programming,Monterey, CA, February 5-7, 1990. This research was supported in part by National Science Foundationresearch initiation grant CCR-8810107 and by a grant from the GTE Laboratories.

f Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston,Illinois 60208 ([email protected]).

575

576 SANJAY MEHROTRA

primal-dual methods, which use solutions of both (P) and (D) in the scaling matrix,are of primary interest.

The primal-dual algorithms have their roots in Megiddo [21]. These were furtherdeveloped and analyzed by Kojima, Mizuno, and Yoshise [15] and Monteiro andAdler [27]. They showed that the central trajectory can be followed to the optimalsolution in O(x/-ffL) iterations by taking "short steps." Kojima, Mizuno, and Yoshise[16] showed that the primal-dual potential function [31], which is a variant of Kar-markar’s potential function [13], can also be reduced by a constant amount at eachiteration and therefore, they developed a primal-dual large step potential reductionalgorithm.

McShane, Monma, and Shanno [20] were the first to develop an implementationof this method. They found it to be a viable alternative to the then popular dual affinescaling method [1], [26] for solving large sparse problems. They also found that thismethod typically takes fewer iterations than the dual affine scaling method. However,it was found to be only competitive with the dual affine scaling method because ofthe additional computations that their implementation had to perform. Thisimplementation created some artificial problems (by adding artificial variables/con-straints to the original problem) and maintained primal and dual feasible solutions ofthese problems. Further developments on this implementation were reported in Choi,Monma, and Shanno [4].

Lustig, Marsten, and Shanno [18] implemented a variant of the primal-dualmethod which is based on an earlier work of Lustig [17]. This method is developedby considering the Newton direction on the optimality conditions for thelogarithmic barrier problem. An important feature of their approach is that it didnot explicitly require that feasible solutions for the primal or the dual problem beavailable. They showed that the resulting direction is a particular combination ofprimal-dual affine scaling direction, feasibility direction, and centering direction. Insupport of their method they reported success in solving all the problems in the netlib[7] test set.

This paper builds on the work of Lustig, Marsten, and Shanno [18]. In doing soit makes use of the work of Monteiro, Adler, and Resende [28] and Karmarkar,Lagarias, Slutsman, and Wang [14]. The discussion in this paper assumes that directmethods are preferred over iterative methods for solving linear equations arising ateach iteration ofthe algorithm. In our view, the following accomplishments are reportedin this paper:

It gives an algorithm and describes its implementation by using first and secondderivatives of the primal-dual affine scaling trajectory Taylor polynomial and byeffectively combining the second derivative with a centering direction.

A comparison with the results reported in the literature (on mutually testedproblems) shows that the method developed in this paper takes approximately 50percent fewer iterations than the dual affine scaling method as implemented by Adler,Karmarkar, Resende, and Veiga 1 ], 40 percent fewer iterations than the primal-dualmethod implemented in Lustig, Marsten, and Shanno [18], and 55 percent feweriterations than the logarithmic barrier function method implemented in Gill, Murray,and Saunders [8]. It requires 35 percent fewer iterations than the second-order dualaffine scaling method implemented in Adler, Karmarkar, Resende, and Veiga 1 and20 percent fewer iterations than the "optimal three-dimensional method" implementedby Domich, Boggs, Donaldson, and Witzgall [5].

An efficient preliminary implementation of the proposed approach wasdeveloped. On average, it was found to be two times faster than OB1 (version 02/1990)[18]. On average, it was also found to be 2.5 times faster than MINOS 5.3.

PRIMAL-DUAL INTERIOR POINT METHOD 577

While developing our implementation we ensure that a suitable potential func-tion is reduced by a constant amount. This is accomplished by taking a two tierapproach. Most ofthe work is performed at the first level, which uses extensive heuristicarguments. The second level ensures the robustness of the implementation.

It gives expressions for computing first and higher derivatives of a primal-dualtrajectory. This trajectory starts from any positive point and goes to the optimum.

It gives an adaptive approach to computing the centering parameter.It gives a modified heuristic for computing step length at each iteration. The

approach allows us to adaptively take steps much closer to the boundary.It gives an approach to generating primal and dual starting points, which treat

these problems symmetrically.We find it convenient to outline the proposed approach first. This is done in the

next section. The organization of this paper is given in that section. The followingnotation and terminology is used throughout this paper.

Notation and terminology, xk, 7r k, and s k represent the estimate of solutions of(P) and (D) at the beginning of iteration k. Xk and Sk are used to represent diagonalmatrices whose elements are xk,xk ,xk and s k sk k

1, .,Sn, respectively, sGk--Axk- b, k ATTrk+ Sk- C, x, and G are referred to as error vectors. D2 representsmatrix (sk)-ax k. e is used to represent a vector of all ones. ei represents column ofan identity matrix. is used to represent the Euclidean norm of a vector.

The term search direction in primal space is used for a direction px, which isconstructed from a combination of directions (pl,p2). The term primal blockingvariable is used for a variable that will first become negative when moving in a direction.The term step factor represents the fraction of step length that makes the blockingvariable zero. Similar terminology is used for directions in the dual space.

Central trajectory is the set of feasible points in (P) and (D) satisfying xi(lz)si(lz)/z, for i= 1,..., n. In all references to central trajectory we assume that x(/x), s()exists for all

2. Implementation of an interior point method. This section outlines the approachwe take to implement an interior point method. We do this to provide a completepicture of this paper and to fix certain additional notations used throughout. Varioussteps in the development of this implementation are discussed in more detail in 3-7.The procedure is outlined in Exhibit 2.1 and it is called AIPM (an interior pointmethod). We now discuss the procedure.

Procedure AIPMInput: Let x> 0 and s> 0, 7r be the given starting points.For k 0, 1,... until a stopping criterion is satisfied do:

Step 0

ks := ATTrk + sk c,

:k := AXk b,

D2:= sk-’x k.Step 1

c find the first derivative of primal-dual affine scaling trajectory.

pl :--(AD2AT)-I(b-AD2k),psl := k Arpl,

p 1 := xk D2p 1.

EXHIBIT 2.1. A pseudo-code for implementing a second-order primal-dual method.

578 SANJAY MEHROTRA

Step 2c compute centering parameter /k.

CALL CENPAR (X k, pxl, S k, ps1,/zk).

C compute the second derivative of primal-dual trajectory with the centering direction.Step 3

vi:=-2* ((pxl),. (psi)i- )/sk for i= 1, 2... n,

p=2:=(AD2AT)-IAv,

ps2 := ATp=2,

p := V D2p2.

c construct a Taylor polynomial and find maximum steps (e, e,.) using this polynomial.Step 4

CALL SFSOP (xk, pl, p2, ex, s k, psl, ps2, e).

c construct a search direction.Step 5

p := e * p1- .5 * e 2 * ps2,

p := e * p 1 -.5 * e * p2,

px := ex * p1- .5 * e * p2.

c compute step factors (fx, f).Step 6

CALL GTSF (xk, p, s k, p, f,f ).

c generate trial points.Step 7

:= x -fx * p,

:= s k -f * p,

:= rk -f * p,.

c test if the trial point is acceptable.Step 8

If an appropriate potential function is reduced, then

Xk+l :___ ,

Sk+l ;--- y,k+l

7/" :--

elseperform a line search/if necessary computeadditional vectors and ensure reduction in the potential function.

endif

EXHIBIT 2.1 (continued).


The approach used to generate a starting point is discussed in 7.Given xk, rk, and s k, Step 0 computes error vectors :k and :k representing the

amount by which primal and dual constraints are violated. D2 has the primal-dualscaling matrix.

Step 1 computes direction pxl in primal and pl, psl in dual spaces. Thesedirections are tangent to a primal-dual trajectory. This trajectory is discussed in 4.Expressions to compute all derivatives of this trajectory at a point are also developedin 4.

The primal and dual directions computed at Step 1 are used in procedure CENPARto estimate the centering parameter /z k. Our approach to estimating the centeringparameter is given in Exhibit 5.1. This approach is discussed in 5.

Step 3 computes the second derivative of the primal-dual trajectory and thecentering direction. These directions could be computed separately. However, in thecurrent implementation we prefer to combine their computation in order to save aforward and a back solve. We use the tangent direction and the direction in Step 3 toconstruct a Taylor polynomial and to find a maximum step to the boundary (in primaland dual spaces separately) using this polynomial. This is done in Procedure SFSOPgiven in Exhibit 4.1. The computations performed in this procedure are also discussedin 4.

In Step 5 we use the maximum step in a Taylor polynomial to generate searchdirections px, p, and ps. In Procedure GTSF (Exhibit 6.1) we compute a fraction(f,f) of the total step to the boundary in the search direction. This is discussedfurther in 6. Using the search directions and the step factors, trial point , ,, g isgenerated in primal and dual spaces.

Step 8 ensures the robustness of the overall procedure. It is loosely defined here.It depends on the choice of the function used to measure the progress of the algorithmand the best possible theoretical results that could be proved for this function. Thepotential function we used to ensure the progress is developed in the next section ( 3),and our motivations for using it are discussed there.

If the potential function is not reduced by the desired amount at the trial points,we may perform a line search and, if necessary, compute additional directions to ensurea reduction in this function. This actually happened for the potential function wediscuss in the next section. If this happens, we generate an additional three trial pointsby using e, := es := min(e,, e); e, := e,, e := 0; and e := 0, e := es in Step 5 to computep, p=, p. The potential function was always reduced by the desired amount at one ofthe new trial points. Therefore, on the tested problems, additional vectors were nevercomputed and explicit line searches were never performed.

3. A potential function. In our view the interior point methods generate one ormore interesting search directions at each iteration and effectively combine thesedirections to ensure that sufficient progress in a suitable convergence function is made.Various proposed methods differ in the directions they compute, in how they combinethese directions (implicitly or explicitly), and in the convergence function they use tomeasure the progress [12]. Unfortunately, to our knowledge, the current theoreticalunderstanding of these methods has not reached a point where a clear superiority ofone method is established. Hence, practical implementations [1], [5], [18], [19], [20],[22], [26] rely on heuristic arguments and empirical evidence obtained from performingexperiments on a set of real problems.

Use of a suitable potential function is frequently ignored while developing fastimplementations. In our experience, an important reason, among others, is that the

580 SANJAY MEHROTRA

cost of performing line searches in one- or higher-dimensional subspaces is significanton sparse problems, and it is frequently not justified by the return.

In our opinion use of heuristic arguments is justified, but not at the cost of therobustness ofthe solution procedures. Hence, even though in this paper several differentheuristics are proposed and their use justified solely on the basis of empirical evidence,in the actual implementation we recommend the use of a potential function.

We now develop the potential function that was used to measure progress in ourimplementation. We find it instructive to go through some construction to motivatethis function. Some steps used in this construction appeared in Karmarkar [13] andothers in Goldfarb and Mehrotra [9] and Todd and Ye [31].

SOLet x> 0, 7r, > 0, be any given point. Let :o Axo b and :o ATTr+ So- C.

In order to solve (P) and (D), it is enough to find an optimal solution of

minimize

Sot.

(3.1)

A

Ax A b,

arTr-I s As c,

c rx- b r"a" + (b rTr- crx) O,

x,&, _->0, i=l,2,...,n.

(x, 7r, s, 1) is a feasible interior solution of (3.1). Let Z be a matrix whose columnsare the basis for the null space of A. Multiplying the second set of equations in (3.1)with JAr" Z]r and solving for the free variables 7r results in

(3.2) 7r (AAT)-I(Ac As + AAsC);

therefore, solving (3.1) is the same as:

minimize A

s.t. Ax-A b,

PDO) ZTs AzTO ZTc,cTx + b T(AAT)-As +A bT(AAr)-Ac,

Xi, Si, A O,

where s% (b TTr-- CX- bT(AAr)-A(). Consider the potential function

(3.3) F(x, s, A)= p In A- L In xisii=1

for p--2n+x/2n+ 1. In the Appendix we show that F(x, s, A) can be reduced by aconstant amount (.25) at any feasible solution of (PDO).

Let k=((b--axk) r, (c--sk)rz, br(aar)-la(c-sk)-crxk) r. Note that :k is A k

times the last column in (PDO). If xk, 7r k, s k, A k and xk+, rk+l, sk+, Ak+ are feasiblesolutions of (PDO), then

/k+lF(xk+l, s k+l, A k+l) F(xk, s k, A k) p In --- In-

i=1

=p In IlQfl+ll]- lnx/k+ls/k+l


for any nonsingular matrix Q )(n+l)(n+l). An important consequence of this observa-tion is that it ensures that the potential function

(3.4) E (x, s, :, Q)= p In QII- In x,s,i-----1

can be reduced by a constant amount at each iteration. We use the following potentialfunction

E(x, s, )-- p In II(’x, ’sU, ,)11- In x,s,,il

where K, and Ks are some prespecified constants. The potential function (3.5) is usedfor the following reasons: (i) We think that a potential function of the form (3.5) issuperior for developing implementations because it allows for numerical errors [10].(ii) It is easily computable without having to know Z. (iii) It allows us to separatelyupdate primal and dual solutions and the corresponding error vectors. (iv) There isno unknown that has to be determined during the algorithm. (v) Finally, it is possibleto compute directions which ensure that (3.3), and therefore (3.5), is reduced by aconstant amount.

The potential function (3.5) is, however, dependent on the scaling of rows (ingeneral, the choice of Q in (3.4)). Because of this, a search direction that may beacceptable while using one scaling matrix may become unacceptable for a differentchoice. However, in our implementation we use it to our advantage. We think that theconstruction of directions p,,1, p=l, psl and px2, p=2, ps2 discussed in the next sectionis inherently biased towards finding (nearly) feasible solutions first. The values of xand s are chosen so that they emphasize primal and dual feasibility over the feasibilityof the last equality constraint in (PDO). As a consequence of this, search directionsthat reduce ,, and/or s significantly, and do not reduce (or possibly increase) theerror in the last equality constraint of (3.1) become acceptable.

x 100 maxi {s} and s 100 maxi {x} were used for all the problems in ourimplementation. The construction of x and so is described in 7.

A reduction by constant amount in (3.5) at each iteration ensures convergence toan optimal solution provided that i"--1 In xis remain bounded. On the other hand, if(3.5) cannot be reduced by a constant amount at some iteration, then either (P) or(D) or both do not have a feasible solution. We may introduce a constraint providingan upper bound on x and s if we detect (through some tests) that the method is notconverging.

4. Derivatives of a primal-dual trajectory. This section provides motivation forusing directions px 1, pl, Ps 1, p,,2, p2, ps2 in Procedure AIPM. It was mentioned thatthese directions use first and second derivative information of a primal-dual trajectoryat a given point. This section defines the trajectory of interest and also shows how tocompute all of its derivatives at a given point. While we used the potential function(3.5) to measure the progress, derivatives of the primal-dual trajectory being consideredare used because they are easily computed and found to be effective in practice.

The results in Monteiro, Adler, and Resende [28] are used frequently to developthese expressions. Monteiro, Adler, and Resende [28] assume that feasible solutionsare available. We do not assume this here. The expressions are given in the contextof linear programming problems. Extensions to convex quadratic programming arestraightforward.

582 SANJAY MEHROTRA

Assume that xk> 0, "/l"k, sk> 0 is the current point. Consider the following systemof nonlinear equations:

X()s()=Xs,Ax(a b + asc,

(4.1)aTTr(a)+ s(a)= c+

x(,)_->o, s(,)_->o

for a [0, 1]. Let w(a)=(x(a), rr(a),s(a)) represent the solutions of (4.1) for agiven a.

PROPOSIa’ION 4.1. If the system of equations (4.1) has a solution for a =0, then ithas a solution for all re [0, 1 ]. Furthermore, the solution is unique for a (0, 1 ].

Proof For a (0, 1 (4.1) gives the optimality conditions for the weighted logarith-mic barrier problems

minimize B(x, a)=- c + olks Tx Ol xki sk In x,i=1

(P) s.t. Ax b + ak

and

x>0,

maximize (b + a)rTr + xsi In sii=1

(D) s.t. AT’It’+ S C+,s>0.

Let x(0), (0), s(0) represent a solution of (4.1) for a 0. For a fixed a e [0, 1],Y() (1-)x(0)+x is a feasible solution for (P) and (a) (1-a)(0)+,g(a)=(1-a)s(O)+s is a feasible solution for (D). If the feasible set of (P) isbounded, then obviously (P) has a solution.

We now consider the case when the feasible set of (P) is unbounded. Since (a),g(a) is a feasible solution for (D), it can be shown that the set {dlAd =0, d0,d O, (c + a)Td O} is empty. Hence, the set Pt {x[Ax b + a, x > O, (c +a)rx t} is bounded for all values of < m. Fuhermore, since the feasible regionof (P) is nonempty and unbounded, Pt is nonempty for all values of > t*, where t*is the minimum value of (c+ a)rx subject to Ax b + a, x O. Clearly, B(x, a)is bounded over Pt for all values of t. Now, to complete the proof for the existenceof the solution, note that in B(x,a), (C+a)Tx increases linearly in t, whilemax =, (xs) In xg subject to {x[Ax b + a, x > O, (c + a)TX t} increases onlylogarithmically in t.

The proof for the uniqueness of solution follows from the strict convexity ofB(x,a).

Let dw(a)/da be the jth derivative of (a). It is now shown that dJ(a)/dacan be computed for all j at a 1. Differentiating (4.1) for the first time gives

S()dx() +X()da

(4.2) A

ds() Xksk,da

dx() x,da

ds(a) kda


The solution of (4.2) at ot 1 is given by

dot_(ADAT)--1( b AD k),

(4.3)ds(1)dot ks _ATd’a’(1)

dot

dx(1)dot

xk D2 ds(1)dot

Further differentiating (4.2) gives

() dlxi(1) d(J-l)si(11=o dot dot(j-l)

=0, j>=2,

(4.4) AdJX(1)

i=l,...,n,

ds(1)ATd;r(1)_

dot dot

From (4.4) it is clear that dJw(ot)/dot can be computed recursively. The derivativescan be computed explicitly from

(4.5)

dJTr(1) _(AXS-,dot

AT )-’ ASk-’u,

dis(l) ATdJr(1)da dot

d Jx(1) sk-’u -ll- sk-’xk dJs(1)dot dot

)( )ui=-jdx(1) d-)s(1)dot dot (j-)

i, i=l,...,n; j>--2.

The recursion (4.5) is the same as the recursion given in Monteiro, Adler, andResende [28] and Karmarkar et al. [14] (see also Megiddo [21] and Bayer and Lagarias[2]). The derivatives resulting from the computations would be the same if :k 0 and

k 0 is assumed, and in the latter paper if no centering and reparameterization is done.The point w(1 e) for 1 > e > 0 can be approximated by using the rth-order Taylor

polynomial

(4.6) w((1-e), r) -= w(1)/ i (-e)J dJw(1)j=l j! dot

The Taylor polynomial is considered for the following two reasons.(1) In the special case Monteiro, Adler, and Resende [28] established powerful

results (near the central path) for this approximation.(2) Computational results indicate that near the optimal solution the first- and

second-order approximations result in nearly unit steps. Our results also indicate that,asymptotically, w(ot) is well represented by the Taylor polynomial of a lower order.In this context Megiddo [21] has argued that for problems with unique optimal solution,if we start close to an optimal solution, the primal-dual paths take us approximatelyin a straight line to the optimal solution. Most of the tested problems do not satisfythis assumption, however they still show this property.

584 SANJAY MEHROTRA

In addition to other things, practical implementations that compute more thanone direction must offset the cost of doing extra work. Adler et al. [1] were the firstto show that in the dual affine scaling method the information from a second derivativecan be used to significantly reduce the number of iterations. However, on problemsin the netlib test set they found that reduction in the number of iterations did notalways translate into reduction in cpu time on sparse problems. Computational resultswere also given in Karmarkar et al. [14] on a small set of "representative problems"using methods implemented in the AT&T KORBX system [3].

This paper restricts itself to using the second-order Taylor polynomial. In fact,we compute only two directions. The tangent direction dJw(c)/da is computed atStep 1 of Procedure AIPM. Step 3 combines the computation of a second derivativewith that of a centering direction. This saves a forward and a back solve. We mustcompute two directions at each iteration in order to use the adaptive approach forcomputing the centering parameter ( 5).

Our strategy of combining the computations for second derivative and the centeringdirection seem to work in practice for the following reasons. (i) The performance ofinterior point methods in practice weakly depends on the choice of centering parameter.(ii) If we view the computations in constructing the Taylor polynomial as that offinding a search direction, then in practice it appears that a wide range of e can beused without adversely affecting the performance of the implementation. To illustratethis, we would like the reader to compare the iteration counts reported in Mehrotra[24], [25] for a predictor-corrector method with those in Table 8.2. The predictor-corrector method results if we take e 1 at each iteration.

We find that taking different steps in primal and dual spaces generally results insuperior performance. This is similar to the experience of Choi, Monma, and Shanno[4] for their method. We construct different polynomials

(4.7) 2x(e2, 2) -= xk- exPxl + expx2,

(4.8) s( e, 2)-= sk- epsl + e2p2

in primal and dual spaces. The computations for ex and es that use (4.7)-(4.8) aredescribed in Procedure SFSOP (step from second-order polynomial) of Exhibit 4.1.A procedure that finds the root of a quadratic equation is used to implement SFSOP.

Procedure SFSOP (Xk, pxl, px2, ex, s k, psl, p2, es)Find maximum 0_-< ex--< 1 such that X(ex, 2) is feasible.Find maximum 0=< e-< 1 such that s(e, 2) is feasible.

EXHIBIT 4.1. Computations for step size using the Taylor polynomial.

Before concluding this section, we point out that if there are reasons to believethat at the current iterate it is better to target a solution satisfying XS W for somepositive diagonal matrix W, then expressions for derivatives of a trajectory taking usto such a point can be obtained in a similar manner. The only difference would be toreplace "ogXks k’’ in (4.1) with "aXksk+(1--a)We. In particular, we may use theHeuristic CENPAR to compute the centering parameter (given in the next section) inorder to decide a target point on the central path, then go back and find desiredderivatives of a trajectory going to this point.


5. Centering. In 4 expressions were developed to construct a Taylor polynomialat a given point in order to approximate a path going to an optimal solution. Obviously,the performance of the algorithm depends to a great extent on how well a "small-order"Taylor polynomial approximates this path at the current point, and on the domain inwhich the Taylor polynomial results in good approximations.

The results in Monteiro, Adler, and Resende [28] and the convergence results oflarge step polynomial time algorithms proved by Freund [6], Gonzaga and Todd [11],and Ye [32] implicitly or explicitly use the properties of the central path. The projectedgradient ofthe potential function used for analysis in these papers encourages centering.On the other hand, it is not clear if the central path (with equal weights) is the bestpath to follow, particularly since it is affected by the presence of redundant constraints[30]. Furthermore, the points on (or near) the central path are only intermediate tosolving the linear programming problem. It is only the limit point on this path that isof interest to us.

In view of this, we make our implementation weakly dependent on centering. Thecentering direction is obtained by solving the equations

jr txk(AD2Ar)-lASk-’e, s AT, x p,ksk-’e D2s/x

k is called the centering parameter. It was mentioned in 4 that the computationfor the centering direction is combined with computations for the second derivativeto save an extra forward and backward solve.

Heuristic CENPAR given in Exhibit 5.1 was used to compute/xk. In the descriptionof this heuristic we assume that the direction tangent to the primal-dual affine scalingtrajectory has been computed. The heuristic is adaptive. It attempts to generate a valueof tx k, depending on the progress that could be made by moving in the tangent direction.

Heuristic CENPAR (Xk, Px 1, S k, Ps 1, k)Step 1. Let e, el be computed as follows:

( )Xx ,1e)l min(px 1)x

{ xki (pxl),>O},lx argmin(px 1)i

e,1 min(ps 1)1,

/s argmin {( si (pl)i > 0}.pl)i

Step 2. Let mdg=(X-exlpl)r(s-eslp.l).

Step 3. Let/xk=-xrs [ tndg’

\xs]

Step 4. Let ef =(p)I)rD--2(pxl)+(pI)rD2(psl)xsStep 5. If (ef> 1.1)/xk=/xk/min (ex, e).EXHIBIT 5.1. A heuristic to compute centering parameter.

586 SANJAY MEHROTRA

The motivation behind various steps in Heuristic CENPAR are now discussed.For the moment assume that sex 0 and Cs 0. If this is the case, then x rs is the currentduality gap and mdg is the minimum duality gap that one can achieve by moving indirections px I and ps I in primal and dual spaces, respectively, ex I and e 1 are alwaystaken smaller than one, because at this value the computations for p 1 and p 1 ensurethat sex 0 and s 0 if no numerical error is present. Hence, for v 1 the choice of/x is such that it targets the point on the central path at which the duality gap is mdg.

The ratio mdg/xTs provides us "some indication" of how well the primal-dualaffine scaling trajectory is being approximated locally. A value of ratio mdg/xTs near1 means that the local approximations are not good, whereas mdg/xTs near zeroindicates that the approximations of the trajectory are good.

Table 5.1 gives the number of iterations required to solve the problems for choicesof v- 1, 2, 3, 4. All other parameters were the same as those for results in Table 8.2.The last column of this table gives the number of iterations required to solve theproblem if no centering was done. The results in Table 5.1 on the test problems showonly a moderate variation in the number of iterations for values of v between two andfour.

The discussion on the computation of the centering parameter thus far assumedthat x 0 and O. If this is not the case, then ef (error factor) is used as an indicatorfor their contribution to the search direction. If : 0 and : O, then it is easy to seethat

Dp, 1 DAT AD2AT)-AD](xks’)/2e,

D-lpx 1 [I- DAT(AD2AT)-AD](xksk)I/2e;

hence ef as defined in Step 4 of Exhibit 5.1 is equal to 1. If ef is smaller than 1, itindicates that the presence of :x and : is probably reducing the norm of the searchdirection and, therefore, it is expected to allow for larger steps in primal and/or dualspaces. Since k and k reduce linearly in step size when moving in directions p 1 andpsl, respectively, larger steps result in greater reduction in the error vectors. Hence,the value of ef smaller than one is not likely to hurt the performance of theimplementation.

Now if ef> 1, then empirical results indicate that the presence of , and/or sresults in a reduction in the step length, which adversely affects the improvement inthe duality gap as well as the reduction in the error vectors. Therefore, it might beindicating trouble ahead. If this happens in practice, we seem to quickly get out ofthe trouble spots by placing more emphasis on centering. In the current implementationthis is accomplished in Step 5.

6. Step length. Standard practice [1], [5], [18], [19], [20], [26] has been to movea certain fixed distance (step factor) to the boundary to avoid one-dimensional linesearches. The step factor in the case of primal-dual methods has typically been .995or .9995 18].

Although the performance of step factor =.995 or .9995 appears satisfactory inpractice, in our view, it has a major drawback as a heuristic: it limits the asymptoticrate of convergence of the algorithm. Furthermore, during the earlier phase of thealgorithm it is overly aggressive. A modified approach to computing step factor isgiven in Exhibit 6.1. It adaptively allows for larger (and smaller) step factors. In anextreme case it may allow a full step to the boundary and generate a point with zeroduality gap.


TABLE 5.1

Performance of implementation for different choices of u. +Stoppedafter 100 iterations with one digit of accuracy in objective function.

Problem

afiroadlittlescagr7

stochforlsc205share2b

sharelbscorpionscagr25

sctaplbrandyscsdl

israelbandmscfxml

e226aggscrs8

beaconfdscsd6shipO4s

agg2agg3scfxm2

ship041fffffS00ship08s

sctap2scfxm3shipl2s

scsd8czprobship081

shipl2125fv47

2 3 4 nc

9 8 7 7 814 11 10 10 1117 14 13 13 15

16 16 16 16 2114 12 11 11 1114 12 12 13 14

23 22 22 20 2513 12 12 12 1219 17 16 17 18

17 15 15 15 1721 20 20 19 199 8 8 8 8

30 25 24 24 2620 17 17 19 1721 19 18 18 19

25 21 20 20 2127 22 25 27 5624 21 21 21 22

11 8 7 7 713 10 10 10 1015 12 13 13 15

23 21 24 23 2622 19 21 21 2322 18 19 19 20

14 12 12 12 1236 38 38 38 10016 13 13 13 13

15 13 12 12 1325 20 20 20 1918 16 16 17 19

12 10 9 9 939 33 35 35 3818 14 14 14 14

19 16 16 17 1732 27 26 25 27

588 SANJAY MEHROTRA

Procedure GTSF (x k, Px, s k, Ps, fx,ZLet

and

Compute fx such that

Ix argmin { xi

s, I(p),>o}.ls argmin(Ps)i

(Xk,-L * (p),.)(s-(p.),)=(6.1) n * "ya

f := max (L, Yy)

and f such that

(x-p)(s-p)

(S,x-f * (ps),s)(X-(px),s)=(6.2) n *

f max (f, yy)

(xk--px)T(sg--ps)

EXHIBIT 6.1. Computation of step factor.

The step factor in the primal space is chosen so that the product of primal blockingvariable (lx) and the corresponding dual slack is nearly equal to the value their productwould take at the point on the central trajectory at which the duality gap is equal to(X--px)r(s--ps)/(n * ya). Note that if sex=0 and : =0, then (x--px)T(s--p) is theduality gap at the point obtained after moving full step. The parameter y, > 1 shouldbe used. The parameter 0 < y/<_- 1 is used to safeguard against very small or negativesteps. The explanation for computation of step factor for the dual variables is similar.In essence, the choice offx and f is guessing the minimizer of potential function (3.5)in directions Px and p while implicitly assuming that x 0 and : 0.

Provided that the computations for the search directions were performed withsufficient accuracy, the computational experience on the tested problems shows thatthe number of iterations required to solve the problems is relatively insensitive to thechoice of y and y/in a large range. We experimented with values of y .5, .75, .9,.99, .999 and y 1/(1-yy). In our experience we found that on most problems thenumber of iterations were fewer for a larger choice of yy, but the difference was smallfor yy in the range .75 to .999.

However, we observed a very interesting phenomenon. The implementationshowed signs of instability for larger values of y/for problems brandy, scfxml, scfxm2,and scfxm3. For these problems to obtain eight digits of accuracy in the solutions atthe last two iterations of the algorithm, the conjugate gradient method was needed toimprove the accuracy in the search direction. These problems were successfully solvedto the desired accuracy for yy .5, .75 and yy .9.

An examination of problem data of brandy, scfxml, scfxm2, and scfxm3 showsthat for these problems the set of optimal primal solutions is unbounded. An examin-ation of various stages of our implementation (with different choices of parameters)revealed that allowing for a larger step factor may result in premature convergence ofdual slacks corresponding to primal variables unbounded in the optimal set. At a lateriteration, this further causes the primal unbounded variables to become very large.


Hence, xi/si corresponding to these variables become disproportionately large. Thisresults in cancellation of large numbers when computing the Cholesky factor, causingcomputations to become less stable.

The reader is referred to Mehrotra [23] for a more elaborate discussion of theprecision of computations in the context of interior point methods and to Mehrotra[22] for a discussion of issues involved in developing implementations based on thepreconditioned conjugate gradient method, which we may want to use to improve thenumerical accuracy.

7. Initial point. In all of our implementations the initial point is generated asfollows. We first compute

(7.1) =(AAT)-IAc; =c--AT’n’; :=AT(AAT)-lb,and 6 max (-1.5 min {)i}, O) and 6 max (-1.5 min {i}, 0). We then obtain

(,+6xe)T(+6se)(7.2) 6, x + .5 *

(Y + &,e) r(g+ ,se)(7.3) 6, 3 + .5

Land generate 7r= and si=+ ,i=l,...,n and x=Y+ ,i=l,...,n as aninitial point.

We first discuss the validity of the above approach in generating x> 0 and s> 0.In the cases in which it fails to produce such a point, either the problems reduce tothat of finding a feasible solution of (P) or (D), or an optimal solution is generated.

From the definition of 6 and we know that x=> 0 and sO_-> 0. A positive pointis always generated, if 6 > 0 and 6 > 0. Furthermore, to show that x> 0 and s> 0,it is sufficient to show that 6 > 0 and 6 > 0.

First consider the case when 6 0 and 0. Clearly, in this case Y is a feasiblesolution for (P) and , is a feasible solution for (D). If yT 0, then these solutionsare optimal for the respective problems. Otherwise, yT> 0 and hence > 0 and 6" > 0.Now consider the case when 6 =0 and 6 > 0. In this case if Y # 0 for all i, thenobviously > 0 and 6 > 0. On the other hand, if Y 0 for all i, then b 0 and theproblem reduces to that of finding a feasible solution of (D). This problem can thenbe solved separately or by generating a perturbed problem for which the right-handside is Ae for any positive 6. 6e can be used as a feasible interior solution of theperturbed problem, and (7.2)-(7.3) can be used to generate a feasible point of theperturbed problem. Finally, the case when 6 > 0 and 6 0 can be argued in a similarmanner.

We now discuss some properties of the proposed approach.

7.1. Desirable properties of the proposed approach.Shift in origin. The approach is independent of the shift in origin. To explain

what we mean by this, consider the dual problem

maximize b r (Tr + A7r)

(D(A)) s.t. AT(’rr+ATr)+s=c,s>_--0

for any fixed choice of Act. Clearly, the polytope defined by the constraints in (D(A))is the same as the polytope defined by the constraints in (D) except for a shift of

590 SANJAY MEHROTRA

origin. It is desirable that an initial point be the same in relation to the respectivepolytopes. Note that in (7.1) is independent of the choice of Ar.

To demonstrate how similar arguments hold for (P), we consider an equivalentformulation of this problem. Let Z be a matrix whose columns form the basis for thenull space of A, and let Xo be any point satisfying Axo b. It is easy to see that (P)is equivalent to

minimize (cTZ)(y+ Ay)(p(A))

s.t. Z(y+ Ay) >=-Xofor y n-m and any (fixed) choice of Ay. An approach analogous to that of findingg computes

_(ZZ)--’Zxo.The slacks in the constraint of (P(Z)) are given by

Xo z(zrz)-lZrxo [I z(zrz)-lzr]xo ar(aar)-laxo ar(aar)-l b .Note that is the orthogonal projection of any vector satisfying Ax b onto the rangespace of A. Since and g are independent of Ay and Ar, respectively, it is obviousthat x and so are also independent of this.

Simple scaling. The initial point is not affected if all the constraints in (P) arescaled by a constant or if c is scaled.

7.2. Undesirable properties of the proposed approach.Column scaling. The initial point is affected by the scaling of columns of A. To

illustrate this, in Table 7.1 we give a number of iterations required to solve the problemsafter problems were scaled by using subroutine MSSCAL from MINOS. All otherdetails of the implementation were kept the same as those for results in Table 8.2.

The results in Table 7.1 indicate that scaling of columns may effect the performanceof the implementation. On most problems, use of MSSCAL improved the number ofiterations required to solve the problem or the number of iterations did not changesignificantly. The notable exception was problem fffffS00. For this problem, scalingincreased the number of iterations by about 40 percent.

We point out that (P) and (D) can be solved in one iteration if we know thecorrect scaling of columns of A. Hence, the problem of finding the best scaling ofcolumns appears to be as difficult as solving the linear programming problem itself.

Presence of redundant constraint. The initial point generated by using this approachis affected by the presence of redundant constraints. This can be a problem, for example,if redundant dual constraints with large slacks (primal variables with huge costs) arepresent. In this regard we point out that the central trajectory itself is affected by thepresence of such constraints.

A possible way to alleviate this problem would be to ask the user to give relativeimportance to various primal variables (dual constraints) in solving the problem. Aclearly redundant variable could be assigned zero weight, and therefore it could beremoved while generating an initial point. In general, the possibility of solving weightedleast squares problems to generate initial points in the context of "warm start" shouldbe explored further.

All the approaches [1], [5], [18], [26] used for practical implementations that arereported in the literature are dependent on column scaling and the presence ofredundant constraints. Furthermore, they lack one of the desirable properties that theproposed approach has.


TABLE 7.1

Effect of scaling on the performance of the algorithm.

Problem

afiroadlittlscagr7



sctaplbrandyscsdl

israelbandmscfxml

e226aggscrs8

beaconfdscsd6ship04s

agg2agg3scfxm2

ship041fffffSO0shipO8s

sctap2scfxm3shipl2s

scsd8czprobship081

shipl2125fv47

No scaling Scaling

7 610 1013 14

16 811 1112 14

22 2412 1216 17

15 1520 178 7

24 1717 1518 17

20 1625 1621 18

7 810 1013 12

24 2021 1919 20

12 1238 5213 14

12 1220 2017 14

9 935 3014 15

16 1526 23

The choice of constants "1.5" and ".5" in computing x and 8 is arbitrary. Thearguments about the validity of the approach (and its properties) do not change if wereplace 1.5 with any constant larger than one and .5 with any constant larger thanzero. We may do a one-dimensional line search (possibly in direction e) on the potentialfunction (3.5) to generate these constants.

592 SANJAY MEHROTRA

8. Computational performance. The basic ideas presented in this paper were imple-mented in a FORTRAN code. This section discusses our computational experiencewith this implementation and compares it with the results documented in the literature.

All testing was performed on a SUN 4/110 work station. The code was compiledwith SUN Fortran version 1.0 compiler option "-O3." All the cpu times were obtainedby using utility etime. The test problems were obtained from netlib [7]. The test setincludes small and medium size problems. The problem names and additional informa-tion on these problems are given in Table 8.1.

All problems were cleaned by using the procedure outlined in Mehrotra [22]. Anin-house implementation of the minimum degree heuristic was used to permute therows. Complete Cholesky factor was computed at each iteration to solve the linearequations. The procedure and the associated data structure, which we used to computethe Cholesky factor, were described in Mehrotra [23]. All the linear algebra subroutineswere written by the author.

The algorithm was terminated when the relative duality gap satisfied

(8.1)c Tx b TTr1 +lbTTrl

eexit.

In the actual implementation :, and :s were computed afresh at each iteration. However,(s)i was set to zero and absorbed in si if it satisfied I(s)l/s <.001. This occasionallysaved some computational efforts.

The following parameters were set to obtain the results reported in Table 8.2.

Eexit 10-8

v=3

yf--.9, ya =10

100 * max {s}

(in (8.1)),

(in Procedure CENPAR),

(in Procedure GFSF),

(in (3.5)),

Ks 100 max {x/} (in (3.5)).

The number of iterations required to solve the problems is given in the secondcolumn of Table 8.2. The primal objective value recorded at termination is given incolumn 3. The relative duality gap (8.1) is given in column 4. The primal infeasibility,

(8.2) Ilax b II/(1 + Ilxll),

and the dual infeasibility,

(8.3) IIA% + s- c11/(1 + s II),

recorded at termination are given in columns 5 and 6, respectively. This informationon relative duality gap and primal and dual feasibility is the same as that given inLustig, Marsten, and Shanno [18]. We use the same stopping criterion and providesimilar information in order to be consistent while making comparisons.

All the problems were accurately solved to eight digits. A comparison with theresults in Lustig, Marsten, and Shanno [18] show that on many problems in ourimplementation the accuracy in the objective value at termination was better. This isprimarily due to our approach for computing the step factor.

In Table 8.3 we compare the number of iterations required to solve the testproblems. Column 2 of this table gives the number of iterations taken by ourimplementation. Column 3 gives the number of iterations taken by the dual affine


scaling method, as reported by Adler et al. 1 ]. Column 4 gives the number of iterationsrequired by the second-order dual affine scaling method in Adler et al. 1]. Column 5gives the number of iterations reported in Lustig, Marsten, and Shanno [18] for aprimal-dual method. Column 6 gives the number of iterations reported in Gill, Murray,and Saunders [8] for a logarithmic barrier function method. Column 7 gives the number

Problem

afiroadlittlescagr7



sctaplbrandyscsdl

israelbandmscfxml

e226aggscrs8


agg2agg3scfxm2

ship041fffff800shipO8s

sctap2scfxm3shipl2s

scsd8czprobship081

shipl2125fv47

TABLE 8.1Problem statistics.

Rows Columns Nonzeros

28 32 8857 97 465130 140 553

118 111 474206 203 55297 79 730

118 225 1,182389 358 1,708472 500 2,029

301 480 2,052221 249 2,15078 760 3,148

175 142 2,358306 472 2,659331 457 2,612

224 282 2,767489 163 2,541491 1,169 4,029

174 262 3,476148 1,350 5,666403 1,458 5,810

517 302 4,515517 302 4,531661 914 5,229

403 2,118 8,450525 854 6,235779 2,387 9,501

1,091 1,880 8,124991 1,371 7,846

1,152 2,763 10,941

398 2,750 11,334930 3,523 14,173779 4,283 17,085

1,152 5,427 21,597822 1,571 11,127

594 SANJAY MEHROTRA

596 SANJAY MEHROTRA

TABLE 8.3Comparison of number of iterations with other implementations.

Problem

afiroadlittlescagr7



sctaplbrandyscsdl

israelbandmscfxml

e226aggscrs8


agg2agg3scfxm2

ship041fffffS00ship08s

sctap2scfxm3shipl2s

scsd8czprobship081

shipl2125fv47

AIPM AKRV2 AKRV1 LMS GMS DBDW

7 15 20 13 20 1010 18 24 17 18 1513 19 24 22 24 18

16 1911 20 28 16 3212 21 29 17 46

1714

22 33 38 40 35 2812 19 24 18 33 1716 21 29 24 28 21

15 23 33 22 53 2120 24 38 27 41 218 16 19 12 13 8

24 29 37 47 36 2417 24 30 28 31 2118 30 33 31 37 23

20 30 34 31 38 2425 3221 29 39 50 59 25

7 17 23 21 34 1710 18 22 15 15 1313 22 30 21 40 15

24 3221 3219 29 39 37 42 29

12 21 28 22 36 1738 59 5513 21 32 23 34 16

12 25 34 23 41 1620 30 40 39 42 3116 23 35 27 46 16

9 18 23 15 15 1335 35 52 57 56 4114 23 31 24 22 17

16 23 32 27 24 1724 52 48 44 28

of iterations reported in Domich et al. [5] for a variant of the method of centers. Ourmethod and the second-order dual affine scaling method in Adler et al. 1] computestwo directions at each iteration. The method implemented in Domich et al. [5] computesthree directions at each iteration and solves a linear programming problem defined by


using these directions. All other implementations compute only one direction at eachiteration.

On the average the number of iterations required by our implementation to solvethe mutually tested problems is 40 percent less than that reported in Lustig, Marsten,and Shanno [18]; it is 50 percent less than that reported in Adler et al. [1]; and it is55 percent less than that reported by Gill, Murray, and Saunders [8]. Compared tothe results in Adler et al. [1] for their second-order method, the results show that ourmethod takes about 35 percent fewer iterations. Finally, our implementation required20 percent fewer iterations than those required in Domich et al. [5].

It is useful to point out that it is possible to further reduce the total number ofiterations needed to solve the problems by using higher-order derivatives. This isdiscussed in a subsequent paper.

The number of iterations was always fewer than the number of iterations requiredby the second-order dual affine scaling method implemented by Adler et al. 1]. Manyof the problems were solved in practically half the number of iterations when comparedwith 1 ].

9. Comparison with OB1 and MINOS 5.3. This section compares the cpu timesrequired by our implementation to those required by the implementation of theprimal-dual method in OB1 [18] (02/90 version) and the simplex method in MINOS5.3 [29]. The source codes (also written in FORTRAN) of OB1 (02/90 version) andMINOS 5.3 were compiled using compiler option "-O3." Hence, everything wasidentical while making these comparisons. All the default options of OB1 (02/90version) and MINOS 5.3 were used. Printing was turned to minimum level in bothcases. In the case of OB1 (02/90 version), crush 2 was used for all problems.

The times for MINOS 5.3 are those for subroutine M5SOLV only. The TIMERsubroutine in OB1 was used to compute its cpu times. The times for OB1 were calculatedas follows:

OB1 Time=end of hprep-after mpsink+end of obdriv-after getcmo.

Times required by MINOS 5.3, OB1 (02/90 version), and our implementation donot include times spent in converting the MPS input file into a problem in the standardform. The times required by our implementation include all the time spent after theinput files were converted into a problem in the standard form.

The times required by our implementation is given in the second column of Table9.1. The times required by OB1 (02/90 version) are given in column 3 and the timesrequired by MINOS are given in column 4. Column 5 gives the ratio of times requiredby OB1 (02/90 version) to our code. Column 6 gives the ratio of times required byMINOS 5.3 to our code.

From these results we find that, on the average, our implementation in the currentstate performs two times better than the implementation in OB1 (02/90 version). Theratio of cpu times with OB1 (02/90 version) is more or less uniform.

Comparing the results with MINOS 5.3 we find that the proposed implementationis on the average better by a factor of 2.5. In this case, however, the ratio of cpu timesvaries significantly. MINOS 5.3 was generally superior on problems with few relativelydense columns, whereas our implementation of the primal-dual method was superioron problems with sparse Cholesky factor.

10. Conclusions. Details of a particular implementation ofthe primal-dual methodare given. This implementation requires a considerably smaller number of iterationsand saves considerable computational effort. We have given expressions to compute

598 SANJAY MEHROTRA

TABLE 9.1Comparison of cpu time with OB1 and MINOS 5.3 on SUN 4/110.

Problem

afiroadlittlescagr7



sctaplbrandyscsdl

israelbandmscfxml

e226aggscrs8


agg2agg3scfxm2

ship041iliif800ship08s

sctap2scfxm3shipl2s

scsd8czprobship081

shipl2125fv47

Total

AIPM OB1 MINOS5.3OB1

AIPMMINOS5.3

AIPM

.12 .60 .09 5.0 .7

.64 1.81 .70 2.8 1.11.11 2.75 1.66 2.5 1.5

1.48 2.91 1.50 2.0 1.01.49 3.33 2.16 2.2 1.41.50 3.00 1.46 2.0 1.0

3.27 9.21 3.98 2.8 1.22.87 7.06 5.92 2.4 2.05.23 10.43 15.32 2.0 2.9

4.92 9.18 7.71 1.87.18 15.30 11.55 2.12.46 5.46 6.15 2.2

58.37 127.01 6.11 2.28.01 17.61 22.09 2.2

10.55 20.82 12.76 2.0

9.38 15.83 15.30 1.732.88 47.46 7.32 1.413.31 43.95 40.86 3.3

2.56 9.28 1.97 3.65.66 10.56 31.19 1.96.92 17.58 6.63 2.5

65.86 100.92 10.05 1.566.66 94.74 10.95 1.421.69 46.74 51.42 2.1

8.90 24.35 13.20 2.780.90 140.01 14.33 1.79.50 23.05 20.43 2.4

30.04 47.75 56.02 1.633.31 72.84 107.40 2.114.06 31.78 47.85 2.2

10.38 21.98 230.23 2.133.78 81.93 166.03 2.418.12 42.81 19.34 2.4

28.56 63.11 120.31 2.2164.53 334.14 941.41 2.0

766.20 1,507.29 2,011.4

1.61.62.5

.12.71.2

1.6.2

3.1

.85.5.95

.150.162.37

.54

.172.1

1.93.23.4

22.24.91.0


all the derivatives at a given point of a primal-dual affine scaling trajectory. Theimplementation described here effectively combines the second derivative with thecentering vector. Heuristics for computing centering parameter and step length weregiven and their effectiveness was demonstrated. A new approach to generating a startingpoint was used. In addition, the results demonstrate that it is possible to develop fast(robust) implementations of interior point methods, which ensure sufficient reductionin a potential function at each iteration.

Comparison with OB1 (02/90 version) and the simplex method show that ourimplementation was faster by a factor of 2 and 2.5, respectively.

Acknowledgments. I thank Professor Michael Saunders for making the source codeof MINOS 5.3 available. I thank Professor Roy E. Marsten for releasing a copy ofOB1 for comparison in this paper. Also, I thank Mr. I. C. Choi for helping out withOB1 and Professor Donald Goldfarb for his constant encouragement, which made thiswork possible. I also thank the two anonymous referees for their careful reading ofthis paper and for their suggestions for improvement.

Appendix. Here we prove that the potential function (3.3) can be reduced by aconstant amount at each iteration. The development of our proof is based on theanalysis in Freund [6]. Let us define I-= 2n + 1,

,i=- o zcT (AAT)-1A a

f) r =_ (b r, crZ, br(AAT)-IAc), and y (x r, s r, A) r. Hence, without loss of generality,consider the problem

minimize yt

(PD) s.t. Ay b,

y>0,=

where y e Rt. Let us consider the potential function

(A.1) F(y)=- In y- E In y,,i=1

where =/+vq. The function F(y) in (A.1) is the same as the function (3.3). Letyk > 0 be any feasible point of (PD) and let yk be the diagonal matrix whose diagonalelements are (Yl,..., Yt). Let

d -= [I-r(,r)-lA](/e,_ e),

where ,, yk. Let yk+l y k (e/]l d II) Y d, < 1. Then

k+ y/k+1 Z yk+F(y )- F(yk) 3 In y----- )’, In y--’.ki=1

=ln 1-ll--d -i=lln 1-2

2(1- e)2

2(1-e)"

600 SANJAY MEHROTRA

The inequality above follows by using the fact that In (1 + 8)_-< 8 for 8>-1, andIn (1 + 8) -> (82/2(1 e)) if 181--< e < 1.

If [[dl[->_ 1, then for e =.5, F(y) is reduced by .25.Otherwise, if lid < 1, then from the definition of d, we have

,’ 7" ,,7" fi (y ek e)+ y

(d+e)= y eP P

which gives a feasible solution to the dual of (PD). Furthermore, the duality gap isgiven by

yk e T"( d + e) t/VTIId<=Y l+vQ <ykt"

But y/k is also the current objective value. Therefore, the optimal objective value of(PD) must be positive. If this is the case, then either (P) or (D) has no feasiblesolution, and we would stop.

Hence, if (P) and (D) have a feasible solution then F(y) can be reduced by .25at each iteration. A failure to reduce F(y) by this amount would imply that either (P)or (D) has no feasible solution.

REFERENCES

I. ADLER, N. KARMARKAR, M. G. C. RESENDE, G. VEIGA (1989), An implementation ofKarmarkar’salgorithm for linear programming, Math. Programming, 44, pp. 297-336.

[2] D. A. BAYER AND J. C. LAGARIAS (1989), The nonlinear geometry of linear programming: I. Affineand projective scaling trajectories, II. Legendre transform coordinates and central trajectories, Trans.Amer. Math. Soc., pp. 499-581.

[3] Y. C. CHENG, D. J. HOUCK, J. M. LIU, M. S. MEKETON, L. SLUTSMAN, R. J. VANDERBEI, ANDP. WANG (1989), The AT&T KORBX System, AT&T Teeh. J., 68, pp. 7-19.

[4] I. C. CHOI, C. MONMA, AND D. F. SHANNO (1990), Further development of a primal-dual interior

point method, ORSA J. Comput., 2, pp. 304-311.[5] P. D. DOMICH, P. T. BOGGS, J. R. DONALDSON, AND C. WITZGALL (1989), Optimal 3-dimensional

methodsfor linearprogramming, NISTIR 89-4225, Center for Computing and Applied Mathematics,U.S. Dept. of Commerce, National Institute of Standards and Technology, Gaithersburg, MD.

[6] R. FREUND (1988), Polynomial-time algorithmsfor linear programming based only on primal scaling andprojected gradient ofa potentialfunction, Working Paper OR 182-88, Operations Research Center,Massachusetts Institute of Technology, Cambridge, MA.

[7] D. M. GAY (1985), Electronic mail distribution of linear programming test problems, MathematicalProgramming Society Committee on Algorithms News Letter, 13, pp. 10-12.

[8] P. E. GILL, W. MURRAY, AND M. A. SAUNDERS (1988), A single-phase dual barrier methodfor linearprogramming, Report SOL 88-10, Systems Optimization Laboratory, Stanford Univ., Stanford, CA.

[9] D. GOLDFARB AND S. MEHROTRA (1988), A relaxed ofKarmarkar’s algorithm, Math. Programming,40, pp. 285-315.

10] (1989), A self-correcting version ofKarmarkar’s algorithm, SIAM J. Numer. Anal., 26, pp. 1006-1015.

11] C. GONZAGA AND M. J. TODD (1989), An O(x/-ffL) iterations large-step primal-dual affine algorithmfor linear programming, Tech. Report, School of Operations Research and Industrial Engineering,Cornell University, Ithaca, NY.

12] D. D. HERTOG AND C. Ross (1989), A survey of search directions in interior point methods for linearprogramming, Report 89-65, Faculty of Mathematics and Informatics/Computer Science, DelftUniv. of Technology, Delft, Holland.

13] N. KARMARKAR (1984), A new polynomial time algorithm for linear programming, Combinatorica, 4,pp. 373-395.

[14] N. KARMARKAR, J. C. LAGARIAS, L. SLUTSMAN, AND P. WANG (1989), Power series variants ofKarmarkar-type algorithms, AT&T Tech. J., May/June, pp. 20S36.


[15] M. KOJIMA, S. MIZUNO, AND A. YOSHISE (1989), A primal-dual interior point algorithm for linearprogramming, in Progress in Mathematical Programming, Interior Point and Related Methods, N.Megiddo, ed., Springer-Verlag, New York, pp. 29-47.

[16] (1991), An O(x/-ffL)-iteration potential reduction algorithm for linear complementarity problems,Math. Programming, 50, pp. 331-342.

17] I. J. LUSTIG (1991), Feasibility issues in a primal.dual interior-method for linear programming, Math.Programming, 49, pp. 145-162.

18] I.J. LUSTIG, R. E. MARSTEN, AND O. F. SHANNO (1989), Computational experience with a primal.dualinterior point method for linear programming, TR J-89-11, Industrial and Systems EngineeringReport Series, Georgia Inst. of Technology, Atlanta, GA.

[19] R. E. MARSTEN, M. J. SALTZMAN, O. F. SHANNO, G. S. PIERCE, AND J. F. BALLINTIJN (1989),Implementation of a dual affine interior point algorithm for linear programming, ORSA J. Comput.1, pp. 287-297.

[20] K.A. MCSHANE, C. L. MONMA, AND O. SHANNO (1989), An implementation ofa primal-dual interior

point method for linear programming, ORSA J. Comput., 1, pp. 70-83.[21] N. MEGIDDO (1986), Pathways to the optimal set in linear programming, in Progress in Mathematical

Programming, N. Megiddo, ed., Springer-Verlag, New York, pp. 131-158.[22] S. MEHROTRA (1992), Implementations of affine scaling methods: Approximate solutions of systems of

linear equations using preconditioned conjugate gradient methods, ORSA J. Comput., 4,pp. 103-118.

[23 (1989), Implementations of affine scaling methods: Towardsfaster implementations wiih completeCholeskyfactor in use, TR-89-15R, Dept. of Industrial Engineering/Management Science, North.western Univ., Evanston, IL.

[24] (1991), On finding a vertex solution using interior point methods, Linear Algebra Appl., 152,pp. 233-253.

[25] (1990), On an implementation of primal-dual predictor-corrector algorithms, presented at theSecond Asilomar Workshop on Progress in Mathematical Programming, Asilomar, CA.

[26] C. L. MONMA AND A. J. MORTON (1987), Computational experience with a dual affine variant ofKarmarkar’s method for linear programming, Oper. Res. Lett., 6, pp. 261-267.

[27] R. C. MONTEIRO AND I. ADLER (1989), An O(n3L) primal-dual interior point algorithm for linearprogramming, Math. Programming, 44, pp. 43-66.

[28] R. C. MONTEIRO, I. ADLER, AND M. G. C. RESENDE (1990), A polynomial-time primal-dual anescaling algorithm for linear and convex quadratic programming and its power series extension, Math.Oper. Res., 15, pp. 191-214.

[29] B. A. MURTAGH AND M. A. SAUNDERS (1983), MINOS 5.0 user’s guide, Tech. Rep. SOL 83-20,Dept. of Operations Research, Stanford Univ., Stanford, CA.

[30] M. J. TODD (1989), The affine.scaling direction for linear programming is a limit of projective.scalingdirection, Tech. Rep., School of Operations Research and Industrial Engineering, Cornell Univ.,Ithaca, NY.

[31] M. J. TODD AND Y. YE (1990), A centered projective algorithm for linear programming, Math, Oper.Res., 15, pp. 508-529.

[32] Y. YE (1991), An O(n3L) potential reduction algorithm for linear programming, Math. Programming,50, pp. 239-258.

o(x/-ffl)users.iems.northwestern.edu/~4er/mehrotranomination/ref2... · 2009. 6. 23. · 576 sanjay...

Documents