a gradient projection algorithm for relaxation methods
TRANSCRIPT
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-5, NO. 3, MAY 1983
Correspondence
A Gradient Projection Algorithmfor Relaxation Methods
JOHN L. MOHAMMED, ROBERT A. HUMMEL,AND STEVEN W. ZUCKER
Abstract-We consider a particular problem which arises when apply-ing the method of gradient projection for solving constrained optimiza-tion and finite dimensional variational inequalities on the convex setformed by the convex hull of the standard basis unit vectors. Themethod is especially important for relaxation labeling techniquesapplied to problems in artificial intelligence. Zoutendijk's method forfinding feasible directions, which is relatively complicated in generalsituations, yields a very simple finite algorithm for this problem. Wepresent an extremely simple algorithm for performing the gradientprojection and an independent verification of its correctness.
Index Terms-Gradient projection, projection operator, relaxationlabeling, updating formnula.
I. FORMULATION OF THE PROJECTION PROBLEMWe treat the following optimization problem.Let IK be the convex set defined by
nIK=§ Rn Zxi = , xi>O, VJ}.
For any vector x IK, the tangent set T7 is given by
nT;={Jv Rn > viO= , vi.O whenever xi= 0}.
i =1
The set of feasible directions at x is defined by
F; =7'Ox{JCEIRn -v|IIJ < 1}where 11 is the standard Euclidean norm.Given a "current point" x C IK, and an arbitrary direction
q C IREn we consider the following.Problem P: Find i C F; such that
q*u> q * for all JC F;.Clearly, Problem P is a linear optimization problem with quad-ratic and linear constraints.Problem P arises in the context of labeling problems in arti-
ficial intelligence, where iterative techniques similar to gradientascent in 1K have been studied for their use in reducing am-biguity and achieving consistency [1], [2]. The convex setIK is especially appropriate for labeling problems. The set IKcan be viewed as the convex hull of the standard unit vectors
( 0, 1,,0, , 0), i = 1, 2, , n. The vector eis
Manuscript received February 15, 1982; revised March 14, 1983.J. L. Mohammed is with the Artificial Intelligence Laboratory, Fair-
child Central Research and Development, Palo Alto, CA 94304.R. A. Hummel is with the Courant Institute of Mathematical Sciences,
New York University, New York, NY 10012.S. W. Zucker is with the Computer Vision and Robotics Laboratory,
Department of Electrical Engineering, McGill University, Montreal,P.Q., Canada H3A 2A7.
assigned to an object to denote the labeling of that object withlabel number i. If the identity of the object is ambiguous, andno label can be assigned with complete certainty, a compro-mise vector p E IK can be assigned to the object, so that
p = (ai, * a*,an) = L aiei
denotes the labeling of that object with label numbers 1 throughn, with degree of certainty a, through an, respectively.The optimization problem P arises in a solution method pro-
posed for solving a variational inequality on 1K [3]. It alsoarises if one is solving a nonlinear optimization problem on IKby the method of gradient ascent (steepest ascent). To moti-vate problem P, we present a brief discussion of the latter case.Consider the problem of maximizing F(xi) among all x & IK,
where F is a real-valued differentiable nonlinear function. Ifone uses the method of gradent ascent, then the procedure isto update successive values of x by replacing x with the vectorx -t au, where a is a small positive scalar, and ui is chosen soas to maximize the directional derivative at x. Of course, a- isconstrained by the requirement that it must lie tangent to thespace IK at x and x + eta is a numerical way of moving x infini-tesimally in the direction u. Further, since the directional de-rivative of F at x, D- F(x), is scaled by the magnitude of u, itsuffices to consider directions defined by vectors of unit lengthor less. Since DaF(x) = grad F(x) au, u can be found by max-imizing q au among all vectors a & F;, where 4 - grad F(x).Thus u solves problem P.
II. SOLUTION METHODOLOGY
Problem P is simply the problem of projecting the given vec-tor q4 onto the convex set formed by the tangent set T7', andthen normalizing the length of the result. If x is an interiorpoint in IK (i.e., no component xi is zero), then T; is a sub-space, and the solution u is simply the length-normalized or-thogonal projection of q onto the subspace. This is accom-plished by the trivial formulas
u = v
Thus, Problem P is interesting only when x lies on a faceor edge of IK. Topologically, IK is a simplex of degree n - 1,and has boundary surfaces of all lower degrees. However, if Ilies on one of these surfaces, the set T; is a convex set (shapedlike a "wedge"), and the solution to Problem P is more compli-cated than projecting onto the boundary surface. For example,if 4 lies in T', then the projection is simply the identity. Onthe other hand, if q lies in a direction that points away fromall tangent directions in the wedge T;, that is, if q - vi< 0 forall v C Tx, then the solution u is the zero vector. In between,there will be regions in which q projects to boundary surfacesof each order greater than or equal to the order of the bound-ary on which I lies.Note that a boundary face (or edge of any given order) is
itself a simplex. However, the solution vector u is not neces-sarily a simple projection of q onto this boundary simplex, asnoted above.A solution method exists, and can be obtained by applying
0162-8828/83/0500-0330$01.00 © 1983 IEEE
330
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-5, NO. 3, MAY 1983
algorithms from the theory of feasible directions to the spe-cific geometry defined by [K. In particular, Zoutendijk offersfinite algorithms [4] for solving problems of the form
maximize q iu, given q,
subject to
AuTBd< b.Problem P can be formulated in this way, with B =I, then X n identity matrix. When Zoutendijk's algorithm is appliedto Problem P, certain simplifications can be applied becauseB =I and because 1K is especially easy to define (i.e., thematrix A has a very simple form). These simplifications areequivalent to an extremely simple finite algorithm for solvingProblem P, which we present in Section III.Despite the fact that Zoutendijk's algorithm has been avail-
able for over 20 years, and in spite of the algorithm's sim-plicity when applied to Problem P, alternative schemes arecommonly used for projecting direction vectors q onto theset of feasible directions from a point on the probability space.These schemes typically do not solve Problem P, but ratheryield feasible directions Ui which are "more or less" in thesame direction as q. For example, the "nonlinear probabilisticmodel" used in many applications of relaxation techniques tolabeling problems defines
v=pP P
where
! Pi(I1 + oeqi)Pi= Epj(l+aqj)
Zpi(I + cxq1)
and where ce > 0 is a small fixed constant.It is easily verified that u E Tp as long as a is sufficiently
small. Note, however, -that if pi = 0, then ui= 0. That is, aniterative scheme based on these formulas can never leave aface or edge of the space IK.Other projection schemes have been studied in connection
with relaxation labeling. We mention the Product Rule [51,Bayesian analysis [6], and single component desaturization[7]. There is also an obvious "truncation method" of settingnegative updating components to zero when the correspondingxi's are zero. None of these schemes yields a solution to Prob-lem P. The connections between relaxation labeling, projec-tion onto a convex set, and Problem P are not fully addressedin the cited references. However, in an accompanying paperby Hummel and Zucker, an algorithm for relaxation labelingis presented which requires, as a subroutine, an algorithm tosolve Problem P.Why should a problem which seems to be geometrically sim-
ple lead to so many different partial solution methods? Partof the answer is related to the formulation of the problems:the need for the projection operator is not always recognized.More importantly, the geometry is not quite as trivial as itfirst seems. Computing the regions of IRn in which q is pro-jected to different order boundary surfaces requires a lot ofcare. The algorithm presented in the next section performsthis computation.
III. THE PROJECTION ALGORITHMThe following algorithm solves Problem P.
0) Accept as input I E IK and q E IRn.
1) Set k = I, Sk =Z,D= {iJxi =0}.
2) LOOP:
tk= 1
n - #Sk i E Sk
Sk+l :=j{iEDI qi < tk}if Sk+I =Sk, then EXIT LOOP
k := k + 1
END LOOP
3) Compute 5, where
[0 if iESk
1qi- tk if i Sk4) Output ui, where
if y=0
Y{/IIYi1 otherwise.
One way to verify that the resulting vector is the correctsolution is to observe that ui satisfies the Kuhn-Tucker condi-tions (at least when ui # 0), which is equivalent to solvingthe constrained optimization problem [8]. In this proof, theLagrange multiplier X belonging to the constraint z ui = 0 hasthe same value as the final threshold tk. We now present analternate proof. This proof is self-contained, and handles theu = 0 and ui = 0 cases uniformly.Before showing that ui is the desired solution, however, we
show that the algorithm terminates in a finite number of steps.Proposition: The Sk's are nested, and thus the algorithm
terminates with at most #D + 1 passes through the loop.Proof: Since S1 = , S1 C S2. We assume by induction
that Sk - 1 C Sk . Since i E Sk implies that qi < tk -1,
(n - #Sk) tk = z qi= Li E Sk i;ESk-
qi -
i F SO Sk -I
> (n - #Sk-1) tkl - (#Sk -#Sk-1) tk-I
=(n - #Sk) tk-l
so tk tk - 1 . Then clearly Sk C Sk + 1, by the definition ofthe Sk's. The proposition follows since the loop terminateswhen Sk = Sk-1-
IV. PROOF OF THE ALGORITHMLet k = N denote the maximum value of k attained during
the last iteration through the loop (step 2). Denote by W thespace
W={vEE TIvi= 0 for iESN}.Note that if v E W and 1111 < 1, then v E Fx.Lemma: The maximum ofq4 i, vilz Fi is attained among
Proof: Suppose that vtiE FX maximizes q4 tv, and that us >0 for some i C SN. Define wi by
if i=
Vi _tvJ if jEi SN\i}
vi +n #SN Vi if 0ON.
Since u_EFX, vi > 0 for j C D, and thus w1> 0 for jE D. Fur-ther, I w; =1 uv = 0. Finally
331
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-5, NO. 3, MAY 1983
fl#SN 2u~~~~~ 2
n-#SNnSN.Here we used vi >O, and j E SN implies v1 > 0. Combining,we have shown that wV E Fx. But
iGSN 4-:SN N
I~= Z q1v1+ v
*IeS nI-ISN (n -#SN)j~S
=< -qI u + tNUi
=4q * V +(tN - qS Vi>KSNsince i uSN implies qi <t, and since vi> O. This is a con-tradiction, since 4 * v is maximum among C Fx. Thus wemust haveus =O for i C SN. That is, u E W.Theorem: The output vector u solves Problem P.Proof: According to the lemma, it suffices to show thatqC Fw, and 4 *u>v * for all u C W l F v.
Clearly, y calculated in step 3 of the algorithm satisfies ys =0 for i ESN. Further, i CD, i SN, then qi> tN accordingto the definition of SN ,and so Yi =qi - tN> 0. So Yia 0for allc CD. Finallyqv - 0 by direct computation. Thusy C W.
In fact, as can be easily seen, step 3 merely performs an or-thogonal projection of 4 onto the subspace W. That is, forany v E W,
So q u = d -v for allq v W.The output vector u calculated in step 4 is simply a length
normalization of 9, and so yis in W also. Since YiI >- 1,
y+ E Fx.
Next, observe that since ueC W,
(q uy v 0ll.
The last equality follows form the definition of a in step 4.LetnvlbeanyvetorindW sF . Then 11 ao Sc1, and so
q4U = yYu = HAIIIIIU IIusing the Cauchy-Schwarz inequality. We have thereforeshown that
LeFl:4 4*any * v forallhe WO<Fa.
This proves the theorem.
REFERENCES
[11 S. W. Zucker, "Relaxation labeling and the reduction of localambiguity," in Proc. 3rd Int. Joint Conf. Pattern Recognition, SanDiego, CA, 1976; also in Pattern Recognition and Artificial Intelli-gence, C. H. Chen, Ed. New York: Academic, 1977.
[21 A. Rosenfeld, R. A. Hummel, and S. W. Zucker, "Scene labelingby relaxation operations," IEEE Trans. Syst. Man, Cybern., vol.SMC-6, p. 420, 1976.
[31 R. A. Hummel and S. W. Zucker, "On the foundations of relaxa-tion labeling processes," this issue, pp. 267-287.
[41 G. Zoutendijk, Methods of Feasible Directions. New York: Else-
vier, 1960; also Mathematical Programming Methods. Amsterdam,The Netherlands: North-Holland, 1976.
[5] S. W. Zucker and J. Mohammed, "An analysis of probabilisticrelaxation labeling processes," in Proc. Pattern Recognition andImage Processing, Chicago, IL, May 1978.
[6] S. Peleg, "A new probabilistic relaxation scheme," IEEE Trans.Pattern Anal. Machine Intell., vol. PAMI-2, p. 362, 1980.
[71 M. Berthod and 0. Faugeras, "Using context in the global recogni-tion of a set of objects, Appendix," in Proc. IFIP, 1980.
[8 ] D. Luenberger, Introduction to Linear and NonlinearProgramming.Reading, MA: Addison-Wesley, 1973.
An Image Transform Coding Scheme Basedon Spatial Domain Considerations
T. C. CHEN AND RUI J. P. DE FIGUEIREDO
Abstract-Image transform coding is first briefly reviewed usingconventional viewpoints. Then a new spatial domain interpretationis given to image transform coding. An improvement based on thisviewpoint for the Fourier transform coding, which possesses simplespatial domain relations, is presented.
Index Terms-Circular convolution, Fourier transform coding,image transform coding, overlap-and-save coding scheme, zonalsampling.
I. INTRODUCTION
In this paper, we investigate the conventional approachto pixel-level image data compression-reconstruction bytransform techniques. A brief review is given in Section II.We then introduce a new spatial-domain interpretation view-point, with illustrations using the Fourier transform, whichshows some of the limitations in the use of transform tech-niques in image data compression. Based on this viewpoint, wepresent, in Section IV, an improvement in the use of Fouriertransforms in image data compression. Finally, we concludeby providing a motivation for investigating the spatial domainprocessing techniques for image data compression and recon-struction which constitute an alternative to transform coding.
1I. A BRIEF REVIEW OF TRANSFORM CODING
Transform coding represents an indirect way of coding com-pared to bit-level coding (e.g., PCM, predictive coding). Aschematic diagram for image transform coding based on zonalsampling is shown in Fig. 1.For purposes of analysis, we can regard the pixels of an im-
age [f I (or image block in actual implementation) as an N-element vector f, which may be considered as a sample of arandom process. This vector undergoes a linear transformationdefined by an N X N-element operator matrix A.I The trans-formed vector F is then operated on by a P X N (where P < N)selection matrix H that extracts certain elements from F andrecords them in a P-element vector FT. This is called "zonal
Manuscript received July 13, 1981; revised September 20, 1982.T. C. Chen was with the Department of Electrical Engineering, Rice
University, Houston, TX 77251. He is now with Bell Laboratories,Holmdel, NJ 07733.
R. J. P. de V-igueiredo is with the Departments of Electrical Engineer-ing and Mathematical Sciences, Rice University, Houston, TX 77251.
1 A 2-D matrix transform [F] = [A] [ f 1 [A I t can be converted to aI-D vector transform F = [A] .f by using lexicographic ordering of [F],[ f I, and the Kronecker direct product of [A I with itself.
0162-8828/83/0500-0332$01.00 (O 1983 IEEE
332