maxcimum entropy...

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-26, NO. 4, APRIL 1977 351

Maxcimum Entropy Image Reconstruction

STEPHEN J. WERNECKE, STUDENT MEMBER, IEEE, AND LARRY R. D'ADDARIO, MEMBER, IEEE

Abstract-Two-dimensional digital image reconstruction is an in this paper can be straightforwardly generalized to higherimportant imaging process in many of the physical sciences. If the dimensions, but to avoid unnecessary mathematical ab-data are insufficient to specify a unique reconstruction, an addi- straction we have not done so. The bulk of current researchtional criterion must be introduced, either implicitly or explicitlybefore the best estimate can be computed. Here we use a principle is directed toward digital reconstruction, at least as aof maximum entropy, which has proven useful in other contexts, simulation tool for the development of special-purposeto design a procedure for reconstruction from noisy measurements. hardware, and the most noticeable effect of reconstructionImplementation is described in detail for the Fourier synthesis in higher dimensions is the rapid increase in the processingproblem of radio astronomy. The method is iterative and hence burden. Even in applications where the structures of in-more costly than direct techniques; however, a number of com- . . . .parative examples indicate that a significant improvement in image terest are inherently three-dimensional, e.g., radiography,quality and resolution is possible with only a few iterations. A major two-dimensional reconstruction is an important activitycomponent of the computational burden of the maximum entropy because it is possible to design measurement apparatus soprocedure is shown to be a two-dimensional convolution sum, which parallel two-dimensional slices can be reconstructed sep-can be efficiently calculated by fast Fourier transform tech- arately and "stacked" to yield a three-dimensional re-niques. construction.Index Terms-Digital image processing, Fourier synthesis, image The assumption that the measurement process is linear,

processing, image reconstruction, maximum entropy, radio tele- apart from errors, is not uncommon, and the extent toscopes, statistical estimation theory. which it reflects physical reality depends on the applica-

tion. Now that the theory and practice of reconstructionINTRODUCTION from linear measurements is maturing, attempts to in-

rIHE reconstruction problem considered here is corporate nonlinearities into the measurement model have.the estimation of a two-dimensional function f(x,y) begun [1], [2]. To some extent, the linearity assumption is

the estimaion of a to-dimensinal functin f(x,,y responsible for the success of interdisciplinary cooperationfrom a finite number of noise-corrupted, linear measure- rsosbefrtesceso nedsilnr oprtofrom afinu eoni-rpe-in reconstruction research. Nonlinearities tend to be par-ments m, i =,2, ---,M' of the form ticular to each application, and efforts to deal with sig-Mi h (x,y)f(x,y)Sdxdy + e nificant departures from linearity become quite special-=T heas)femend+de1)(1ized.

where ei is an error term and f is zero outside the known The measurement model (1) is general enough to showregion D. We borrow optics terminology by calling the that a variety of seemingly different applications involveregion D. We borrow OptiCS terminology by calling thesilamth aic.I,fremp,wehoe

unknown function f an object and its reconstruction I animage. Specification of the measurement kernel hi (x,y) r x cos Oi + y sin 0--2Ri1 .. 1 r 1_ 1 ' ~~~~~~~~~~hi(x,y) = rect I (2a)depends on the application. Problems of this type have warisen in a number of diverse fields-radiography, radio'we describe the radiographic problem of reconstructionastronomy, and optics, to mention a few-and it is only from projection measurements of the linear attenuationwithin the last few years that the various disciplines have coefficient integrated along the path of a collimated X-rayappreciated their common interest in reconstruction. Thisdiscovery has revealed a large number of reconstruction beam. The geometry for this activity is shown in Fig. 1.algorithms and, inevitably, a considerable duplication of Use of a complex exponential kernelresearch effort. hi(x,y) = exp [-j2-r(uix + viy)] (2b)We restrict ourselves to two-dimensional reconstruction;

however, this is not a serious limitation. All of the results th]two-dim odel themeasurm mfa meo fthe two-dimensional Fourier transform of the object at

Manuscript received December 1, 1975; revised April 10, 1976. This spatial frequency (ui,vi). This situation is important inwork was supported in part by the National Science Foundation under radio astronomy where reconstruction is sometimes calledGrant DCR75-15140 and in part by the National Radio Astronomy 0b- Fore sytessservatory. Portions of this work have been presented at the Image Pro- Fulrsnhsscessing for 2-D and 3-D Reconstruction from Projections: Theory and If we letPractice in Medicine and the Physical Sciences Meeting, Stanford Uni-versity, Stanford, CA, August 1975. (y)=h(x- YiY)2c

S. J. Wernecke is with the Department of Electrical Engineering, h n(,)=h(~-X~)(cStanford University, Stanford, CA 94305. esrmnsbcm ape ftecnouino h

L. R. D'Addario is with the National Radio Astronomy Observatory,maueet eoesmls ftecnoulnohSocorro, NM 87801. image and a space-invariant point spread function h( -,.*).

352 IEEE TRANSACTIONS ON COMPUTERS, APRIL 1977

x

Fig. 1. A projection measurement with radius R, angle 0, and width wdetermines the mean image brightness in the shaded strip.

Here the reconstruction problem becomes one of decon- edge into the reconstruction process. One constraint op-volution or restoration of a blurred object. erative in most reconstruction applications is that the

In pointing out the mathematical similarity among these objects-representing absorptivities, densities, emissivi-three examples, we do not imply that the problems are ties, and the like-cannot, for physical reasons, assumeidentical nor do we claim that an algorithm developed for negative values. It is difficult to introduce this propertyone application is guaranteed suitable for use in another. into the simpler reconstruction procedures; however,For example, the restoration problem (2c) involves constraints of this sort can be naturally included in sta-space-invariant linear filtering, and this permits processing tistically motivated techniques.techniques that would not be appropriate in the other One aspect common to all reconstruction problems is thecases. We can, however, say that techniques to invert the insufficiency of a finite set of measurements to specify ageneral measurement equation (1) will be applicable to all unique reconstruction, even permitting the luxury ofof the specialities of (2). In some cases, these general error-free measurements for the moment. This is due tomethods will enjoy simplification in the context of a par- the fact that a continuous function possesses, in general,ticular reconstruction application. an infinite number of degrees of freedom. Of course, in

practice f can often be adequately described by a finite

STATISTICAL RECONSTRUCTION TECHNIQUES number of parameters, and such an approximation isnecessary if we are to consider digital reconstruction.

The error term is deliberately included in (1) because Choosing the dimension of a suitable finite representationwe intend to discuss certain statistical reconstruction is a nontrivial problem whose solution depends on thetechniques that are designed to acknowledge and, to some structures of interest, the final purpose for which recon-extent, compensate for measurement errors. To explicitly struction is needed, the speed and memory offered byconsider measurement errors in the formulation of new available computing facilities, etc. We will not consider thisreconstruction schemes is, we feel, important. Many of the problem here, but we will assume that an intelligent choicecurrently used reconstruction algorithms have been de- has been made so we can concentrate on other aspects ofrived without this consideration. The effect of noise on reconstruction. Thus, the object is represented byN pa-linear reconstruction procedures can be explored by su- rameters (fI,f2, * - ,fN); frequently, these parameters areperposition; however, superposition does not hold for the just samples of the object fi = [(x1,yi).nonlinear algorithms we discuss here, and this makes It often happens that the number of parameters neededconventional error analysis more difficult. We note, also, for an adequate description is considerably greater thanthat reconstructions are often categorized according to the number of available measurements. In this event, evensome resolution figure-of-merit, a number that is fre- the finite representation of the object is underdetermined,quently determined by measurement apparatus geometry and there is an infinity of possible solutions. A funda-alone despite the fact that the useful resolution of a re- mental decision must be made: should we redefine theconstruction is strongly influenced by noise. It would be reconstruction task so the data are sufficient to uniquelyhelpful if properly designed, error-acknowledging algo- specify the solution to the new problem, or should we stickrithms could "'sense" the resolution inherent in a set of with the original formulation and decide which of the so-noisy measurements and yield reconstructions lacking lutions is "best" in some sense? In this paper, we deal withspurious detail. the latter option.

Also important is the incorporation of a priori knowl- One approach is to use Bayes' theorem

WERNECKE AND D'ADDARIO: MAXIMUM ENTROPY IMAGE RECONSTRUCTION 353

P(fm}|I00 tropy (ME) reconstruction proceeds by finding the re-P(I {mi ) = (3) construction that maximizes the entropy measure and isP(lm~}) compatible with all the available data, both measurementswhich calculates the a posteriori probability that any re- and a priori information.construction I is correct given the available measurements. To develop the first model, we require a discrete repre-This conditional probability can be maximized once the sentation of the object in terms of picture elements (pix-terms in the numerator of the right-hand side of (3) are els). Let the object be partitioned into N pixels, each ofspecified. The denominator is a constant and does not af- area AA, and let fi be the average brightness in the ithfect the optimization. Assuming the measurement equa- pixel. We suppose that the brightness arises from thetion (1) and the noise statistics are known, determination random emission of discrete particles (call them photons),of P($m4 I) is usually straightforward. The difficulty with each of energy e, and we define the brightness asBayes' theorem is that it depends on the a priori proba-bility density P0). To find this, there must be a statistical = e (4)model that gives, for every possible object, an a priori AAprobability of occurrence.Severobabilworkers,tstudyingyrestorat o blurred p where ri is the average rate of emission of photons from theSeveral wor-kers, studying restoration of blurred pho-. .tographs [3] and reconstruction from projections [4], have ith pixel. Under this model, the probability that a photon

ud B a mprobability was emitted from the ith pixel, given that it was emittedused Bayesian methods with the a priori Image poastyfrom the object, is-density taken to be multivariate Gaussian. While this as- f

signment is made primarily for analytic convenience, Hunt r ft f=and Cannon [5] have shown the Gaussian model to be a =iri -ifi Fgood one provided the mean object is not taken to be uni-formly gray. They adopt a context-dependent mean object, where F = i iS the total intensityobtained by blurring one member of the image ensemble. The entropy of the discrete probability distribution (5)obtained by blurring one membDer of the image ensembtle. .

Although the use of a Gaussian prior does not explicitly isconstrain images to be nonnegative, negative values occur N N fi fiwith low probability if fluctuations about a positive mean HI = Pi log Pi - lgF F (6)image are not too large.

Formula (6) is solidly grounded in information theory [6],[10] and measures the uncertainty as to which pixel emit-

MAXIMUM ENTROPY RECONSTRUCTION ted a given photon. We emphasize that it is the randomIn this paper, we will consider an approach motivated emission model which enables us to make the connection

by non-Bayesian, but still probabilistic, arguments. In between discrete pictures and the entropy measure H1; thisabstract terms, the procedure begins with the assignment model may not be appropriate for all reconstructionof an entropy measure to indicate the randomness or un- problems.certainty in a statistical environment. Each measurement Several authors [11] [12] have used an "entropy mea-or piece of a priori information, assuming these data to be sure similar to Hl, namely,free of contradiction, reduces the randomness and restricts Nstatistical freedom. If the data are insufficient to eliminate - E fi log fi.(7)all uncertainty, a principle of maximum entropy can be i=1used to assure that the admittedly incomplete description Note that maximization ofH1 is equivalent to maximiza-of the situation reflects only the statistical structure im- tion of (7) if the latter is done under the constraint F =posed by the available information. This maximum en- constant, i.e., if we regard the total intensity as exactlytropy description retains all of the uncertainty not re- known. We prefer to avoid this restriction.moved by the data, and it has been interpreted as the de- The second statistical model is based on an extensionscription that is most objective [6] or maximally noncom- of Burg's work on spectral analysis of stationary randommittal with respect to missing information [7], [8]. processes [13], [14]. There the problem is to estimate theThe concept of entropy is a familiar one in information power spectrum S(), which is the Fourier transform of the

theory, probability theory, and thermodynamics [9], al- autocorrelation function -R(r) of a stationary randomthough its place in the present problem of image recon- process x(t). In many applications it happens that obser-struction is not yet clear. To define an entropy measure vations include only a finite number of samples of R(r) orthat is useful in the context of reconstruction we must of a particular realization of x(t). The samples are notformulate a statistical model for the imaging process. Two sufficient to specify S(IJ) uniquely, just as in the recon-such models, leading to different entropy measures, have struction problem where {m~I cannot specify f(x,y)been suggested in the literature, and each has attracted uniquely. Also, S(v) must be nonnegative, as must f(x,y)followers. Once a given model is adopted, maximum en- in all applications considered here. The maximum entropy


method of power spectrum estimation is based on the fact smoothness. It is felt, however, that entropy criteria havethat the entropy rate E of a stationary, band-limited ran- a stronger theoretical foundation than ad hoc smoothnessdom process is related in a simple way to its power spec- criteria since the former are grounded in specific statisticaltrum, namely, models.

E = 5 log S(v)dv + C (8) APPLICATION TO FOURIER SYNTHESIS

For concreteness, we will restrict the remainder of thewhere v 1isthe cutoff frequency and C depends on hihe discussion to ME reconstruction using H2in the context

ord8 se5in of Fourier synthesis as it arises in radio astronomy (see,

T[8, ]t.

e.g., [22]). We are interested in reconstructing a radioThe ME philosophy iS to assume that the process under bihns itiuinfo os ape fisto

investigation is as random, or statistically structureless, dimensional Fourier transform,as possible. Applying this principle to the spectral analysis d

problem, we select as our spectral estimate the function mi = S SD f(x,y) exp [-j2ir(uix + viy)]dxdy + ei.S(v) that maximizes the entropy rate (8) and is consistentwith the available observations. The basic mathematics (10)and an important algorithm for ME spectral analysis weredeveloped by Levinson [16]; however, the usefulness of this Since measurements are made in the transform domain,technique was not widely appreciated until it was inter- it is natural to consider reconstruction by Fourier inver-preted and extended by Burg [13], [14], and others [17], sion. This would be a simple matter if the image transform[18]. For a tutorial discussion of ME theory, the reader were known everywhere in the (u,v)-plane; however, thisshould consult [8]. is usually not the case in practice. For economic and

Generalizing (8) to two dimensions and letting f(x,y) technical reasons, measurement coverage of the (u,v)-correspond to S(D), we obtain a second entropy measure plane is seldom as complete as would be desired in the

absence of these constraints.H2= Cf log f(x,y)dxdy (9) The dilemma is how to form an image from a finite

JJD number of Fourier transform samples, too few to specifywhich has been adopted by Ables [71 and Ponsonby [19]. a unique reconstruction. It is instructive to consider whatIf the imaging process can be regarded as two-dimensional a proposed reconstruction technique implies about thespectral estimation, the appropriateness of H2 as an en- transform at points where measurements are not available.tropy measure follows immediately. This is the case in at It is not uncommon in this field to see reconstructionsleast one important reconstruction application in radio defined by equations that assume, at least implicitly, thatastronomy. Here, an incoherent radio source gives rise to unmeasured values are zero. An example of this treatmenta random electric field whose spatial autocorrelation is afforded by what has been called the direct transformfunction is sampled by radio interferometers. The rela- reconstructiontionship between the unknown radio brightness distribu- Mtion, which is to be reconstructed, and the available in- ID(X,Y) = Z ailmf cos [2ir(uix + viy)]terferometer measurements is embodied in the van Cit- s1tert-Zernike theorem [20] and is a Fourier transformation. - mf'1 sin [2ir(uix + vjy)]} (11)Hence, reconstruction in radio astronomy can be described where m- = m (R) + jm (I. The real apodizing constants aias power spectral analysis of the electric field, and use of can be chosen to improve the appearance of the pointH2 as an entropy measure is justified. source response in a fashion analogous to-use of "lag win-The existence of two different entropy measures for dows" in classical power spectrum estimation [23].

image formation is interesting. Which entropy measure, Methods of this sort work well if measurement coverageif either, is correct for a given application depends on the of the (u,v)-plane is nearly complete or if the missing dataphysics of the measurement process. There has been would have had values close to zero had these extra num-speculation (Ponsonby, private communication) about the bers been available. If coverage is restricted or irregular,relationship between the two measures introduced here; however, images reconstructed by such algorithms canhowever, no clear statement of this relationship has yet suffer from serious sidelobe artifacts. There will also beemerged. Frieden [11] has pointed out that reconstructions areas in which the reconstructed brightness is negative andfound using H1 tend to be smooth in a certain sense. hence physically inadmissible.Wernecke [211 has interpreted both H1 and H2 as simple The existence of regions of negative brightness on a radiosmoothness indicators and has suggested other criteria that map is not, in itself, disastrous; radio astronomers havecould also serve this purpose. Thus, different formulations been successfully analyzing such maps for years. The de-of ME reconstruction can be regarded as special cases in fect does indicate, however, that the reconstruction algo-the larger framework of reconstruction with maximum rithm has not used all the available information. Inclusion


of the a priori knowledge that a physically admissible re- 5 C log f(x,y)dxdy - -12construction cannot go negative is not sufficient, in general, D i - 2(to permit a unique reconstruction, but this constraint does where X is chosen so that (15) is satisfied to sufficient ac-represent additional information no less valid than the curacy. This problem has a nonnegative solution for anyactual measurements made. The difficulty, of course, is X > X.X plays the role of a Lagrange multiplier in theincluding this information in the reconstruction process. maximization of (12) subject to (15). From another pointThe ME technique ensures a nonnegative reconstruction of view, the maximization of (17) can be regarded as an

since the entropy measure H2 diverges to - c if more than attempt to simultaneously maximize the entropy measurea countable number of points in i approach zero. If the and minimize the total squared residual, with the relativemeasurements were error-free, we would formulate ME importance of the two being specified by the parameterreconstruction as the constrained optimization x

Even the unconstrained maximization of (17) has nomaximize J log Ax,y)dxdy (12) known explicit solution. We therefore seek a numerical

solution, and for that purpose write a discrete version ofsubject to (17) by partitioning the image Ax,y) into N pixels, each

of area AvA, and approximating the integrals by sums,

55 Ai(x,y) exp [-j27r(uix + v-y)]dxdy = m-, giving

i=1,2* * M (3)J(ibh/, * JN) = AA lo - |mD ~~~~~~~~~~~~~~~~~~~N Mi

and (x,y) > O fr, (x,y) C D (14) AA E A exp [-j2 r(uixk + viYk)] (18)

The constraints depend only on the available data so no k=1unwarranted assumptions about unavailable measure- where (xk,yk) is the location of the kth pixel. The objectivements are made. function J(h1, * * * IN) is a concave function (one always

Physical measurements are never free from errors, and underestimated by linear interpolation), and it thereforemodification of (12)-(14) to reflect that uncertainty is has at the most one local maximum.desirable. If the data are sufficiently noisy, (12)-(14) mayhave no solution, i.e., there may be no nonnegative image NUMERICAL METHODSthat, when Fourier transformed, agrees exactly with all the Differentiating (18) with respect to the pixel values, wemeasurements. Ables [7] has noted that information about obtainthe error statistics constitutes important a priori knowl-edge which should be included in the problem formula- cJ =A 1 m(R COS [2Ir(u xR + ViYc)]tion. I71- I + A

If we suppose the errors are independent, zero mean M 1random variables with variances ai, we could replace (13) -2AAX L - ml,, sin [2ir(uixl + viYl)]with the single constraint i=1 ¢

Mi NM -2AA2X\ Y3 -EZE2Imi-Ml|2= M (15) i=l k=1

i=l 0i

where * cos 2r[ui(xi -Xk) + Vi(yi -Yk)] } (19)By changing the order of the double summation, we arrive

Mi = J r (x,y) exp [-j2ir(u-x + v-yf]dxdy. at the alternate form

(16) JJ AA NBy the Central Limit Theorem, (15) will be very nearly - - + AAd, - AA2 L PI,kAk (20)satisfied when J(x,y) represents the true object, provided where=1thatM is large.No explicit solution is known for either of the con- M 1

strained optimization problems just posed, and numerical d1 = 2X L 2~$mlR) COS [2ir(uxix + viYl )]-solution of such problems is very difficult. Furthermore,i 1 07there is still a nonzero probability that the constraints (14) - mi(' sin [2ir(uxix + vlyi)]I} (21)and (15) are inconsistent for a particular set of measure- andments. Therefore, in seeking a practical algorithm, we re-place the constrained maximization with the uncon- p1k =2XEM CS2[gX-k+iy-k122strained (except for nonnegativity) maximization of -1 co 2ru(i-x)+ iY k].(2

356 IEEE TRANSACTIONS ON COMPUTERS, APRII, 1977

Yl Yk rowinde x

(R-1Uay 1 0 0 0

(R-2)ay 2 0 0 0

0 R 00 0

-(R-1I)y 2R-1 0 0 0

1 2 E E C column index

0 Ax (C-1Ax xl-Xk

Fig. 2. The distinct values of pl,k are stored in an array with 2R - 1 rowsand C columns. The correspondence between array elements and pixellocation differences is shown.

We can express the objective function (18) in terms of Both (21).and (22) have strong physical interpretations.kdlI and IPl,kI as As shown by comparison of (11) and (21), 1dl is a direct

N N transform reconstruction from the measurements with theJ = AA Z log ik + AA L3 dklk ith scaling constant equal to 2X/cr2. A direct transform

k=l k=1 reconstruction is equivalent to the output of a two-di-AA2 N N M 1 mensional, linear, space-invariant filter whose input is the2 {E Z1pl,kIkIl- ImjI2. (23) original object [24]. Ifwe evaluate (21) with mj = land2 1=1 k=11 i

m(I) = 0, we find the point spread function of the filter.There are a number of advantages, notation notwith- I p u tstanding, to the definition of the constants in (21) and (22). Th pito a pointisourcekinsthekthp el.Since they are independent of the pixel values, they canbe computed once at the beginning of the reconstruction From (20) we see that the gradient of the ME recon-

procedure and stored with modest memory requirements.1 struction objective function (18) is completely character-There areN numbers defined by (21).Although the double ized by these direct transform reconstruction parameters.ThesrerNu edfedbykin22implies (21) Alemthough therdouble That the nonlinear ME reconstruction procedure shouldsubscriptedPalk in (22) impliesNo elements, thereisca have such ties with the linear direct transform techniques

stants because they depend ong tn valueseftes cn is both interesting and fortunate. By exploiting our un-derstanding of linear image processing operations, welocations (xi - Xk) and (yI - Yk). This reduces the number . . . . . . . '

of distinct values calculable by (22) to at most (2R - 1) (2C recmg truh tion-1) for rectangularly arranged pixels with R rows and C B ruction.columns. The actual storage required can be reduced byanother factor of 2 by observing that P1,k = Pk,l.

In practice, it is convenient to store the distinct values k= 1of Pl,k in an array dimensioned as P(2R -1,C) as shownOfP1,kin an array dimensioned as Pl.R ..C)as shownas two-dimensional convolution of the reconstruction andin Fig. 2. With this scheme, there are some duplicated el- the point spread function, we can use fast Fourier trans-ements in the first column of P; however, the amount of p p

form (FFT) techniques for efficient calculation of theredundancy is small (R - 1 elements), and the total storagerequired for different values OfP1,kis slightly under 2N objective function in the form (23) and its gradient, whoseelements. A particular value of pI,k is extracted from P elements are given by (20). To identify the convolutionby form of (25), we temporarily drop the one-dimensional

ordering of pixels.P1,k = P[R - sgn (xl - xk) Let the center position of the kth pixel be

(Yl -Yk)/Ay, + JIX -XkI/AXI (24) (Xk,Yk) = (CkAx,rkAY) (26)

where row and column spacings are Ay and Ax, respec- where 1 . Ck . C and 1 . rk S R. The parameters rk andtively. ck indicate row and column locations, respectively, within

1 What constitutes modest memory requirements depends, of course, the pixel matrix. We avoid the imposition of a particularon available computing facilities. The minimum configuration anticipated data structure (e.g., row-sequential or column-sequential)includes a main memory at least several times larger than that requiredto store the reconstruction and auxiliary random-access storage such as for generality and notational brevity. Since Pl,k is a func-a disk. With these assets, array storage proportional to the number of tion of (xl- Xk) and (Yc-Yk), we can writepixels (N) can be considered reasonable. It is usually possible to structureprogram flow and memory allocation so not all arrays are needed in main-memory simultaneously. Algorithms demanding storage proportional Pl,k = P(Xi- Xk,Yi -Yk). (27)to N2 or MN are, in our view, unreasonable for most image processing Fo 2)(7 eotiproblems. Fo 2)(7 eoti


N N 2 A2E, P1,k/k = Z. X = -91 Ao

1 (32)E Pl,kZ k=1 2(ml A-) 2(ml + 6)(-I ck).~x,(rl - rk) yI?(ck \x,rky). (28) where 6 = Al - ml is the discrepancy between the recon-

Since the sum ranges over all pixels, we have struction and the measurement. Since the variance of theN C R intensity measurement m1 is known to be l, setting 6 =E P1,kA = E E r seemsreasonable;thisgivesk=1 ck=1 rk=1

*P[(Cl-Ck)/X,(rI-rk)AY1l(CkAx,rk.Y) (29) = Au2 = A

which is the form of two-dimensional convolution. 2(mi + 6r1)ri 2(1 + ml/hl)(We will not belabor the details of high-speed convolution Using A from (33) in the optimization of (18) does not

via FFT calculations; the interested reader should consult guarantee that the solution will satisfy constraint (15)[251, [261. Of interest, though, are the storage requirements when there is more than one measurement, but the ap-and the speed advantage of FFT use. We can only give proximation has been adequate for our work thus far. Ifrough guidelines here since both aspects depend on the necessary, A can be increased to obtain closer agreementcomputer installation and the FFT realization. with the measurements. It should be realized, however,The minimum transform size is (2C - 1) by (2R - 1), that there may be no A for which (15) is satisfied; e.g., if one

and these figures will usually be adjusted upward to the Or more measurements are unusually noisy, or if the van-nearest power of 2. The burden of such a transform in- ances jo-a assumed by the (optimistic) experimenter arecreases roughly as 4RC log2 (4RC) - 4N log2 N. We can unrealisticaly small.precompute and store the FFT of the point spread function Although ME reconstruction is a nonlinear procedure,array; this gives a discrete approximation to the direct it does maintain linearity with respect to measurementreconstruction filter transfer function. The number of scale if X is chosen according to (33). The loss of thisdistinct values in this discrete transfer function and in the valuable property would be disturbing: it would imply apoint spread function array can be shown to be equal so no nontrivial sensitivity of the reconstruction to the physicalextra storage is involved. units of the measurements. To demonstrate scaling lin-Each evaluation of (25) consequently requires only two earity, we consider a set of measurementsTmon and vain-

FFT's, one of the reconstruction, properly augmented with ancesart leading to the ME reconstruction Im}i Of neces-zeros, and the other an inverse FFT which yields the de- sity all partial derivatives (20) of the objective functionsired convolution values. Hence, the burden ofFFT eval- must vanish. We now scale the measurements by mf = tmi;uation of (25) is about 8N log2N as compared with N2 for where t>v0. This scale change induces, from (33) (21) androutine programming of the convolution sum. For a 128 by (22)128 picture (N = 16 384), the speedup factor is on the orderof 100. (os)' - t2cr, (34)The reduction of the problem to the unconstrained op-

timization of (18) required the introduction of the Lag- A' = A, (35)range multiplier A, whose value must be determined sep-arately. In general it may be necessary to compute a solu- d= dlt, (36)tion for several trial values of A in order to find the onewhich allows constraint (15) to be satisfied with sufficient P=,k P1,k/t2, (37)accuracy. Larger values of A cause the total squared re-sidual [left-hand side of (15)] of the solution to be small- and it is seen by substitution of (36) and (37) into (20) that

er. [tii Is the ME reconstruction in the new measurementWe have found the following argument useful for de- system.

termining an initial estimate for A. If there were only one OPTIMIZATION ALGORITHMSmeasurement m1 (with variance c2) at (ul,v1) = (0,0), thenm1 is an estimate of the total object intensity. In this case, Lacking an explicit solution to the nonlinear maximi-the ME reconstruction is uniformly gray with brightness zation of (18), we carry out an iterative search for theI, where, from (17),7 is the number that maximizes maximum. The general problem of such nonlinear opti-

mization has been well studied (see [27]-[29] and others),J = A log - A(mi - Af)2/l (30) and many algorithms are available. As mentioned earlier,

where A = NL.A is the total area of the object. The neces- the objective function has a unique local maximum; as asary condition result, convergence to that point can be guaranteed for

~~~9JA ~~~~~~~most reasonable search algorithms. The process is ham--J = - + 2AA(m - AbAder = 0 (31) pered, however, by the large dimensionality of typicala/ I imaging problems: there are as many independent vani-

is satisfied when ables as there are pixels. This causes practical difficulties


due to slow convergence, large storage requirements, and sive, but the use of approximate searches may require morelimited numerical precision. iterations for convergence. We have used the followingWe consider algorithms in which a sequence of one- compromise: take a step of predetermined length in the

dimensional searches is executed in the N-dimensional search direction, evaluate the function and gradient there,solution space. These can be classified according to the and estimate the location of the one-dimensional searchamount of local information about the objective function maximum by quadratic or cubic interpolation.2that is used in determining the search direction. Zero-, A practical problem becomes apparent when some pixelfirst-, and second-order methods are those which use values are close to zero. The objective function is highlyevaluations of, respectively, the function itself; the function nonlinear in this region, and its value is undefined forand its gradient; and the function, gradient, and Hessian negative pixel values. Convergence is slowed since stepsmatrix of second partial derivatives. Second-order meth- that would further reduce these already small values mustods require storage proportional to N2, which is so large be short to maintain nonnegativity. This problem can bein our application that we will not consider them further. alleviated by the exponential transformation fi = exp (gi),Quasi second-order methods, e.g., [301, which build up an i = 1,2, - - -,N. The objective function is now defined forestimate of the Hessian from function and gradient eval- both positive and negative values of the new independentuations, have the same storage problem. variables gi.On the other hand, the storage required for both zero- Another modification, which has proven more useful

and first-order methods is proportional to N, so the choice than the exponential transformation, is to deflect thebetween them depends mainly on computational consid- search direction so pixel values below a certain cutoff, sayerations. Equations (20) and (23) show that the major 1 percent of the maximum pixel value, are not changedburden of computing the objective function and its de- unless they would be increased by a move in the originalrivatives is due to the convolution sum (25). Once this has search direction. The justification for this is that relativelybeen computed, either by routine programming or FFT small pixel values have little effect on either reconstruc-techniques, the number of extra operations needed for tion-measurement consistency or the appearance of theevaluation of the objective function and the complete image. An algorithm that uses deflected gradients for mostgradient is proportional to N. Consequently, exact calcu- of the iterations and occasionally perturbs the recon-lation of the gradient each time the objective function is struction in the true gradient direction has been found toevaluated is almost free, and this discourages use of search converge faster than exact steepest ascent.algorithms that do not employ derivatives or rely on finite Another procedure showing potential for ME recon-difference approximations. We are thus led to first-order struction in some applications involves univariate moves.procedures such as steepest ascent or the conjugate gra- If only one pixel brightness, J1, is changed at a time, thedient method [31], [27]. optimal change can be found analytically by setting theWe have used both of these methods with some success. corresponding partial derivative to zero. From (20) we

In steepest ascent, the search is in the direction of the find

di - AA E Pl,kAk + V(di - AA E Pl,kAk) + 4AAp,lltl= k#1 2AAp1,1(39)

objective function gradient, VJ; in the conjugate gradient to be the value that maximizes the objective function whilemethod, the search direction is holding all other variables constant. By adjusting each

S - VJ - SY ,i'l/lVJtI (38) pixel separately, we can thus avoid the need for a searchto find the optimal step size. Another advantage is that (39)

where primes refer to the values in the previous iteration. automatically satisfies the nonnegativity constraint sinceThe conjugate gradient method requires more storage since A is positive.it retains the previous search direction as well as the cur- There are two disadvantages to the univariate approach.rent gradient. Asymptotically, steepest ascent converges First, the sequential handling of pixel values induceslinearly and conjugate gradient quadratically, making the patterns, characteristic of the order in which variables arelatter apparently more powerful; however, with the large adjusted, in the reconstruction during the early iterations.dimensionality of the present objective function, most of This is particularly apparent when working with imagesthe computation takes place before the asymptotic rate isachieved, and the superiority of either method has yet to 2 One can also consider moves that are a constant fraction of the gra-be demonstrated. dient vector, as in [3]. This simplifies programming and reduces the- number of function evaluations in each iteration; however, the number

In any such methods, the one-dimensional search pro- of iterations will generally be increased and convergence cannot becedure is quite important. A very accurate search requires guaranteed unless the fraction is chosen carefully. Also, it is necessaryto consider the action to be taken should such a fixed-fraction move drivemany function and gradient evaluations, which iS expen- a pixel value negative.


0-5 Y o3.9 4~~~~~~~~50.5 y3.9

I0~~~~~~~~~~~~~~~~~

i 4 M05.3-0.5 -9% -- --4------

70 02.5028~~~~~~~~~~~

-0.51-Fig. 4. Measurement coverage in the spatial frequency domain for the

Fig. 3. A phantom consisting of seven point sources. Source locations seven point source phantom.are shown by circles, and relative source strengths appear in italics.

possessing a high degree of symmetry. Gradient methods, struction consistency with the individual measurements.moving all variables simultaneously, tend to maintain any If the agreement is unsatisfactory, the image is reinput bysymmetry present and yield better reconstructions for the the second program for continuing iteration. Adjustmentfirst few iterations. Randomization of the order in which of the parameter X can easily be handled in this programunivariate methods update the variables is helpful, al- if necessary.though true randomization increases the program com- Reconstructions have been computed on a 21 by 21plexity. A reasonable compromise is to define several image grid, resulting in 441 independent variables in thedeterministic, but different, adjustment orders, e.g., first optimization. The examples shown here have been com-across rows and then down columns, so patterns break up puted with the univariate search algorithm discussedsooner. above. The exact optimizing step for each variable is cal-The second disadvantage is more serious. Each time a culated from (39). The values X/0f2 needed in (21) and (22)

variable is changed, the convolution sum (25) is altered. were calculated according to (33) with a? r2and m /o,With gradient methods, one use of the FFT for high-speed = 10.convolution permits adjustment of all pixel values. It is A useful quantitative measure of the discrepancy be-inefficient to compute the complete convolution for each tween a reconstruction and the measurements is the fac-single variable step; consequently, an exact univariate torsearch requires an effort proportional to N2 to update all A =pixel values. While this makes univariate optimization M N 2unattractive in large problems, it may not be a handicap M - AA EI ik exp [-j27r(UiXk + ViYk)Iin all reconstruction applications. Since the gradient is not k= Istored, less memory is needed, and the difficulty of re- mconstruction on moderate-sized computers is eased. Also,

I M1if the number of pixels is relatively small, FFT techniquesmay offer no computational advantage. Even in caseswhere FFT use could be profitable, the increased program that is, the ratio of the rms error to the rms amplitude ofcomplexity and additional storage requirements may re- the data. This figure is quoted for the reconstructionsstrict its application. shown here. The discrepancy of direct transform recon-

EXAMPLES structions is not zero because the image brightness is as-sumed to be zero outside the image domain. In these ex-

The ME procedure has been used with encouraging re- amples, we have multiplicatively scaled each directsults for reconstruction from both phantom (artificial) data transform reconstruction so the reconstruction intensityand actual interferometer measurements. Programming AA x-N k agrees exactly with the measured value.was done in Fortran on a Hewlett-Packard 2116B com- The first example uses a phantom composed of sevenputer with 16K 16 bit words of main memory and 1.2M point sources whose strengths and positions, shown in Fig.words of auxiliary disk storage. It is convenient to divide 3, were chosen randomly. The Fourier transform of thisthe reconstruction task into three independent programs. phantom was evaluated at the locations shown in Fig. 4,The first program inputs the Fourier transform samples, and these data was used for reconstruction. Contour plotscalculates the direct transform parameters defined in (21) of the direct transform reconstruction, with areas of neg-and (22), and stores the parameters on disk. The second ative brightness shaded, and of the ME reconstructionprogram uses these parameters for iterative maximization after 10 iterations-that is, 10 adjustments to each pixelof the ME objective function. This program can be ter- brightness-are compared in Fig. 5. The initial image es-minated after any number of iterations; the last recon- timate for the ME algorithm is taken to be the directstruction estimate is saved in a disk file. A third program transform reconstruction with negative pixel values set tois responsible for display and for checking the recon- zero. We do not claim that the ME procedure has con-

(a)

(b)

0.4

U0.4-

0.2

0.2 2

0-

0

-0.4

-04 4

-0.4

-0.2

00.

20.4

x-0

.4-0

.20

0.2

Fig.

5.Reconstructionsof

thesevenpointsource

phantom

(a)by

direct

Fouriertr

ansf

orma

tion

(A19.

0percent),and(b

)z

afte

r10

iterations

ofthemaximumen

trop

yte

chni

que(A

=10.8

perc

ent)

.In

thes

eand

allotherc

ontour

plot

s,th

e contour

interval

is10

percentof

thema

ximu

mbrightness.

(a)

~~~~~~~~~~~~~~~~(b)_

YyII

0.41

0.41

0.2(

0.2

-

_04

0~0

-0.4

-0.2.

00.

20.4

x-0

.4-0

.20

0.2Q.X

Fig.6

.Maximum

entr

opyreconstructions.

ofthese

venpo

ints

ourceph

anto

rnaf

ter

(a)10

(A-10

.8pe

rcen

t)an

d(h)

20(a

=10

.0percent)

iterations.

cn


tbrightness

~~~~~~~v (b;'S

Fig. 9. Reconstruction of the "stacked blocks" phantom (a) by direct7/ , f -0 5Fourier transformation (A = 14.1 percent) and (b) after 16 iterations

-o 5 o0 0.5 of the maximum entropy procedure (A = 10.1 percent). A cubic splinex was used for interpolation along each row before plotting.

Fig. 7. A piecewise-constant "stacked blocks" phantom. The imagebrightness is zero for x l > 0.25, lyI > 0.25.

o 0o 0

0 0

o4 a -10 5 I0o 00 0 1/

4 -51Fig. 8. Spatial frequency domain coverage for reconstruction of the

stacked blocks phantom. Fig. 10. Interferometer measurements for an observation of the radiosource Cygnus A (3C405) lie on nine concentric ellipses. Measurementsused by the maximum entropy procedure are circled.

verged in these few iterations; however, it has settled downenough that iteration-to-iteration changes in the recon- wavelength of 2.8 cm and provides measurements of thestruction are slight. In this example, the rms pixel value two-dimensional Fourier transform of the source underchange in going from iteration 9 to iteration 10 was only4 percet of te rms vlue ofthe recnstrucion. T eefec nvestigation. These measurements lie on nine concentric

d m.onsTre in ellipses as shown in Fig. 10. Because of the way in whichof increasing the number of iterations is demonstrated data are accumulated, measurement coverage is essentiallyFig. 6 where the ME reconstruction from the same data continuous along each arc.after 20 iterations iS shown.Ifter 20istseen fratiomnFig.s5thattheMEreis of.wn Only a fraction of the available data has been input to

It.isseenfromFi . tt te MErni the ME algorithm, but the direct transform program hashigher resolution than the direct transform image although been allowed to use all the measurements. This is indicatedthere are two pairs of sources-at approximately (-0.4, in Fig. 10, where it is seen that the data given to the ME0.1) and (-0.1, -0.1)-that are not resolved by eithertechnique. The ME procedure eliminates much of the procdr have been sampled at approximately 300 in-background fluctuation that characterizes direct transform treconstructions. It is interesting to note that in the direct data from the outer four ellipses. As a consequence of the

tasomrcntuto,tt, 04 alatter action, the natural resolution of the data providedtransormecontructon,hepaka (-0.1,0) a s to the ME algorithm is poorer, by almost a factor of 2, thanwith a higher amplitude than the peak at (0.4, -0.1) even

the resolution of the data given to the direct transformthough there is no source at the former location but thereis ne at the latter. The fictitious peak disappears (at the procedure.Reconstructions using the two methods are displayed10 percent level) from the maximum entropy reconstruc- in Fig. 11. The east-west peak widths in the ME recon-

on. .i a struction are slightly greater than in the direct transformreconstruction, although the increase is not as great asconstruction because they provide the most opportunity m b

for resolution gain over direct procedures; however, im- .proved reconstruction quality is also obtained for the given to the two routines. The north-south widths are

high-contrast, piecewise-constant phantom shown in Fig. smaller in the ME reconstruction, resulting in brightness7. Its Fourier transform is calculated at points indicated contours that are more nearly circular. Once again, the ME

in Fig. 8, and a.coparsonofrcontr technique has suppressed much of the background activityFig. 9. The improved resolution of the ME reconstruction aprn ntedrc rnfr eosrcinis manifest in steeper edges and a stronger indication of the CNLSOextra brightness in the back right quadrant.The final example uses real data taken with the Stanford Although we have concentrated most of our attention

University five-element interferometer during an obser- on the imaging problem of radio astronomy, maximumvation of the well-mapped radio double Cygnus A (3C405). entropy reconstruction has potential application in otherThe instrument, described in detail in [32], operates at a fields concerned with reconstruction from incomplete data,

North

INo

rth

(orc

sec)

(a)

(arcsec)

(b)

NN

80

8--0-T

00

-~~~

~)

-40

~~~~~~~~~~-4

0-

-80

J0-8

0-

-80

-40

040

80

-80

-40

040

80

<-Ea

st(arcsec)

eEast

tarc

sec)

Fig.

11.

Reconstruction

ofCygnusAby

(a)d

irec

tFourier

transformation

ofallm

easurements(A

unkn

own)

,and

(b)a

fter

23it

erat

ions

ofthemaximum

entropyte

chni

queus

ingthelimitedmeasurements

indicated

inFi

g.10

(i\=

15.7

per-

cent

).


e.g., electron microscopy. Even in radiography, where sented at Image Processing for 2-D and 3-D Reconstruction frommeasurements are usually plentiful, there are situations Projections: Theory and Practice in Medicine and the Physical

Sciences Meeting, Stanford, CA, Aug. 1975.in which data are missing; for example, when the number [221 G. W. Swenson and N. C. Mathur, "The interferometer in radioof projections is intentionally reduced to limit X-ray ex- astronomy," Proc. IEEE, vol. 56, p. 2114, 1968.posure to the patient or when imaging moving objects. As [23] R. B. Blackman and J. W. Tukey, The Measurement of Power

Spectra. New York: Dover, 1958.discussed in the Introduction, the essential difference [241 R. N. Bracewell and A. R. Thompson, "The main beam and ringlobesbetween reconstruction in these varied disciplines is in the of an east-west rotation-synthesis array," Astrophys. J., vol. 182,form of the kernel in the measurement equation (1). In- pp. 77-94, 1973.[25] R. M. Mersereau and D. E. Dudgeon, "Two-dimensional digitalcorporation of the appropriate kernel into the maximum filtering," Proc. IEEE, vol. 63, pp. 610-623, 1975.entropy objective function (18) adapts the method to new [26] B. R. Hunt, "Minimizing the computation time for using the tech-specialties. nique of sectioning for digital filtering of pictures," IEEE Trans.

Comput., vol. C-21, pp. 1219-1222, 1972.[27] R. L. Fox, Optimization Methods for Engineering Design. Read-

REFERENCES ing, MA: Addison-Wesley, 1971.[28] W. Murray, Ed., Numerical Methods for Unconstrained Optimi-

[1] B. R. Hunt, "Digital image processing," Proc. IEEE, vol. 63, pp. zation. New York: Academic Press, 1972.693-708,1975. [29] E. J. Beltrami, An Algorithmic Approach to Nonlinear Analysis

[2] A. Macovski, R. E. Alvarez, J. L.-H. Chan, and J. P. Stonestrom, and Optimization. New York: Academic Press, 1970."Correction for spectral shift-artifacts in X-ray computerized to- [30] R. Fletcher and M. J. D. Powell, "A rapidly convergent descentmography," presented at Image Processing for 2-D and 3-D Re- method for minimization," Comput. J., vol. 6, pp. 163-168, 1963.construction from Projections: Theory and Practice in Medicine [31] R. Fletcher and C. M. Reeves, "Function minimization by conjugateand the Physical Sciences Meeting, Stanford, CA, Aug. 4-7, 1975. gradients," Comput. J., vol. 7, pp. 149-154, 1964.

[3] B. R. Hunt, "Bayesian methods of nonlinear digital image restora- [32] R. N. Bracewell, R. S. Colvin, L. R. D'Addario, C. J. Grebenkemper,tion," IEEE Trans. Comput., to be published. K. M. Price, and A. R. Thompson, "The Stanford five-element radio

[4] G. T. Herman and A. Lent, "A computer implementation of a telescope," Proc. IEEE, vol. 61, pp. 1249-1257, Sept. 1973.Bayesian analysis of image reconstruction," State Univ. of NewYork, Buffalo, NY, Tech. Rep. 91, 1974.

[5] B. R. Hunt and T. M. Cannon, "Nonstationary assumptions forGaussian models of images," IEEE Trans. Syst., Man, and Cybern.,to be published.

[6] E. T. Jaynes, "Prior probabilities," IEEE Trans. Syst. Sci. Cybern., Stephen J. Wernecke (S'72) was born invol. SSC-4, pp. 227-241, 1968. Evanston, IL on October 20, 1950. He received

[7] J. G. Ables, "Notes on maximum entropy spectral analysis," Astron. the B.S. (with distinction), M.S. and Ph.D. de-Astrophy. Suppl., vol. 15, 1974. grees in electrical engineering from Stanford

[8] D. E. Smylie, G. K. C. Clarke, and T. J. Ulrych, "Analysis of the ir- University, Stanford, CA, in 1972, 1974, andregularities in the earth's rotation," in Methods in Computational 1976, respectively.Physics, vol. 13, B. Alder, S. Feinbach, and B. A. Bolt, Eds. New From 1972 to 1976, he was a Research Assis-York: Academic Press, 1973. tant with the Stanford Electronic Laboratories.

[9] L. Brillouin, Science and Information Theory. New York: Aca- During that time he also held appointments asdemic Press, 1956. Teaching Fellow and Acting Instructor of Elec-

[10] C. E. Shannon and W. Weaver, The Mathematical Theory of trical Engineering at Stanford. He is currentlyCommunication. Urbana, IL: University of Illinois Press, 1949 a Research Associate at Stanford where his interests include image re-

[11] B. R. Frieden, "Restoring with maximum likelihood and maximum construction, digital signal processing and statistical estimation theory.entropy," J. Opt. Soc. Amer., vol. 62, no. 4, pp. 511-518, 1972. In 1972 he received the F. E. Terman Award for Scholastic Achieve-

[12] R. Gordon and G. T. Herman, "Reconstruction of pictures from their ment.projections," Quarterly Bull. Center for Theor. Biol., vol. 4, pp. Dr. Wernecke is a member of Phi Beta Kappa, Tau Beta Pi, and the71-151, 1971. Optical Society of America.

[13] J. P. Burg, "Maximum entropy spectral analysis," presented at the37th Meeting Soc. of Exploration Geophysicists, Oklahoma City,OK, 1967.

[14] J. P. Burg, "A new analysis technique for time series data," presentedat NATO Adv. Study Inst. on Signal Processing with Emphasis onUnderwater Acoustics, 1968. Larry R. D'Addario (S'70-M'74) was born in

[15] M. S. Bartlett, An Introduction to Stochastic Processes. London, East Orange, NJ, on December 26, 1946. He re-England: Cambridge University Press, 1966. ceived the S.B. degree in electrical engineering

[16] N. Levinson, "The Wiener rms (root mean square) error criterion from the Massachusetts Institute of Technolo-in filter design and prediction," J. Math. and Phys., vol. 25, pp. gy, Cambridge, in 1968, and the M.S. and Ph.D.261-278,1947. degrees, both in electrical engineering, from

[17] T. J. Ulrych and 0. G. Jensen, "Cross-spectral analysis using max- Stanford University, Stanford, CA, in 1969 andimum entropy," Geophysics, vol. 39, pp. 353-354,1974. 1974, respectively.

[18] A. Van den Bos, "Alternative interpretation of maximum entropy From 1971 through August 1974, he was withspectral analysis," IEEE Trans. Inform. Theory, vol. 17, pp. _ _ the Radio Astronomy Institute, Stanford Uni-493-494, 1971. versity, where he developed the calibration

[19] J. E. B. Ponsonby, "An entropy measure for partially polarized ra- methods and major portions ofthe software for the five-element synthesisdiation and its application to estimating radio sky polarization telescope. Since September 1974, he has been with the National Radiodistributions from incomplete 'aperture synthesis' data by the Astronomy Observatory, Charlottesville, VA, engaged in basic researchmaximum entropy method," Mon. Not. R. Astr. Soc., vol. 163, pp. in data processing for radio telescopes. In January 1976, he transferred369-380, 1973. to the Observatory's Very Large Array Project, Socorro, NM, where he

[20] M. Born and E. Wolf, Principles of Optics. Oxford: Pergamon is responsible for evaluating electronic systems tests.Press, 1970. Dr. D'Addario is a member ofTau Beta Pi, Sigma Xi, and Eta Kappa

[21] S. J. Wernecke, "Maximum entropy image reconstruction," pre- Nu, and of Commission V of URSI.

maxcimum entropy...

Documents