phase retrieval from noisy data based on minimization of penalized i-divergence

16
Phase retrieval from noisy data based on minimization of penalized I-divergence Kerkil Choi and Aaron D. Lanterman School of Electrical and Computer Engineering, Georgia Institute of Technology, Mail Code 0250, Atlanta, Georgia 30332, USA Received October 21, 2005; accepted February 9, 2006; posted July 19, 2006 (Doc. ID 65484); published December 13, 2006 We study noise artifacts in phase retrieval based on minimization of an information-theoretic discrepancy mea- sure called Csiszár’s I-divergence. We specifically focus on adding Poisson noise to either the autocorrelation of the true image (as in astronomical imaging through turbulence) or the squared Fourier magnitudes of the true image (as in x-ray crystallography). Noise effects are quantified via various error metrics as signal-to-noise ratios vary. We propose penalized minimum I-divergence methods to suppress the observed noise artifacts. To avoid computational difficulties arising from the introduction of a penalty, we adapt Green’s one-step-late ap- proach for use in our minimum I-divergence framework. © 2006 Optical Society of America OCIS codes: 100.5070, 100.3190, 100.3010. 1. INTRODUCTION In various scientific applications such as astronomical im- aging through extreme atmospheric turbulence and x-ray crystallography, it is impossible to directly observe the ob- jects of interest with current technology. In some prob- lems, Fourier magnitudes—but not Fourier phases—are obtained. For example, in crystallography, we want to find the interatomic structure of a molecule, but the structure cannot be directly observed with any practical devices be- cause of physical limitations. 1–3 Instead, we can obtain Fourier magnitudes by shooting x rays through a crystal, but Fourier phase information is entirely lost. In astro- nomical imaging, we can directly obtain the autocorrela- tion of an object through photon differencing. 4,5 However, the autocorrelation is the inverse Fourier transform of the squared Fourier magnitudes of the object, meaning that the Fourier phases are completely lost. As in these two applications, phase retrieval 6–11 can be approached from two viewpoints. Of course, the first ap- proach is to retrieve Fourier phases from the correspond- ing Fourier magnitudes, as in x-ray crystallography. The other approach is to estimate a function from its autocor- relation, as in astronomical imaging. Note that the knowl- edge of a function is equivalent to the knowledge of its Fourier magnitudes and phases. Based on the latter idea, Schulz and Snyder found an expectation-maximization (EM) algorithm that attempts to recover a function from its autocorrelation for their as- tronomical imaging application. 4 In their underlying sto- chastic model, the data are assumed to follow a Poisson point process; the EM algorithm maximizes the corre- sponding Poisson likelihood. Furthermore, they noted that their EM algorithm, in its asymptotic form, which we call the Schulz–Snyder algorithm, 12 produces a sequence of estimates that can minimize an information-theoretic discrepancy measure called Csiszár’s I-divergence (also called cross entropy in the related literature 13–16 ). The asymptotic form is obtained by assuming an infinite num- ber of data samples and by using the weak law of large numbers. 4 (Similar arguments were made earlier by Sny- der et al. 17 in the context of emission tomography, which also assumes a Poisson data model.) Later, they formally derived the Schulz–Snyder algorithm 5 by minimizing the I-divergence via the Kuhn–Tucker conditions. 18 Although the papers by Schulz and Snyder gave a few numerical examples, none of their experiments involved noise. To our knowledge, our paper is the first to explore the effect of noise on the Schulz–Snyder iteration. Csiszár’s I-divergence 19 is an information-theoretic dis- crepancy measure defined on two nonnegative functions. It may be thought of as a generalization of the Kullback– Leibler distance. 20,21 An important result of Csiszár’s work 19 is that if the functions involved in an inverse prob- lem are nonnegative, minimizing the I-divergence mea- sure is the only method consistent with a set of intuitive postulates such as regularity and locality, which are de- sirable for estimation problems. Methods of minimizing the I-divergence have been popular in various estimation applications. 13–15,22,23 In general, maximum-likelihood estimates (MLE) are highly inclined to become rough when data are noisy. Es- timates for phase retrieval problems, obtained by mini- mizing the I-divergence between the measured autocorre- lation and the autocorrelation of an estimate, are equivalent to MLE under a Poisson autocorrelation data model. Therefore it is of interest to investigate what im- pact noise can have on minimum I-divergence estimates in phase retrieval. In both astronomical imaging and x-ray crystallography, noise may be modeled by Poisson random processes. 4,24 In particular, x-ray crystallography data are usually measured with charge-coupled-device cameras, whose detectors’ readout noise can be modeled by a Poisson random process. 25,26 Good’s roughness penalty 27 has been known to be help- ful for reducing the noise artifacts in maximum-likelihood estimation for several applications including emission 34 J. Opt. Soc. Am. A/Vol. 24, No. 1/January 2007 K. Choi and A. D. Lanterman 1084-7529/06/010034-16/$15.00 © 2006 Optical Society of America

Upload: aaron-d

Post on 30-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phase retrieval from noisy data based on minimization of penalized I-divergence

1IacjlotccFbnttst

apioreF

ettcpstcodca

34 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

Phase retrieval from noisy data based onminimization of penalized I-divergence

Kerkil Choi and Aaron D. Lanterman

School of Electrical and Computer Engineering, Georgia Institute of Technology, Mail Code 0250,Atlanta, Georgia 30332, USA

Received October 21, 2005; accepted February 9, 2006;posted July 19, 2006 (Doc. ID 65484); published December 13, 2006

We study noise artifacts in phase retrieval based on minimization of an information-theoretic discrepancy mea-sure called Csiszár’s I-divergence. We specifically focus on adding Poisson noise to either the autocorrelation ofthe true image (as in astronomical imaging through turbulence) or the squared Fourier magnitudes of the trueimage (as in x-ray crystallography). Noise effects are quantified via various error metrics as signal-to-noiseratios vary. We propose penalized minimum I-divergence methods to suppress the observed noise artifacts. Toavoid computational difficulties arising from the introduction of a penalty, we adapt Green’s one-step-late ap-proach for use in our minimum I-divergence framework. © 2006 Optical Society of America

OCIS codes: 100.5070, 100.3190, 100.3010.

bndadIteoo

cILwlspsta

htmlempixrdcb

fe

. INTRODUCTIONn various scientific applications such as astronomical im-ging through extreme atmospheric turbulence and x-rayrystallography, it is impossible to directly observe the ob-ects of interest with current technology. In some prob-ems, Fourier magnitudes—but not Fourier phases—arebtained. For example, in crystallography, we want to findhe interatomic structure of a molecule, but the structureannot be directly observed with any practical devices be-ause of physical limitations.1–3 Instead, we can obtainourier magnitudes by shooting x rays through a crystal,ut Fourier phase information is entirely lost. In astro-omical imaging, we can directly obtain the autocorrela-ion of an object through photon differencing.4,5 However,he autocorrelation is the inverse Fourier transform of thequared Fourier magnitudes of the object, meaning thathe Fourier phases are completely lost.

As in these two applications, phase retrieval6–11 can bepproached from two viewpoints. Of course, the first ap-roach is to retrieve Fourier phases from the correspond-ng Fourier magnitudes, as in x-ray crystallography. Thether approach is to estimate a function from its autocor-elation, as in astronomical imaging. Note that the knowl-dge of a function is equivalent to the knowledge of itsourier magnitudes and phases.Based on the latter idea, Schulz and Snyder found an

xpectation-maximization (EM) algorithm that attemptso recover a function from its autocorrelation for their as-ronomical imaging application.4 In their underlying sto-hastic model, the data are assumed to follow a Poissonoint process; the EM algorithm maximizes the corre-ponding Poisson likelihood. Furthermore, they notedhat their EM algorithm, in its asymptotic form, which weall the Schulz–Snyder algorithm,12 produces a sequencef estimates that can minimize an information-theoreticiscrepancy measure called Csiszár’s I-divergence (alsoalled cross entropy in the related literature13–16). Thesymptotic form is obtained by assuming an infinite num-

1084-7529/06/010034-16/$15.00 © 2

er of data samples and by using the weak law of largeumbers.4 (Similar arguments were made earlier by Sny-er et al.17 in the context of emission tomography, whichlso assumes a Poisson data model.) Later, they formallyerived the Schulz–Snyder algorithm5 by minimizing the-divergence via the Kuhn–Tucker conditions.18 Althoughhe papers by Schulz and Snyder gave a few numericalxamples, none of their experiments involved noise. Tour knowledge, our paper is the first to explore the effectf noise on the Schulz–Snyder iteration.

Csiszár’s I-divergence19 is an information-theoretic dis-repancy measure defined on two nonnegative functions.t may be thought of as a generalization of the Kullback–eibler distance.20,21 An important result of Csiszár’sork19 is that if the functions involved in an inverse prob-

em are nonnegative, minimizing the I-divergence mea-ure is the only method consistent with a set of intuitiveostulates such as regularity and locality, which are de-irable for estimation problems. Methods of minimizinghe I-divergence have been popular in various estimationpplications.13–15,22,23

In general, maximum-likelihood estimates (MLE) areighly inclined to become rough when data are noisy. Es-imates for phase retrieval problems, obtained by mini-izing the I-divergence between the measured autocorre-

ation and the autocorrelation of an estimate, arequivalent to MLE under a Poisson autocorrelation dataodel. Therefore it is of interest to investigate what im-

act noise can have on minimum I-divergence estimatesn phase retrieval. In both astronomical imaging and-ray crystallography, noise may be modeled by Poissonandom processes.4,24 In particular, x-ray crystallographyata are usually measured with charge-coupled-deviceameras, whose detectors’ readout noise can be modeledy a Poisson random process.25,26

Good’s roughness penalty27 has been known to be help-ul for reducing the noise artifacts in maximum-likelihoodstimation for several applications including emission

006 Optical Society of America

Page 2: Phase retrieval from noisy data based on minimization of penalized I-divergence

tpdtpelcpm

ntttptt

cflfuao

GzbmliigEGg

umttiw

2AASIrvitu

ccnwt

Tt

wkcd

w

t

w1k

gl

NfcTsspnotF

BI“Txpr

tEcpSeTS

t

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 35

omography28 and optical sectioning microscopy.29 Theenalty encourages smooth estimates by penalizing theifferences between an estimate and shifted versions ofhe estimate. Since minimum I-divergence estimates forhase retrieval from noisy data are also rough, we studyffects of Good’s roughness on the estimates. A particu-arly nice aspect about Good’s roughness penalty is that itan be interpreted in terms of I-divergences. This sup-lies insights on how Good’s roughness operates on esti-ates.Although Good’s roughness can reasonably suppress

oise artifacts, it tends to smear edges, which is often dis-urbing. The total variation (TV) penalty has been knowno provide estimates that smooth noise while preservinghe edges.30–35 This motivates us to study effects of the TVenalty on minimum I-divergence estimates for phase re-rieval. The TV penalty has also been used in emissionomography.36

In a sense, phase retrieval can be viewed as a blind de-onvolution problem, where the unknown kernel is a re-ection of the object being imaged. The TV penalty hasound some success in regularizing estimates (as well asnknown kernels) in blind deconvolution.37–39 This servess another motivation for considering the TV penalty inur study.

When regularizing minimum I-divergence estimates byood’s roughness, or TV penalties, the pertinent optimi-

ation problem that needs to be solved at each iterationecomes complicated because the components of esti-ates are “coupled” by the penalties. Green’s one-step-

ate (OSL) algorithms40,41 are techniques proposed to eas-ly resolve such issues in EM algorithms. Based on themportant theoretical fact that minimum I-divergence al-orithms are asymptotically equivalent to certain types ofM algorithms under Poisson data models, we adaptreen’s OSL algorithms for use in our phase retrieval al-orithms.

This paper is organized as follows. Section 2 discussesnconstrained phase retrieval algorithms based on mini-izing Csiszár’s I-divergence. We discuss penalties in de-

ail and derive our constrained algorithms for phase re-rieval using I-divergence in Section 3. Section 4llustrates and discusses interesting experiments. Finally,e conclude our study in Section 5.

. UNCONSTRAINED PHASE RETRIEVALLGORITHMS. Algorithm for Unaliased Autocorrelations: Thechulz–Snyder Algorithmn astronomical imaging, the autocorrelation of an object,ather than its Fourier magnitudes, is directly obtainedia manipulation of measured data.4 This autocorrelations “continuous” in that the associated Fourier magni-udes, which yield an “unaliased” autocorrelation, are notndersampled as in x-ray crystallography.1

Schulz and Snyder found an iterative algorithm for re-overing nonnegative functions from nth-orderorrelations.5 Here, we are interested specifically in the=2 case of recovery from “unaliased” autocorrelations,hich is equivalent to phase retrieval. For implementa-

ion on a computer, we discretize all functions of interest.

he algorithm estimates functions from their autocorrela-ions by minimizing Csiszár’s I-divergence:

I�S�R�� = �y�S�y�log

S�y�

R��y�+ R��y� − S�y�� , �1�

here S=Rf is the autocorrelation of some true but un-nown f that we desire to estimate from S, and the auto-orrelation of an estimate � (which is an estimate of f) isefined as

R��y� = �x

��x���x + y�, �2�

here x�X and y�Y, X= �1,2, . . . ,N�� �1,2, . . . ,M�, and

Y =def

�y:y = x1 − x2,�x1,x2� � X2�. �3�

The algorithm attempts to minimize the objective func-ion J���=I�S �R�� subject to the constraints

C��� = �x

��x� = C�f�, � � 0,

here �C����2=�yR��y� (see Property 3.4 in Ref. 5, p.269). Note that C�f� can be obtained even if f is un-nown.The Schulz–Snyder algorithm for recovering a nonne-

ative function from its autocorrelation is given by the fol-owing iteration:

��k+1��x� = ��k��x�1

C����y��k��x + y�

S�y�

R�k�y�

. �4�

ote that if �0�x�=0 for some particular x, then �k�x�=0or that x for all k. This provides a convenient way of in-orporating support constraints when they are available.his algorithm possesses some other useful propertiesuch as monotonically decreasing I-divergence and con-ervation of total intensity of estimates, and its fixedoints are (global or local) minimizers of Eq. (1).5 Anotheroteworthy property is that the Schulz–Snyder algorithmperates completely in the spatial domain, instead of al-ernating between the Fourier and spatial domains as inienup’s algorithm.42

. Algorithm for Aliased Autocorrelationsn x-ray crystallography, we can measure only analiased” autocorrelation, unlike in astronomical imaging.his is because the measured Fourier magnitudes in-ray crystallography are undersampled because of theeriodicity of molecular structures.1,2 The aliased autocor-elations are called Patterson functions.43–45

Fortunately, if we replace the “unaliased” autocorrela-ion with Patterson functions, all the nice properties ofq. (4) remain, and hence the Schulz–Snyder algorithman also be applied to x-ray crystallography (and other ap-lications with periodic structures) with some tweaks.ince it is not difficult to prove that the algorithm’s prop-rties remain valid, we omit the proofs for conciseness.he arguments for the proofs are similar to those inchulz and Snyder.5

The functions of interest in x-ray crystallography arehree dimensional; this paper presents theory and simu-

Page 3: Phase retrieval from noisy data based on minimization of penalized I-divergence

lstacgiesat=a

wtitoPu

wo

wj

Eaesntfs

3AWtvsr

t

wt

at

ttt

A

1Gm

w(p

tcpn

w

dndGf

2Atsnn

36 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

ations in two dimensions for conciseness and ease of vi-ual presentation. All concepts are readily extended tohree dimensions. We use different notations for theliased and unaliased cases to avoid confusing the twoases. Consider a two-dimensional (2-D) periodic function�r�. Again, we consider a discretization of all functionsnvolved for computational purposes. Since g is periodic, rxtends from negative infinity to positive infinity. Forimplicity, however, we assume that r= �r1 ,r2� ranges oversingle period: 1�r1�d1 and 1�r2�d2, where r1 and r2

ake on real values, d1 and d2 are real constants, and d�d1 ,d2� represents the period of g. Equation (4) can bedapted to the periodic case as

��k+1��r� = ��k��r�1

C�g��u��k���r + u�mod d�

P�u�

P��k��u�, �5�

here P denotes the measured Patterson function, ob-ained directly from the diffraction measurements via annverse Fourier transform of the squared magnitudes ofhe diffraction data, P��k� denotes the Patterson functionf the kth estimate �k, u= �u ,v� denotes coordinates in theatterson space (which are also assumed to take on val-es over one period, and C��k� is given by

C��k� = �r

�k�r� = C�g� ∀ k, � � 0, �6�

here C��k�2=�uP�u�. The Patterson function of a peri-dic function g is defined by

P�u� = �r

g��r + u�mod d�g�r�, �7�

hich has the same period as g. Note that Eq. (5) still en-oys monotonically decreasing I-divergence.

Even though Eq. (5) preserves all the nice properties ofq. (4), there still may be some troublesome issues suchs nonunique solutions, where there may exist two differ-nt electron density maps that produce the same Patter-on function, and convergence to local minima that areot global minima, where the iterations may becomerapped in “wrong” answers. Equation (4) also suffersrom similar problems,12 but these problems may be moreerious in Eq. (5).

. CONSTRAINED PHASE RETRIEVALLGORITHMShen the data are noisy, the algorithms’ estimates tend

o be rough, as we illustrate in our experiments. To alle-iate this roughness, we incorporate additional con-traints via penalty methods, particularly Good’soughness27 and total variation (TV)30 penalties.

When a penalty is incorporated into the objective func-ions J, our goal becomes finding �0 or �0 such that

�0 = arg min��0

I�S�R�� + �����, �8�

�0 = arg min��0

I�P�P�� + �����, �9�

here S and P are the measured unaliased autocorrela-ion and the measured Patterson function, respectively, �

nd � are regularization parameters, and the �s are func-ions that depend on the penalty type.

For brevity, we describe and discuss our methods inerms of Eq. (8), which involves unaliased autocorrela-ions. Nonetheless, the methods can be easily applied tohe case of aliased autocorrelations as well.

. Penalties Toward Smoothness

. Good’s Roughness Penaltyood’s roughness was originally proposed for nonpara-etric probability density estimation.27

In the continuous spatial domain, it can be defined as46

�G��� = −�� ��x1,x2�� �2

�x12 +

�2

�x22 log ��x1,x2�dx1dx2,

�10�

here �x1 ,x2� represents a continuous spatial coordinatemaking a slight abuse of notation). Discretizing this ex-ression yields

�G��� = − �x1

�x2

��x1,x2��log ��x1 + 1,x2� + log ��x1 − 1,x2�

+ log ��x1,x2 + 1� + log ��x1,x2 − 1� − 4 log ��x1,x2��.

�11�

O’Sullivan provided an inspiring alternative interpre-ation of this discretized penalty.47 He noted that this dis-retization of Good’s roughness can be equivalently ex-ressed in terms of the I-divergences betweeneighboring pixels:

�O��� = I���SV�� + I���SV−1�� + I���SH�� + I���SH

−1��,

�12�

here

SV��x1,x2� = �„�x1 − 1�mod N,x2…,

SV−1��x1,x2� = �„�x1 + 1�mod N,x2…,

SH��x1,x2� = �„x1,�x2 − 1�mod M…,

SH−1��x1,x2� = �„x1,�x2 + 1�mod M…,

�x1,x2� � X. �13�

From Eq. (12), it can be inferred that if there are largeifferences between neighboring pixels, then the rough-ess penalty encourages smoothness by suppressing theifferences in the sense of the I-divergence, which makesood’s roughness particularly attractive in our overall

ramework.

. Total Variation Penaltylthough Good’s roughness penalty nicely smoothes con-

iguous regions in the estimates, it often undesirablymears out edges. TV penalties can often provide smooth-ess while preserving edges,30,48 since it only weakly pe-alizes large discontinuities.

Page 4: Phase retrieval from noisy data based on minimization of penalized I-divergence

wnCz

=

wt

BAR

N

a

w−(trmp

gtfsat−

Us

tDtodapa

gIssGm

CWfso(

ic

p

= �

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 37

TV is given, in its continuous form, by

�TV��� =�� ����x1,x2��2dx1dx2, �14�

here � · �2 denotes the Euclidean norm in R2 and �� de-otes the gradient of �.49,50 Following the suggestions byombettes and Luo,34,35 we employ the following dicreti-ation of the TV penalty:

TV���

�x1=1

N−1

�x2=1

M−1

����x1+1,x2� − ��x1,x2��2 + ���x1,x2 + 1� − ��x1,x2��2

+ �x1=1

N−1

���x1 + 1,M� − ��x1,M�� + �x2=1

M−1

���N,x2 + 1� − ��N,x2��,

�15�

here the second and third terms on the right-hand sideake into account boundary effects (Ref. 34, p. 1299).

. Relation between Expectation-Maximizationlgorithms and Minimum I-Divergence Algorithmsecall that we aim to find the �0 that attains

�0 = arg min��0

I�S � R�� + �����. �16�

ote the following relations:

rg min��0

I�S � R�� + �����

= arg min��0

�y�S�y�log

S�y�

R��y�− S�y� + R��y�� + �����

= arg min��0

�y

�S�y�log S�y� − S�y��

− �y

�S�y�log R��y� − R��y�� + �����

= arg min��0

�y

�S�y�log R��y� − R��y�� − �����, �17�

here the last equality is satisfied since �yS�y�log S�y�S�y� does not depend on �. Note that the last line in Eq.

17) corresponds with maximum-penalized-likelihood es-imation using a Poisson data model.17,23 These importantelations suggest that a sequence of � that can achieveaximum penalized likelihood can also achieve minimum

enalized I-divergence.Expectation-maximization (EM) algorithms are strate-

ic tools for producing a sequence of estimates ��k� thatry to maximize the penalized likelihood. We give only aew highlights here to give a flavor of the EM framework;ee the work by Green40 and the work by Dempster etl.51 for the complete description of this setting and nota-ion. The EM algorithm maximizes Q���new� ;��old�������new�� at each iteration, where

Q���new�;��old�� = ELcd���new���z,��old�. �18�

nder typical regularity conditions, this can be done byolving

D10Q���new�;��old�� − �D����new�� = 0. �19�

In the formulas above, D denotes the derivative opera-or with respect to the parameters involved [e.g.,

10Q���new� ;��old�� denotes the first-order partial deriva-ive of Q with respect to ��new�], and Q is the expectationf the log likelihood Lcd���new�� of hypothetical “completeata” given the current estimate of the parameter ��old�

nd the measured “incomplete data” z. The specific com-lete data formulation appropriate for the Schulz–Snyderlgorithm is given in Ref. 4.Exactly the same sequence ��k� produced by the EM al-

orithms can be used to minimize the penalized-divergence, provided that the EM algorithms are de-igned by assuming the Poisson data model in Ref. 4. Sub-ection 3.D exploits this theoretical connection to adaptreen’s OSL methods to the penalized I-divergence opti-ization problem given in Eq. (8).

. Optimization Challenge: Couplinghen �=0, Eq. (19) has a closed-form solution; when �0, for most penalties, Eq. (19) cannot be solved in closed

orm. In the case of our spatial penalties, Eq. (19) repre-ents a coupled set of nonlinear equations. The derivativef O’Sullivan’s version of Good’s roughness penalty in Eq.11) is given by

��G���

���x�= 4�1 + log ��x1,x2��

− �log ��x1 − 1,x2� +��x1 + 1,x2�

��x1,x2� �− �log ��x1 + 1,x2� +

��x1 − 1,x2�

��x1,x2� �− �log ��x1,x2 − 1� +

��x1,x2 + 1�

��x1,x2� �− �log ��x1,x2 + 1� +

��x1,x2 − 1�

��x1,x2� � . �20�

Note that the derivative of Good’s roughness penaltynvolves all the neighboring pixels of ��x1 ,x2�. Hence alosed-form solution of Eq. (19) is intractable.

A similar situation occurs when the TV penalty is ap-lied. Consider the derivative of the TV penalty:

��TV���

���x�

�A���, 1 � x1 � N − 1, 1 � x2 � M − 1

B���, x1 = N, x2 � M

C���, x1 � N, x2 = M

2, ���N,M� − ��N − 1,M�����N,M� − ��N,M − 1�� � 0

0, ���N,M� − ��N − 1,M�����N,M� − ��N,M − 1�� � 0

�21�

Page 5: Phase retrieval from noisy data based on minimization of penalized I-divergence

A

=

B

=

C

=

Ta

clglsgapgo

DGgttetstas

Nipa

tdp

HiO

gcdrIi

ETbrqroOpi

Ff

Was

4A

1Tacngotsnsplp

pttw3

38 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

where

���

��x1,x2� − ��x1 − 1,x2�

����x1,x2�−��x1 − 1,x2��2 + ���x1 − 1,x2 + 1� − ��x1 − 1,x2��2

+2��x1,x2� − ��x1 + 1,x2� − ��x1,x2 + 1�

����x1 + 1,x2� − ��x1,x2��2 + ���x1,x2 + 1� − ��x1,x2��2

+��x1,x2� − ��x1,x2 − 1�

����x1+1,x2 − 1� − ��x1,x2 − 1��2+���x1,x2�−��x1,x2−1��2,

���

��N,x2� − ��N − 1,x2�

����N,x2�−��N − 1,x2��2+ ���N − 1,x2 + 1� − ��N − 1,x2��2,

���

��x1,M� − ��x1,M − 1�

����x1+1,M − 1� − ��x1,M − 1��2+���x1,M� − ��x1,M − 1��2.

�22�

he coupling problem becomes even more complicated,nd no closed-form solution of Eq. (19) is available.Various methods such as gradient-based methods29,46,52

an be used to maximize the penalized complete-data logikelihood, hence solving Eq. (19). O’Sullivan suggested aeneralized EM algorithm for solving the coupling prob-ems caused by Good’s-roughness-type neighborhoodtructures based on coloring ideas.47,53,54 Green’s OSL al-orithm is another method, which is straightforward topply and implement. The connection between minimumenalized I-divergence algorithms and penalized EM al-orithms justifies application of the OSL algorithms tour framework.

. Green’s One-Step-Late Algorithmsreen’s OSL algorithms were originally tweaks of EM al-orithms designed for maximum-penalized-likelihood es-imation. Green40 noted that the relevant objective func-ion in an EM algorithm may be linearized at the currentstimate as in gradient methods, and the derivatives ofhe penalty term at the two consecutive iterations bearmall differences if the algorithm converges slowly, as ishe case with EM algorithms51 and other multiplicativelgorithms.55 In an EM formulation, these observationsuggest finding a parameter � that satisfies

D10Q���new����old�� − �D����old�� = 0. �23�

otice that the only difference between Eqs. (19) and (23)s that ��old� is used in Eq. (23) instead of ��new�. An ap-ealing property of the OSL algorithm is that Eqs. (19)nd (23) have the same fixed points.Green’s OSL algorithms have empirically shown mono-

onically increasing penalized likelihood of incompleteata and show faster convergence rate per iteration com-ared with the associated unconstrained EM algorithms.

owever, we emphasize that the faster convergence speeds due to the penalty, rather than to the attributes of theSL algorithm.Since the sequence of estimates generated by OSL al-

orithms can attain maximum penalized likelihood, theyan also achieve minimum penalized I-divergence as weiscussed in Subsection 3.B. Thus the next subsection de-ives the algorithms for minimizing the penalized-divergence objective function given in Eq. (8) by exploit-ng the OSL idea.

. Constrained Phase Retrieval Algorithmshe unconstrained minimum I-divergence algorithms cane interpreted as deterministic versions56 of the EM algo-ithm associated with the Poisson data model. Conse-uently, the unconstrained minimum I-divergence algo-ithms inherit the corresponding slow convergence ratesf EM algorithms. This encourages us to adapt Green’sSL algorithm to our minimum I-divergence methods toerform the constrained I-divergence minimization givenn Eq. (8).

Application of Green’s OSL idea yields the algorithm

�k+1��x� =��k��x�

2C��� + ��D������=��k�2�

y��x + y�

S�y�

R��k��y�

. �24�

ollowing a similar idea, the algorithm for a Pattersonunction can be obtained:

��k+1��r� =��k��r�

2C��� + ��D������=��k�

�2�u

�„�r + u�mod d…P�u�

P��k��u�

. �25�

e omit the details for brevity; they are a straightforwarddaptation of the ideas in Ref. 37 to the discussion in Sub-ection 3.B.

. NUMERICAL EXPERIMENTS. Experimental Settings

. Initial Estimateshe Schulz–Snyder phase retrieval algorithm in Eq. (4)nd our modification in Eq. (5) of the algorithm for x-rayrystallography are both subject to a serious challenge,amely, convergence to local minima.12 In addition, it isenerally difficult to know whether an estimate is a localr a global minimum. Hence, in our experiments, we ini-ialize the algorithms with a known truth added to amall constant . This allows us to focus on the effects ofoise and regularization without becoming confused by is-ues involving local minima (which haunt all practicalhase retrieval algorithms). Finding methods for avoidingocal minima is very challenging; it is an active researchroblem.Note that we do not assume that the exact image sup-

ort is known. For example, suppose that a circle con-ained in a 32�32 rectangle is the true image and thathe region other than the circle in the rectangle is filledith zeros. Then we may initialize the algorithms with a2�32 constant rectangle plus the true image, which is

Page 6: Phase retrieval from noisy data based on minimization of penalized I-divergence

attta

sottuootcctsmwmpile

lcgt

atnla

2Wanmmp

ptrattr

ftScPcnsua

bEte

Tstreff

Flu

Face

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 39

ssumed to be known for purposes of our study. By doinghis, we can isolate the effects of noise propagationhrough the algorithms from the problem of convergenceo local minima, since we start from a place that is prob-bly near a global minimum.The size of the initial estimate should be carefully cho-

en. Note that the summation term on the right-hand sidef Eq. (4) contains not only the autocorrelations but alsohe cross correlation of an estimate with an autocorrela-ion. When we compute the autocorrelation of an imagesing fast-Fourier-transform-based convolutions, the sizef the resulting image should be set to at least twice thatf the original image to avoid having some part of the au-ocorrelation “wraparound” undesirably. The same logican be used for the cross correlation. Since the estimate isross correlated with the autocorrelation (whose size iswice that of the estimate) of itself, the resulting imageize should be set to at least three times that of the esti-ate. Therefore we should begin with an initial imagehose size is 3N�3N, where 2N�2N is the size of theeasured autocorrelation. Note that there is a lot of zero

adding in the initial estimate. The nonzero part of thenitial estimate that we explained how to construct in theast paragraph is placed at the center of the whole initialstimate.

When Patterson functions are involved, both autocorre-ations and cross correlations are computed by using cir-ular convolutions; therefore initial estimates for the al-orithm in Eq. (5) have the same size as that of the knownruth.

Figure 1 shows an example of initial estimates for bothliased and unaliased autocorrelations. Again note thathe exact true support is not known for both cases. Alsoote that zeros are padded to avoid the wraparound prob-

em of autocorrelations or cross correlations in the “un-liased” case.

. Noisy Data Realizationhen measurements are recorded as photon counts, as in

stronomical imaging or x-ray crystallography, relevantoise may be modeled by Poisson statistics. However, weay observe diverse noise artifacts since noise corruptseasurements in different ways dependent upon the ap-

lication.

ig. 1. Example of initial estimates: The initial estimate on theeft is used for the algorithm in Eq. (5), and that on the right issed for the algorithm in Eq. (4).

Poisson noise on autocorrelations. Figure 2 shows therocedure through which we generate noisy autocorrela-ions. A series of 2�2 arrays connected by solid arrowsepresents the procedure for generating an aliased, noisyutocorrelation; the combination of two 2�2 arrays andhree 3�3 arrays connected by dotted arrows representshe procedure for generating an unaliased, noisy autocor-elation.

From probability theory, the signal-to-noise ratio (SNR)or a Poisson random variable is related to the mean � ofhe random variable provided that the mean is large:NR���.57 Therefore we may control noise levels byhanging the image intensities, which act as means foroisson random variables. Since we are interested in thehange of noise artifacts with respect to the change ofoise levels, we first change the image intensity level bycaling the known truth f by a constant c: fc=cf. (Here wese a generic f instead of � or �, since the exposition maypply to both cases.)Noiseless autocorrelations Pfc

and Rfcare produced for

oth aliased and unaliased cases by using fc according toqs. (2) and (7). Note that the sizes for two autocorrela-

ions are different. Then, noisy autocorrelations are gen-rated by

Pfcnoisy�u� � Poisson„Pfc

�u�…,

Pfcnoisy�y� � Poisson„Rfc

�y�…. �26�

hat is, each pixel of Pfcand Rfc

is the mean of the corre-ponding pixels of Pfc

noisy and Rfcnoisy, respectively. Note that

he algorithms assume that autocorrelations are symmet-ic. However, since each pixel is associated with a differ-nt realization, Pfc

noisy and Rfcnoisy are not symmetric. There-

ore we enforce symmetries for the autocorrelations asollows:

Sym�Pfcnoisy��u� = Sym�Pfc

noisy��− u� =Pfc

noisy�u� + Pfcnoisy�− u�

2,

ig. 2. Procedure for realizing noisy autocorrelations: An un-liased noisy autocorrelation is generated by the procedure indi-ated by the solid arrows; an aliased noisy autocorrelation is gen-rated by the procedure indicated by the dotted arrows.

Page 7: Phase retrieval from noisy data based on minimization of penalized I-divergence

Dlpibas

saaFnT

wA

wtt

3Fn

wstNbt

tHs

wm

TLa

B

1UlatFfcAabi

Frai

Fttta

40 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

Sym�Pfcnoisy��y� = Sym�Pfc

noisy��− y� =Pfc

noisy�y� + Pfcnoisy�− y�

2,

�27�

ata in astronomical imaging through extreme turbu-ence may be generated as Rfc

noisy. We do not know of anyhysical mechanism that generates data as in Pfc

noisy; wenclude it to facilitate an “apples-to-apples” comparisonetween the unaliased and aliased cases. Data in theliased case are typically physically generated as de-cribed next.

Poisson noise on squared Fourier magnitudes. Figure 3hows another procedure for generating an aliased, noisyutocorrelation. However, in this case, Poisson noise isdded to squared Fourier magnitudes. Let Ifc

denote theourier magnitudes squared: Ifc

= �F�fc��2, where F�·� de-otes the Fourier transform operator and fc=cf as before.hen, noisy Fourier magnitudes are produced by

Ifcnoisy� � � Poisson�Ifc

� ��, �28�

here represents 2-D frequency-domain coordinates.n aliased, noisy autocorrelation is then obtained by

Pfcnoisy = F−1�Ifc

�, �29�

here F−1�·� denotes the inverse Fourier transform opera-or. Note that the autocorrelation Pfc

noisy in this case obeyshe correct symmetry Pfc

noisy�u�=Pfcnoisy�−u�.

. Error Metricsor qualification of the deterioration of estimates byoise, we study changes via various error metrics:

L1�f,�� = �x

�f�x� − ��x��,

L2�f,�� = �x

�f�x� − ��x��2,

L��f,�� = maxx

�f�x� − ��x��,

I�f � �� = �x�f�x�log

f�x�

��x�− f�x� + ��x�� , �30�

here f denotes the truth and � denotes an estimate. Theame error metrics are used for g and �, where g denoteshe truth and � denotes an estimate in the Patterson case.ote that here we use the I-divergence as a discrepancyetween the truth and an estimate rather than their au-ocorrelations.

ig. 3. Alternative procedure for realizing noisy aliased autocor-elations, where Poisson noise is added to Fourier magnitudes,nd the noisy, aliased autocorrelation is obtained by taking thenverse Fourier transform to the noisy magnitudes.

When the truth is scaled to control the noise level inhe autocorrelations, the error metric is also scaled.ence we use tweaks of the error metrics for fair compari-

ons. Observe that

L1�fc,�c� = �x

�cf�x� − c��x��,

L2�fc,�c� = �x

�cf�x� − c��x��2,

L��fc,�c� = maxx

�cf�x� − c��x��,

I�fc � �c� = �x�cf�x�log

cf�x�

c��x�− cf�x� + c��x�� , �31�

here fc denotes a scaled truth and �c denotes an esti-ate of fc. Therefore we can argue that

L1�fc,�c� = cL1�f,��, L2�fc,�c� = c2L2�f,��,

L��fc,�c� = cL��f,��, I�f � �c� = cI�f � ��. �32�

hus, when we compare L1�f ,�� and L1�fc ,�c�, we divide1�fc ,�c� by c. For the other metrics, the same reasoningnd method are used.

. Unconstrained Estimates

. Poisson Noise on Autocorrelationsnconstrained reconstructions from unaliased autocorre-

ations. Figure 4 shows the truth (a hand image) and itsliased and unaliased autocorrelations. The colormaps forhe autocorrelations are modified to best show details.igure 5 shows selected estimates produced by Eq. (4)

rom unaliased autocorrelations for various c values. Re-all that the SNR becomes lower as c becomes smaller.mong the estimates in Fig. 5, the estimate for c=0.01 isssociated with the lowest SNR. Observe that estimatesecome rougher as the SNR becomes lower. For c=0.01, its difficult to recognize the hand in the estimate.

ig. 4. (a) Truth image, (b) unaliased autocorrelation of theruth in (a), (c) aliased autocorrelation (or Patterson function) ofhe truth in (a). The color maps of autocorrelations are modifiedo best show details; the color maps are given on the right of theutocorrelation images.

Page 8: Phase retrieval from noisy data based on minimization of penalized I-divergence

tite

stvi

Ft(c

Fiwa

F5w(a

Ftc=

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 41

Different noise realizations would result in different es-imates. To obtain the “average” behavior of the algorithmn noise, we perform ten Monte Carlo experiments. Evenhough the estimates in Fig. 5 are selected from the tenxperiments, the other estimates for the same c have

ig. 5. Selected unconstrained estimates at the 50,000th itera-ion produced by Eq. (4) from unaliased autocorrelations whena) c=0.26, (b) c=0.21, (c) c=0.16, (d) c=0.11, (e) c=0.06, and (f)=0.01.

ig. 6. Mean images of unconstrained estimates at the 50,000thteration of ten Monte Carlo experiments performed with Eq. (4)hen (a) c=0.26, (b) c=0.21, (c) c=0.16, (d) c=0.11, (e) c=0.06,nd (f) c=0.01. The measured autocorrelations are not aliased.

imilar quality. Figure 6 shows the mean images of the es-imates from the ten Monte Carlo experiments for the calues in Fig. 5. Figure 7 shows the pixelwise variancemages of the estimates from the ten Monte Carlo experi-

ig. 7. Variance images of unconstrained estimates at the0,000th iteration of ten Monte Carlo experiments performedith Eq. (4) when (a) c=0.26, (b) c=0.21, (c) c=0.16, (d) c=0.11,

e) c=0.06, and (f) c=0.01. The measured autocorrelations are notliased.

ig. 8. Selected unconstrained estimates at the 50,000th itera-ion produced by Eq. (5) from aliased autocorrelations when (a)=0.26, (b) c=0.21, (c) c=0.16, (d) c=0.11, (e) c=0.06, and (f) c0.01.

Page 9: Phase retrieval from noisy data based on minimization of penalized I-divergence

mblb

lEiruFmgFCTvcbi

ntasmsM

Fiwand (f) c=0.01. The measured autocorrelations are aliased.

42 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

ents. Overall, variances of estimates are higher on theackground than on the hand, but as the SNR becomesower, the variances on the background and on the handecome similar.Unconstrained reconstructions from aliased autocorre-

ations. Figure 8 shows selected estimates produced byq. (5) from aliased autocorrelations for the same c values

n Fig. 5. Noticeably, the estimates from aliased autocor-elations suffer less from Poisson noise than those fromnaliased autocorrelations. Compare the estimates inigs. 5 and 8 and observe that the hand in Fig. 8(f) isore distinguishable than that in Fig. 5(f). Also, the back-

round in Fig. 8(e) is less rough than the background inig. 5(e). Figure 9 shows the mean image of ten Montearlo runs associated with Fig. 8 for the c values in Fig. 8.he variance images show behavior similar to that of theariances for the case of unaliased autocorrelations, ex-ept that the difference between the variances on theackground and the hand is more dramatic. We omit themage for brevity.

Error metric comparison. Because of the randomness ofoise, it is not so obvious that estimates from aliased au-ocorrelations suffer less from noise than those from un-liased autocorrelations. This may be seen via compari-on of error metrics. Figure 10 shows various erroretrics that we discussed in Subsection 4.C. Each subplot

hows the values of an error metric as c changes. Sinceonte Carlo runs are involved, the error metric’s behav-

ig. 9. Mean images of unconstrained estimates at the 50,000thteration of ten Monte Carlo experiments performed with Eq. (5)hen (a) c=0.26, (b) c=0.21, (c) c=0.16, (d) c=0.11, (e) c=0.06,

Fig. 10. Various error metrics when the autocorrelations are subject to Poisson noise: (a) L , (b) L , (c) L , and (d) I-divergence.

1 2 1
Page 10: Phase retrieval from noisy data based on minimization of penalized I-divergence

Ftcc

F5w=

FTm

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 43

ig. 11. Selected unconstrained estimates at the 50,000th itera-ion produced by Eq. (5) from aliased autocorrelations when (a)=0.001535, (b) c=0.0012875, (c) c=0.00104, (d) c=0.0007925, (e)

=0.000545, and (f) c=0.0002975. =

a

ig. 12. Mean images of unconstrained estimates at the0,000th iteration of ten Monte Carlo experiments performedith Eq. (5) when (a) c=0.001535, (b) c=0.0012875, (c) c0.00104, (d) c=0.0007925, (e) c=0.000545, and (f) c0.0002975. Poisson noise is placed on Fourier magnitudes that

re undersampled, resulting in noisy, aliased autocorrelations.

ig. 13. Various error metrics when Poisson noise is placed on squared Fourier magnitudes: (a) L1, (b) L2, (c) L1, and (d) I-divergence.he occasional jumpiness of the curve [as near the right side of (b)] is due to the limited number of Monte Carlo runs. We did not performore runs, since the overall trends are already quite clear.

Page 11: Phase retrieval from noisy data based on minimization of penalized I-divergence

itdttdt

eNb

2Fctbavm“tl

sol

4snta8

eaw

cnta

rnFafdpqvrl

C

1CFGSsal

Felama

Fr=a

44 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

or is represented by two lines: The upper line representshe mean of the ten Monte Carlo runs plus the standardeviation of the ten runs, and the lower line representshe mean minus the standard deviation. Now, it is clearhat the estimates from aliased autocorrelations are lessegraded by noise than the estimates from unaliased au-ocorrelations in the sense of the four error metrics.

It is important to note that noise “suddenly” destroysstimate quality, as opposed to having a “gradual” impact.ote that the error-metric values suddenly shoot up for cetween 0.21 and 0.01.

. Poisson Noise on Squared Fourier Magnitudesigure 11 shows selected estimates produced by the un-onstrained algorithm in Eq. (5) from aliased autocorrela-ions. Here, Poisson noise is generated with means giveny the squared Fourier magnitudes of the truth. Theliased autocorrelations are obtained by taking the in-erse Fourier transform of the noisy squared Fourieragnitudes. The Fourier magnitudes are assumed to be

undersampled,”1 which results in aliased autocorrela-ions. Because the values of Fourier magnitudes arearge, the c values for the estimates in Fig. 11 are much

ig. 14. Interesting unconstrained estimates at the 50,000th it-ration produced by Eq. (5) from aliased autocorrelations withow SNRs when (a) c=0.000035, (b) c=0.00004, (c) c=0.000045,nd (d) c=0.00005. Poisson noise is placed on squared Fourieragnitudes. The autocorrelations of the estimates in (a), (b), (c),

nd (d) are shown in (e), (f), (g), and (h), respectively.

maller than those used in Figs. 5 and 8. We can clearlybserve roughness in the estimates, especially when c isow, which corresponds to a low SNR.

Comparing c values in this subsection and Subsection.B.1 reveals important information: For obtaining aimilar SNR level, much lower photon counts would beecessary when noise is placed on the Fourier magni-udes squared compared with when noise is placed on theutocorrelations directly [compare the estimates in Figs.(f) and 11(f)].Figure 12 shows the mean images of ten Monte Carlo

xperiments associated with Fig. 11. The variance imageslso show behavior similar to that described previously, soe omit the images.The error metrics are illustrated in Fig. 13. As in the

ases when noise is added to autocorrelations directly,oise suddenly and rapidly degrades estimate quality inhe sense of the four error metrics once the noise reachescritical level.Unconstrained reconstructions from highly noisy Fou-

ier data. When SNRs become severely low, interestingoise artifacts that look like sinusoidal patterns occur.igure 14 shows some selected estimates from low SNRutocorrelations. Since the noise corrupts the Fourier in-ormation, noise dominating certain measurements mayestroy some frequency components. Depending upon thearticular noise realization, there may be several fre-uency components destroyed by noise; this results inarious artifacts in the autocorrelations as in the thirdow of Fig. 14. Note that the noise artifacts in Fig. 14 lookike several types of sinusoidal patterns.

. Constrained Estimates

. Poisson Noise on Autocorrelationsonstrained estimates from unaliased autocorrelations.igure 15 shows estimates produced by Eq. (24) whenood’s roughness is incorporated for a relatively highNR �c=0.06�. When �=0.5, the roughness on the handeen when the estimate is unconstrained is much allevi-ted, while the background in the estimate still remains aittle rough. Higher � values provide more smoothness to

ig. 15. Estimates produced by Eq. (24) incorporating Good’soughness penalty given unaliased autocorrelations when c0.06 (high SNR) and (a) unconstrained, (b) �=0.1, (c) �=0.5,nd (d) �=1.0.

Page 12: Phase retrieval from noisy data based on minimization of penalized I-divergence

tcs

tndeesab

ta

eFwtfTtic

Fr=(

FpS�

Fpa=

Fr(�

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 45

he background, but the hand becomes blurred as � be-omes higher. When �=5.0, the border of the hand is toomeared out.

When the SNR is low �c=0.01�, noise leads to estimateshat are too messy to be distinguishable; Good’s rough-ess cannot help much. Figure 16 shows estimates pro-uced by Eq. (24), incorporating Good’s roughness. Thestimate produced for �=0.5 starts to show a smoothnough hand, which is more recognizable than the uncon-trained estimate in Fig. 5(f). The estimate with �=1.0chieves nice smoothness on both the hand region and theackground. Higher � values overly smooth the estimate;

ig. 16. Estimates produced by Eq. (24) incorporating Good’soughness penalty given unaliased autocorrelations when c0.01 (low SNR) and (a) unconstrained, (b) �=0.1, (c) �=0.5, and

d) �=1.0.

ig. 17. Estimates produced by Eq. (24) incorporating the TVenalty given unaliased autocorrelations when c=0.06 (highNR) and (a) unconstrained, (b) �=0.1, (c) �=0.5, (d) �=1.0, (e)=2.0, and (f) �=5.0.

he estimate produced with �=5.0 is not even recogniz-ble as a hand.Noticeably, the TV penalty induces considerably differ-

nt smooth textures compared with Good’s roughness.igures 17 and 18 show estimates produced by Eq. (24)hen the TV penalty is applied. Observe how flat the es-

imates are in both the hand and the background regionsor � higher than 0.5. Another important property of theV penalty is that it preserves edges of estimates. Note

hat the edges are quite clear. However, if the noise levels high �c=0.01�, then the TV penalty cannot seem to lo-ate the correct edges for � higher than 2.0. As with

ig. 18. Estimates produced by Eq. (24) incorporating the TVenalty given unaliased autocorrelations when c=0.01 (low SNR)nd (a) unconstrained, (b) �=0.1, (c) �=0.5, (d) �=1.0, (e) �2.0, and (f) �=5.0.

ig. 19. Estimates produced by Eq. (25) incorporating Good’soughness penalty given aliased autocorrelations when c=0.06high SNR) and (a) unconstrained, (b) �=0.1, (c) �=0.5, and (d)=1.0.

Page 13: Phase retrieval from noisy data based on minimization of penalized I-divergence

G�e

FwscaeEN

taascsmlahs

Tingfssgiie

2Fpcmsn

Fr(�

Fpa=

Fpa=

46 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

ood’s roughness, the estimates with the TV penalty for=0.5 and 1.0 are much improved over the unconstrainedstimates and become recognizable.

Constrained estimates from aliased autocorrelations.igures 19 and 20 show estimates produced by Eq. (25)ith Good’s roughness penalty for c=0.06 and 0.01, re-

pectively, when Poisson noise is placed on aliased auto-orrelations. Similarly to the estimates from the un-liased autocorrelations, the penalty leads to smoothstimates for some �. On the other hand, the algorithm inq. (25) is more sensitive to the operation of the penalty.ote that the estimate in Fig. 19(d) is a lot more blurred

ig. 20. Estimates produced by Eq. (25) incorporating Good’soughness penalty given aliased autocorrelations when c=0.01low SNR) and (a) unconstrained, (b) �=0.1, (c) �=0.5, and (d)=1.0.

ig. 21. Estimates produced by Eq. (25) incorporating the TVenalty given aliased autocorrelations when c=0.06 (high SNR)nd (a) unconstrained, (b) �=0.1, (c) �=0.5, (d) �=1.0, (e) �2.0, and (f) �=5.0.

han the estimate in Fig. 15(d), although the noise levelsre similar, and the same regularization parameters arepplied. Since the estimates produced by Eq. (25) are soensitively driven by the penalty effects, it could be diffi-ult to find an appropriate � that can provide enoughmoothing without smearing out the features of the esti-ate. Observe the estimates in Fig. 20. Even though � is

ow (see the �=0.5 case), the hand becomes quite blurrednd starts to lose much of the information about theand’s shape. Obviously, a value of � higher than 0.5mashes most of the features in the estimates.

Figures 21 and 22 show estimates produced with theV penalty from aliased autocorrelations. When the SNR

s high, the TV penalty suppresses the background rough-ess due to noise better than Good’s roughness; the back-round becomes almost entirely smooth, and most of theeatures are well reconstructed [see Fig. 21(c)]. A similarensitivity of the algorithm to Good’s roughness is ob-erved with the TV penalty. When the SNR is low, the al-orithm has difficulty in suppressing noise while preserv-ng the features of estimates such as edges, since a slightncrease in the regularization parameter could turn thestimates into unrecognizable blobs, as seen in Fig. 22.

. Poisson Noise on Squared Fourier Magnitudesigures 23–26 show estimates produced by Eq. (25) incor-orating Good’s roughness and the TV penalties in thease where Poisson noise is placed on the squared Fourieragnitudes. The estimates show behavior that is quite

imilar to the case of aliased autocorrelations in whichoise is manifest directly in the spatial domain.

ig. 22. Estimates produced by Eq. (25) incorporating the TVenalty given aliased autocorrelations when c=0.01 (low SNR)nd (a) unconstrained, (b) �=0.1, (c) �=0.5, (d) �=1.0, (e) �2.0, and (f) �=5.0.

Page 14: Phase retrieval from noisy data based on minimization of penalized I-divergence

p

cwp

Frna�

Frna�

Fps(=

Fps(=

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 47

When the SNR is “destructively” low as in Fig. 14, theenalty is helpless no matter what type is used. The un-

ig. 23. Estimates produced by Eq. (25) incorporating Good’soughness penalty given aliased autocorrelations formed fromoisy squared Fourier magnitudes when c=0.000545 (high SNR)nd (a) unconstrained, (b) �=0.001, (c) �=0.005, (d) �=0.01, (e)=0.02, and (f) �=0.05.

ig. 24. Estimates produced by Eq. (25) incorporating Good’soughness penalty given aliased autocorrelations formed fromoisy squared Fourier magnitudes when c=0.0002975 (low SNR)nd (a) unconstrained, (b) �=0.001, (c) �=0.005, (d) �=0.01, (e)=0.02, and (f) �=0.05.

onstrained estimates in Fig. 14 preserve only the edgesith large values. A slight amount of smoothing from aenalty entirely blurs out all information in the esti-

ig. 25. Estimates produced by Eq. (25) incorporating the TVenalty given aliased autocorrelations formed from noisyquared Fourier magnitudes when c=0.000545 (high SNR) anda) unconstrained, (b) �=0.001, (c) �=0.005, (d) �=0.01, (e) �0.02, and (f) �=0.05.

ig. 26. Estimates produced by Eq. (25) incorporating the TVenalty given aliased autocorrelations formed from noisyquared Fourier magnitudes when c=0.0002975 (low SNR) anda) unconstrained, (b) �=0.001, (c) �=0.005, (d) �=0.01, (e) �0.02, and (f) �=0.05.

Page 15: Phase retrieval from noisy data based on minimization of penalized I-divergence

mst

5WmmaAmm

filn

mccioiP

tatr

sBeqpTa

mcmm

alu

cacaomsl

pcmts

miis

ATSGP

k

R

1

1

1

1

1

1

1

1

1

1

48 J. Opt. Soc. Am. A/Vol. 24, No. 1 /January 2007 K. Choi and A. D. Lanterman

ates, and no useful features are observable in the con-trained estimates. For brevity, we omit the results ofhese experiments.

. CONCLUSIONSe studied the effect of noise on phase retrieval via mini-ization of Csiszár’s I-divergence in three scenarios ofeasurements corrupted by Poisson noise. One common

rtifact in all scenarios is the roughness of the estimates.dditionally, when noise is placed on the squared Fourieragnitudes and the SNR is “destructively” low, noise isanifest as sinusoidal patterns.Estimate degradation from Poisson noise was quanti-

ed via various error metrics. We observed thresholds, be-ow which noise had only trivial effects, and over whichoise “suddenly” made the estimates extremely poor.We employed the Schulz–Snyder algorithm5 to mini-ize the I-divergence, which was originally inspired by a

ertain EM algorithm.4 To suppress noise artifacts, we in-orporated certain types of constraints via penalties. Ourmplementation adapted Green’s OSL algorithms, basedn the theoretical fact that the Schulz–Snyder algorithms equivalent to an EM algorithm assuming a particularoisson data model.In this paper, we tweaked the Schulz–Snyder algorithm

o make the algorithm usable for aliased autocorrelationss in x-ray crystallography. We also incorporated penal-ies in our tweaked version of the Schulz–Snyder algo-ithm.

Good’s roughness and TV penalties were chosen for re-training the observed noise artifacts, such as roughness.oth penalties provided nice smoothing properties. How-ver, the textures resulting from the two penalities wereuite different, especially in the background. Another im-ortant difference between the two penalties is that theV may preserve edges, while Good’s roughness encour-ges smoothing of edges as well as other regions.Interestingly, it turned out that the penalties haveore sensitive effects on the estimates from aliased auto-

orrelations than from unaliased autocorrelations, noatter whether noise is placed on the squared Fourieragnitudes or the autocorrelations directly.When the SNR is “destructively” low, none of the pen-

lties could improve the estimates; the penalties alwaysed to entirely blurred estimates that do not show anyseful information.Throughout this paper, regularization parameters were

hosen experimentally. Methods such as bootstrappingnd cross validation58–60 may be helpful for automaticallyhoosing the regularization parameter and are a worthyvenue for future exploration. However, our experience inther imaging applications suggests that, in practice, itay often be useful to present the image analyst with

everal estimates created with differing degrees of regu-arization.

Even though incorporating penalties helpfully im-roved estimate quality, there still remains a serioushallenge, namely, convergence of the algorithms to localinima. Avoiding such minima is a vital avenue for fu-

ure work. Also, we started the algorithm from the “truth”o that we could study the effect of noise on the global

inima independent of local minima issues. It would benteresting to study whether noise increases the probabil-ty of converging to an unpleasant local minimum whentarting from a more generic initial estimate.

CKNOWLEDGMENTShis work was supported by startup funds from thechool of Electrical and Computer Engineering at theeorgia Institute of Technology and by the Demetrius T.aris Junior Professorship.

The authors may be reached by e-mail [email protected] and [email protected].

EFERENCES1. R. P. Millane, “Phase problems for periodic images: effects

of support and symmetry,” J. Opt. Soc. Am. A 10,1037–1045 (1993).

2. M. M. Woolfson, An Introduction to X-Ray Crystallography(Cambridge U. Press, 1997).

3. H. Schenk, Direct Methods of Solving Crystal Structures(Plenum, 1991).

4. T. J. Schulz and D. L. Snyder, “Imaging a randomly movingobject from quantum-limited data: applications to imagerecovery from second- and third-order autocorrelations,” J.Opt. Soc. Am. A 8, 801–807 (1991).

5. T. J. Schulz and D. L. Snyder, “Image recovery fromcorrelations,” J. Opt. Soc. Am. A 9, 1266–1272(1992).

6. R. P. Millane, “Phase retrieval in crystallography andoptics,” J. Opt. Soc. Am. A 7, 394–411 (1990).

7. M. H. Hayes, “The reconstruction of a multidimensionalsequence from the phase or magnitude of its Fouriertransform,” IEEE Trans. Acoust., Speech, Signal Process.ASSP-30, 140–154 (1982).

8. J. Miao, “Phase retrieval from the magnitude of the Fouriertransforms of nonperiodic objects,” J. Opt. Soc. Am. A 15,1662–1669 (1998).

9. M. H. Pérez-Ilzarbe, “Phase retrieval from the powerspectrum of a periodic object,” J. Opt. Soc. Am. A 9,2138–2148 (1992).

0. G. Leone, R. Pierri, and F. Soldovieri, “Reconstruction ofcomplex signals from intensities of Fourier-transformpairs,” J. Opt. Soc. Am. A 13, 1546–1556 (1996).

1. R. Pierri, F. Soldovieri, and R. Pierri, “Convergenceproperties of a quadratic approach to the inverse-scatteringproblem,” J. Opt. Soc. Am. A 19, 2424–2428 (2002).

2. K. Choi, A. D. Lanterman, and R. Raich, “Convergence ofthe Schulz–Snyder phase retrieval algorithm to localminima,” J. Opt. Soc. Am. A 23, 1835–1845 (2006).

3. C. L. Byrne, “Iterative image reconstruction algorithmsbased on cross-entropy minimization,” IEEE Trans. ImageProcess. 2, 96–103 (1993).

4. C. L. Byrne, “Iterative algorithms for deblurring anddeconvolution with constraints,” Inverse Probl. 14,1455–1467 (1998).

5. C. L. Byrne, “A unified treatment of some iterativealgorithms in signal processing and image reconstruction,”Inverse Probl. 20, 103–120 (2004).

6. R. F. MacKinnon, “Minimum cross-entropy noise reductionin images,” J. Opt. Soc. Am. A 6, 739–747 (1989).

7. D. L. Snyder, M. I. Miller, L. J. Thomas, and D. G. Politte,“Noise and edge artifacts in maximum-likelihoodreconstructions for emission tomography,” IEEE Trans.Med. Imaging MI-6, 228–238 (1987).

8. D. G. Luenberger, Optimization by Vector Space Methods(Wiley, 1968).

9. I. Csiszár, “Why least squares and maximum entropy?—anaxiomatic approach to inverse problems,” Ann. Stat. 19,2033–2066 (1991).

Page 16: Phase retrieval from noisy data based on minimization of penalized I-divergence

2

2

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

4

4

4

5

5

5

5

5

5

5

5

5

5

6

K. Choi and A. D. Lanterman Vol. 24, No. 1 /January 2007 /J. Opt. Soc. Am. A 49

0. S. Kullback, Information Theory and Statistics (Wiley,1959).

1. S. Kullback and R. A. Leibler, “On information andsufficiency,” Ann. Math. Stat. 22, 79–86 (1971).

2. Y. Vardi and D. Lee, “From image deblurring to optimalinvestments: maximum likelihood solutions for positivelinear inverse problems,” J. R. Stat. Soc. Ser. B (Stat.Methodol.) 55, 569–612 (1993).

3. D. L. Snyder, T. J. Schulz, and J. A. O’Sullivan,“Deblurring subject to nonnegativity constraints,” IEEETrans. Signal Process. 40, 1143–1150 (1992).

4. M. B. Sherman, J. Brink, and W. Chiu, “Performance of aslow-scan CCD camera for macromolecular imaging in a400 kV electron cryomicroscope,” Micron 27, 129–139(1996).

5. D. L. Snyder, A. M. Hammoud, and R. L. White, “Imagerecovery from data acquired with a charge-coupled-devicecamera,” J. Opt. Soc. Am. A 10, 1014–1023 (1993).

6. D. L. Snyder, C. W. Helstrom, A. D. Lanterman, and R. L.White, “Compensation for read-out noise in charge-coupled-device images,” J. Opt. Soc. Am. A 12, 272–283(1995).

7. I. J. Good and R. A. Gaskins, “Nonparametric roughnesspenalties for probability densities,” Biometrika 58, 255–277(1971).

8. M. I. Miller and B. Roysam, “Bayesian imagereconstruction for emission tomography incorporatingGood’s roughness prior on massively parallel processors,”Proc. Natl. Acad. Sci. USA 88, 3223–3227 (1991).

9. S. Joshi and M. I. Miller, “Maximum a posteriori estimationwith Good’s roughness for optical sectioning microscopy,” J.Opt. Soc. Am. A 10, 1078–1085 (1993).

0. L. Rudin, S. Osher, and E. Fatemi, “Non-linear totalvariation based noise removal algorithms,” Physica D 60,259–268 (1992).

1. R. H. Chan, T. F. Chan, and C. K. Wong, “Cosine transformbased preconditioners for total variation deblurring,” IEEETrans. Image Process. 8, 1472–1478 (1999).

2. T. F. Chan, G. H. Golub, and P. Mulet, “A nonlinear primal-dual method for total variation-based image restoration,”SIAM J. Sci. Comput. (USA) 20, 1964–1977(1999).

3. Y. Li and F. Santosa, “A computational algorithm forminimizing total variation in image restoration,” IEEETrans. Image Process. 5, 987–995 (1996).

4. P. L. Combettes and J. Luo, “An adaptive level set methodfor nondifferentiable constrained image recovery,” IEEETrans. Image Process. 11, 1295–1304 (2002).

5. P. L. Combettes and J. C. Pesquet, “Image restorationsubject to a total variation constraint,” IEEE Trans. ImageProcess. 13, 1213–1222 (2004).

6. E. Jonsson, S. Huang, and T. Chan, “Total variationregularization in positron emission tomography,” UCLAComputational and Applied Mathematics Rep., 98–48(UCLA, 1998).

7. T. F. Chan and C. K. Wong, “Total variation blinddeconvolution,” IEEE Trans. Image Process. 7, 370–375(1998).

8. T. F. Chan and C. K. Wong, “Multichannel imagedeconvolution by total variation regularization,” inAdvanced Signal Processing: Algorithms, Architectures,and Implementations, VII, F. T. Luk, eds., Proc. SPIE 3162,358–366 (1997).

9. T. F. Chan and C. K. Wong, “Convergence of the alternating

minimization algorithm for blind deconvolution,” LinearAlgebr. Appl. 316, 259–286 (2000).

0. P. J. Green, “On use of the EM for penalized likelihoodestimation,” J. R. Stat. Soc. Ser. B (Stat. Methodol.) 52,443–452 (1990).

1. P. J. Green, “Bayesian reconstruction from emissiontomography data using a modified EM algorithm,” IEEETrans. Med. Imaging 9, 84–93 (1990).

2. J. R. Fienup, “Phase retrieval algorithms: a comparison,”Appl. Opt. 21, 2758–2769 (1982).

3. A. L. Patterson, “A Fourier series method for thedetermination of the components of interatomic distancesin crystals,” Phys. Rev. 46, 372–376 (1934).

4. A. L. Patterson, “A direct method for the determination ofthe components of interatomic distances in crystals,” Z.Kristallogr. A 90, 517–542 (1935).

5. D. Harker, “The application of the three dimensionalPatterson method and the crystal structures of proustiteAg3AsS3 and pyrargyrite Ag3SbS3,” J. Chem. Phys. 4,381–390 (1936).

6. D. L. Snyder and M. I. Miller, Random Point Processes inTime and Space (Springer-Verlag, 1991).

7. J. A. O’Sullivan, “Roughness penalities on finite domains,”IEEE Trans. Image Process. 4, 1258–1268 (1995).

8. S. Teboul, L. Blanc-Féraud, G. Aubert, and M. Barlaud,“Variational approach for edge-preserving regularizationusing coupled PDE’s,” IEEE Trans. Image Process. 7,387–397 (1998).

9. U. Hermann and D. Noll, “Adaptive image reconstructionusing information measures,” SIAM (Soc. Ind. Appl. Math.)J. Control Optim. 38, 1223–1240 (2000).

0. S. L. Keeling, “Total variation based convex filters formedical imaging,” Appl. Math. Comput. 139, 101–119(2003).

1. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximumlikelihood from incomplete data via the EM algorithm,” J.R. Stat. Soc. Ser. B (Stat. Methodol.) 39, 1–37 (1977).

2. A. W. McCarthy and M. I. Miller, “Maximum likelihoodSPECT in clinical computation times using mesh-connectedparallel computers,” IEEE Trans. Med. Imaging 10,426–436 (1991).

3. J. Besag, “Spatial interaction and the statistical analysis oflattice systems (with discussion),” J. R. Stat. Soc. Ser. B(Stat. Methodol.) 36, 192–236 (1974).

4. J. Besag, “On the statistical analysis of dirty pictures (withdiscussion),” J. R. Stat. Soc. Ser. B (Stat. Methodol.) 48,259–302 (1986).

5. H. Lantéri, M. Roche, and C. Aime, “Penalized maximumlikelihood image restoration with positivity constraints:multiplicative algorithms,” Inverse Probl. 18, 1397–1419(2002).

6. J. A. O’Sullivan and D. L. Snyder, “Deterministic EMalgorithms with penalties,” in Proceedings of the IEEEInternational Symposium on Information Theory (IEEE,1995), p. 177.

7. E. L. Dove, “Signal-to-noise ratio,” http://www.engineering.uiowa.edu/~bme_285/Lecture/SignalNoise.pdf.

8. J. Shao, “Linear model selection by cross-validation,” J.Am. Stat. Assoc. 88, 486–494 (1993).

9. J. Shao, The Jackknife and Bootstrap (Springer,1995).

0. C. Goutte, “Note on free lunches and cross-validation,”Neural Comput. 9, 1211–1215 (1997).