eomf vol3 (i-o)

IImage Processing: MathematicsG Aubert, Universite de Nice Sophia Antipolis,Nice, FranceP Kornprobst, INRIA, Sophia Antipolis, France 2006 Elsevier Ltd. All rights reserved.Our society is often designated as being an infor-mation society. It could also be defined as animage society. This is not only because image is apowerful and widely used medium of communica-tion, but also because it is an easy, compact, andwidespread way to represent the physical world. Ifwe think about it, it is indeed striking to realize justhow much images are omnipresent in our livesthrough numerous applications such as medical andsatellite imaging, videosurveillance, cinema,robotics, etc.Many approaches have been developed to processthese digital images, and it is difficult to say whichone is more natural than the other. Image processinghas a long history. Maybe the oldest methods comefrom 1D signal processing techniques. They rely onfilter theory (linear or not), on spectral analysis, oron some basic concepts of probability and statistics.For an overview, we refer the interested reader tothe book by Gonzalez and Woods (1992).In this article, some recent mathematical conceptswill be revisited and illustrated by the imagerestoration problem, which is presented below. Wefirst discuss stochastic modeling which is widelybased on Markov random field theory and dealsdirectly with digital images. This is followed by adiscussion of variational approaches where thegeneral idea is to define some cost functions in acontinuous setting. Next we show how the scalespace theory is connected with partial differentialequations (PDEs). Finally, we present the wavelettheory, which is inherited from signal processingand relies on decomposition techniques.IntroductionAs in the real world, a digital image is composed ofa wide variety of structures. Figure 1 shows differentkinds of textures, progressive or sharp contours,and fine objects. This gives an idea of the complex-ity of finding an approach that allows to cope withthe different structures at the same time. It alsohighlights the discrete nature of images which willbe handled differently depending on the chosenmathematical tools. For instance, PDEs basedapproaches are written in a continuous setting,referring to analogous images, and once the exist-ence and the uniqueness of the solution have beenproved, we need to discretize them in order to find anumerical solution. On the contrary, stochasticapproaches will directly consider discrete images inthe modeling of the cost functions.The Image Restoration ProblemIt is well known that during formation, transmis-sion, and recording processes images deteriorate.Classically, this degradation is the result of twophenomena. The first one is deterministic and isrelated to the image acquisition modality, to possibledefects of the imaging system (e.g., blur created byan incorrect lens adjustment or by motion). Thesecond phenomenon is random and corresponds tothe noise coming from any signal transmission. Itcan also come from image quantization. It isimportant to choose a degradation model as closeas possible to reality. The random noise is usuallymodeled by a probabilistic distribution. In manycases, a Gaussian distribution is assumed. However,some applications require more specific ones, likethe gamma distribution for radar images (specklenoise) or the Poisson distribution for tomography.Unfortunately, it is usually impossible to identify thekind of noise involved for a given real image.A commonly used model is the following. Letu: & R2!R be an original image describing a realscene, and let f be the observed image of the samescene (i.e., a degradation of u). We assume thatf Au i 1where i stands for a white additive Gaussian noiseand A is a linear operator representing the blur(usually a convolution). Given f, the problem isthen to reconstruct u knowing [1]. This problemis ill-posed, and we are able to carry out only anapproximation of u. In this article, we will focus onthe simplified model of pure denoising:f u i 2The Probabilistic ApproachThe Bayesian FrameworkIn this section, we show how the problem of puredenoising, that is, recovering u from the equationf =u i knowing only some statistical informationon i can be solved by using a probabilisticapproach. In this context, f, u, and i are consideredas random variables. The general idea for recoveringu is to maximize some prior probability. Mostmodels involve two parts: a prior model of possiblerestored images u and a data model expressingconsistency with the observed data. The prior model is given by a probability space(u, p), where u is the set of all values of u. Themodel is specified by giving the probability p(u)on all these values. The data model is a larger probability space(u, f, p), where u, f is the set of all possible valuesof u and all possible values of the observed imagef. This model is completed by giving the condi-tional probability p(f u) of any image f given u,resulting in the joint probabilities p(f , u) =p(f u)p(u). Implicitly, we assume that the spaces(u) and (u, f) are finite although huge.The next step is to use a Bayesian approachintroduced in image processing by Besag (1974)and Geman and Geman (1984). The probabilitiesp(u) and p(f u) are supposed to be known and,given an observed image f, we seek the imageu which maximizes the conditional a posterioriprobability p(uf ) (MAP: Maximum A Posteriori).Thanks to the Bayes rule, we havepuf pf upupf 3Let us explain the meaning of the different termsin [3]: The term p(f u) expresses the probability, thelikelihood, that an image u is realized in f. It alsoquantifies the lack of total precision of the modeland the presence of noise. The term p(u) expresses our incomplete a prioriinformation about the ideal image u (it is theprobability of the model, i.e., the propensity thatu be realized independently of the observation f ). The term p(f ) which is the probability to observe fis a constant and does not play any role whenmaximizing the conditional probability p(uf )with respect to u.Let us remark that the problem maxup(uf ) isequivalent to minuE(u) =log p(f u) log p(u).So Bayesian models lead to a minimizationprocess.Then the main question is how to assign theseprobabilities? The easiest probability to determine isp(f u). If the images u and f consist in a set of valuesu=(ui, j), i, j =1, N and f =(fi, j), i, j =1, N, we sup-pose the conditional independence of (fi, jui, j) in anypixel:pf u Ni1pfi.jui.jand if the restoration model is of the form f =u iwhere i is a white Gaussian noise with variance o2,thenpfi.jui.j 12op exp fi.j ui.j22o2andpf u 12oN2 exp Ni.jfi.j ui.j22o2Therefore, at this stage, the MAP reduces tominimizeEu Kokf uk2log pu 4where k.k stands for the Euclidean norm on RN2andKo is a constant. So, it remains now to assign aprobability law p(u). To do that, the most commonway is to use the theory of Markov random fields(MRFs).(a)(b)(c)(d)(e)Figure 1 Digital image example. 1 the close-ups showexamples of low resolution, low contrasts, graduated shadings,sharp transitions, and fine elements. (a) low resolution, (b) lowcontrasts, (c) graduated shadings, (d) sharp transitions, and(e) fine elements.2 Image Processing: MathematicsThe Theory of Markov Random FieldsIn this approach, an image is described as a finite setS of sites corresponding to the pixels. For each site,we associate a descriptor representing the state ofthe site, for example, its gray level. In order to takeinto account local interaction between sites, oneneeds to endow S with a system of neighborhoods V.Definition 1 For each site s, we define its neighbor-hood V(s) as:Vs ftg such that s 2Vs and t 2Vs )s 2VtThen we associate to this neighborhood system thenotion of clique: a clique is either a singleton or a setof sites which are all neighbors of each other.Depending on the neighborhood system, the familyof cliques will be different and involve more and lesssites. We will denote by C the set of all the cliquesrelative to a neighborhood system V (see Figure 2).Before introducing the general framework ofMRFs, let us define some notations. For a site s,Xs will stand for a random variable taking its valuesin some set E (e.g., E ={0, 1, . . . , 255}) and xs will bea realization of Xs and xs=(xt)t6s will denote animage configuration where site s has been removed.Finally, we will denote by X the random variableX=(Xs, Xt, . . . ) with values in =EjSj.Definition 2 We say that X is an MRF if the localconditional probability at a site s is only a functionof V(s), that is,pXs xsXs xs pXs xsxt. t 2VsTherefore, the gray level at a site depends only ongray levels of neighboring pixels. Now we give thefollowing fundamental theorem due to HammersleyClifford (Besag 1974) which states the equivalencebetween MRFs and Gibbs fields.Theorem 1 Let us suppose that S is finite, E is adiscrete set and for all x2=EjSj, p(X=x) 0,then X is an MRF relatively to a system ofneighborhoods V if and only if there exists a familyof potential functions (Vc)c 2C such thatp(x) =(1Z) exp(c 2C Vc(x)).The function V(x) =c 2C Vc(x) is called theenergy potential or the Gibbs measure and Z is anormalizing constant: Z= exp(x2V(x)).If, for example, the collection of neighborhoods isthe set of 4-neighbors, then the theorem says thatV(x) =c ={s} 2C1Vc(xs) c ={(s, t)} 2C2Vc(xs, xt).Application to the Denoising ProblemNow, given this theorem we can reformulate, thanksto [4], the restoration problem (with the change ofnotation u =x and us=xs): find u minimizing theglobal energyEu Kokf uk2Vu 5The next step is now to precise the Gibbsmeasure. In restoration, the potential V(u) is oftendedicated to impose local regularity constraints, forexample, by penalizing differences between neigh-bors. This can be modeled using cliques of order 2 inthe following manner:Vu us.t 2C2cus utwhere c is a given real function. This term penalizesthe difference of intensities between neighbors whichmay come from an edge or some noise. This discretecost function is very similar to the gradient penaltyterms in the continuous framework (see the nextsection). The resulting final energy is (sometimesE(u) is written E(uf ))Eu Kos 2Sfs us2us.t 2C2cus utwhere the constant u is a weighting parameterwhich can be estimated.The difficulty in choosing the strength of thepenalty term defined by c is to be able to penalizethe noise while keeping the most salient features,that is, edges. Historically, the function c was firstchosen as c(z) =z2but this choice is not good sincethe resulting regularization is too strong introducinga blur in the image and loss of the edges. A betterchoice is c(z) =jzj (Rudin et al. 1992) or aregularized version of this function. Of course,other choices are possible depending on the con-sidered application and the desired degree ofsmoothness.In this section, it has been shown how to modelthe restoration problem through MRFs and theBayesian framework. Numerically, two main typesof algorithms can be used to minimize the energy:deterministic algorithms and stochastic algorithms.The former are generally used when the globalenergy is strictly convex (e.g., algorithms based onC1 C2 C1 C2Figure 2 Examples of neighborhood system and cliques.Image Processing: Mathematics 3gradient descent). The latter are rather used whenE(u) is not convex. There are stochastic minimiza-tion algorithms mainly based on simulated anneal-ing. Their main interest is that they always converge(almost surely) to a minimizer (this is not the casefor deterministic algorithms which give only localminimizers) but they are often strongly timeconsuming.We refer the reader to Li (1995) for more detailsabout MRFs and Bayesian framework andKirkpatrick et al. (1983) for more information onstochastic algorithms.The Variational ApproachMinimizing a Cost Function over aFunctional SpaceOne important issue in the previous section was thedefinition of p(u) which gives some a priori on thesolution. In the variational approach, this idea isalso present but the way to infer it is in fact todefine the more suitable functional space thatdescribes images and their geometrical properties.The choice of a functional space sets a norm whichin turn will constrain the solution to a certainsmoothness.We illustrate this idea in this section on thedenoising problem [2] which can be seen as adecomposition one. This means that given theobservation f, we look for u and i such thatf =ui, where i incorporates all oscillations, thatis, noise, and also texture. Let us define a functionalto be minimized which takes into account the data fand possibly some statistical informations about i:minu.icjujE such that jijG owith f u ig6This formulation means that we look, among alldecompositions f =u i, for the one which mini-mizes c(jujE) under the constraint (jijG) =o.Banach spaces E and G, and functions c and will be discussed in the next subsection. Since aminimization problem under constraints can beexpressed with an additional term weighted by aLagrange multiplier, the formulation [6] can berewritten as:minu.icjujE `jijG; f u i 7A similar writing consists in replacing i by f u sothat [7] rewritesminucjujE `jf ujG 8which is the classical formulation in image restora-tion. From a numerical point of view, the minimiza-tion is usually carried out by solving the associatedEuler equations but this may be a difficult task. Themain concern is the search for E and G and theirnorm (or seminorm). It is guided by the choice thatan image u is composed of various geometricstructures (homogeneous regions, edges) whilei =f u represents oscillations (noise and textures).Examples of Functional SpacesIn this section, we revisit some possible choices offunctional spaces summarized in Table 1.The first case (a) was inspired by the classicalTikhonov regularization. The functional spaceH1()( & R2) is the space of functions in L2()such that the distributional gradient Du is in L2().Unfortunately, functions in H1() do not admitdiscontinuities across curves and this is a majorproblem with respect to image analysis since imagesare made of smooth patches separated by sharpvariations.Considering the problemreported in (a), Rudin et al.(1992) proposed to work on BV(), the space ofbounded variations (BV) Ambrosio et al. (2000)defined byBV u2L1;

Du j j < 1 with

Du j j sup

udivdx; 1. 2. . . . . N 2C10N.jjL1 1

9Table 1 Examples of functional spaces and their norm (see model [8])Model E and jujEc(t ) G and jujG(t )(a) H1(), jujE =

jruj2dx 12t2L2() with its usual norm t2(b) BV(), jujE =

jDuj t L2() with its usual norm t2(c) BV(), jujE =

jDuj t fb 2 L2(); b =div, jjL1()2 1, Nj0=0g t4 Image Processing: MathematicsIt is equivalent to define BV() as the space ofL1() functions whose distributional gradient Du isa bounded measure and [9] is its total variation. Thespace BV() has some interesting properties:1. lower semicontinuity of the total variation

Du j j with respect to the L1() topology,2. if u2BV(), we can define, for H1almosteverywhere x 2Su, the complement of Lebesguepoints (i.e., the jump set of u), a normal nu(x)and two approximate right and left limitsu(x) and u(x), and3. Du can be decomposed as a sum of a regularmeasure, a jump measure, and a Cantor measure:Du rudx u unuH1Su Cuwhere ru is the approximate gradient and H1theone-dimensional Hausdorff measure.This ability to describe functions with disconti-nuities across a hypersurface Su makes BV() veryconvenient to describe images with edges. In thiscontext, the image restoration problem is wellposed and suitable numerical tools can be proposed(Chambolle and Lions 1997).One criticism of the model (b) in Table 1 pointedout by Meyer (2001) is that if f is a characteristicfunction and if f is sufficiently small with respect toa suitable norm, then the model (Rudin et al. 1992)gives u=0 and i =f contrary to what one shouldexpect (u=f and i =0). In fact, the main reason ofthis phenomenon is that the L2-norm for the icomponent is not the right one since very oscillatingfunctions can have large L2-norm (e.g.,fn(x) = cos(nx)). To better describe such oscillatingfunctions, Meyer (2001) introduced the space offunctions which can be expressed as a divergenceof L1-fields. This work was developed in RNandthis framework was adapted to bounded 2Ddomains by Aubert and Aujol (2005) (see (c) inTable 1). An example of image decomposition isshown in Figure 3.In this section, we have shown how the choice ofthe functional spaces is closely related to thedefinition of a variational formulation. Thefunctionals are written in a continuous setting andthey can usually be minimized by solving thediscretized Euler equations iteratively, until conver-gence. These PDEs and the differential operators areconstrained by the energy definition but it is alsopossible to work directly on the equations, forget-ting the formal link with the energy. Such anapproach has also been much developed in thecomputer vision community and it is illustrated inthe next section.We refer the reader to Aubert and Kornprobst(2002) for a general review of variationalapproaches and PDEs as applied to image analysis.Scale Spaces and PDEsAnother approach to perform nonlinear filteringis to define a family of image smoothing operatorsTt, depending on a scale parameter t. Given animage f (x), we can define the image u(t, x) =(Ttf )(x)which corresponds to the image f analyzed at scale t.In this section, following AlvarezGuichardLionsMorel (Alvarez et al. 1993), we show that u(t, x)is the solution of a PDE provided some suitableassumptions on Tt.Basic Principles of a Scale SpaceThis section describes some natural assumptions tobe fulfilled by scale spaces. We first assume that theoutput at scale t can be computed from the output ata scale t h for very small h. This is natural, since acoarser scale view of the original picture is likely tobe deduced from a finer one. Tt is obtained bycomposition of transition filters, denoted by Tth, t.So the first axiom is(A1) Tth=Tth, tTt T0=IdAnother assumption is that operators act locally,that is, (Tth, tf )(x) depends essentially upon thevalues of f (y) with y in a small neighborhood of x.Taking into account the fact that as the scaleincreases, no new feature should be created by thescale space, we have the local comparison principle:if an image u is locally brighter than another imagev, then this order must be conserved by the analysis.This is expressed by:(A2) For all u and v such that u(y) v(y) in aneighborhood of x and y 6 x, then for h smallenough, we haveTth.tux ! Tth.tvxThe third assumption states that a very smoothimage must evolve in a smooth way with the scaleOriginal u Figure 3 Example of image decomposition (see Aubert andAujol (2005)).Image Processing: Mathematics 5space. Denoting the scalar product of two vectors ofRNby

eomf vol3 (i-o)

Documents