infinity laplacian on graphs with gradient terms for image and data clustering

8
Infinity Laplacian on graphs with gradient terms for image and data clustering q Sadia Alkama a , Xavier Desquesnes b,, Abderrahim Elmoataz c a Department of Automatics, M. Mammeri University, 15100 Tizi-Ouzou, Algeria b Univ. Orléans, PRISME, EA 4229, IUT de l’Indre, 36100 Issoudun, France c Université de Caen Basse-Normandie, GREYC – UMR CNRS 6972, Image Team, ENSICAEN 6. Bvd Marechal Juin, 14050 Caen, France article info Article history: Available online 27 December 2013 Keywords: Weighted graphs PdE Laplacian Unsupervised Classification Segmentation abstract In this paper, we introduce a new family of graph-based operators for semi-supervised and unsupervised classification. These operators interpolate between two morphological gradient operators introduced on graphs, and are linked with the discrete infinity Laplacian. Then, we consider semi-supervised classifica- tion as the Dirichlet problem associated with this new family of operators. We show the proof of existence and uniqueness of the solution of this problem and propose an implementation. Similarly, we consider unsupervised classification as a diffusion problem associated with this new family of operators to handle it. We finally illustrate these two approaches on image segmentation and data clustering. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction Data clustering as unsupervised of semi-supervised grouping of similar patterns into clusters is a central problem in engineering and applied sciences and is also currently under theoretical and practical development and validation. In this paper, we focus on graph-based data clustering methods which are extensively studied and developed thanks to their simple implementation and acceptable efficiency in a number of different fields as signal and image processing, computer vision, computational biology, machine learning to name a few. The main contribution of this paper is to propose new Partial difference Equations (PdEs) based operators on graphs for solving semi-supervised and unsupervised classification problems. This new family of operators is introduced as a convex combination be- tween two upwind discrete gradients on weighted graphs. They can be also interpreted as combination between infinity Laplacian and discrete gradients terms on graphs. Our first motivation is to transcript the infinity Laplacian equation with gradient terms on graphs to extend their applications from image processing to machine learning. The infinity Laplacian has found applications in image processing, com- puter vision, stochastic games [1,2], but have found only few appli- cations in machine learning problems. Even if the graph-Laplacian and p-Laplacian for p < 1 have been recently used in machine learning, it is, to our knowledge, the first time that the infinity Laplacian is used in combination with discrete gradients for semi-supervised and unsupervised classification problems. 1.1. Short background More and more contemporary applications involve data in the form of functions defined on irregular and topologically compli- cated domains. Typical examples are data defined on manifolds or irregularly-shaped domains, data defined on network-like struc- tures, or data defined as high dimensional point clouds such as collections of features vectors. Such data are not organized as familiar digital signals and images sampled on regular lattices. However, they can be conveniently represented as graphs where each vertex represents measured data and each edge represents a relationship (connectivity or certain affinities or interaction) between two vertices. Moreover, constructing a graph of patches from a usual signal or images is a more rich representation that en- ables to take into account local and nonlocal interactions and leads to very powerful tools for nonlocal image processing. Processing and analyzing these types of data is a major chal- lenge for both image and machine learning communities. Hence, it is very important to transfer to graphs and networks many of the mathematical tools which were initially developed on usual Euclidean spaces. This tools were proven to be efficient for many problems and applications dealing with usual image and signal domains. Historically, the main tools for data clustering on graphs or networks come from combinatorial and graph theory. For instance, 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.11.024 q This paper has been recommended for acceptance by J.Fco. Martínez-Trinidad. Corresponding author. Tel.: +33 (0)2 31 45 27 06. E-mail address: [email protected] (X. Desquesnes). Pattern Recognition Letters 41 (2014) 65–72 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Upload: abderrahim

Post on 30-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Infinity Laplacian on graphs with gradient terms for image and data clustering

Pattern Recognition Letters 41 (2014) 65–72

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Infinity Laplacian on graphs with gradient terms for image and dataclustering q

0167-8655/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.patrec.2013.11.024

q This paper has been recommended for acceptance by J.Fco. Martínez-Trinidad.⇑ Corresponding author. Tel.: +33 (0)2 31 45 27 06.

E-mail address: [email protected] (X. Desquesnes).

Sadia Alkama a, Xavier Desquesnes b,⇑, Abderrahim Elmoataz c

a Department of Automatics, M. Mammeri University, 15100 Tizi-Ouzou, Algeriab Univ. Orléans, PRISME, EA 4229, IUT de l’Indre, 36100 Issoudun, Francec Université de Caen Basse-Normandie, GREYC – UMR CNRS 6972, Image Team, ENSICAEN 6. Bvd Marechal Juin, 14050 Caen, France

a r t i c l e i n f o

Article history:Available online 27 December 2013

Keywords:Weighted graphsPdELaplacianUnsupervisedClassificationSegmentation

a b s t r a c t

In this paper, we introduce a new family of graph-based operators for semi-supervised and unsupervisedclassification. These operators interpolate between two morphological gradient operators introduced ongraphs, and are linked with the discrete infinity Laplacian. Then, we consider semi-supervised classifica-tion as the Dirichlet problem associated with this new family of operators. We show the proof ofexistence and uniqueness of the solution of this problem and propose an implementation. Similarly,we consider unsupervised classification as a diffusion problem associated with this new family ofoperators to handle it. We finally illustrate these two approaches on image segmentation and dataclustering.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

Data clustering as unsupervised of semi-supervised grouping ofsimilar patterns into clusters is a central problem in engineeringand applied sciences and is also currently under theoretical andpractical development and validation. In this paper, we focus ongraph-based data clustering methods which are extensivelystudied and developed thanks to their simple implementationand acceptable efficiency in a number of different fields as signaland image processing, computer vision, computational biology,machine learning to name a few.

The main contribution of this paper is to propose new Partialdifference Equations (PdEs) based operators on graphs for solvingsemi-supervised and unsupervised classification problems. Thisnew family of operators is introduced as a convex combination be-tween two upwind discrete gradients on weighted graphs. Theycan be also interpreted as combination between infinity Laplacianand discrete gradients terms on graphs.

Our first motivation is to transcript the infinity Laplacianequation with gradient terms on graphs to extend theirapplications from image processing to machine learning. Theinfinity Laplacian has found applications in image processing, com-puter vision, stochastic games [1,2], but have found only few appli-cations in machine learning problems. Even if the graph-Laplacianand p-Laplacian for p <1 have been recently used in machine

learning, it is, to our knowledge, the first time that the infinityLaplacian is used in combination with discrete gradients forsemi-supervised and unsupervised classification problems.

1.1. Short background

More and more contemporary applications involve data in theform of functions defined on irregular and topologically compli-cated domains. Typical examples are data defined on manifoldsor irregularly-shaped domains, data defined on network-like struc-tures, or data defined as high dimensional point clouds such ascollections of features vectors. Such data are not organized asfamiliar digital signals and images sampled on regular lattices.However, they can be conveniently represented as graphs whereeach vertex represents measured data and each edge representsa relationship (connectivity or certain affinities or interaction)between two vertices. Moreover, constructing a graph of patchesfrom a usual signal or images is a more rich representation that en-ables to take into account local and nonlocal interactions and leadsto very powerful tools for nonlocal image processing.

Processing and analyzing these types of data is a major chal-lenge for both image and machine learning communities. Hence,it is very important to transfer to graphs and networks many ofthe mathematical tools which were initially developed on usualEuclidean spaces. This tools were proven to be efficient for manyproblems and applications dealing with usual image and signaldomains.

Historically, the main tools for data clustering on graphs ornetworks come from combinatorial and graph theory. For instance,

Page 2: Infinity Laplacian on graphs with gradient terms for image and data clustering

66 S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72

in the case of semi-supervised segmentation, graph-based approacheshave became very popular in recent years. Many graph-basedalgorithms for image segmentation have been proposed, such asgraph-cuts [3], random walker [4], shortest-paths [5,6],watershed[7–9] Recently, these algorithms were all placed into a commonframework [10] that allows them to be seen as special cases of asingle general semi-supervised algorithms. One can also quotethe recent Combinatorial Continuous Maximal Flows [11]. On theother hand, spectral clustering has become one of the most popularmodern graph-based clustering algorithm. This was originallyderived as relaxation of NP-hard problems involving graph-cutssuch as normalized-cut, Cheeger-cut, expander etc. The spectralrelaxation leads to an eingenproblem for the graph Laplacian. See[12] and references therein.

However, in recent years there has been an increasing interestin the investigation of two major mathematical tools for signaland image analysis, which are PDEs and wavelets, on graph. Forwavelets on graphs, one can quote the works of [13] on diffusionwavelets, Jansen [14] on multiscale methods, or recently Ham-mond et al. [15] on spectral wavelets transforms. The PDEs ap-proach on graphs starts to raise interests in image and manifoldprocessing. It consists in the exploitation of Partial difference Equa-tions (PdEs), or related Discrete Vector Calculus, to transpose Dif-ferential operators and PDEs on graphs [16–21].

The graph Laplacian is a popular tool which is extensively usedfor semi-supervised or unsupervised learning problems on graphs.In particular, the graph p-Laplacian, generalization of the standardLaplacian, has started to attract attention from mathematical, ma-chine learning, image and manifolds processing community. Onecan quote the proof of the relationship between graph p-Laplacianand cheeger cut [22], the p-Laplacian regularization for semi-supervised classification [23], or the recent framework for nonlocalregularization on graphs based on p-Laplacian that unify image,mesh and manifold processing [19].

1.2. Main contributions

In this paper, we introduce a new family of operators on graphsfor semi-supervised and unsupervised classification. These opera-tors interpolate between two morphological gradient operatorsintroduced on graphs, and are linked with the discrete infinityLaplacian. Then, we consider the semi-supervised classificationproblem and propose a study of the Dirichlet problem associatedwith this new family of operators to handle it. We show the proofof existence and uniqueness of the solution of this problem andpropose an implementation. Similarly, we consider unsupervisedclassification problem and propose a study of the diffusion prob-lem associated with this new family of operators to handle it.

1.3. Paper organization

The rest of the paper will be organized as follows. In Section 2, weprovide definitions and notations used in this work. In Section 3, weintroduce our new family of PdEs based operators on graphs, andpresent a study of Dirichlet and diffusion problems associated withthis family, including the proof of existence and uniqueness of theirsolutions. In Section 4, we present different illustrations of our ap-proach for semi-supervised or unsupervised classification problems,involving images and data. Finally, Section 5 concludes the paper.

2. Partial difference operators on weighted graphs and previousworks

In this Section, we present definitions and previous works in-volved in this paper. For more details, see [21,2].

2.1. Partial difference equations on graphs

First, we recall some basics on weighted graphs.

2.1.1. NotationsLet us consider the general situation where any discrete domain

can be viewed as a weighted graph. A weighted graph G ¼ ðV ; E;wÞconsists in a finite set V of N vertices and in a finite set E # V � V ofedges. Let ðu;vÞ be the edge that connects vertices u and v. Anundirected graph is weighted if it is associated with a weight func-tion w : V � V ! 0;1½ �. The weight function represents a similaritymeasure between two vertices of the graph. According to theweight function, the set of edges is defined as: E ¼ fðu;vÞjwðu;vÞ– 0g. We use the notation u � v to denote two adjacent vertices.The degree of a vertex u is defined as dwðuÞ ¼

Pv�uwðu;vÞ.

The neighborhood of a vertex u (i.e., the set of vertices adjacentto u) is denoted NðuÞ. In this paper, the considered graphs areconnected, undirected, with no self-loops neither multiple edges.Let HðVÞ be the Hilbert space of real valued functions on thevertices of the graph. Each function f : V ! R of HðVÞ assigns a realvalue f ðuÞ to each vertex u 2 V . Similarly, let HðEÞ be the Hilbertspace of real valued functions defined on the edges of the graph.These two spaces are endowed with following the inner products:hf ;hiHðVÞ ¼

Pu2V f ðuÞhðuÞ with f ;h 2 HðVÞ, and hF;HiHðEÞ ¼

Pu2VP

v2V Fðu;vÞHðu;vÞ where F;H 2 HðEÞ.Given a function f : V ! R, the Lp norm is given by

kfkp ¼Xu2V

jf ðuÞjp !1=p

; 1 6 p <1

kfk1 ¼maxu2V

jf ðuÞjð Þ; p ¼ 1

Let A be a set of connected vertices with A � V such that for allu 2 A, there exists a vertex v 2 A with ðu;vÞ 2 E. We denote by@A : the boundary set of A,

@A ¼ fu 2 Ac : 9v 2 A with ðu;vÞ 2 Eg ð1Þ

where Ac ¼ V n A is the complement of A.

2.1.2. Differences and gradient operatorsWe recall several definitions of difference operators on

weighted graphs to define derivatives and morphological operatorson graphs. More details on these operators can be found in [19,24].

The gradient or difference operator of a function f 2 HðVÞ, notedGw : HðVÞ ! HðEÞ, is defined on an edge ðu; vÞ 2 E by:

ðGwf Þðu; vÞ ¼def : c wðu;vÞð Þ f ðvÞ � f ðuÞð Þ: ð2Þ

where c : Rþ ! Rþ depends on the weight function (in the sequelwe denote cðwðu;vÞÞ by cuv ). This gradient operator is linear andantisymmetric.

The adjoint of the difference operator, noted G�w : HðEÞ ! HðVÞ,is a linear operator defined by hGwf ;HiHðEÞ ¼ hf ;G�wHiHðVÞ for allf 2 HðVÞ and all H 2 HðEÞ. Using the definitions of the differenceand inner products in HðVÞ and HðEÞ, the adjoint operator G�w, ofa function H 2 HðEÞ, can by expressed at a vertex u 2 V by the fol-lowing expression:

ðG�wHÞðuÞ ¼def :Xv�u

cuvðHðv;uÞ � Hðu;vÞÞ: ð3Þ

The divergence operator, defined by Dw ¼ �G�w, measures thenet outflow of a function of HðEÞ at each vertex of the graph. Eachfunction H 2 HðEÞ has a null divergence over the entire set ofvertices. From previous definitions, it can be easily shown thatP

u2V

Pv2V Gwfð Þðu;vÞ ¼ 0; f 2 HðvÞ and

Pu2V DwFð ÞðuÞ ¼ 0; F 2 HðEÞ.

Based on the previous definitions, we can define two upwindgradients G�w : HðVÞ ! HðEÞ, expressed by the following expressions

Page 3: Infinity Laplacian on graphs with gradient terms for image and data clustering

S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72 67

G�wf� �

ðu;vÞ ¼def : cuv f ðvÞ � f ðuÞð Þ�: ð4Þ

with the notation ðxÞþ ¼maxð0; xÞ and ðxÞ� ¼ �minð0; xÞ.We define the directional derivative of a function f 2 HðVÞ, noted

ð@v f Þ as

@v fð ÞðuÞ ¼def : cuv f ðvÞ � f ðuÞð Þ: ð5Þ

We also introduce two morphological directional partial derivativeoperators (external and internal), respectively defined as

ð@�v f ÞðuÞ ¼def : ð@v f ÞðuÞð Þ� ð6Þ

The discrete weighted gradient of a function f 2 HðVÞ, notedrwf : HðVÞ ! RjV j, is defined on a vertex u 2 V as the vector of allpartial derivatives with respect to the set of edges ðu;vÞ 2 E:

ðrwf ÞðuÞ ¼def :@v fð ÞðuÞð ÞTv2V : ð7Þ

Similarly, discrete upwind weighted gradients are defined as

ðr�wf ÞðuÞ ¼def :@�v f� �

ðuÞ� �T

v2V : ð8Þ

The Lp norms, 1 6 p <1 of these gradients: krwfkp and kr�wfkp,allow to define the notion of the regularity of a function around avertex. They are expressed as:

k r�wf� �

ðuÞkp ¼Xv2V

cpuv f ðvÞ � f ðuÞð Þ�

" #1p

ð9Þ

Similarly, the L1 norm of these gradients is expressed as:

k r�wf� �

ðuÞk1 ¼maxv2V

cuv j f ðvÞ � f ðuÞð Þ�j� �

ð10Þ

They can be used to construct several regularization functionalson graphs. One can remark that all these definitions can be appliedto graphs of any topology.

2.2. Our previous works on PdEs

In this Section, we first recall our previous works on the expres-sion of the p-Laplacian on graphs for 1 6 p <1 [19,23] as well asPdEs based mathematical morphology.

2.2.1. p-Laplacian and infinity LaplacianIn our previous works, we focused on the expression of the p-

Laplacian on weighted graphs with 1 6 p <1. We consider theanisotropic p-Laplace operator of a function f 2 HðVÞ, notedDw;p : HðVÞ ! HðVÞ, defined by:

Dw;pf� �

ðuÞ ¼ 12

Dw ðGwf Þj jp�2 Gwfð Þ� �

ðuÞ: ð11Þ

The anisotropic p-Laplace operator of f 2 HðVÞ, at a vertexu 2 V , can be computed by [25]:

Dw;pf� �

ðuÞ ¼Xv�u

uw;pf� �

ðu;vÞ f ðvÞ � f ðuÞð Þ ð12Þ

with

ðuw;pf Þðu;vÞ ¼ cpuv jf ðvÞ � f ðuÞjp�2

: ð13Þ

This operator is nonlinear if p – 2. In this latter case, it correspondsto the combinatorial graph Laplacian. To avoid zero denominatorin (12) when p 6 1; jf ðvÞ � f ðuÞj is replaced by jf ðvÞ � f ðuÞj�¼ jf ðvÞ � f ðuÞj þ �, where �! 0 is a small fixed constant.

In order to simplify the notations, we will now refer to theanisotropic p-Laplacian as the p-Laplacian. Depending on cuv defi-nition, we will consider the two following p-Laplacians:

cuv ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiwðu;vÞ

p! DU

w;p : Unnormalized p� Laplacian ð14Þ

cuv ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiwðu; vÞdwðuÞ

s! Dw;p : Normalized p� Laplacian ð15Þ

As an example, in the case where p ¼ 2, we have:

DUw;2 ¼

Xv�u

wðu;vÞ f ðvÞ � f ðuÞð Þ: ð16Þ

Dw;2 ¼1

dwðuÞXv�u

wðu;vÞf ðvÞð Þ � f ðuÞ: ð17Þ

The 1-Laplacian of a function f 2 HðVÞ is defined as

ðDw;1f ÞðuÞ ¼def : 12kðrþwf ÞðuÞk1 � kðr�wf ÞðuÞk1� �

: ð18Þ

2.2.2. PdE-based morphology on graphsIn [24,21], using the expression of weighted morphological gra-

dients (8) based on PdEs, we have defined the discrete analogue ofthe continuous PDEs-based dilation and erosion formulations. Con-tinuous scale morphology [26] defines flat dilation d and erosion �of a function f 0 : Rm ! R by using structuring sets B ¼ fx : kxkp

6 1g with the following general PDEs [27]:

@tf ¼ þkrfkp and @tf ¼ �krfkp; ð19Þ

where f is a modified version of f 0;r is the gradient operator, k kp

corresponds to the Lp-norm, and one has the initial condition f ¼ f 0

at t ¼ 0. We have proposed in [24] the discrete PdEs analogue ofPDEs-based dilation and erosion formulations and obtained thefollowing expressions over graphs. For a given initial functionf 0 2 HðVÞ:

@dt f ðuÞ ¼ þkðrþwf ÞðuÞkp; and

@et f ðuÞ ¼ �kðr�wf ÞðuÞkp;8u 2 V

ð20Þ

where rþw and r�w are weighted pseudo-morphological gradients(8). (We use the term morphological by analogy with the continu-ous case). The relation between this morphological frameworkand adaptive morphological methods proposed in the literaturesuch as amoebas [28] or PDEs-based viscous morphology [29] hasbeen discussed in [21].

3. A family of PdEs-based operators for graph-basedclassification

Considering our framework of PdEs, with the discretization of p-Laplacian and infinity Laplacian on a general graph domain, wenow propose a new family of PdEs-based operators for graph-based classification.

3.1. Definition

Let G ¼ ðV ; E;wÞ be a weighted graph and f : V ! R a function.Our new family of operators is given by the following combinationof two morphological gradients and can be expressed using ourframework of discrete gradient as

Dw;a;bf ¼ aðuÞk rþwf� �

ðuÞk1 � bðuÞk r�wf� �

ðuÞk1 ð21Þ

where a : V ! ½0;1�;b : V ! ½0;1� and with aðuÞ þ bðuÞ ¼ 1.This family can also be expressed according to the infinity

Laplacian:

Dw;a;bf ¼ 2 min aðuÞ; bðuÞð ÞDw;1f ðuÞþ aðuÞ � bðuÞð Þþk rþwf

� �ðuÞk1

� aðuÞ � bðuÞð Þ�k r�wf� �

ðuÞk1 ð22Þ

Page 4: Infinity Laplacian on graphs with gradient terms for image and data clustering

68 S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72

Moreover, this family is a convex linear combination betweenthe infinity Laplacian and either the external gradient (rþw) orthe internal gradient (r�w), according to the values of aðuÞ andbðuÞ. This family is then a family of locally adaptive operators thatdepends on the function f and on the position (the vertex u).

For example:

In the case where aðuÞ ¼ bðuÞ ¼ 0:5, Eq. (21) becomes

Fig. 1.(e) Our

Dw;a;bf ¼ 0 ð23Þ

that recovers the discrete version of the continuous infinity Lapla-cian operator D1 ¼ 0. See [30]. In the case where

aðuÞ ¼

Xv�u;f ðvÞPf ðuÞ

wuv

dðuÞ ; and bðuÞ ¼

Xv�u;f ðvÞ<f ðuÞ

wuv

dðuÞ ; ð24Þ

the sign of aðuÞ � bðuÞ (and then the expression of our operator) de-pends on the mean similarity between u and vertices withf ðvÞP f ðuÞ and respectively vertices with (f ðvÞ < f ðuÞ. If aðuÞ> bðuÞ the similarity is strongest with neighbors for whichf ðvÞP f ðuÞ and the operator descibes locally a diffusion processwith a dilation process. On the contrary, the similarity is strongestwith neighbors for which f ðvÞ < f ðuÞ and the operator describes lo-cally a diffusion process with an erosion process. This operator cor-responds to a combination between a diffusion and a shock filter.

3.2. Semi-supervised classification problem

Many tasks in image processing, computer vision and machinelearning can be formulated as interpolation problems. Image and

Comparison with state of the art of graph-based semi-supervised segmentation mapproach. See text for more details.

video colorization, inpainting and semi-supervised segmentationor clustering are examples of these interpolation problems. Inter-polating data consists in constructing new values (a function f)for missing data in coherence with a set of known data (or initialvalue function g).

In this Section, we consider the semi-supervised classificationproblem as an interpolation problem, where the function to inter-polate is a label function f that assigns a class to every vertex. Inthis case, considering two classes A and B, the initial value functiong is defined as follow

gðuÞ ¼ �1 if u 2 A

gðuÞ ¼ 1 if u 2 B

gðuÞ ¼ 0 otherwise

8><>: ð25Þ

At convergence, the class membership can be easily computed by asimple threshold on the sign of f

Remark 1. In the case of more than two classes, multi-classessegmentation can be performed by several segmentation of oneclass versus the others.

For this semi-supervised classification problem, we propose tosolve the Dirichlet problem associated with our newly introducedfamily. We present here a study of this problem, so as a proof ofexistence and uniqueness of its solution.

Broadly speaking, the Dirichlet problem is a boundary valueproblem of the following type: find a function f 2 HðVÞ such thatDw;a;bf ðuÞ ¼ 08u 2 V . Let G ¼ ðV ; E;wÞ a weighted and connectedgraph, A � V a set of vertices and g : @A! R a function definedon the boundary of A. We consider the following equation that de-scribes the Dirichlet problem associated to our newly introducedLaplacian operator.

ethods. (a) Initial seeds. (b) Ground truth. (c) Powerwatershed. (d) Random Walker.

Page 5: Infinity Laplacian on graphs with gradient terms for image and data clustering

S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72 69

Dw;a;bf� �

ðuÞ ¼ 0 u 2 A

f ðuÞ ¼ gðuÞ u 2 @A

(ð26Þ

Lemma 1. Dw;a;bf ðuÞ ¼ 0 implies that f has no regional maxima.

Proof. By contradiction, suppose v0 2 A is a regional maxima of f.Then, if F # A is a connected subset of vertices with v0 2 F, wehave f ðv0Þ ¼max

Ff > f ðuÞ;8u 2 @F. Let H ¼ fu 2 F; f ðuÞ ¼ f ðv0Þg,

then 8u 2 @H;maxv�u

f ðuÞ � f ðvÞð Þþ ¼ 0 < maxv�u

f ðuÞ � f ðvÞð Þ�, i.e.,aðuÞk rþwf

� �ðuÞk1 < bðuÞk r�wf

� �ðuÞk1 which is equivalent to

Dw;a;bf ðuÞ < 0, what provides a contradiction. Therefore, Dw;a;bf ðuÞ¼ 0 implies that f has no regional maxima. h

Theorem 1. Given a graph G ¼ ðV ; E;wÞ, a set A � V and a functiong : @A! R where @A is the boundary of A. Then, there exists a uniquefunction f 2 HðVÞ such that f verifies the following equation:

aðuÞk rþwf� �

ðuÞk1 � bðuÞk r�wf� �

ðuÞk1 ¼ 0 u 2 Af ðuÞ ¼ gðuÞ u 2 @A

(ð27Þ

First, let us prove the uniqueness of the solution.

Proof. This proof is based on the contradiction principle using theproperty showed by Lemma 1, as the fact that having Dw;a;b ¼ 0implies that f has no regional maxima.

Suppose we have two functions f and h with f 6 h on @A. Wewant to show that this implies f 6 h. By contradiction, we supposethat we have M ¼maxðf � hÞ > 0 on A. Let H ¼ fu 2 V ; f ðuÞð�hðuÞÞ ¼ Mg and F ¼ fu 2 H; f ðuÞ ¼max

Hfg. If u 2 H, then ðf � hÞ

reaches its maximum at u. This implies that aðuÞk r�wf� �

ðuÞk1P aðuÞk r�wh

� �ðuÞk1 and bðuÞk rþwf

� �ðuÞk1 6 bðuÞk rþwh

� �ðuÞk1. By

hypothesis, we have Dw;1f ðuÞ ¼ Dw;1hðuÞ, therefore, 8v 2 H :

k r�wf� �

ðuÞk1 ¼ k r�wh� �

ðuÞk1k rþwf� �

ðuÞk1 ¼ k rþwh� �

ðuÞk1

(ð28Þ

Now let us show that the set F is a regional maxima. By contradic-tion, we can choose v0 2 @þF such that f ðv0ÞP max

Ff for

u 2 F; v0 � u. Then, it follows that rþw;1f� �

ðuÞ ¼ wuv0 f ðv0Þ � f ðuÞð Þ.Since v0 R H, we must have f ðv0Þ � hðv0Þ < f ðuÞ � hðuÞ. Thusf ðv0Þ � f ðuÞ < hðv0Þ � hðuÞ and so rþw;1h

� �ðuÞ > rþw;1f

� �ðuÞ, con-

tradicting (28). h

Now, let us prove existence.

Proof. For this demonstration, we consider the following non localaveraging operator

NLA fð ÞðuÞ ¼ f ðuÞ þ aðuÞk rþwf� �

ðuÞk1 � bðuÞk r�wf� �

ðuÞk1 ð29Þ

First, we recall the Brouwer fixed point theorem: A continuousfunction from a convex, compact subset of an Euclidean space toitself has a fixed point.

Fig. 2. Comparative segmentation evaluation on Microsoft’s Grabcut database. Seetext for details.

Then, we identify HðVÞ as Rn and consider the setK ¼ ff 2 HðVÞjf ðuÞ ¼ gðuÞ8u 2 @A; andm 6 f ðuÞ 6 M8u 2 Ag, wherem ¼ min

@AgðuÞð Þ and m ¼max

@AgðuÞð Þ. By definition, K is a convex and

compact subset of Rn.

It is easy to show that the map f ! NLA fð Þ is continuous andtake from K to K. So, by the Brouwer fixed point theorem, the mapNLA has a fixed point that is solution of NLA fð Þ ¼ f . This completethe proof. h

This proof leads to a the following simple digital algorithm:

f 0ðuÞ ¼ gðuÞ 8u 2 V

f nðuÞ ¼ f n�1ðuÞ þ Dw;a;bf n�1� �ðuÞ 8u 2 V

ð30Þ

3.3. Unsupervised classification problem

A convenient way to perform unsupervised classification con-sists in plotting random seeds on the data to cluster, and then per-form a diffusion in order to propagate seeds along the graphaccording to the similarity criterion.

In this Section, we consider the unsupervised classificationproblem as a diffusion problem involving our newly introducedfamily of discrete operator.

@f@t ¼ Dw;a;b

� �f

f ðu; t ¼ 0Þ ¼ XAðuÞ � XAc ðuÞ

(ð31Þ

Where A is a randomly chosen subset of V and XA is the charac-teristic function of a set A. Similarly to the semi-supervised prob-lem, this equation leads to the following simple digital algorithm.

f 0ðuÞ ¼ 1 8u 2 A

f 0ðuÞ ¼ �1 8u 2 B ¼ V � A

f nðuÞ ¼ f n�1ðuÞ þ Dw;a;bf n�1� �ðuÞ 8u 2 V

ð32Þ

It is easy to show that this iterative algorithm converge to the PdEDw;a;bf� �

¼ 0.

4. Illustrations

In this section, will consider two concrete illustrations of ournewly introduced operator applicability in images segmentationand data clustering, using both semi-supervised and unsupervisedapproaches. In each example we consider both images and data asparticular graphs, with specific topologies.

In both cases, it consists in clustering the set of vertices in twoclasses A and B, from a set of seeds (chosen by the user or randomlyplaced) and our newly introduced operator. The a and b functionsare data dependent and hold a similarity measure between thevalue function at a vertex u and the mean of the value functionof initial sets A and B.

Our goal here is not to provide a detailed comparison with allstate of the art approaches, but to illustrate the behavior of ourapproach and show its potentialities.

4.1. Graph construction

There exists several popular methods to transform discrete datafx1; . . . xng into a weighted graph structure. Considering a set ofvertices V such that data are embedded by functions of HðVÞ, theconstruction of such graph consists in modeling the neighborhoodrelationships between the data through the definition of a set ofedges E and using a pairwise distance measure l : V � V ! Rþ. Inthe particular case of images, the ones based on geometric neigh-borhoods are particularly well-adapted to represent the geometry

Page 6: Infinity Laplacian on graphs with gradient terms for image and data clustering

Fig. 3. Semi-supervised data clustering. (a) User seeds. (b) Classification result. See text for more details.

Fig. 4. Unsupervised data clustering. (a) Initial configuration (random). (b) Classification result. See text for more details.

70 S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72

of the space, as well as the geometry of the function defined on thatspace. One can quotes:

Grid graphs which are most natural structures to describe animage with a graph. Each pixel is connected by an edge to itsadjacent pixels. Classical grid graphs are 4-adjacency gridgraphs and 8-adjacency grid graphs. Larger adjacency can beused to obtain nonlocal graphs. Region adjacency graphs (RAG) which provide very useful and

common ways of describing the structure of a picture: verticesrepresent regions and edges represent region adjacencyrelationship. k-neighborhood graphs (k-NNG) where each vertex v i is con-

nected with it (s k-nearest neighbors according to l. Such con-struction implies to build a directed graph, as the neighborhoodrelationship is not symmetric. Nevertheless, an undirectedgraph can be obtained while adding an edge between two ver-tices v i and v j if v i is among the k-nearest neighbor of v j or if v j

is among the k-nearest neighbor of v i. k-Extended RAG (k-ERAG) which are RAGs extended by a k-NNG.

Each vertex is connected to adjacent regions vertices and to it’sk most similar vertices of V.

The similarity between two vertices is computed according to ameasure of similarity g : E! Rþ, which satisfies:

wðu;vÞ ¼gðu;vÞ if ðu; vÞ 2 E

0 otherwise

Usual similarity functions are as follow:

g0ðu;vÞ ¼ 1;g1ðu;vÞ ¼ exp �lðf 0ðuÞ; f 0ðvÞÞ=r2

� �with r > 0

where r depends on the variation of the function l and control thesimilarity scale.

Several choices can be considered for the expression of the fea-ture vectors, depending on the nature of the features to be used forthe graph processing. In the context of image processing, one canquote the simplest gray scale or color feature vector Fu, or thepatch feature vector Fs

u ¼S

v2WsðuÞFv (i.e, the set of values Fv wherev is in a square window WsðuÞ of size ð2sþ 1Þ � ð2sþ 1Þ centeredat a vertex pixel u), in order to incorporate nonlocal features.

4.2. Semi-supervised classification problem

In this subsection we present the application of our method tosemi-supervised classification. It consists in providing a classifica-tion of data in two clusters A and B given a few initial user’s seeds.The clustering is performed by interpolating the label function fover the graph and according to algorithm (30).

Image segmentation. Fig. 1 presents a qualitative comparison be-tween our approach and the state of the art of graph-based semi-supervised image segmentation approaches. The comparison isperformed on the Grabcut segmentation database with originalseeds. Results are presented for Power Watershed (using Couprie’simplementation), Random Walker and our approach with the fol-lowing adaptive a and b functions:

Page 7: Infinity Laplacian on graphs with gradient terms for image and data clustering

S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72 71

aðuÞ¼

Xv�u; f ðvÞ>f ðuÞ

wuv þX

v�u; f ðvÞ¼f ðuÞ

12 wuv

dðuÞ and bðuÞ¼1�aðuÞ: ð33Þ

With this definition, the diffusion is associated to a dilation processif the vertex u has a strong similarity with positive label, and asso-ciated to an erosion process if vertex u has a strong similarity withnegative label. Finally, the graph built for the random walker andour approach is a nonlocal graph (each pixel is linked with everypixel in a 3� 3 window centered on it) but without patches (eachpixel is characterized by its own color vector).

A comparative evaluation of our approach on the whole Grabcutdatabase is presented in Fig. 2. The segmentation is based on theGrabcut set of seeds that are rather close to object boundaries. Thatis why every presented approaches provide good results even onsharp and textured images. The evaluation is performed usingtwo standard measures as the Dice coefficient and the ProbabilisticRand Index [31]. Our approach efficiency is very close to other ap-proaches and slightly better (for every image, our approach has aslightly better score than the two other approaches).

Data clustering. Fig. 3, presents an illustration of semi-super-vised data clustering using our approach and the same adaptivea and b functions that for image segmentation. Labels are givenby the red and blue boxes, where the data are a set of 200handwritten digits represented by feature vectors of R28�28 and or-ganized as a k-nearest-neighbors graph. With our approach, 100%of ‘1’ digits and 92% of ‘0’ digits are well recognized.

4.3. Unsupervised classification problem

In this Section, we now consider unsupervised classificationproblem without apriori on initial data. Using our approach, it con-sists in providing a classification of data in two clusters A and B,where the sets A and B, but without users seeds. Sets A and B aresimply and randomly initialized from the set V such thatA [ B ¼ V and A \ B ¼ ; (see Fig. 4 (a)). The clustering is performedusing our approach and the diffusion process (32), using the fol-lowing a and b functions that mimic the speed in the Chan andVese active contour model [32]:

aðu;tÞ¼ DðlAðtÞ;FðuÞÞDðlAðtÞ;FðuÞÞþDðlBðtÞ;FðuÞÞ

and bðu;tÞ¼1�aðu;tÞ ð34Þ

where D is the Euclidean distance and lAðtÞ, respectively lBðtÞ, isthe mean characteristic vector of vertices inside A, respectively B,at time t.

At steady state, due to a and b functions that hold a belongingcriterion to each classes A and B, the set of vertices is split in twohomogeneous classes where the intra class similarity tends to amaximum and the inter class similarity tends to a minimum.

Data clustering Fig. 4, presents an illustration of unsuperviseddata clustering using our approach. Labels are given by the redand blue boxes, where the data are a set of handwritten digits rep-resented by feature vectors of R28�28 and organized as a k-nearest-neighbors graph. With our approach, 98% of digits are wellrecognized.

5. Conclusion

In this paper, we have introduced a new family of operators ongraphs for semi-supervised and unsupervised classification. Theseoperators interpolate between two morphological gradient opera-tors previously introduced on graphs, and are linked with the dis-crete infinity Laplacian. Then, we consider the semi-supervisedclassification problem and propose a study of the Dirichlet problemassociated with this new family of operator to handle it. We haveshown the proof of existence and uniqueness of solutions of this

problem and proposed an implementation. Similarly, we have con-sidered unsupervised classification problem and proposed a studyof the diffusion problem associated with this new family of opera-tors to handle it. Finally, we have shown the behavior and potentialof this new family of operators for semi-supervised and unsuper-vised classification of images and data.

References

[1] Y. Peres, O. Schramm, S. Sheffield, D.B. Wilson, Tug-of-war and the infinityLaplacian, J. Am. Math. Soc. 22 (2009) 167–210.

[2] A. Elmoataz, X. Desquesnes, O. Lézoray, Non-Local Morphological PDEs and p-Laplacian Equation on Graphs with applications in image processing andmachine learning, IEEE J. Sel. Top. Sign. Process. (2012).

[3] Y. Boykov, M.-P. Jolly, Interactive graph cuts for optimal boundary & regionsegmentation of objects in N-D images, in: Proc. ICCV, vol. 1, 2001, pp. 105–112.

[4] L. Grady, Random walks for image segmentation, IEEE Trans. Pattern Anal.Mach. Intell., 0162-8828 28 (2006) 1768–1783.

[5] X. Bai, G. Sapiro, Geodesic matting: a framework for fast interactive image andvideo segmentation and matting, Int. J. Comput. Vision 82 (2009) 113–132.

[6] A.X. Falcão, J. Stolfi, R. de Alencar Lotufo, The image foresting transform:theory, algorithms, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 26(2004) 19–29.

[7] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm basedon immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1991)583–598.

[8] G. Bertrand, On topological watersheds, J. Math. Imaging Vision 22 (2–3)(2005) 217–230.

[9] J. Cousty, G. Bertrand, L. Najman, M. Couprie, Watershed cuts: minimumspanning forests and the drop of water principle, IEEE Trans. Pattern Anal.Mach. Intell. 31 (8) (2009) 1362–1374.

[10] C. Couprie, L. Grady, L. Najman, H. Talbot, Power watershed: a unifying graph-based optimization framework, IEEE Trans. Pattern Anal. Mach. Intell. 33 (7)(2011) 1384–1399.

[11] C. Couprie, L. Grady, H. Talbot, L. Najman, Combinatorial continuous maximumflow, SIAM J. Imaging Sci. 4 (33) (2011) 905–930.

[12] U. von Luxburg, A tutorial on spectral clustering, J. Stat. Comput. 17 (2007)395–416.

[13] R.R. Coifman, S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal. 21 (1)(2006) 5–30.

[14] M.H. Jansen, P.J. Oonincx, Second Generation Wavelets and Applications,Springer, 2005.

[15] D.K. Hammond, P. Vandergheynst, R. Gribonval, Wavelets on graphs viaspectral graph theory, Appl. Comput. Harmon. Anal. 30 (2) (2011) 129–150.

[16] W. Schwalm, B. Moritz, M. Giota, M. Schwalm, Vector difference calculus forphysical lattice models, Phy. Rev. 59 (1999) 1217–1233.

[17] P. McDonald, R. Meyers, Nonlinear elliptic partial difference equations ongraphs, Trans. Am. Math. Soc. 354 (12) (2002) 5111–5136.

[18] E. Bendito, A. Carmona, A. Encinas, Difference schemes on uniform gridsperformed by general discrete operators, Appl. Numer. Math. 50 (2004) 343–370.

[19] A. Elmoataz, O. Lézoray, S. Bougleux, Nonlocal discrete regularization onweighted graphs: a framework for image and manifold processing, IEEE Trans.Image Process. 17 (7) (2008) 1047–1060.

[20] L. Grady, J.R. Polimeni, Discrete Calculus: Applied Analysis on Graphs forComputational Science, Springer, 2010.

[21] V.-T. Ta, A. Elmoataz, O. Lézoray, Nonlocal PDEs-based morphology onweighted graphs for image and data processing, IEEE Trans. Image Process.20 (6) (2011) 1504–1516.

[22] M. Hein, T. Bühler, An inverse power method for nonlinear eigenproblemswith applications in 1-spectral clustering and sparse PCA, in: NIPS, 2010, pp.847–855.

[23] S. Bougleux, A. Elmoataz, M. Melkemi, Local and nonlocal discreteregularization on weighted graphs for image and mesh processing, Int. J.Comput. Vision 84 (2009) 220–236.

[24] V.-T. Ta, A. Elmoataz, O. Lézoray, Partial difference equations over graphs:morphological processing of arbitrary discrete data, in: Proc. ECCV, LNCS 5304,2008, pp. 668–680.

[25] A. Elmoataz, O. Lzoray, S. Bougleux, V.-T. Ta, Unifying local and nonlocalprocessing with partial difference operators on weighted graphs, in:International Workshop on Local and Non-Local Approximation in ImageProcessing (LNLA), 2008, pp. 11–26.

[26] G. Sapiro, R. Kimmel, D. Shaked, B.B. Kimia, A.M. Bruckstein, Implementingcontinuous-scale morphology via curve evolution, Pattern Recognit. 26 (9)(1993) 1363–1372.

[27] R.W. Brockett, P. Maragos, Evolution equations for continuous-scalemorphology, in: Proc. ICASSP, vol. 3 , pp. 125–128, 1992.

[28] R. Lerallut, E. Decencire, F. Meyer, Image filtering using morphologicalamoebas, Image Vision Comput. 25 (2007) 395–404.

[29] P. Maragos, C. Vachier, A PDE formulation for viscous morphological operatorswith extensions to intensity-adaptive operators, in: ICIP, 2008, pp. 2200–2202.

Page 8: Infinity Laplacian on graphs with gradient terms for image and data clustering

72 S. Alkama et al. / Pattern Recognition Letters 41 (2014) 65–72

[30] A.M. Oberman, A convergent difference scheme for the infinity laplacian:construction of absolutely minimizing lipschitz extensions, Math. Comput. 74(2005) 1217–1230.

[31] R. Unnikrishnan, C. Pantofaru, M. Hebert, Toward objective evaluation ofimage segmentation algorithms, IEEE Trans. Pattern Anal. Mach. Intell., 0162-

8828 29 (6) (2007) 929–944, http://dx.doi.org/10.1109/TPAMI.2007.1046. URL<http://dx.doi.org/10.1109/TPAMI.2007.1046>.

[32] T.F. Chan, S. Osher, J. Shen, The digital TV filter and nonlinear denoising, IEEETrans. Image Process. 10 (2001) 231–241.