a combinatorial solution to non-rigid 3d shape-to-image matching

10
A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching Florian Bernard 1,2,3 Frank R. Schmidt 3 Johan Thunberg 2 Daniel Cremers 3 1 Centre Hospitalier de Luxembourg, Luxembourg 2 Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg 3 Technical University of Munich (TUM), Germany (a) template (3D surface mesh) (d) matched template (b) data term (per-triangle) (c) smoothness term (neighbouring triangles) per-triangle rigid-body transformations 3D image (low cost) 0 i 2 SE(3) E ij (i ,⌧ j ) E i (i ) triangle neighbourhood low cost high cost 0 i ,⌧ 0 j i,⌧j Fi triangle i 2 SE(3) (high cost) i(Fi) 0 i (Fi) E()= f X i=1 E i (i )+ X (i,j)2E E ij (i ,⌧ j ) Figure 1. 3D Shape-to-Image Matching. (a) Given a shape as a triangular mesh, we associate to each triangle Fi its rigid transformation τi SE(3) such that the sum of data and smoothness terms is minimised. (b) The data term Ei (τi ) measures how well the transformed triangle τi (Fi ) fits into the volumetric image. (c) The smoothness term Eij (τi j ) penalises the discrepancy between transformed triangles τi (Fi ) and τj (Fj ). (d) Optimising E provides us with a shape-to-image matching. Abstract We propose a combinatorial solution for the problem of non-rigidly matching a 3D shape to 3D image data. To this end, we model the shape as a triangular mesh and al- low each triangle of this mesh to be rigidly transformed to achieve a suitable matching to the image. By penalising the distance and the relative rotation between neighbouring triangles our matching compromises between the image and the shape information. In this paper, we resolve two major challenges: Firstly, we address the resulting large and NP- hard combinatorial problem with a suitable graph-theoretic approach. Secondly, we propose an efficient discretisation of the unbounded 6-dimensional Lie group SE(3). To our knowledge this is the first combinatorial formulation for non-rigid 3D shape-to-image matching. In contrast to ex- isting local (gradient descent) optimisation methods, we ob- tain solutions that do not require a good initialisation and that are within a bound of the optimal solution. We evalu- ate the proposed combinatorial method on the two problems of non-rigid 3D shape-to-shape and non-rigid 3D shape-to- image registration and demonstrate that it provides promis- ing results. 1. Introduction Matching a shape template to an image is a well stud- ied problem in computer vision and image analysis. It gives rise to a wide range of applications, including image seg- mentation and object detection. An early approach for the detection of lines and parametrised curves in images is the voting-based Hough transform [13], which was later gener- alised to the detection of arbitrary shapes [1]. Whilst the Hough transform considers rigid shapes, the utilisation of shape information in image segmentation tasks has also been addressed in the non-rigid case, including methods based on active shape models [9], level sets [11], convex shape spaces [12], multiphase graph cuts [54], or statistical shape models [18, 59]. For the reconstruction of the shape of an object from a single 2D image, shape- arXiv:1611.05241v1 [cs.CV] 16 Nov 2016

Upload: vuonganh

Post on 31-Dec-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching

Florian Bernard1,2,3 Frank R. Schmidt3 Johan Thunberg2 Daniel Cremers3

1Centre Hospitalier de Luxembourg, Luxembourg2Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg

3Technical University of Munich (TUM), Germany

(a) template (3D surface mesh)

(d) matched template (b) data term (per-triangle) (c) smoothness term (neighbouring triangles)

per-trianglerigid-body

transformations

3D image

(low cost) 0i 2 SE(3)

Eij(i, j)Ei(i)

triangle neighbourhood

low cost

high cost

0i , 0j

i, j

Fitriangle i 2 SE(3)

(high cost)i(Fi) 0i(Fi)

E( ) =

fX

i=1

Ei(i) +X

(i,j)2EEij(i, j)

Figure 1. 3D Shape-to-Image Matching. (a) Given a shape as a triangular mesh, we associate to each triangle Fi its rigid transformationτi ∈ SE(3) such that the sum of data and smoothness terms is minimised. (b) The data term Ei(τi) measures how well the transformedtriangle τi(Fi) fits into the volumetric image. (c) The smoothness termEij(τi, τj) penalises the discrepancy between transformed trianglesτi(Fi) and τj(Fj). (d) Optimising E provides us with a shape-to-image matching.

Abstract

We propose a combinatorial solution for the problem ofnon-rigidly matching a 3D shape to 3D image data. Tothis end, we model the shape as a triangular mesh and al-low each triangle of this mesh to be rigidly transformed toachieve a suitable matching to the image. By penalisingthe distance and the relative rotation between neighbouringtriangles our matching compromises between the image andthe shape information. In this paper, we resolve two majorchallenges: Firstly, we address the resulting large and NP-hard combinatorial problem with a suitable graph-theoreticapproach. Secondly, we propose an efficient discretisationof the unbounded 6-dimensional Lie group SE(3). To ourknowledge this is the first combinatorial formulation fornon-rigid 3D shape-to-image matching. In contrast to ex-isting local (gradient descent) optimisation methods, we ob-tain solutions that do not require a good initialisation andthat are within a bound of the optimal solution. We evalu-ate the proposed combinatorial method on the two problems

of non-rigid 3D shape-to-shape and non-rigid 3D shape-to-image registration and demonstrate that it provides promis-ing results.

1. IntroductionMatching a shape template to an image is a well stud-

ied problem in computer vision and image analysis. It givesrise to a wide range of applications, including image seg-mentation and object detection. An early approach for thedetection of lines and parametrised curves in images is thevoting-based Hough transform [13], which was later gener-alised to the detection of arbitrary shapes [1].

Whilst the Hough transform considers rigid shapes, theutilisation of shape information in image segmentation taskshas also been addressed in the non-rigid case, includingmethods based on active shape models [9], level sets [11],convex shape spaces [12], multiphase graph cuts [54], orstatistical shape models [18, 59]. For the reconstructionof the shape of an object from a single 2D image, shape-

arX

iv:1

611.

0524

1v1

[cs

.CV

] 1

6 N

ov 2

016

from-template approaches aim to match a given 3D tem-plate to the image via a 3D-to-2D projection [46, 35, 42].Several authors have considered combinatorial formulationsof the non-rigid shape-to-image matching problem for cer-tain classes of shapes. For the case of matching contours[10, 49], or 2D chordal graph polygons [15], the resultingoptimisation problems can be solved globally. However, ageneralisation of these methods to 3D shapes is non-trivialand we are not aware of previous work addressing this issue.The purpose of this work is to fill this gap by presenting acombinatorial formulation for the non-rigid matching of a3D shape template to a 3D image. For that, we model theshape as a triangular mesh and allow each triangle Fi of thismesh to be independently transformed via a rigid transfor-mation τi ∈ SE(3). Using a discretisation of the unbounded6-dimensional Lie group SE(3), we formulate the matchingtask as a manifold-valued multi-labelling problem that canbe cast as minimising the energy

E(τ ) =

n∑

i=1

Ei(τi) +∑

(i,j)∈EEij(τi, τj). (1)

Here, the data term Ei(τi) takes the image information intoaccount, while the smoothness term Eij(τi, τj) measuresthe dissimilarity between the observed shape and the mod-elled shape prior. By penalising the distance and the rela-tive rotation of neighbouring triangles our matching com-promises between the image and the shape information. Ingeneral, minimising functions of the form in (1) is NP-hard[25].

1.1. Related Work

To the best of our knowledge the present paper is thefirst one that considers a combinatorial formulation of thenon-rigid 3D shape to 3D image matching problem. In thefollowing we will summarise methodologies that are mostrelevant to our work.

Continuous Optimisation: In many scenarios it is nat-ural to assume that image or shape deformations are spa-tially continuous and smooth. Frequently, such problemsare formulated in terms of optimisation problems over thespace of diffeomorphisms [14, 37, 2, 38]. Commonly, gra-dient descent-like methods are used to obtain (local) optimaof the (typically non-convex) problems. However, a majorshortcoming of these methods is that a good initial estimateis crucial and in general there are no bounds on the opti-mality of the solution. To deal with the non-convexity ofa 2D shape-to-image matching problem that is formulatedin terms of optimal transport, the authors in [48] propose touse a branch and bound scheme.

Shortest Paths and Dynamic Programming: In con-trast to the continuous local optimisation methods, many vi-sion problems can be formulated in a discrete manner such

that they are amenable to solutions based on graph algo-rithms and dynamic programming (DP) [16]. Since curvesare intrinsically one-dimensional, various curve matchingformulations can also be reduced to finding a shortest-pathin a particular graph. Moreover, based on a recursive formu-lation using easier-to-solve subproblems, matching prob-lems with templates that have a tree structure can frequentlybe tackled by DP. For a deformable matching of an opencontour to a 2D image, a global solution based on DP hasbeen proposed in [10]. Also based on DP, in [15] the authorspresent a method for solving the problem of deformablymatching a 2D polygon to a 2D image for chordal graphpolygons. In [49], the authors propose a globally optimalapproach for matching a closed contour to a 2D image basedon cycles in a product graph of the contour and the image.A related formulation that is also based on a product graphhas recently been introduced in [28] for deformable contourto 3D shape matching.

Graph-cuts: It is well known (see for example [4]) thatany cut of a graph can be interpreted as finding a closedmanifold of co-dimension 1 in the ambient space (e.g.,closed curve in 2D, closed surface in 3D, etc.). One suchexample is the reconstruction of a 3D shape from a set ofsparse 3D points, where the latter is represented on a dis-crete 3D grid [32].

Labelling Problems: Labelling problems are ubiqui-tous in computer vision and appear both in continuousand discrete settings [58]. The popular Markov RandomField (MRF) framework offers a Bayesian treatment thereof[34]. Also, linear programming relaxations of MRFs havebeen studied [55]. The continuous approaches to multi-labelling include various convex relaxations [43, 30, 51,17], multi-labelling problems with total variation regular-isation of functions with values on manifolds [31], as wellas sublabel-accurate convex relaxations [39, 29]. Amongthe discrete multi-labelling methods are the previously-mentioned graph-cuts, which can be used to find globalsolutions for certain binary labelling problems, includingproblems with submodular pairwise costs [24]. For a sub-class of multi-labelling problems a global solution can alsobe found [24]. This sub-class includes pairwise costs thatare convex in terms of totally ordered labels [21]. In addi-tion, efficient algorithms for finding local optima of generalmulti-labelling problems have been proposed [5], whicheven have theoretical optimality guarantees if the pairwiseterm is a metric in the label space. A more detailed de-scription of the energy functions that can be optimised usinggraph-cuts is given in [25, 23, 24].

1.2. Main Contributions

The main contribution of this paper is to present for thefirst time a combinatorial formulation of the non-rigid 3Dshape to 3D image matching problem. Whilst our problem

is a natural extension to the afore-mentioned “dimensionone” matching approaches [10, 15, 49, 28], a generalisationto (intrinsic) dimension two problems is more intricate. Wesummarise the most important aspects of our work as fol-lows:

• By using a surface mesh transformation model thatmakes use of per-triangle rigid transformations, weformulate the 3D shape to 3D image matching problemin terms of a manifold-valued multi-labelling problem.

• We introduce a pairwise term that defines a metricon the label space SE(3), which itself is a high-dimensional Lie group. With that, our energy functionis amenable to be minimised by the alpha-expansionalgorithm [5], which has been shown to work well inpractice, is efficient even for very large label spaces,and has theoretical optimality guarantees.

• In contrast to continuous optimisation methods thatuse gradient descent-like algorithms, our combinato-rial method does not require a good initialisation.

• In order to deal with the computationally challengingdiscretisation of SE(3), we propose to use a coarse-to-fine discretisation of the Lie group.

2. Non-Rigid 3D Shape-to-Image MatchingIn this section we first specify our objective, followed by

a description of the data term and smoothness term. Af-ter we have introduced the combinatorial formulation of theproblem, we describe the discretisation of the label space.In the section that follows we discuss the algorithmic so-lution of the problem, where also practical aspects like theimplementation of the coarse-to-fine discretisation of the la-bel space will be discussed.

2.1. Objective Function

In the following, we assume that a 3D shape S ⊂ R3

is given as a triangular mesh. This means we have n ∈ Ntriangles F1, . . . , Fn ⊂ R3 such that

S =

n⋃

i=1

Fi. (2)

We use the set E ⊂ 1, . . . , n2 to define the neighbour-hood between pairs of (different) triangles. We assume thatfor all (i, j) ∈ E the neighbouring triangles Fi and Fj arenon-disjoint and that the intersection Fi ∩ Fj results eitherin a common edge or a common vertex. Also, w.l.o.g. weassume that for each (i, j) ∈ E it holds that i < j, i.e.(i, j) ∈ E ⇒ (j, i) /∈ E .

Our objective is it now to match the 3D shape S ontoa volumetric image I : Ω → Rc, where Ω ⊂ R3 denotes

the compact image domain and c ∈ N describes the amountof image channels. While we are interested in a non-rigidshape-to-image matching, we like to favour matchings thatare as-rigid-as-possible, similar to the approach in [50] thatapplies (locally regularised) rigid transformations to eachvertex. However, in our case this is done by applying toeach triangle Fi a rigid transformation

τi = (τi, ~τi) ∈ SE(3) = SO(3) nR3, (3)

where τi ∈ SO(3) ⊂ R3×3 represents the rotational partand ~τi ∈ R3 represents the translational part of τi. The taskof finding the best matching can be formulated as minimis-ing the energy

E(τ ) =

n∑

i=1

Ei(τi) +∑

(i,j)∈EEij(τi, τj), (4)

where τ = (τ1, . . . , τn). In Section 2.2 we define the dataterm Ei(τi) that evaluates how well the transformed trian-gle τi(Fi) fits to the image data. In Section 2.3 we definethe smoothness term Eij(τi, τj) that measures the geomet-ric dissimilarity between the shape model S and the trans-formed shape

τ (S) :=

n⋃

i=1

τi(Fi). (5)

Using the proposed piecewise rigid transformation modelwe may end up with a model τ (S) having (small) gaps orintersections between neighbouring triangles. We will lateraddress this issue and present a simple yet effective way ofdealing with this irregularity.

2.2. Data Term

The data term Ei(τi) ∈ R measures how well the trans-formed triangle τi(Fi) fits to the image data I . For that,we introduce the score image J : Ω → [0, 1] that is de-rived from the image I (e.g. a gradient magnitude image, ormore advanced predictors based on neural networks). For atriangle F ⊂ Ω, we define

J [F ] :=

F

J(x)dx. (6)

With that, the value J [F ] indicates how well the triangle Ffits to the image data, where a high value of the score imageindicates a good fit. The data term is then given by

Ei(τi) = −J [τi(Fi)]. (7)

In the discrete setting, the data term Ei is computed bya weighted sum of function values −J(x) over the triangle.The weights take the rasterisation of the deformed triangleτi(Fi) in the image into account.

2.3. Smoothness Term

The pairwise term Eij(τi, τj) ∈ R+0 penalises the dis-

agreement between neighbouring triangles Fi and Fj afterthey have been transformed by τi and τj , respectively.

For defining the pairwise term we first introduce suitabledistances. The expression

dSO(3)(τi, τj) =

∥∥∥∥1

2log(τTi τj)

∥∥∥∥F

(8)

is the geodesic distance between the rotations τi and τjon SO(3) [20], with matrix logarithm log(·). For qiand qj being the quaternion representations of τi and τj ,one can efficiently compute the distance as dSO(3) =√

2 cos−1(|〈qi, qj〉|), where 〈·, ·〉 is the quaternion inner-product [20].

For defining a distance between neighbouring triangles,we make use of the concept of group actions. To be morespecific, we define

dSE(3),X(τi, τj) = maxx∈X‖τi(x)−τj(x)‖2 , (9)

where the group SE(3) acts on X ⊆ R3. In our case we useX = Fi ∩ Fj such that dSE(3),Fi∩Fj

(τi, τj) can be seen asdistance between the deformed triangles τi(Fi) and τj(Fj).In this case the maximum in dSE(3),Fi∩Fj

(τi, τj) is achievedat the common vertices of Fi and Fj , which is attractivefrom a computational point of view.

Using the introduced distances, we define our pairwiseterm as a weighted sum thereof, i.e.

Eij(τi, τj) = (10)λBdSO(3)(τi, τj) + λSdSE(3),Fi∩Fj

(τi, τj).

The purpose of the bending term, weighted by λB > 0,is to ensure that the rotational components that are appliedto neighbouring triangles are similar. The stretching term,weighted by λS > 0, ensures that neighbouring trianglesstay close together.

2.4. Combinatorial Formulation

A matching τ of shape S to the image I is given by asolution of the optimisation problem

minτ∈SE(3)n

E(τ ). (11)

Due to the non-convexity of the feasible set SE(3)n, it fol-lows that Problem (11) is non-convex. This non-convexitymakes it difficult to solve the problem directly over the un-bounded continuous space SE(3)n. Our approach is to opti-mise instead over a discretisation of the search space. Withthat, we obtain a multi-labelling problem, for which effi-cient and effective algorithms are available.

For the discretisation of SE(3) we make use of the factthat it is a product space of SO(3) and R3. Thus, we defineL ⊂ SO(3)nR3 = SE(3) to be the (finite) manifold-valuedlabel space that contains ` = |L| elements of SE(3).

Translations: The Lie group SE(3) is non-compact dueto the translational part being encoded by R3. However,since the image domain Ω is compact, the image size pro-vides natural bounds for a discretisation of the translations.Let nx, ny, nz be the number of voxels of the image I and`x, `y, `z be the number of labels for the x, y, z transla-tions. For convenience, and w.l.o.g., we assume that ourtemplate is defined relative to the centre of the image do-main, i.e. the template’s centre-of-gravity coincides withthe centre of the image. Moreover, w.l.o.g., we assumethat we are looking for a matching such that a substan-tial part of the (transformed) template lies inside the im-age1. Let us define Zm(n) = −n2 , . . . , 0, . . . , n2 to bethe set containing m evenly-spaced elements with centre 0,where m is an odd positive integer. The diameter n de-fines the difference between the largest and the smallest el-ements. A discretisation of the translations is given by theset ~L = Z`x(nx)×Z`y (ny)×Z`z (nz) with | ~L| = `x·`y·`z .

Rotations: Various works that are related to the discreti-sation of SO(3) have previously been addressed in the liter-ature. These include sampling strategies for rigid-body pathplanning [27], an approximation of the neighbourhood inSO(3) based on vector distances [33], or a comparison andanalysis of various metrics for 3D rotations [20]. Our dis-cretisation of SO(3) is based on the Hopf fibration, whichdescribes SO(3) in terms of the circle S1 and the 2-sphereS2. The intuition of this approach is to transfer a discreti-sation of S1 and S2 to the space of rotations. We refer theinterested reader to [57] for a detailed description. Let Ldenote the so-obtained set of a uniform sampling of SO(3)containing ˜= |L| elements.

By optimising E over the label space Ln, we now obtainthe combinatorial optimisation problem as

minτ∈Ln

E(τ ). (12)

3. Algorithm

In order to solve Problem (12), we use the alpha-expansion algorithm [5], which has the main idea to updateone label at a time in a greedy manner. Whilst it has the re-quirement that the pairwise term is a metric, it is appealingboth from a practical and a theoretical point of view. To bemore specific, the alpha-expansion algorithm is efficient, itis well-known to be robust with respect to initialisation, andit is guaranteed that the obtained local optimum lies withina factor of the global optimum. We now show that in our

1If this is not the case, one can increase the size of the image accord-ingly.

case the pairwise term is a metric and thus we can use thealpha-expansion algorithm for solving (12).

Lemma 1 Let X ⊆ R3 be non-empty and τi, τj , τk ∈SE(3). The stretching term

dSE(3),X(τi, τj) = maxx∈X‖τi(x)−τj(x)‖2 ,

is a pseudometric, i.e. it satisfies

(i) dSE(3),X(τi, τj) ≥ 0, dSE(3),X(τi, τi) = 0,

(ii) symmetry: dSE(3),X(τi, τj) = dSE(3),X(τj , τi), and

(iii) the triangle inequality:

dSE(3),X(τi, τk) ≤ dSE(3),X(τi, τj)+dSE(3),X(τj , τk).

Proof: (i) and (ii) follow directly from the definition. Thetriangle inequality holds, since

dSE(3),X(τi, τk) = maxx∈X‖τi(x)−τk(x)‖2

= maxx∈X‖τi(x)− τk(x) + τj(x)− τj(x)‖2

≤ maxx∈X

(‖τi(x)− τj(x)‖2 + ‖τj(x)−τk(x)‖2

)

≤ maxx∈X‖τi(x)− τj(x)‖2 + max

x∈X‖τj(x)−τk(x)‖2 ,

= dSE(3),X(τi, τj) + dSE(3),X(τj , τk).

Proposition 1 For λS , λB > 0 and (i, j) ∈ E , the pairwiseterm Eij(·, ·) defined in (10) is a metric.

Proof: Due to the assumption in Section 2.1, for (i, j) ∈ Eit follows that X = Fi ∩ Fj 6= ∅. Thus, dSE(3),Fi∩Fj

(·, ·)is a pseudometric (Lemma 1). Since dSO(3)(·, ·) is knownto be a metric, in addition to the pseudometric conditions, italso holds that dSO(3)(τi, τj)=0⇔ τi=τj . SinceEij(·, ·) isa positive linear combination of a metric and a pseudomet-ric, in addition to the pseudometric conditions it also fulfilsEij(τi, τj)=0⇔ τi=τj . Hence Eij(·, ·) is a metric.

3.1. Coarse-to-Fine Processing

In practice, for a reasonably large number of labels ` =˜·`x·`y·`z , a direct solution of Problem (12) is intractable.In order to cope with this issue we propose to use a coarse-to-fine strategy that (approximately) solves Problem (12) atdifferent levels s of the label space. Let s=0 denote thecoarsest (initial) level and s=smax ≥ 0 the finest (final)level. Once a solution τ (s) has been obtained at level s,for running the algorithm at level s+1 the labelling is ini-tialised with τ (s) and the label space is updated accordingly.For computational efficiency, in the coarse-to-fine approacheach triangle has its own feasible label space. Let L(s)

i de-note this feasible label space for the i-th triangle at levels. Initially, on the base level s=0, the label spaces are the

same for each triangle, i.e. L(0)i = L(0). The general idea

for obtaining L(s+1)i is to consider a (uniform) discretisa-

tion of the neighbourhood of the transformation τ (s)i at levels, where the radius of the neighbourhood decreases acrossthe levels. Let us introduce a neighbourhood on SO(3):

Definition 1 (ε-ball on SO(3))The ball on SO(3) with radius ε and centre τ ∈ SO(3) isdefined as BSO(3)

ε (τ) = τ ∈ SO(3) : dSO(3)(τ, τ) < ε.

Next, we describe the coarse-to-fine structure of the labelspace based on its product space nature. Since each labelcan be written as

τ(s)i = (τ

(s)i , ~τ

(s)i ) ∈ L (s)

i × ~L (s)i ⊂ SO(3) nR3, (13)

we can consider the translations and rotations indepen-dently. Note that L (s)

i × ~L (s)i ⊂ SE(3) is not necessarily a

group anymore. By enforcing that τ (s)i ∈ L(s+1)i , the solu-

tion τ (s) at level s is also contained in the new label space.Thus, the energy cannot increase from level s to s+1.

Translations: We define ~L(0) := ~L (cf. Section 2.4).For obtaining the set of translations at level s+1 for trianglei, the new translation grid at level s+1 is centred at ~τ (s)

i .Moreover, the diameter from level s is reduced by a factorof two, leading to

~L (s+1)i := (14)

~τ(s)i +

(Z`x(

n(s)x

2)×Z`y (

n(s)y

2)×Z`z (

n(s)z

2)

),

where the vector-set addition is element-wise. Initially,n(0)x = nx, n

(0)y = ny and n(0)z = nz .

Rotations: Let L[r] denote a discretisation of (the en-tire) SO(3) at resolution r containing ˜[r] elements, where˜[r] increases with increasing r. L[r] should not be con-fused with L(s)

i , which is the (rotation) label space of thei-th triangle at level s that is to be defined below. Follow-ing the construction in [57], we obtain 5 resolutions of theSO(3) discretisation L[r] for r = 0, . . . , 4. After includ-ing the identity in L[r], the number of elements ranges from˜[0] = 577 to ˜[4] ≈ 2·106. Let us define a set that containsa fixed number of p elements from L[r] that are “closest” tothe identity, i.e.

L[r]p := L[r] ∩ BSO(3)

ε (Id), (15)

where for each resolution r the smallest ε that fulfils|L[r]p | ≥ p is used as radius.Now, we define L(0) := L[0], and

L (s+1)i := τ τ (s)

i : τ ∈ L[s+1]p ⊆ SO(3), (16)

which is the set of compositions of τ (s)i with all rotations

in L[s+1]p . We use p = ˜[0] = 577 for all levels s. For

the predefined SO(3) griddings at resolutions r = 0, . . . , 4there always existed an ε such that the above inequality istight, i.e. |L[r]

p | = ˜[0] = 577.By including the identity in L[s+1]

p , we make sure thatτ

(s)i ∈ L (s+1)

i . Since 0 ∈ Z·(·), it follows that ~τ (s)i ∈

~L (s+1)i . Thus, we have τ (s)

i ∈ L (s+1)i , ensuring that the

energy cannot increase when moving from level s to s+1.

3.2. Practical Considerations

In this section we describe some aspects for the applica-tion of the proposed method in practice.

3.2.1 Mesh Connectivity

After applying an individual rigid-body transformation toeach triangle of the template mesh, in general the resultingmesh may have gaps or intersections between neighbouringtriangles. However, due to the introduced regulariser, thesegaps or intersections can be expected to be rather small. Forrecovering the original mesh topology, we replace each sub-set of vertices having the same position in the mesh templateby their centre-of-gravity after the transformation.

3.2.2 Memory Requirements

For running our algorithm we pre-compute the data term,which has memory requirements of O(n·`) (we also evalu-ated its computation on the fly, which has constant memoryrequirements but increased the runtime significantly). Sincepre-computing the full pairwise term would require memoryof O(n2·`2), we only precompute the bending term dSO(3),which has memory requirements of O(˜2). The stretchingterm dSE(3),X is computed on the fly.

4. ResultsFor the evaluation of our method we focus on demon-

strating the general applicability of our approach. Thepoint-set registration problem [3, 44, 7, 33, 41, 40, 22,19, 45, 36] and the related correspondence problem for 3Dshapes [52, 56, 26, 6] are problems that can be tackled us-ing our method by recasting them as shape-to-image match-ing problem. Hence, in our evaluation below, in addition to3D image segmentation, we also consider the case of de-formable mesh registration.

4.1. Deformable Mesh Registration

In the first set of experiments we demonstrate that ourmethod can be used to perform deformable mesh registra-tion. To emphasize that our method is insensitive to initial-isation, we compare it exemplarily with the Coherent Point

Drift (CPD) algorithm [41, 40], a widely-used point-cloudregistration method based on Expectation Maximisation.

Template and Target: For the evaluation we use a low-resolution mesh of a bunny shape2 as template (n = 498), asshown in Fig. 2 (top left). A total of 20 deformed versions ofthe bunny mesh, each with a random pose, are used as reg-istration targets. For that, we synthetically create deformedversions of a high-resolution bunny mesh (Fig. 2, top right)based on random 3D displacement vectors defined at 8 con-trol points on a cubic grid. These displacement vectors arethen transferred to the mesh using a spline-based interpo-lation in order to achieve a smooth and nonlinear deforma-tion. One such deformed mesh is shown in Fig. 2 (bottomleft). Eventually, a random pose transform is applied to thedeformed shape, as shown in Fig. 2 (bottom right).

Figure 2. Bunny meshes. Top left: template. Top right: high-resolution target. Bottom left: deformed target (with original tar-get overlay). Bottom right: deformed target with random pose.

Score Image: In order to use our method for meshregistration we create a score image for each target meshand then fit the template mesh to the score image. Ford : Ω → R+

0 we denote by d(x) the distance of positionx to the boundary of the target mesh. Now, we define thescore image as J(x) = exp(−d(x)β ), where we used β=2.

Parameters: For running our method, we set λS =54· g

max(nx,ny,nz) and λB = 19· gπ , where g = ncmax

2|E| isa normalisation factor that takes the problem size into ac-count. The positive number cmax is an upper bound forthe largest possible absolute value of the data term for asingle triangle (cf. eq. (6)), which we compute by multi-plying the largest value of the score image by the area ofthe largest triangle. The size of the label space is |L(0)| =

2from the Stanford University Computer Graphics Laboratory

Figure 3. Qualitative results for registrations of the bunny tem-plate to the deformed target with random pose (cf. Fig. 2, bottomright). Top left: CPD result, where the shape is completely de-stroyed. Top right: CPD error. Bottom left: our result. Bottomright: our error.

0 50 100 150 2000

20

40

60

80

100

error

%ve

rtic

es

ourCPD

Figure 4. Percentage of vertices (vertical axis) which have an errorthat is smaller than or equal to the value on the horizontal axis.

93·577 = 420,633, and the dimensions of the score imagerange from 2083 to 2623, which resulted in an average pro-cessing time of≈ 92 minutes per registration on a MacBookPro (2.5GHz, 16GB).

Results: In the case of CPD, we first solve for arigid registration and then for a non-rigid registration (per-forming a non-rigid registration directly performed worse).Since CPD is highly initialisation-sensitive it fails in 17 outof the 20 evaluated cases, where one representative failurecase is depicted in Fig. 3 (top row). This extreme amount ofcorrupt registrations emphasizes the necessity of a methodthat is robust with respect to initialisation. In contrast, in all20 cases our method is able to achieve a good registration,see Fig. 3 (bottom row) for a representative result. In Fig. 4we present a quantitative evaluation.

Discussion: Whilst we do not claim to present an ex-haustive evaluation of mesh registration methods, the pur-pose of this experiment is to demonstrate the insensitivityto initialisation of our method in a proof of concept manner.One advantage of our proposed approach is that we neitherhave the necessity of compatible mesh topologies, nor of

compatible mesh discretisations, since in our case the targetis represented in terms of the score image. Since the scoreimage is a discrete representation of the target shape sur-face, our approach amounts to a surface-based registration,rather than a point-based registration as CPD, which is bi-ased towards deformations that align points to be as closeas possible. Moreover, the score image offers further flex-ibility since additional information can be integrated (e.g.uncertainties, mesh texture, shape features, etc.).

4.2. Segmentation

In the second set of experiments we apply our methodto the segmentation of four brain structures (substantia ni-gra & subthalamic nucleus as single object and the nucleusruber, both bilaterally) in 16 multi-modal 3T magnetic res-onance images. The delineation of the subthalamic nucleusis known to be a challenging task, even for humans [47].The main difficulties include weak image contrasts and thesmall size of the brain structures (the structures shown inFig. 5 are contained in a bounding box of ≈6×3×2.5cm3,with the MRI image covering a volume of ≈203cm3).

Template: For capturing the inter-relation between thebrain structures, we use a multi-object template (n=379), asshown in Fig. 5. The template connects neighbouring brainstructures by (degenerate) triangles, referred to as “phantomtriangles”, which are used only for the smoothness term andare “free” with respect to the data term.

Figure 5. Brain structure template.

Parameters: For running our method, we set λS =135· g

max(nx,ny,nz) and λB = 64· gπ , where g is definedanalogously to Section 4.1. The size of the label space is|L(0)| = 113·577 = 767,987, and the dimension of all scoreimages is 364×436×364, which resulted in an average pro-cessing time of ≈ 58 minutes per fitting.

Score Image: In order to tackle the image segmentationtask with our method, we use a data term that is based onthe recently proposed 3D U-Net convolutional neural net-work [8]. To be more specific, for all 16 images we train thenetwork in a leave-one-out manner for the prediction of vol-umetric segmentations. In the centre column of Fig. 7 threeexamples of the volumetric segmentations produced by the

so-trained networks are shown. It can be seen that for thischallenging segmentation task the U-Net is able to identifythe (rough) location of the brain structures, but does in manycases not produce an output that resembles the shape of thebrain structures (the first two rows in Fig. 7). Thus, we com-plement the U-Net segmentations with additional geometricinformation using our method.

For each of the four brain structures o ∈ 1, 2, 3, 4 weuse an individual score image Jo. Given the binary U-Netsegmentation Iunet

o : Ω → 0, 1 for brain structure o, wefirst extract for each brain structure the (predicted) bound-ary using morphological operations. For do : Ω → R+

0

we denote by do(x) the distance of position x to the so-extracted boundary. Then, we use a Gaussian kernel to de-fine the score image as

Jo(x) = wo exp(−−d2o(x)

2σ2o

). (17)

The weight wo and the standard deviation σo are used forincorporating the confidence about the U-Net segmentationIuneto of brain structure o. To this end, for

Yo = x ∈ R3 : Iuneto (x) = 1 (18)

being the one-level set of Iuneto that represents the point-

cloud of segmented voxels, we use the average of the (per-coordinate) median absolute deviation (MAD) as robustdispersion measure, computed as

MAD(Yo) =1

3‖MAD(Yo)‖1, (19)

for MAD(Yo) = median(|Yo −median(Yo)|) ∈ R3.

The median is understood in a per-coordinate sense, andboth the set-vector difference as well as the absolute valueare understood in an element-wise manner. Now, for Y ′odenoting the point-cloud of segmented voxels of structure oas given by the template, the absolute value of the averageMAD difference between the U-Net segmentation and thetemplate is given by ho = |MAD(Yo)−MAD(Y ′o)|. Withthat, we define the standard deviation as σo = ρ(ho+1),which we scale by ρ=3. Thus, if the average MAD for theU-Net segmentation and the template are equal, the standarddeviation corresponds to ρ, whereas a larger difference indispersion leads to a larger standard deviation, accountingfor more uncertainty in the U-Net segmentation. Moreover,we define wo = 1

ho+1 such that an increased uncertaintyin the U-Net segmentation of brain structure o leads to adecreased weight for its data term.

Results: For the evaluation of the segmentation we usethe Dice Similarity Coefficient (DSC) as volumetric overlapmeasure, which is defined as 2|Y∩Y′|

|Y|+|Y′| , for Y and Y ′ eachbeing point-clouds of segmented voxels. We compute theDSC for each individual brain structure, and then report the

average of the four DSC values. Quantitative results com-paring the plain U-Net segmentation and our obtained seg-mentations are presented in Fig. 6. The boxplot on the leftreveals that in overall our method achieves higher volumet-ric overlaps across the 16 cases. Moreover, the plot of sortedDSC differences on the right emphasizes that applying ourmethod improves the DSC in almost all cases (the valuesabove zero), and in only a few cases it is reduced slightly.

our unet

0.2

0.4

0.6

DSC

best median worst

0

0.2

0.4

DSC

our−

DSC

unet

Figure 6. Left: Boxplot of the DSC of our method versus the U-Net segmentation. Right: Sorted DSC differences for the 16 cases(values above zero indicate an improvement upon U-Net).

ground truth U-Net our

best

med

ian

wor

st

Figure 7. Qualitative results for the brain structure segmentationexperiments. Each row shows a different instance of results, wherefrom top to bottom we present the best, median and worst DSCdifferences (cf. Fig. 6, right).

In Fig. 7 qualitative results for three segmentation casesthat correspond to the best, median and worst cases in Fig. 6(right) are shown. In the first row of Fig. 7 it can be seenthat in some cases our method is even able to achieve a rea-sonable segmentation based on a poor U-Net segmentation.

5. ConclusionWe introduced the first combinatorial method for non-

rigidly matching a 3D shape to a 3D image. The key ideais to represent the 3D shape as a triangular mesh and to

solve a manifold-valued multi-labelling problem on the setof triangles. We determine an assignment of a rigid-bodytransformation associated with each triangle by minimis-ing a cost function where the unary terms encode the localmatching cost in the image and the pairwise terms penalisethe amount of non-rigidity in the deformation. In particu-lar, we propose an efficient discretisation of the unbounded6-dimensional Lie group of rigid motions. Moreover, wesolve the large and NP-hard optimisation problem with agraph theoretic algorithm that is insensitive to initialisationand has the guarantee that the obtained solution is within afactor of the global optimum [5]. Experimental validationconfirms these benefits.

Acknowledgements

The authors would like to thank O. Cicek for helpfulfeedback regarding the U-Net implementation, as well asBenedikt Staffler and Ankush Gupta for general feedbackon CNNs. CNN-related computations presented in this pa-per were carried out using the HPC facilities of the Univer-sity of Luxembourg [53] – see http://hpc.uni.lu. Theauthors gratefully acknowledge the financial support by theFonds National de la Recherche, Luxembourg (6538106,8864515). This work was supported by the ERC Consol-idator Grant 3D Reloaded.

References[1] D. H. Ballard. Generalizing the Hough transform to detect

arbitrary shapes. Pattern Recognition, 13(2):111–122, 1981.1

[2] M. F. Beg, M. I. Miller, A. Trouve, and L. Younes. Comput-ing large deformation metric mappings via geodesic flows ofdiffeomorphisms. International Journal of Computer Vision,61(2):139–157, 2005. 2

[3] P. J. Besl and N. D. McKay. A method for registration of 3-Dshapes. TPAMI, 14(2):239–256, 1992. 6

[4] Y. Boykov and V. Kolmogorov. Computing Geodesics andMinimal Surfaces via Graph Cuts. ICCV, 2003. 2

[5] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate en-ergy minimization via graph cuts. TPAMI, 23(11):1222–1239, 2001. 2, 3, 4, 9

[6] Q. Chen and V. Koltun. Robust Nonrigid Registration byConvex Optimization. ICCV, 2015. 6

[7] H. Chui and A. Rangarajan. A new point matching algo-rithm for non-rigid registration. Computer Vision and ImageUnderstanding, 89(2):114–141, 2003. 6

[8] O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, andO. Ronneberger. 3D U-Net: Learning Dense Volumetric Seg-mentation from Sparse Annotation. In MICCAI, 2016. 7

[9] T. F. Cootes and C. J. Taylor. Active Shape Models - SmartSnakes. In BMVC, pages 266–275, 1992. 1

[10] J. Coughlan, A. Yuille, C. English, and D. Snow. Efficientdeformable template detection and localization without userinitialization. Computer Vision and Image Understanding,78(3):303–319, 2000. 2, 3

[11] D. Cremers. Dynamical statistical shape priors for level set-based tracking. TPAMI, 28(8):1262–1273, 2006. 1

[12] D. Cremers, F. R. Schmidt, and F. Barthel. Shape priors invariational image segmentation: Convexity, lipschitz conti-nuity and globally optimal solutions. In CVPR, 2008. 1

[13] R. O. Duda and P. E. Hart. Use of the Hough transformationto detect lines and curves in pictures. Communications of theACM, 15(1):11–15, 1972. 1

[14] P. Dupuis, U. Grenander, and M. I. Miller. Variationalproblems on flows of diffeomorphisms for image matching.Quarterly of applied mathematics, pages 587–600, 1998. 2

[15] P. F. Felzenszwalb. Representation and detection of de-formable shapes. TPAMI, 27(2):208–220, 2005. 2, 3

[16] P. F. Felzenszwalb and R. Zabih. Dynamic Programming andGraph Algorithms in Computer Vision. TPAMI, 33(4):721–740, 2011. 2

[17] B. Goldluecke, E. Strekalovskiy, and D. Cremers. Tight con-vex relaxations for vector-valued labeling. SIAM Journal onImaging Sciences, 6(3):1626–1664, 2013. 2

[18] T. Heimann and H.-P. Meinzer. Statistical shape models for3D medical image segmentation: A review. Medical ImageAnalysis, 13(4):543–563, 2009. 1

[19] R. Horaud, F. Forbes, M. Yguel, G. Dewaele, and J. Zhang.Rigid and articulated point registration with expectation con-ditional maximization. TPAMI, 33(3):587–602, Mar. 2011.6

[20] D. Q. Huynh. Metrics for 3D Rotations: Comparison andAnalysis. Journal of Mathematical Imaging and Vision,35(2):155–164, June 2009. 4

[21] H. Ishikawa. Exact optimization for Markov random fieldswith convex priors. TPAMI, 25(10):1333–1336, 2003. 2

[22] B. Jian and B. C. Vemuri. Robust Point Set Registration Us-ing Gaussian Mixture Models. TPAMI, 33(8):1633–1645,2011. 6

[23] V. Kolmogorov and Y. Boykov. What Metrics Can BeApproximated by Geo-Cuts, Or Global Optimization ofLength/Area and Flux. ICCV, 2005. 2

[24] V. Kolmogorov and C. Rother. Minimizing non-submodularfunctions with graph cuts – a review. TPAMI, 29(7):1274–1279, 2007. 2

[25] V. Kolmogorov and R. Zabih. What energy functions canbe minimized via graph cuts? TPAMI, 26(2):147–159, Feb.2004. 2

[26] A. Kovnatsky, M. M. Bronstein, X. Bresson, and P. Van-dergheynst. Functional correspondence by matrix comple-tion. In CVPR, 2015. 6

[27] J. J. Kuffner. Effective sampling and distance metrics for 3Drigid body path planning . In ICRA, Jan. 2004. 4

[28] Z. Lahner, E. Rodola, F. R. Schmidt, M. M. Bronstein,and D. Cremers. Efficient Globally Optimal 2D-to-3D De-formable Shape Matching. CVPR, 2016. 2, 3

[29] E. Laude, T. Mollenhoff, M. Moller, J. Lellmann, and D. Cre-mers. Sublabel-Accurate Convex Relaxation of VectorialMultilabel Energies. ECCV, 2016. 2

[30] J. Lellmann and C. Schnorr. Continuous multiclass label-ing approaches and algorithms. SIAM Journal on ImagingSciences, 4(4):1049–1096, 2011. 2

[31] J. Lellmann, E. Strekalovskiy, S. Koetter, and D. Cremers.Total variation regularization for functions with values in amanifold. In ICCV, 2013. 2

[32] V. S. Lempitsky and Y. Boykov. Global Optimization forShape Fitting. CVPR, 2007. 2

[33] H. Li and R. Hartley. The 3D-3D Registration Problem Re-visited. In ICCV, 2007. 4, 6

[34] S. Z. Li. Markov random field modeling in computer vision.Springer Science & Business Media, 2012. 2

[35] A. Malti, A. Bartoli, and R. Hartley. A Linear Least-SquaresSolution to Elastic Shape-From-Template. CVPR, 2015. 2

[36] H. Maron, N. Dym, I. Kezurer, S. Kovalsky, and Y. Lip-man. Point registration via efficient convex relaxation. ACMTransactions on Graphics (TOG), 35(4):73, 2016. 6

[37] P. W. Michor and D. Mumford. Riemannian Geometries onSpaces of Plane Curves. In Journal of the European Mathe-matical Society, 2004. 2

[38] M. I. Miller, A. Trouve, and L. Younes. Geodesic shootingfor computational anatomy. Journal of Mathematical Imag-ing and Vision, 24(2):209–228, 2006. 2

[39] T. Mollenhoff, E. Laude, M. Moeller, J. Lellmann, andD. Cremers. Sublabel-Accurate Relaxation of NonconvexEnergies. In CVPR, 2016. 2

[40] A. Myronenko and X. Song. Point Set Registration: Coher-ent Point Drift. TPAMI, 32(12):2262–2275, Dec. 2010. 6

[41] A. Myronenko, X. Song, and M. A. Carreira-Perpinan. Non-rigid point set registration: Coherent Point Drift. In NIPS,2007. 6

[42] S. Parashar, D. Pizarro, A. Bartoli, and T. Collins. As-Rigid-as-Possible Volumetric Shape-from-Template. ICCV, 2015.2

[43] T. Pock, T. Schoenemann, G. Graber, H. Bischof, and D. Cre-mers. A convex formulation of continuous multi-label prob-lems. In ECCV, 2008. 2

[44] A. Rangarajan, H. Chui, and F. L. Bookstein. The softassignprocrustes matching algorithm. In IPMI, 1997. 6

[45] A. Rasoulian, R. Rohling, and P. Abolmaesumi. Group-wiseregistration of point sets for statistical shape models. TMI,31(11):2025–2034, Nov. 2012. 6

[46] M. Salzmann and P. Fua. Reconstructing sharply folding sur-faces: A convex formulation. In CVPR, 2009. 2

[47] J. R. Schlaier, C. Habermeyer, J. Warnat, M. Lange,A. Janzen, A. Hochreiter, M. Proescholdt, A. Brawanski, andC. Fellner. Discrepancies between the MRI- and the elec-trophysiologically defined subthalamic nucleus. Acta Neu-rochirurgica, 153(12):2307–2318, July 2011. 7

[48] B. Schmitzer and C. Schnorr. Globally optimal joint im-age segmentation and shape matching based on Wasser-stein modes. Journal of Mathematical Imaging and Vision,52(3):436–458, 2015. 2

[49] T. Schoenemann and D. Cremers. A combinatorial solutionfor model-based image segmentation and real-time tracking.TPAMI, 32(7):1153–1164, 2010. 2, 3

[50] O. Sorkine and M. Alexa. As-rigid-as-possible surface mod-eling. Symposium on Geometry Processing, 2007. 3

[51] E. Strekalovskiy, B. Goldlucke, and D. Cremers. Tight con-vex relaxations for vector-valued labeling problems. ICCV,2011. 2

[52] O. Van Kaick, H. Zhang, G. Hamarneh, and D. Cohen Or.A survey on shape correspondence. In Computer GraphicsForum, pages 1681–1707, 2011. 6

[53] S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos. Man-agement of an Academic HPC Cluster: The UL Experience.In Intl. Conf. on High Performance Computing & Simulation,2014. 9

[54] N. Vu and B. S. Manjunath. Shape prior segmentation ofmultiple objects with graph cuts. In CVPR, 2008. 1

[55] T. Werner. A linear programming approach to max-sumproblem: A review. TPAMI, 29(7):1165–1179, 2007. 2

[56] T. Windheuser, U. Schlickewei, F. R. Schmidt, and D. Cre-mers. Geometrically consistent elastic matching of 3Dshapes: A linear programming solution. ICCV, 2011. 6

[57] A. Yershova, S. Jain, S. M. LaValle, and J. C. Mitchell. Gen-erating Uniform Incremental Grids on SO(3) Using the HopfFibration. The International journal of robotics research,29(7):801–812, June 2010. 4, 5

[58] C. Zach, C. Hane, and M. Pollefeys. What is optimizedin convex relaxations for multilabel problems: Connectingdiscrete and continuously inspired map inference. TPAMI,36(1):157–170, 2014. 2

[59] S. Zhang, Y. Zhan, M. Dewan, J. Huang, D. N. Metaxas, andX. S. Zhou. Sparse shape composition: A new frameworkfor shape prior modeling. In CVPR, 2011. 1