computing the weighted ow complex - computer graphicscomputing the weighted ow complex joachim...

Computing the weighted flowcomplex

�

Joachim Giesen Matthias John

Institut fur Theoretische InformatikETH Zurich, CH-8092 Zurich, Switzerland�

giesen,john � @inf.ethz.ch

Abstract. The flow complex was introduced recentlyas a data structure on a finite number of points in �� .The flow complex is a two dimensional simplicial com-plex that turned out to be useful for modeling appli-cations like surface reconstruction. In order to applythe flow complex to tasks in bio-geometric modelingan extension to weighted points is needed. Here wereport on an algorithm to compute the weighted flowcomplex, its implementation and two applications inbio-geometric modeling.Keywords. flow complex, bio-geometric modeling

1 Introduction

In [9, 10] we introduced the flow complex as adata structure on finite point sets in �� and ap-plied it successfully to the problem of surface re-construction. The flow complex is a cell complexwhere each cell can be triangulated. It is closelyrelated to the Delaunay triangulation of the samepoint set. In fact, our efficient algorithm to com-pute the flow complex is based on the Delaunaytriangulation. But neither complex is a subcom-plex of the other.

Inspired by the work of Edelsbrunner et al. [6]we want to apply the flow complex also fortasks in bio-geometric modeling. In bio-geometrymolecules are often modeled as a union of ballsor positively weighted points. That is, the in-put that we have to structure is a finite set ofweighted points in contrast to unweighted pointsthat we structure in the flow complex. Hence toapply the flow complex to bio-geometric mod-eling an extension of its definition to weightedpoints is necessary. It turns out that the defi-nition can be extended quite easily to capturealso the weighted case. But this is not true forthe algorithm to compute the flow complex. Thereason is that the structure of the 1- and 2-cellsof the weighted flow complex can be much morecomplicated than their counterparts in the un-weighted flow complex. This is due to the fact�

Partly supported by the IST Programme of the EUand the Swiss Federal Office for Education and Sci-ence as a Shared-cost RTD (FET Open) Project un-der Contract No IST-2000-26473 (ECG - EffectiveComputational Geometry for Curves and Surfaces).

that not every weighted point corresponds to avertex in the weighted flow complex, but still canhave an influence on the structure. That makesit more complicated to derive the weighted flowcomplex from the weighted Delaunay diagramthan it is to derive the unweighted flow diagramfrom the Delaunay triangulation.

The purpose of this paper is to show how thecomplications caused by introducing weights canbe resolved and to demonstrate that the weightedflow complex can be a useful data structure forbio-geometric modeling.

The paper is organized as follows: In the secondsection we introduce the weighted flow complex.Starting point of our study is a distance functionassociated with a finite set of weighted pointsin �� . This function assigns to every point in ��its least power distance to any of the points in .It is intimately related to the power diagram of .The distance function has a unique direction ofsteepest ascent at almost every point in �� . Thepoints where such a direction does not exist arethe critical points of the distance function, i.e. itslocal extrema and saddle points. We study wherea point in �� flows if it always follows the direc-tion of steepest ascent of the distance function.It turns out that all points either flow into a crit-ical point or to infinity. The set of all points thatflow into a certain critical point is called the sta-ble manifold of this critical point. We call the col-lection of all stable manifolds the weighted flowcomplex induced by .

In the third section we give an algorithmiccharacterization of the 0-, 1- and 2-cells of theweighted flow complex and implicitly character-ize its 3-cells. The algorithm to compute the 2-cells is especially important since it reveals thenature of the flow complex in general, i.e. the re-cursive structure of the algorithm can in princi-ple be generalized to compute higher order cellsin weighted or unweighted flow complexes inhigher dimensions.

In the fourth section we present two appli-cations of the weighted flow complex in bio-geometric modeling. First, we use it to decomposemacromolecules into their constituents. Second,we give an alternative definition of pockets [6],i.e. essential cavities, in macromolecules basedon the weighted flow complex. Our definition isnot equivalent to the definition in [6]. We think itis easier to grasp since it avoids some technicali-ties and directly builds on the intuitive notion ofa pocket.

We conclude the paper with the fifth sectionwhere we discuss our implementations of thepresented algorithms and the results we got inthe aforementioned applications.

2 Joachim Giesen Matthias John

2 Weighted flow complex

A weighted point � in three dimensional Eu-clidean space is a tuple �� where �� de-notes the point itself and �� its weight. Everyweighted point gives rise to a distance function �� , namely the power distance func-tion. The power distance of a point �� froma weighted point � is defined as

� �� !��" $#Let be a finite set of weighted points. From allthe power distance functions %� �&�' , together,we derive a distance function ( � � � � � whichassigns to every point in �� its least power dis-tance to any weighted point in , i.e.

()��+*-,/.�"0$1 2� �3��4#We are interested in the gradient vector field of( and its critical points, i.e. its local minima, lo-cal maxima and saddle points. Extra care has tobe taken since ( is not smooth everywhere. Thatis, the ordinary theory of gradients and criticalpoints does not apply here. Instead we are goingto apply the critical point theory of distance func-tions that was developed by Grove [11]. To do sowe associate with every point �5 �� the subset6 �3��87 which contains the nearest neighborsof � in , namely

6 ��:9;�< � ()��=� 2� �3��?>@#Note that A 6 �3��BA)C�D . Let EF�3�� be the convex hullof the points in

6 �3�� . We call the point � critical ifit is contained in E'�3�� otherwise we call it regular.The index of a critical point � is the dimension ofEF�� . It can be shown that a critical point � is

a local minimum , if G2,/*HEF�3��=�5I .a saddle point , if G2,/*HEF�3��=�JD or K .a local maximum , if G2,/*HEF�3��=�5L .

The distance function ( and its critical points areclosely related to the power- and the weightedDelaunay diagram of . The power diagram of is a decomposition of �� into the power cells ofthe points in . The power cell of �M is givenas N � �:9O�< � � �QP�R !� 2� ��!S �T ��4>@�i.e. it contains all points of �� that do not have alarger power distance to � than to any other pointin . That is,

N � contains exactly the points wherevalue of ( is determined by � . The points thathave the same power distance from two weightedpoints in form a hyperplane. Thus

N � is ei-ther a convex polyhedron or empty. Closed facetsshared by two power cells are called power facets,

closed edges shared by three or more power cellsare called power edges and the points shared byfour or more power cells are called power vertices.The term power object can denote either a powercell, facet, edge or vertex. The power diagram of is the collection of all power objects. Note thatthe distance function ( is not smooth at �� if and only if � is contained in a power object ofdimension less than three.

The dual of the power diagram is calledweighted Delaunay diagram. For convenience wewant to refer to weighted Delaunay diagrams justas Delaunay diagrams in the following. The De-launay diagram of is a cell complex that de-composes the convex hull of the points in . Theconvex hull of four or more points in definesa Delaunay cell if the intersection of the corre-sponding power cells is not empty and there ex-ists no superset of points with the same prop-erty. Analogously, the convex hull of three or twopoints in defines a Delaunay face or Delaunayedge, respectively, if the intersection of their cor-responding power cells is not empty. Up to de-generate situations every point in is a Delau-nay vertex. The term Delaunay object can denoteeither a Delaunay cell, face, edge or vertex.

The definition of Delaunay diagrams providesus with a duality between power- and Delau-nay objects. That is, for every U -dimensionalpower object, IVSWUXSWL , there is a dual �&LY�UZ� -dimensional Delaunay object and vice versa.For more information see Aurenhammer [1]. Us-ing this duality we can characterize the criticalpoints of the distance function ( . It is easy to seethat a local minimum of ( is a point in that iscontained in its own power cell. We should notehere that not necessarily ever point in is con-tained in its own power cell. If � is either a sad-dle point or a local maximum then A 6 ��OA\[WD ,i.e. � is contained in a power object of dimensionless than three, and EF�� is a Delaunay object.In fact, we have the following theorem.

Theorem 1. The critical points of ( are the in-tersection points of power objects and their dualDelaunay objects. ]^

So far the critical point theory of distance func-tions allowed us to characterize the critical pointsof ( . Now we want to find an analog of the gradi-ent vector field for the distance function ( . Start-ing point is the observation that at any point thegradient vector of a smooth function points in thedirection of steepest ascent of the function. As areplacement for the gradient vector field we wantto assign to every regular point of ( the unit vec-tor that points in the direction of steepest ascentof ( . To the critical points we assign the zero vec-tor.

Computing the weighted flow complex 3

The direction of steepest ascent of ( at everypoint � �� can be characterized in terms of thedriver U�� of � .

Driver. For any point �+ �� let U��3�� be thepoint in EF�3�� closest to � . We call U�� the driverof � .

The driver of a point � will be also very impor-tant for our algorithms where we are going tomake use of the following more explicit charac-terization that allows to determine drivers effi-ciently.

Lemma 1. For a point � �� letN

be the lowestdimensional power object that contains � and let�

be the dual Delaunay object ofN

. The driver of� is the point in

�closest to � . Furthermore, all

points in the interior of a power object have thesame driver. ]^

But most importantly knowing the driver of apoint � allows to compute the direction of steep-est ascent of ( at � .Lemma 2. For any regular point �< �� let U��3�� bethe driver of � . The steepest ascent of the distancefunction ( at � is in the direction of � ��U��3�� . ]^

The unit vector field � � �� assigns toevery point in �< �� the direction of the steepestascent of ( at � , i.e.

��3�� U��3��4�� U��3��O� if �� U��3�� and I otherwise.

This vector field leads us to the main topic of ourstudy, the flow associated with the vector field � .The flow associated with � is a function

� �� I%��

such that its right derivative at every point �< ��satisfies the following equation

� , * �� 4� �� Z��

�Q�� 4#We often refer to the first argument of the flow astime.

An important notion associated with a flow isthat of a fixpoint. A point � �� is called a fix-point of

�if�� H� for all � CMI . It is not difficult

to see that the fixpoints of�

are exactly the crit-ical points of the distance function ( . Because ofthis observation we want to refer to a fixpoint of�

as a minimum, saddle or maximum if the cor-responding critical point of the height function isa minimum, saddle or maximum, respectively.

With the notion of a fixpoint at hand we canalso give an explicit description of the flow

�. For

all fixpoints � of�

we have of course

� ��4� ��=�5�� I��'��#Otherwise let U��3�� be the driver of � and � be theray originating at � and shooting in the direction�� . Let � be the first point on � whose driveris different from U��3�� . Note that such a � neednot exist in �� if � is contained in an unboundedpower object. In this case let � be the point atinfinity in the direction of � . We set:

� ��4�� H��! "��3�� I�� Q�B�For � C �� the flow is given as follows:

� ��4�� H�B� � � �#��B� � � �"�� H�B� � � �"� � � �B� � � �"�2��

If we fix the second argument of the flow�

tobe some point �< � � then we get the orbit or flowline of � , i.e. a function

�� I%�� $�&%� � ��4��4#The orbit describes the motion of � under the flow�. One can show that the orbits of regular points

are piecewise linear curves.The flow line of a regular point either connects

it with some fixpoint of�

or it leaves any compactsubset of �� in finite time. For every fixpoint � of�

we collect all points in �� that flow into � andcall the resulting set ' �� the stable manifold of� , i.e.

' �3��:9)( � � � � ,/* �*,+ ��- �� )>@#

Up to degeneracies the stable manifolds of thefixpoints of

�build a three dimensional cell com-

plex. We call this cell complex the weighted flowcomplex.

3 The cells of the weighted flowcomplex

In this section we have a closer look on the cellsof the weighted flow complex. These cells are thestable manifolds of fixpoints of the flow derivedfrom the unit vector field induced by a set ofweighted points. It turns out that the index of afixpoint, i.e. the index of the corresponding crit-ical point of the distance function, gives the di-mension of its stable manifold.


0-cells

From Theorem 1 we know that the local min-ima of the flow are weighted points that are con-tained in their dual power cell. In contrast to un-weighted case not every weighted point is con-tained in its dual power cell. It can even happenthat the dual power cell of a weighted point isempty. But it is algorithmically easy to check if aweighted point is contained in its dual power cellprovided one has already computed the power-and the Delaunay diagram.

As one expects from a local minimum � nopoint besides � itself flows into it. That is, the sta-ble manifold of � contains just � itself. The stablemanifolds of the local minima are the 0-cells ofthe weighted flow complex. Thus the 0-cells ofthe weighted flow complex are a subset of the setof weighted points.

1-cells

Up to degeneracies the stable manifolds of theindex 1 saddle points are the 1-cells of the flowcomplex. In fact, we can prove the followinglemma.

Lemma 3. Let � be an index 1 saddle of�. If the

stable manifold ' �� of � does not contain a pointon a power edge then the closure of ' �3�� is a sim-ple piecewise linear curve whose endpoints are lo-cal minima of

�. ]^

Note that the situations where ' �3�� does con-tain a point on a power edge are really degeneratein the sense that such a situation is not stableunder small perturbations of the weighted pointsthat determine

�.

Probably the easiest way to describe the stablemanifold of an index 1 saddle point is to give analgorithm to compute it. For the description ofthe algorithm we assume that the power- andthe Delaunay diagram of have already beencomputed such that we can query them.

ONECELL( Index-1-saddle � )1

N��

2 �� Delaunay edge that contains � .3 � � � 9 � �� >4 while � �� do5 choose ( U�� '9 ( U%> .6 (� � � first intersection point of the segment

from ( to U with a power facet � thatintersects (2U in only one point;or U if such a point does not exist.

7N � �

N��9 (� �>��

�9)( (� >

8 if (� !�� U do9 UZU� � � Delaunay edge dual to � .

10 � � ��9)(� /U� �>

11 end if12 end while

The algorithm ONECELL stores the vertices ofthe piecewise linear curve which is the stablemanifold of the index 1 saddle point � in a setN

. The edges of this curve are stored in a set � .In line 1 both sets sets are initialized with theempty set. From Theorem 1 we know that the in-dex 1 saddle point � is the intersection point of aDelaunay edge �� with its dual power facet. Wecompute this Delaunay edge in line 2 and initial-ize in line 3 a set � with the two line segments� � and �� that connect � with the endpoints ofthe Delaunay edge. We will always maintain theinvariant that the second endpoint of a line seg-ment stored in � is a Delaunay vertex. Lines 4and 12 enclose the main loop of the algorithm.While the set � is not empty we choose in line 5a line segment (2U �� and remove it from � . Weknow from Lemma 1 that the driver along a linesegment can only change at the boundary of apower object. In line 6 we determine the point (� where the driver along the line segment ( U mightchange. In line 7 we include this point in our ver-tex set

Nand the edge ( (� in the edge set � . If (�

equals U then we have reached a local minimumof the flow. That is, the stable manifold of � endshere. Otherwise we have to determine the driverof the point (� . This driver is the endpoint U� ofthe Delaunay edge UZU� dual to the power facet � .In line 10 we include the line segment (� /U� into �for further processing.

Note that the algorithm ONECELL does nottreat the degenerate case. We have shown in [8]for flow diagrams in two dimensions how to dealexplicitly with degenerate situations. In our im-plementation we use explicit perturbations.

The description of the algorithm ONECELL

shows that the stable manifold of an index 1 sad-dle point in general is a polygonal arc with manyvertices. That is in contrast to the unweightedcase where the stable manifolds of index 1 sad-dles are exactly the Gabriel edges, i.e. straightline segments. In fact, one can show that the sta-ble manifold of an index 1 saddle point can haveup to �-�;A �A � vertices.

2-cells

For the 2-cells of the flow complex we have alemma very similar to Lemma 3. This lemmastates that up to degeneracies the stable mani-folds of the index 2 saddle points are the 2-cellsof the flow complex.

Lemma 4. Let � be an index 2 saddle of�. If the

stable manifold ' �3�� of � does not contain a power


Fig. 1. On the left a set of positively weighted points in��shown as balls. On the right: The two local minima

and the saddle point of the point set from the left.

vertex then the closure of �� is a piecewise lin-ear surface with boundary. The boundary of thissurface is made up from 1-cells. �

As for the 1-cells the degenerate situation that�� does contain a power vertex is not stable un-der small perturbations of the point set � .

Again we want to turn to an algorithm todescribe the 2-cells explicitly. We assume thatthe power- and the Delaunay diagram of � havealready been computed.

TWOCELL( Index-2-saddle � )1 ��2 INFLOWEDGE( x )3 return

The algorithm TWOCELL initializes in line 1a set that is going to store all the trianglesthat will make up the stable manifold �� of theindex 2 saddle � . The set is a global variablethat can also be accessed by the subroutinesINFLOWEDGE and INFLOWFACET. In line 2 thealgorithm just calls a subroutine INFLOWEDGE

which we are going to describe next.

INFLOWEDGE( Point-on-a-power-edge � )1 �� Delaunay facet dual to the power edge

that contains � .2 for each Delaunay edge �� incident to �

whose endpoints lie on differentsides of its dual power facet � do

3 �� intersection of the triangle �!�!�� withthe power facet � .

4 "#�� endpoint of � different from � .

5 INFLOWFACET( �$%"&$'� )6 INFLOWFACET( �$%"&$'�!� )7 if " is not a saddle of index ( do8 INFLOWEDGE( " )9 end if

10 end for

The subroutine INFLOWEDGE recursively calls(and is called by) another subroutine INFLOW-FACET which reads in pseudocode as follows.

INFLOWFACET( Point � , Point � � , Driver � )1 )*� � power cell dual to � .2 +�� intersection of ) with triangle ��,�-�3 .� �/ 10�23+544 if � is not a local minimum do5 for each power facet � incident to ) that

is intersected by the triangle�!�!�6� in a line segment"�"7�98�1�!�� do

6 �!�!�:� � Delaunay edge dual to � .7 INFLOWFACET( ��"&$%";�<$'�!�6� )8 end for9 for each power edge = incident to ) that

is intersected by the triangle�!�!�6� in a point ">8�?�$%�� do

10 if " is not a saddle of index ( do11 INFLOWEDGE( " )12 end if13 end for14 end if

We are now going to describe the subroutinesINFLOWEDGE and INFLOWFACET in detail.

The subroutine INFLOWEDGE computes the setof points that flow into a point � which has to becontained in a power edge. See Figure 2 for anillustration.

v

s

f

d’

d

w

g

Fig. 2. Geometric objects that occur in the definition ofthe subroutine INFLOWEDGE


All the drivers of the points that flow into � arecontained in the Delaunay facet � dual to thepower edge that contains � . In line 1 we computethis Delaunay facet. Only a Delaunay edge UZU incident to � whose endpoints U and U� lie ondifferent sides of its dual power facet � can yielda two dimensional flow into � . That is, the edgeUZU� contains a driver that drives the points on acertain line segment � into � , but all the pointsthat flow into this line segment � eventually alsoflow into � . The line segment � is the intersectionof the triangle �ZUZU with the power facet � . Wecompute the line segment � in line 3 inside theloop enclosed by the lines 2 and 10 which iter-ates over all such Delaunay edges U@U� . Let � bethe second endpoint of � besides � . That is, � isthe point on � farthest away from � . We compute� in line 4. The flow into the line segment � iseither driven by U or U� or it flows into � via � .The first case is handled by the recursive callsof the subroutine INFLOWFACET in lines 5 and6. The second case is handled by the recursivecall of INFLOWEDGE in line 8 provided � isnot an index 1 saddle point. In the latter casewe have already reached the boundary of thestable manifold ' �3�� of the index 2 saddle point � .

The subroutine INFLOWEDGE computes the setof points that are driven by a Delaunay vertex �into a line segment � �� . The line segment � �� hasto be contained in a power facet. See Figure 3 foran illustration.

c

v

v’

w’

d

p

d’

w

f

Fig. 3. Geometric objects that occur in the definition ofthe subroutine INFLOWFACET

Let � be the power cell dual to U . All the pointsin the intersection of � with the triangle � �� U flowinto the segment � �� . In line 1 we determine thepower cell � and in line 2 its intersection � withthe triangle � �� U . We add � or a triangulation ofit to � in line 3. If U is a local minimum of theflow then � is exactly the triangle � �� /U and noother points flow into � �� . Otherwise more points

flow into � �� . These points can only flow into � �� through the intersection of the boundary of � withthe triangle � �� U . This intersection can be decom-posed into a collection of line segments on powerfacets and points on power edges. Since we as-sume non-degeneracy the triangle � �� U cannot in-tersect the boundary of � in a power vertex. Asfor the computation of the 1-cells in our imple-mentation we deal with degeneracies by explicitlyperturbing the point set . The line segments arethe intersections of the triangle � �� U with powerfacets � in the boundary of the power cell � . Wetake care of these line segments � � in the loopenclosed by the lines 5 and 8 by calling INFLOW-FACET recursively with input � � � � �;U� � in line 7.To do so we have determined U� in line 6 as thesecond endpoint of the Delaunay edge U@U� of apower facet � incident to the power cell � . In theloop enclosed by the lines 9 and 11 we take careof the points � in the intersection of the trian-gle � �� U with power edges in the boundary of thepower cell � . If � is not an index 1 saddle pointwe call the INFLOWEDGE recursively with � asits argument. Otherwise, i.e. � is an index 1 sad-dle point, we have already reached the boundaryof the stable manifold ' �� of the index 2 saddlepoint � .

Fig. 4. On the left: A 2-cell of the weighted flow com-plex. On the right: The 2-skeleton of a weighted flowcomplex

3-cells

We are not going to compute the 3-cells ofthe flow complex explicitly but only indirectlythrough the simplicial complex made up from all0-, 1- and 2-cells. Lets call this complex the 2-skeleton of the flow complex. With similar tech-niques as in the unweighted case one can provethe following theorem.

Theorem 2. In case that there are no degenera-cies the stable manifolds of the local maxima of a


flow�

are exactly the bounded regions of the 2-skeleton of the flow complex. ]^

That is, the 3-cells of the flow complex are ex-actly the stable manifolds of the local maxima ofthe flow

�.

4 Applications in bio-geometricmodeling

In this section we are going to demonstrate howthe weighted flow complex can be applied in mod-eling certain properties of macromolecules. Todo so we need at first a geometric model of themolecule at hand. A natural model is a union ofballs in �� where each ball in the union repre-sents an atom of the molecule. Such models arecalled space filling models [5, 12]. A ball is char-acterized by a pair �&�%��! ��& � I%�� , where �- ��denotes the center of the ball and � denotes itsradius. That is, a ball can be seen as a weightedpoint with positive weight. In space filling mod-els the balls are centered at the locations of thecorresponding atoms. Usually one gets these lo-cations from X-ray diffraction of the crystallizedprotein. The two most popular space filling mod-els differ in the radii they assign to the balls.

(1) In the van der Waals model the radius of aball is the van der Waals radius of the corre-sponding atom.

(2) In the solvent accessible model the radius ofa ball is the van der Waals radius of the cor-responding atom plus the radius of some sol-vent molecule also modeled as a ball. That is,the balls in the solvent accessible model arealways larger than the corresponding ballsin the van der Waals model. The solventis frequently taken to be a water molecule,modeled as a ball with radius radius 1.4 A(Angstrom).

We want to discuss two applications. The firstapplication aims for decomposing a molecule.Macromolecules are often a collection of pro-teins that are only loosely coupled. We want touse the 1-skeleton of the weighted flow complexto decompose a macromolecule in these con-stituents. In the second application we are go-ing to rephrase the concept of pockets in macro-molecules as developed by Edelsbrunner et al. [6]in terms of the 2-skeleton of the weighted flowcomplex.

In the following we assume that we are givena space filling model of macromolecule as in-put as a set of positively weighted points. Ourapproach can handle both types of space fill-ing models as input, but the solvent accessible

Fig. 5. Space filling models of some Neurotoxin pro-tein. On the the left: Van der Waals model. On theright: Solvent accessible model.

seems to be more reasonable from a chemicalperspective. The points in � model the atoms ofthe macromolecule and their weights the radii ofthese atoms. We say that a point � is containedin a space filling model if � is contained in theunion of balls that correspond to the points in � .We derive both the distance function � and theflow � from the point set � .

Decomposing macromolecules

When we try to decompose a macromolecule inits constituents we essentially try to predict thebonds between the atoms of the molecule. Thereshould not exist bonds between the different con-stituents of the macromolecule. Bader [2] has de-veloped a theory based on the gradient vectorfield of the charge density to predict bonds. Hewrites in his book,

The topology of � , the charge density, asdisplayed in the global properties of itsgradient vector field, yields a faithful map-ping of the chemical concepts of atoms,bonds, and structure.

The global properties mentioned in this citationare for example the critical points of the chargedensity. Bonds in this theory are represented asindex 2 saddle points. If we assume that the dis-tance function � approximates ��,� , i.e. the nega-tive of the charge density, to some extent then weshould be able to derive an approximation of thebond structure from the index 1 saddle points of� .

The collection of all 1-cells of the weighted flowcomplex, i.e. the collection of the stable mani-folds of the index 1 saddle points, forms the 1-skeleton of the flow complex. One can show thatthe 1-skeleton is connected. That is, we cannotdecompose a macromolecule by directly usingthe 1-skeleton of the flow complex induced by � .


Instead we are going to use a filtration of the 1-skeleton.

The distance function ( of course also takesvalues at the index 1 saddle points. We assignthese values to the corresponding 1-cells of theweighted flow complex. Let '�� be the subset ofthe 1-skeleton that contains all 1-cells whose as-signed value is smaller than or equal to � . Sincewe have

' �� '�� if � S� �these sets gives us a filtration of the 1-skeleton.

We found that setting � to -8 A (Angstrom)decomposes many macromolecules in its con-stituents if the input was a solvent accessiblemodel with solvent radius 1.4 A. See Figure 6for an example. If the input was a van der Waalsmodel a value of -1.2 A works well.

Fig. 6. Decomposition of a Rhinovirus into con-stituents. One constituent is highlighted.

Pockets in macromolecules

As we mentioned earlier pockets were intro-duced in [6] to model essential cavities in macro-molecules. Their definition is based on a spacefilling model of the molecule. Again let be theset of positively weighted points that model theatoms in the space filling model. We define a voidas a compact connected region in the comple-ment of the space filling model inside the con-vex hull of , i.e. compact regions in the convexhull of not occupied by the balls that modelthe atoms. If we let the weights of the points in grow new voids are created and existing ones getdestroyed. Finally all voids get destroyed. In [6]

the following growth model for the weight � of apoint �� was chosen:

�2�� for all � CHIThe weights are now a function of time. See Fig-ure 7 for two snapshots of such a growth processfor a set of weighted points in the plane. Note thatthe growth process does not change the power-and the Delaunay diagram.

Fig. 7. On the left: A space filling model of a moleculemade up from six atoms in two dimensions. The atomsof this molecule do not define a void. On the right: Ifwe grow the disks a void emerges which gets destroyedif we grow the disks further. The point at � is a maxi-mum, i.e. at � another void which was created earlierhas already been destroyed.

While the weights grow a void shrinks untilit finally consists of only a single point beforeit vanishes. For reasons that will become obvi-ous later we call these points positive maxima.We group positive maxima together if they canbe connected by a path in the complement of thespace filling model inside the convex hull of ,i.e. in the intersection of the convex hull of with the complement of the space occupied bythe balls that model the atoms. Essentially, thesegroups of positive maxima define a pocket. Edels-brunner et al. assign a shape to a pocket. Ourapproach to assign a shape to a pocket is basedon the weighted flow complex induced by . It isclosely related to the definition of voids and posi-tive maxima as we have presented them here, butputs these definitions in the context of a moregeneral theory.

We partition the set of critical points of the flowinduced by the balls in into two sets.

We partition the set of critical points of the flowinduced by the balls in into two sets. The firstset contains all critical points that are containedin the space filling model of the molecule. Thecritical points in the second set are all criticalpoints not contained in the first set, i.e. they re-


Fig. 8. On the left: The critical points of the flow in-duced by the points from Figure 7. The power diagramof the points is shown with dashed lines and the De-launay diagram with solid lines. The local maxima aredenoted as � , the saddles as � and the local minimaas � . Note that two of the maxima are at positionswhere voids get destroyed if we grow the disks. On theright: The 1-skeleton of the corresponding weightedflow complex.

side outside the space filling model. From the def-inition of the distance function ( it follows that (takes non-positive values on the first set of criti-cal points and positive values on the second set.That is why we call the critical points from thefirst set negative and the points from the secondset positive. The positive maxima that we usedwhen we introduced pockets are exactly positivemaxima as defined here. For a formal definitionof pockets we introduce the restricted flow com-plex as the subcomplex of the flow complex thatcontains all cells that correspond to positive crit-ical points. See Figure 9 for an example.

Fig. 9. On the left: An example of a restricted flow com-plex. This complex contains the stable manifolds of twosaddle points � and two local maxima � . It defines onepocket. At both maxima voids get destroyed if we growthe disks. On the right: The boundary (light blue) ofthe pocket on the left contains the stable manifolds ofthree saddles � . The mouth (gray) of the same pocketis the stable manifold of one saddle.

From the restricted flow complex we directlyderive our definition of a pocket.

Pocket. A pocket � is a maximal connectedcomponent of the restricted flow complex, i.e.the component is connected and there is nolarger connected component in the restrictedflow complex that contains it. Thus a pocket is acollection of cells corresponding to some subsetof positive critical points.

To visualize pockets we make use of the cellstructure of the weighted flow complex. We calla negative critical point a boundary critical pointof a pocket � if its corresponding cell in theweighted flow complex is contained in the bound-ary of a cell that corresponds to a positive criticalpoint in � . Note that the cells that correspond toboundary critical points all have to be containedin the 2-skeleton of the weighted flow complex.Instead of visualizing a pocket directly we will vi-sualize its boundary, i.e. the complex made upfrom the cells corresponding to its boundary crit-ical points. See Figure 9 for an example.

One can distinguish three types of pockets ac-cording to their number of openings to the out-side of the protein. The outside is defined as thestable manifold of a virtual maximum at infinity,i.e. the closure of the set of all points that flowto infinity under the flow

�induced by . We call

a positive critical point an opening critical point ifits stable manifold is contained in the boundaryof the stable manifold of the maximum at infinity.We call a maximal connected component in thesubcomplex of the restricted flow complex thatis build by the cells corresponding to the open-ing critical points a mouth. We use the number ofmouths of a pocket to classify pockets. A pocketis called,

(1) a void, if it is not incident to any mouth.A void is an unaccessible cavity, i.e. solventmolecules from outside the protein cannotenter a void. Voids sometimes have a stabi-lizing functionality for a protein.

(2) a normal pocket, if it is incident to exactly onemouth.

(3) a tunnel, if it is incident to more than onemouth. Proteins who function as an ion pumptypically contain a large tunnel.

5 Implementation and results

We implemented the algorithms ONECELL andTWOCELL using C++ and the Computational Ge-ometry Algorithms Library CGAL [4] which pro-vides fast and robust weighted Delaunay trian-gulations in three dimensions through its regulartriangulation package.


We tested our implementations on proteindatasets that we retrieved from the Protein Databank [3]. All tests were performed on a 480Mhz Sun Ultra Sparc II. Screenshots of some ofthe computed results can be found in Figures 6and 10. We used Geomview [7] for the renderingof the models. All data were computed from sol-vent accessible models of the molecules.

In Table 1 we summarize additional data of theshown molecules. It is interesting to note that inour experiments the runtime per atom is almostconstant, i.e. here it seems to be independent ofthe total number of atoms in the model.

Protein pdb key Atoms Time ms/AtomsNEUROTOXIN 1nxb 543 0.8 1.47GRAMICIDIN 1alz 850 1.7 2.00

OXYGEN 1mbd 1666 2.8 1.681XTHROMBIN 1ppb 2819 4.9 1.74RHINOVIRUS 4rhv 6542 11.6 1.772XTHROMBIN 1aho 6901 12.0 1.74

RECEPTOR a7gg 8580 15.2 1.77ALPHA-TOXIN 7ahl 22778 41.5 1.82NITROGENASE 1n2c 24432 43.5 1.78

Table 1. Basic data for several molecules. With the pdbkey molecule data can be retrieved from the ProteinData bank [3]. The timings are given in seconds andthe processing rate is given in milliseconds per atom.

References

1. F. Aurenhammer and R. Klein. Voronoi Dia-grams. In Handbook of Computational Geometry,J.-R. Sack and J. Urrutia (eds.), pp. 201–290, El-sevier (2000)

2. R.F.W. Bader. Atoms in molecules: a quantum the-ory. Clarendon Press (1990)

3. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland,T.N. Bhat, H. Weissig, I.N. Shindyalov and P.E.Bourne. The Protein Data Bank. Nucleic AcidsRes., 28, pp. 235–242 (2000)http://www.rcsb.org/pdb/

4. Computational Geometry Algorithms Libraryhttp://www.cgal.org

5. T.E. Creighton. Proteins: Structures and MolecularProperties. Second Edition, Freeman, New York,(1993)

6. H. Edelsbrunner, M. A. Facello and J. Liang. Onthe definition and the construction of pockets inmacromolecules. Discrete Apl. Math. 88, pp. 83–102, (1998)

7. http://www.geomview.org8. J. Giesen and M. John. A new diagram from disks

in the plane. In Proc. 19th Symp. Theoretical As-pects of Computer Science, pp. 238–249, (2002)

9. J. Giesen and M. John. Surface reconstructionbased on a dynamical system. Computer GraphicsForum 21(3), pp. 363–371, (2002)

10. J. Giesen and M. John. The flow complex: A dataStructure for Geometric Modeling. In Proc. 14thACM-SIAM Symp. Discrete Algorithms, pp. 285–294, (2003)

11. K. Grove. Critical Point Theory for Distance Func-tions. In Proceedings of Symposia in Pure Mathe-matics 54(3), pp. 357–385, (1993)

12. F.M. Richards. Areas, volumes, packing and pro-tein structures. Ann. Rev. Biophys. Bioeng., 6, pp.151–176, (1977)


(a) A decompositionof Alpha-toxin. Differ-ent protein chains arecolored differently.

(b) A sideview of (a). Onlyone protein chain is em-phasized.

(c) A zoom into thedecomposition of Nitro-genase. On the top,two nicely separated ATPmolecules.

(d) A tunnel in Alpha-toxin. No mouths areshown.

(e) A sideview of the tun-nel shown in (d).

(f) The tunnel shown in(d) and supporting nor-mal pockets. No mouthsare shown.

(g) Two Thrombin pro-teins bind at their activesites which gives riseto a tunnel. The threemouths of the tunnel areemphasized.

(h) A normal pocket inNeurotoxin. The mouthis emphasized.

(i) A space filling modelof Neurotoxin. Atomsat the normal pocketshown in (h) are high-lighted.

Fig. 10. Decompositions and pockets of some macromolecules

computing the weighted ow complex - computer graphicscomputing the weighted ow complex joachim...

Documents