c 2011 nithin pradeep thazheveettil - university of...

ON SAMPLING AND RECONSTRUCTION OF DISTANCE FIELDS

By

NITHIN PRADEEP THAZHEVEETTIL

A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE

UNIVERSITY OF FLORIDA

2011

c⃝ 2011 Nithin Pradeep Thazheveettil

2

I dedicate this to my parents.

3

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Alireza Entezari for all his support and guidance

without which this thesis would not have been possible. I would also like to thank my

thesis committee members, Dr. Jorg Peters and Dr. Anand Rangarajan for their help

and suggestions. Finally, I thank my family and friends whose constant support kept me

going when all else failed.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Body Centered Cubic Sampling Lattice . . . . . . . . . . . . . . . . . . . . 111.3 Compute Unified Device Architecture . . . . . . . . . . . . . . . . . . . . . 131.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1 Sampling Distance Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Reconstruction of Volumetric Data . . . . . . . . . . . . . . . . . . . . . . 202.3 Sampling and Reconstruction on Body Centered Cubic Lattice . . . . . . 22

3 SAMPLING DISTANCE FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1 Brute Force Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Point-Triangle Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Graphics Processing Unit Implementation . . . . . . . . . . . . . . . . . . 263.4 Generating True Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 CUBIC INTERPOLATION IN BODY CENTERED CUBIC LATTICE USINGHERMITE DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1 Cubic Interpolation in Body Centered Cubic Lattice . . . . . . . . . . . . . 344.1.1 Interpolating Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.2 Smoothness And Approximation Order . . . . . . . . . . . . . . . . 394.1.3 Isotropic Finite-Differences on the Lattice . . . . . . . . . . . . . . 41

4.2 Tricubic Interpolation in Cartesian Cubic Lattice . . . . . . . . . . . . . . . 46

5 INTERPOLATION OF DISTANCE FIELDS AND EXPERIMENTS . . . . . . . . 49

6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 62

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5

LIST OF TABLES

Table page

3-1 Triangular meshes used for performance testing . . . . . . . . . . . . . . . . . 33

3-2 Comparison of execution times. The time taken by the GPU implementationis shown in the column ’GPU’ and that by the CPU implementation is shownin the column ’CPU’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4-1 The L2 norm error in reconstruction of datasets sampled at a resolution of111×111×222 on the BCC lattice using the proposed cubic interpolation. Theweighted least-squares approach is designed to use a zero mean Gaussianwith σ2 = .5 to estimate partial derivative values which show marginal improvementsin terms of reconstruction error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6

LIST OF FIGURES

Figure page

1-1 The BCC Lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1-2 Hierarchy of threads, warps, blocks and grids. . . . . . . . . . . . . . . . . . . . 15

1-3 CUDA memory model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3-1 The st-plane partitioned into 7 regions. . . . . . . . . . . . . . . . . . . . . . . . 26

3-2 Block diagram of the GPU implementation. . . . . . . . . . . . . . . . . . . . . 27

3-3 Two ways of adapting the GPU implementation for BCC lattices. . . . . . . . . 31

4-1 The Carp fish dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4-2 Weights of the finite-differencing kernel on the 27-point neighborhood of theBCC lattice for fx and fyz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5-1 The triangular meshes used in our experiments. . . . . . . . . . . . . . . . . . 50

5-2 The soccer ball dataset with approximately 512,000 samples. . . . . . . . . . . 52

5-3 The Stanford dragon dataset (side view) with approximately 512,000 samples. 53

5-4 The Stanford dragon dataset (front view) with approximately 512,000 samples. 54

5-5 The buddha dataset with approximately 512,000 samples. . . . . . . . . . . . . 55

5-6 The bunny dataset with approximately 615,000 samples. . . . . . . . . . . . . 56

5-7 The Pawn dataset with approximately 33,000 samples. . . . . . . . . . . . . . . 57

5-8 The Pawn dataset with approximately 200,000 samples. . . . . . . . . . . . . . 58



5-11 The Pawn dataset with approximately 2,095,000 samples. . . . . . . . . . . . . 61

7

Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the

Requirements for the Degree of Master of Science

ON SAMPLING AND RECONSTRUCTION OF DISTANCE FIELDS

By

Nithin Pradeep Thazheveettil

May 2011

Chair: Alireza EntezariMajor: Computer Engineering

In this thesis, we examine sampling and reconstruction of distance fields from

surfaces represented by triangular meshes. Motivated by sampling theory, we

explore the application of optimal sampling lattices in this context. To sample exact

distance values from a triangular mesh, we propose a Graphics Processing Unit(GPU)

implementation of the brute force approach and show how to adapt it to the Body

Centered Cubic(BCC) lattice. The exact gradients of distance fields can be computed

with relative ease and we believe that incorporating these values could improve

the quality of reconstruction of discrete distance fields. Hence, we discuss ways of

modifying our implementation to sample exact gradients with relatively few additional

computations. The suitability of BCC as a sampling lattice for distance fields and the

merits of using exact gradient data are evaluated by reconstructing and visualizing

distance fields sampled from various triangular meshes. To reconstruct the data on

BCC lattices, we introduce a cubic spline construction that is exactly interpolating at the

lattice points and can utilize true gradient values where available. We also compare and

contrast the images rendered from these datasets to those rendered using Catmull-Rom

interpolation on distance fields sampled on Cartesian Cubic(CC) lattices.

8

CHAPTER 1INTRODUCTION

1.1 Motivation

A distance field of an object is a scalar field around the object in which every point

holds the shortest distance from that point to the surface of the object.1 It can be

considered as an implicit representation of the object. On the surface of the object, the

field will have a value of 0, providing an implicit representation of the object as the zero

level set of the field. In addition to the distances themselves, distance fields can also

indicate other properties of a given point, such as the direction from that point to the

surface and whether the point is inside or outside the object. The gradient of the field at

any point gives the direction from that point to the closest point on the surface. Distance

fields can be signed or unsigned. In a signed distance field, every point is assigned a

sign in addition to the scalar value. This sign indicates whether the point is inside or

outside the object. Note that for this to be meaningful, the surface must be closed and

orientable, while distance fields in general do not require this property.

Distance fields have applications in a variety of fields like computer vision, physics,

medical imaging and computer graphics. They are used for proximity computations[26,

37], collision detection[48, 49], morphing[5, 8], skeletal animation[21], path planning[29,

32] and constructive solid geometry(CSG) operations[6, 39]. In the field of volume

graphics, distance fields are used for modeling, manipulation and visualization of

geometric objects. Techniques of accelerating visualization, like skipping empty voxels

during ray tracing based on distance values, also make distance field representation

attractive. In this thesis, we are primarily concerned with using distance fields to

represent and visualize surfaces represented by triangular meshes.

1 Parts of Chapters 2, 4, 5 and 6 are adapted with permission from M. Mirzargar, N.Thazheveettil, W. Ye and A. Entezari. Cubic Interpolation On The Body Centered CubicLattice. Submitted to Transactions on Visualization and Computer Graphics.

9

Since distance fields of geometric models are almost always too complex to be

represented in analytical form, they are sampled and stored in discrete form. The

sample locations are usually chosen to be points of a regular grid though adaptive

schemes that are better suited for specific applications have been suggested as

well [2, 20]. Non-regular sampling schemes are attractive for the adaptivity features;

however, efficient reconstruction is difficult and expensive scattered-data interpolation

techniques have to be employed. Moreover, uniform grid representation lends to an

easy way to analyze signed distance functions in Fourier space through Fast Fourier

Transform(FFT) of sampled data. In 3D, the most commonly used uniform sampling

lattice is the Cartesian Cubic (CC) lattice, though other lattices like the Body Centered

Cubic (BCC) lattice and the Face Centered Cubic (FCC) lattice have been shown to be

more efficient from the sampling theoretic point of view. In Section 1.2, we discuss the

CC and BCC lattices and their sampling efficiencies.

Each sample point in a discrete distance field of a surface holds the minimum

distance from that point to the surface. Computing this distance involves identifying

the point on the surface that is closest to the sample point. Since this thesis focuses

on surfaces represented by triangular meshes (’surface’ or ’object’ hereafter refers

to a surface represented by triangular mesh), this closest point can be identified by

identifying the triangle it is part of. Thus, a discrete distance field can be constructed

directly from the triangular mesh data using a brute force approach, i.e. computing the

distance from every sample point to every geometric primitive in the object. While this

method is accurate, it is also extremely slow, making it unsuitable for most applications.

Numerous other approaches have been suggested, mostly focusing more on speed than

accuracy. We discuss many of these approaches in Section 2.1. However, as there are

applications that require accurate values of distance fields, we try to address the issue

of generating accurate distance fields as fast as possible by pursuing techniques that

can accelerate the brute force method.

10

Distance fields so generated and the objects they represent are often consumed

visually. Visualization also helps in evaluating the quality of various discrete representations

of distance fields. Hence the topic of visualizing distance fields receives our attention

as well. Distance fields can either be visualized directly using a volume visualization

method like ray casting or be converted to a polygonal mesh using techniques like the

Marching Cubes algorithm (or the Marching Tetrahedra algorithm) and rendered using

a traditional surface rendering technique. In either case, the continuous distance field

must be reconstructed from the discrete values first, using some interpolation scheme.

While a wide variety of interpolation schemes could be used with distance fields, we

are particularly interested in those that make use of the availability of its true gradient

values, which can be computed along with the distance values themselves. Coming up

with such a scheme for the BCC lattice is another topic we address in this thesis.

The process of converting a polygonal mesh to a discrete distance field and back

almost always introduces errors in the mesh geometry. As the accuracy of this process

is often a key consideration in determining the applications it is suitable for, we devote

some attention to the accuracy of the sampling and visualization schemes we introduce

in this thesis.

The rest of this chapter is organized as follows. Section 1.2 talks about sampling

on the BCC lattice. Section 1.3 gives a brief introduction to Compute Unified Device

Architecture(CUDA) and in Section 1.4, we list the major contributions of this thesis.

1.2 Body Centered Cubic Sampling Lattice

The goal of optimal sampling is to capture the entire spectrum of the underlying

signal using the least number of samples. For a specific given signal, there is a unique

choice of optimal sampling lattice, which defines the points in space where the signal

is sampled. This lattice can be computed based on the geometric knowledge of the

spectrum of the signal. But for generic data, where such knowledge is not available,

11

regular lattices are used for sampling. In 3D, the most commonly used regular sampling

lattice is the Cartesian Cubic(CC) lattice.

The CC lattice is formed by the tensor-product of uniform sampling in the lower

dimensions. However, while simple and popular, the CC lattice has been shown to

be inefficient in sampling generic multivariate signals. Both the Body Centered Cubic

lattice(BCC) and Face Centered Cubic(FCC) lattice, which are 3D counterparts of the

hexagonal lattice, exhibit higher sampling efficiency (i.e. capture more information per

sample taken) than the CC lattice with the BCC lattice performing better on smooth

signals.

y

z

x

A B

Figure 1-1. The BCC Lattice. (A) The BCC lattice is formed by adding a lattice point(inblue) to the center of each cubic element of the CC lattice. (B)The uniquetetrahedralization of the BCC lattice is composed of the semi-regularcongruent tetrahedra.

The BCC lattice can be constructed from the CC lattice by adding a lattice point

to the center of each cubic element formed by 8 neighboring lattice points(part (A) of

Figure 1-1). Its relative superiority in sampling generic signals can be explained by a

12

frequency domain analysis of generic signals which we assume to be isotropic (i.e. not

biased in any direction) and band limited. Sampling a signal in a regular manner in the

spatial domain corresponds to periodically replicating its spectrum in the frequency

domain. For the sampling to be optimal, it should be done such that the spectrum is

replicated as densely as possible without overlapping in the frequency domain. The

spectrum of an isotropic band limited signal has a spherical support. So, the optimal

sampling lattice is that whose dual in frequency domain allows spheres to be packed as

densely as possible without overlap. It can be seen that among the lattices discussed,

the FCC lattice enables the most optimal sphere packing. Hence, its dual in the spatial

domain, the BCC lattice, performs best as a sampling lattice.

Every lattice point in the lattice has a neighboring region where it is the only lattice

point. This region can be defined using its Delauny diagram or alternately, its dual, the

Voronoi cells. A Voronoi cell of a lattice point is the set of points in its neighborhood

that are closest to it. For a given lattice point, the other lattice points whose Voronoi

cells share a face with its own Voronoi cell forms its ”‘first ring”’ of neighborhood. In

the case of BCC lattice, this neighborhood forms a rhombic dodecahedron, which

can be decomposed into congruent Delaunay tetrahedra. As the space of BCC lattice

is composed of such rhombic dodecahedra, the entire space can be partitioned into

congruent Delaunay tetrahedra. The rhombic dodecahedron and its tetrahedral partition

are shown in part (B) of Figure 1-1.

It is worth noting here that the Delaunay tetrahedralization of the BCC lattice is

unique. If the tetrahedralization isn’t unique, as is the case with the CC lattice, the

reconstruction of the sampled signal could depend on the choice of the tetrahedra and

hence could result in arbitrary and inconsistent reconstruction.

1.3 Compute Unified Device Architecture

Compute Unified Device Architecture (CUDA) is a general purpose parallel

programming architecture developed by NVIDIA. It makes the massive parallel

13

processing capabilities of the graphics processing unit(GPU) accessible to the user

for general purpose computations. The CUDA architecture allows CUDA enabled

GPUs to be programmed using a high level programming language called C for CUDA

which is based on ANSI C with a few CUDA specific extensions. CUDA programs can

contain code segments targeted to run on the CPU (referred to as host) as well as code

segments targeted to run on the GPU (referred to as device). The code segments that

run on the device are written as C style functions called kernels. CUDA kernels, along

with accompanying host code, are written in .cu files which are compiled using the nvcc

tool. Additional host code can also be written in separate C/C++ files which must be

compiled separately and linked using a standard C/C++ compiler and linker.

When a kernel is launched, a user specified number of CUDA threads are created

to execute the kernel. The number of threads to be created is specified in terms

of 3 dimensional arrays of threads called threadblocks. All threads within a block

can synchronize execution using a barrier primitive and share data through shared

memory. There is a limit on the number of threads that can be assigned to a block.

Multiple equally-shaped blocks are launched to generate the necessary number of

threads for the kernel. These blocks are arranged into a single 2 dimensional grid. The

number of threads assigned to a block (along with the memory resources each thread

requires) influences how well the processing power of the GPU is utilized, indicated by

a percentage value called occupancy. Thus, the number of threads should be carefully

chosen so as to maximize occupancy.

Threads are executed in groups of 32 called warps, in a Single Instruction Multiple

Thread (SIMT) manner. That is, at any given time, every thread in a warp executes

the same instruction. Parallelism is achieved by assigning different data units to each

thread. Thus, this is similar to the Single Instruction Multiple Data (SIMD) execution

model. The relationship between threads, warps, blocks and grids are shown in

Figure 1-2.

14

Figure 1-2. Hierarchy of threads, warps, blocks and grids.

Figure 1-3. CUDA memory model.

15

CUDA separates the host memory space from the device memory space. The

data that a kernel works with during execution must be in the device memory. A CUDA

enabled device has multiple types of memory as illustrated in Figure 1-3. Global,

constant and texture memory are accessible from the host and from every thread on the

device. Hence, data can be transferred directly from the host memory to these memory

spaces and accessed by any thread in the grid. All three are persistent (i.e. retain data)

across kernel launches by the same application. However, global memory can be written

to by the threads while constant and texture memory are read-only from the device.

Shared memory and registers are fast on-chip memory. Shared memory is allocated per

threadblock. Every thread in a threadblock has access to its shared memory but cannot

access the shared memory of other blocks. Registers are allocated per thread and are

used for local variables declared within the kernel and temporary variables used for

computations. The number of registers allocated to a thread depends on the number of

threads assigned to a block. Once the registers are used up, any additional variables are

stored in local memory, which is off-chip and considerably slower. Very large arrays and

data structures go into the local memory as well. Typically, data is loaded into registers

and shared memory before being processed to avoid large memory latencies.

For a more detailed documentation of CUDA, please refer the CUDA programming

guide[1].

1.4 Contributions

The primary contributions of this thesis are as follows.

We introduce an accelerated, GPU based implementation of the brute force method

to construct a discrete signed distance field on a CC lattice from triangular meshes. We

present ways of adapting this to work with BCC lattices and also discuss computing the

true gradient values of the distance field at the lattice points.

We propose a local cubic interpolation scheme in trivariate setting on the BCC

lattice that uses Hermite data (i.e. function values and its partial derivatives). We also

16

briefly summarize a local tricubic interpolation scheme proposed by Leiken and Marsden

[31] that uses Hermite data on the CC lattice.

Finally, we generate distance fields for various triangular meshes on both CC and

BCC lattices using the sampling technique proposed in this thesis. Then, we compare

and contrast the results of their visualization using the two interpolation schemes

mentioned previously along with the well known Catmull-Rom scheme.

17

CHAPTER 2RELATED WORK

2.1 Sampling Distance Fields

Jones et al. [27] present an excellent survey of the various methods used to

generate distance fields along with a comparison of their speed and accuracy. We

discuss a selection of those approaches that are relevant to the context of this thesis.

The naive approach to constructing a discrete 3D distance field from a triangular

mesh is to iterate through every grid point/voxel, compute the shortest distance from

it to every triangle in the mesh and store the minimum value. Payne and Toga in [41]

discuss this approach and suggest a few optimizations to accelerate it, like using

hierarchical bounding boxes in a tree structure to reduce the number of computations.

They also discuss an algorithm to find the shortest distance from a point to a triangle.

The Meshsweeper algorithm proposed by Gueziec [22] presents a dynamic algorithm

to find the shortest distance from a point to a polygonal mesh. It uses a hierarchy of

multilevel bounding boxes with the bounding boxes at each level completely enclosing

the mesh. These bounding regions are indexed into a priority queue based on the

minimum distance from the point to the region.

Maush [34, 35] presented the Characteristic/Scan Conversion method that

computes a distance field around a polygonal mesh up to a certain distance using scan

conversion. The point on a triangular mesh closest to a voxel must lie on either the face

of a triangle, an edge or a vertex. For each of these features, the approach constructs

a polyhedron that holds all the points in 3D space that are closest to that feature, up to

a certain distance from the feature. These polyhedrons are similar to truncated Voronoi

regions and are called characteristics of the feature. Then, for each feature, only the

distance to the points within the corresponding polyhedron need be computed. The

polyhedrons are scan converted to determine the points inside them and the distance

for the corresponding points computed. Sigg et al. [46] improved this algorithm by

18

using the graphics hardware to scan convert slices of the grid. The distance values are

computed using a fragment program. As the slicing is done on the CPU which can form

a bottleneck, the characteristics for the triangle face, edges and vertices are combined

to form a single polyhedron. Sud et al. [47] presents another hardware based method

which exploits properties like connectivity and spatial coherence in Voronoi regions to

cull the number of primitives considered for distance field computations for each slice

and restrict the region of computation around each primitive. This reduces the number of

distance functions computed per slice.

A different approach to computing distance fields, which yields approximate values,

is Distance Transforms. Here, the distance values for a narrow band around the mesh

surface are first computed and then propagated through the rest of the volume. Mullikin

[38] discusses applying one particular distance transform called the Vector Distance

Transform to 3D images. In Vector Distance Transforms, the vector connecting a point

to the closest point on the object surface is computed along with the distance values

and these vectors are propagated to the neighboring voxels and used to compute the

distance values for them. Satherley and Jones [42] introduce a faster and more accurate

Vector Distance Transform and discuss how to generate distance fields using it. Breen

et al. [6] present a wavefront propagation technique to generate distance fields for

CSG models with sub-voxel accuracy. They compute the shortest distance and closest

surface point for a set of points in the narrow band and propagate them to the rest of

the volume using a Fast Marching Method [25, 44, 45, 52]. A critical review of various

Distance Transform methods can be found in Cuisenaire [12].

Finally, a CUDA based approach to computing adaptive distance fields has been

suggested recently [40]. Like our implementation, this approach also uses a GPU

implementation of the brute force method (the naive method described at the beginning

of this section) to compute distance fields. Each CUDA thread is assigned a mesh

element (triangle) and the sample points are fed to the GPU one after the other.

19

While each thread could be assigned a sample point instead of a mesh element, the

non-uniform nature of sampling grids prompts the authors to pick mesh elements as the

foci for parallelization. During each iteration, each thread computes the distance from

the input sample point to the triangle assigned to it. The shortest value among these

is then computed using a parallel reduction technique[24]. The sign of each distance

value is computed using the angle-weighted pseudonormal method [3]. Compared to a

single core CPU implementation of a kd-tree based nearest neighbor search algorithm,

the authors report speedups ranging from 10 to 65 for various meshes and sampling

resolution.

2.2 Reconstruction of Volumetric Data

While a vast amount of literature is available on reconstruction of volumetric data,

we are primarily interested in approaches that make use of Hermite data, i.e. data

values and its exact derivative values, to achieve improved reconstruction. A few such

approaches are reviewed here.

Marching Cubes [33] is a well known algorithm for constructing a triangular mesh

representation of a surface from a grid based volumetric representation. Vertices of the

triangles are formed by the points of intersection of the surface of interest and the edges

of the grid. These intersection points are found using linear interpolation on the values at

the grid points. Then, for each cube in the grid, these vertices are connected according

to a predefined case table to form the triangles. One major drawback of this method is

the poor reconstruction of sharp features inside the grid cells.

Kobbelt et al. [30] propose an Extended Marching Cubes algorithm, along with

an enhanced distance field representation to improve the reconstruction of sharp

features. The enhanced representation involves sampling directed distance values in

x, y and z directions instead of just a scalar distance value. That is, at each sample

point, the distances to the closest surface point in each of positive x, y and z directions

are stored. For triangular meshes, these points can be found from the intersection of

20

the surface with the corresponding edge of the grid. During reconstruction, the surface

points obtained from these values will be more accurate than those obtained from linear

interpolation of the original scalar values. The Extended Marching Cubes algorithm tries

to identify cubes that hold sharp features (corners or edges). For this, at the points of

intersection of the surface and edge, the gradient of the distance field is sampled and

stored. As this point is on the surface, for triangular meshes, this gradient will be the

normal to the triangle at that point. During reconstruction, the opening angle of the cone

formed by these gradients is used to detect if the cube has a sharp feature. If it doesn’t,

the standard Marching Cubes table is used for the cube. If it does, the gradients are

used to construct tangents to the surface at the corresponding points and an additional

vertex is inserted at the intersection of these tangents. A triangle fan is then formed

connecting this vertex to all other vertices on the cube to try and approximate the

sharp feature. A post processing step of flipping edges is then applied to correct mesh

connectivity in the case of sharp edges.

Ju et al. [28] describe a method for contouring a signed grid that improves upon

the Extended Marching Cubes(EMC) algorithm. This approach uses an octree instead

of a uniform 3D grid and the edges of the octree’s leaves that have sign changes are

tagged using exact intersection and normal data. A Quadratic Error Function (QEF)

is formed for each leaf cube of the octree from the normal data. Then, for each cube

that exhibits a sign change, a vertex is placed at the minimizer of the QEF. This avoids

having to explicitly identify cubes that have sharp features. The QEF is chosen such

that the vertex that minimizes it best approximates the original geometry. Then, for each

edge exhibiting sign changes, the minimizer vertex of the cubes sharing the edge are

connected to generate a quad. Simplifications to the Octree are also presented which

avoids wasting of space by collapsing leaf cubes that are homogeneous (i.e. have the

same sign for all vertices) and forming QEFs for internal nodes.

21

The previous two approaches use Hermite data to construct polygonal mesh

representations of surfaces from their discrete implicit representation. Hermite data

can also be used in interpolating these discrete values more accurately to aid in the

visualization of the surfaces. Leiken and Marsden [31] present such an interpolation

scheme on the CC lattice. Their scheme locally approximates the sampled data using

tricubic splines that are interpolating. The Hermite data associated with the vertices of a

CC lattice is used to construct a linear system of equations from which the co-efficients

of the interpolating polynomial are obtained. In reconstructing distance fields, the

availability of exact values of partial derivatives makes approaches using Hermite

data particularly attractive. The interpolation scheme we propose in this thesis is also

capable of making use of sampled derivative values where available.

2.3 Sampling and Reconstruction on Body Centered Cubic Lattice

Motivated by the sampling theoretic advantages of the BCC lattice (Section 1.2),

this thesis explores its application in sampling, and consequently reconstructing,

distance fields. In this section, we discuss some of the previous work carried out in

sampling and reconstructing volumetric data on the BCC lattice. Theußl et al. [50]

make a case for using BCC lattice in volume graphics by showing that BCC lattices

can achieve the same accuracy as CC lattices with 29.3% fewer samples (or, in turn,

samples on a BCC lattice retain about 30% more information than the same number

of samples on a CC lattice). They also demostrate improved rendering rates on BCC

using a splatting technique adapted to BCC lattices. Entezari et al. [15, 16, 18] introduce

a set of box splines reconstruction schemes which are more suited to the geometry of

the BCC lattice and show how to obtain its optimal approximation order [17] using the

principle of quasi-interpolation [13]. Meng et al. [36], confirm that these advantages

significantly improve the visual quality of the visualization pipeline based on the BCC

sampling lattice. Finkbeiner et al. [19] have recently developed a GPU implementation

22

of a fast algorithm for convolution of BCC sampled data with the above-mentioned box

splines.

Based on the idea that the BCC lattice can be considered to be composed of

two overlapping CC lattices, Csebfalvi proposes a Gaussian reconstruction on BCC

using global prefiltering [9] and a prefiltered B-spline reconstruction scheme for

quasi-interpolation [10]. Decomposing BCC lattice into two CC lattices allows for

efficient hardware implementations[11]. But this disregards the topological structure of

the lattice [23] and hence the neighborhood of each lattice point is distorted. Moreover,

neither the quasi-interpolation methods nor the box spline schemes (beyond the linear

C 0 case) are exactly interpolating. Our interpolation scheme addresses both these

shortcomings.

23

CHAPTER 3SAMPLING DISTANCE FIELDS

This chapter discusses how the brute force method is used to sample accurate

distance fields from triangular meshes on CC and BCC lattices. It then describes

how the parallel processing capabilities of the GPU can be utilized to accelerate the

implementation of the algorithm. A method to obtain the true gradients of the distance

field at each lattice point is also discussed.

3.1 Brute Force Method

The simplest and most straightforward way to sample accurate distance fields from

triangular meshes is to employ the brute force method. Before we describe the brute

force method though, it is important to explain the notion of a lattice point. To sample

the distance field of a mesh, we overlay a lattice over the ambient space in which the

mesh lives. The samples are then taken at the points on this lattice, i.e. the lattice

points. Constructing a discrete distance field from a surface involves computing the

minimum distance from each lattice point to the surface. For a given lattice point, this

would be the distance from the lattice point to the point on the surface that is closest

to it. As triangular meshes are composed of a finite number of triangles, every point

on the surface falls on either the face of a triangle, an edge or a vertex. Hence, every

point on the surface can be considered to be part of one or more triangles. Then, the

task of computing the distance to the closest point on the surface can be simplified by

first finding the triangle that holds the closest point and then computing the minimum

distance to it. As the name implies, the brute force technique does this by computing the

minimum distance from each lattice point to every triangle in the mesh and storing the

minimum values corresponding to each lattice point.

To find the minimum distance from a lattice point to a triangle, we use a point-triangle

distance algorithm proposed by Eberly in [14]. This algorithm is described in Section 3.2.

24

Though simple and exact, the brute force technique is computationally very

expensive. It evaluates the distance between every possible lattice point-triangle

pairing, leading to a time complexity of O(mn), where m is the number of lattice points

and n the number of triangles. As a straight forward implementation of this method can

take a prohibitively long time, we propose an implementation that utilizes the parallel

processing capabilities of NVIDIA’s multi-core GPUs using the CUDA development

platform.

3.2 Point-Triangle Distance

This section gives a brief description of the Point-Triangle distance method

proposed by Eberly. To find the minimum distance between a point P and a triangle

T , T is represented in the form

T (s, t) = B + sE0 + tE1 for (s, t) ∈ D = (s, t) : s ∈ [0, 1], t ∈ [0, 1], s + t ≤ 1

where B is one of the vertices of T , E0 and E1 are the vectors from B to the other

two vertices of T and s and t are scalars. Each pair of values for s and t such that

(s, t) ∈ D describes a point that is on the triangle, i.e. on the face, edges or vertices of

the triangle. When (s, t) /∈ D, the point described is on the same plane as the triangle,

but outside it. Then our task is to find the point on T that is closest to P.

The squared distance between P and any point on T is given by

Q(s, t) = |T (s, t)− P|2 = as2 + 2bst + ct2 + 2ds + 2et + f for (s, t) ∈ D

where a = E0 · E0, b = E0 · E1, c = E1 · E1, d = E0 · (B −P), e = E1 · (B −P), and f

= (B − P) · (B − P).

Then, the point on T that is closest to P is obtained by minimizing Q over D. The

minimum can occur in any of the 7 regions as shown in Figure 3-1. If the minimum

occurs in any of regions 1− 6, the corresponding point on the boundary of the triangle is

computed.

Refer [14] for a more detailed description of the algorithm and its implementation.

25

Region 2

Region 1

Region 6Region 5

Region 3

Region 5

Region 5

t

s

Figure 3-1. The st-plane partitioned into 7 regions.

3.3 Graphics Processing Unit Implementation

The brute force method described above performs the same set of computations

on every point-triangle pair. We exploit this inherent potential for parallelization by

implementing these computations to run in parallel on a CUDA capable GPU. For a

small number of point-triangle pairs, it is possible to utilize the available parallelism

completely by using a separate thread for each pair. However, with larger datasets,

limitations in memory and parallel processing power of the GPU make it impossible

to avoid some amount of serialization. Moreover, with multiple threads computing the

distances from the same lattice points, identifying and retaining the shortest distance

would require multiple threads to compare and write to a common location. Since

atomic operations on floats are not available in CUDA (at the time of designing this

implementation), implementing this approach becomes quite complicated and inefficient.

Assigning a triangle to each thread and computing its distance to every lattice point

serially within the thread again involves multiple threads handling the same lattice point,

and suffers from the same drawback mentioned above. Hence, we follow an approach

where every thread is responsible for a specific lattice point and computes the distance

26

between every pair involving that point. These computations are performed in serial

within the thread, making it easy to locally keep track of the shortest distance for that

point. The only data shared between the threads is the triangle coordinates, which are

read-only as far as the threads are concerned.

Within each thread, a loop is used to iterate over the triangles and compute the

distances to them. The algorithm described in Section 3.2 is used to compute the

distances. This loop, along with a few memory operations, forms our kernel (i.e. the

function that is executed on the GPU by each thread). It takes the triangle data and

point coordinates as input and returns the shortest distance from that point to the mesh

surface as the output. As the kernel cannot access the CPU memory space, these

values must be transferred and stored on GPU memory. Figure 3-2 shows a block

diagram of the GPU implementation discussed here.

Thread 0

Point 0

Thread 1

Point 1

Thread N

Point N

...

...

...

Minimum squared distances

Corresponding triangles and

closest points

Loop

GPUCPU

Triangle mesh

Preprocessing

Sign computations

Gradient computations

Figure 3-2. Block diagram of the GPU implementation. Every lattice point is assigned toa CUDA thread. The triangles are processed by these threads in a loop, oneafter the other.

27

The triangle data is only read from within the kernel and every concurrent thread

reads the same triangle data at any given point. This makes the constant memory

on the GPU a good option to store triangle data on. While computing the distance

to a particular triangle, the coordinates of that triangle are fetched from constant

memory and stored in shared memory so that it is faster to access those values during

computation. The shortest distance value for each lattice point is stored in a register

within its thread. To compute the sign of the distance, it is necessary to keep track of

the triangle and the point on the triangle corresponding to the shortest distance. These

values are stored in registers as well. Once all the triangles have been processed and

the shortest distance found, this value along with the corresponding triangle data is

transferred to the global memory so that it can then be transferred to the CPU memory

space.

The coordinates of the lattice point corresponding to each thread can be computed

within the thread using its thread ID and block ID. However, computations within the

kernel uses registers, and sometimes, it might be necessary to reduce register usage

to get better efficiency. In such cases, a part of the computations for the lattice point

coordinates can be done on the CPU and transferred to the GPU via constant memory.

We do this by computing the base x , y and z coordinates of each thread block on

the CPU and passing these values as arrays to the GPU. These base values are the

coordinates of the first thread in the corresponding thread block. For the other threads

in the block, the lattice point coordinates are calculated by adding the coordinates of the

thread ID to these base coordinates.

The limited memory on the GPU limits the number of lattice points and triangles that

can be processed in a single kernel launch. In the case of triangles, the capacity of the

constant memory limits the number of triangles that can be sent to the GPU per launch.

If the mesh has more triangles, we launch the kernel multiple times, until all triangles

have been processed. Since each kernel launch will find its own shortest distance (from

28

among the set of triangles it processed), we need a way to find the shortest distance

among them. This task is simplified by the fact that global memory on the GPU retains

data across kernel launches. Hence, at the time of a particular launch, the global

memory will be holding the shortest distance for each lattice point from the previous

launch. All that needs to be done is to fetch these values into registers at the start of

the kernel execution, as the current shortest distance, so that the distances computed

during that launch are compared against these values.

The number of lattice points that can be processed per launch is limited by the

global memory available on the GPU. This is because each lattice point being processed

during a launch stores its shortest distance and associated triangle data in the global

memory before transferring to the CPU. Therefore, large lattices (or grids) must be split

into smaller sections and processed over multiple kernel launches. For simplicity, we

assume that our lattices are Cartesian cubic and have resolutions that are a multiple of

32 along each axis. Then, we split the lattice into cubical regions of resolution 32x32x32

with each region being processed by a separate kernel launch. With such a split,

the computation of lattice point coordinates can be greatly simplified by choosing the

dimensions of the thread and block IDs appropriately. If the resolution is not an exact

multiple of 32, it can be padded to the next highest multiple. The only drawback is that

one or more kernel launches will have a few threads that are not performing necessary

work. However, this will not cause a significant performance hit. If neither the triangles

nor the lattice points fit within a single kernel launch, then the multiple launches are

structured so that all the triangles for a particular region of the grid is processed before

moving on to the next region. This is done so that the distance values retained by the

global memory on the GPU can be used by the following kernel launches.

Since the lattice is split such that each region has 32, 768 lattice points, there will

be as many threads per kernel launch. These threads must be grouped into blocks. The

number of threads a block can have is restricted by the number of registers used by

29

the thread. At the same time, a block should ideally have at least 192 threads to hide

memory latency. Considering these factors, we group our threads into 32 blocks, each

holding 256 threads. On GPUs that can run more than 32, 768 threads concurrently,

we could process more than one region of the lattice per kernel launch by increasing

the number of thread blocks per launch. This makes sure that the available parallel

processing power is completely utilized.

While the approach described so far assumes a CC lattice, it can be adapted to

sampling on a BCC lattice with a few minor modifications. The easiest way to do this is

by sampling on a CC lattice that is twice the resolution of the required BCC lattice in x

and y directions. In other words, to sample on a BCC lattice of resolution RxxRyx2Rz

(the resolution along z axis is shown as twice Rz to account for the additional points

at the center of each cube on a BCC lattice), we sample on a CC lattice of resolution

2Rxx2Ryx2Rz . This can be considered to be a BCC lattice of resolution RxxRyx2Rz with

additional data points at the center of every face and edge. These additional values can

then be thrown away to obtain the necessary distance field on a BCC lattice. This is

the approach we have followed and is illustrated in Figure 3-3(A) with the points to be

discarded shown in white. However, for large resolutions, this approach takes a large

number of unnecessary samples only to be discarded later. This can slow down the

application considerably. To improve performance, we can sample the distance field

on two CC lattices, of resolutions RxxRyxRz and (Rx − 1)x(Ry − 1)x(Rz − 1), with the

second grid being shifted along each axis by half a unit. In other words, for each point

(x , y , z) on the first grid, the corresponding point on the second grid will be located at

(x + h/2, y + h/2, z + h/2) where h is the distance between two lattice points along

any axis. It can be seen easily that each point on the second grid falls at the center of a

cubical region formed by eight adjacent lattice points of the first grid. This is essentially a

BCC grid of resolution RxxRyx2Rz . Figure 3-3(B) illustrates this method, with the shifted

CC lattice shown in blue. Combining the two sets of samples appropriately is more

30

complicated than simply discarding a set of samples. However, since we only take as

many samples as necessary for the BCC lattice here, the performance will be better for

grids of higher resolution.

y

z

x

A

y

z

x

B

Figure 3-3. Two ways of adapting the GPU implementation for BCC lattices. (A) Sampleon CC lattice of twice the resolution and discard the additional (white) points.(B) Sample on two CC lattices, the second (blue) shifted along each axis byhalf a cell length.

The distance values obtained from the kernel are in fact squared distance values.

These are converted to the actual distance values by taking the square root. Then, the

values corresponding to points inside the triangle mesh are given a negative sign. To

identify points that are inside the mesh, we use a method proposed by Bærentzen and

Aanæs in [3] that uses angle weighted pseudo-normals (originally proposed by Thurmer

and Wuthrich [51] and independently by Sequin [43]). Finally, these values are scaled to

the range [0 − 255] such that a value of 127 represents points that are on the surface of

the mesh. Values in the range [0 − 127) represent points outside the mesh with lower

31

values indicating longer distances and values in the range (127 − 255] indicates points

inside the mesh with the value increasing with distance.

3.4 Generating True Gradients

One characteristic of distance fields is that their true gradients at any point can be

computed precisely with relatively few extra computations. As this gradient data proves

useful in reconstructing distance fields, we describe how these values can be computed

as part of our implementation of the brute force method.

As previously mentioned, the value of a distance field at any point is the shortest

distance from that point to the surface of the triangular mesh under consideration. Then,

the true gradient of the distance field at that point is the vector from the point to the

point on the mesh closest to it. The aforementioned shortest distance is essentially

the absolute length of this vector. The first order partial derivatives of the distance field

along each axis is the component of the gradient vector along that axis.

To compute the gradient value for a particular lattice point, we need the point on the

mesh closest to it. Our kernel, which computes the shortest distance corresponding to a

lattice point, identifies this closest mesh point during the course of its computations.

Recall that along with the shortest distance, our kernel also tracks and returns

associated triangle data for each lattice point. This triangle data includes the triangle

that holds the closest mesh point and the s and t values (as described in Section 3.2)

for that point. Using this data, the closest mesh point can be identified and the gradient

vector computed. This is then split into its x , y and z components and appropriately

scaled to give the true values of the first order partial derivatives.

3.5 Performance

To analyze the speed-up achieved by our implementation, we coded up a pure CPU

implementation of the brute force algorithm, and generated distance fields at different

resolutions from the soccer ball, Stanford bunny and Stanford dragon meshes using

both implementations. The GPU implementation was executed on an NVIDIA GeForce

32

GTX 465 which has 352 CUDA cores. Table 3-1 shows the details of these meshes.

The resulting execution times are given in Table 3-2. For the datasets tested, the GPU

implementation achieves speedups ranging from 145 to 375. It can also be seen that the

performance gain of the GPU implementation improves as the mesh size and resolution

increases.

Table 3-1. Triangular meshes used for performance testingMesh No. of triangles No. of verticesSoccer ball 3,516 1,760Stanford bunny 69,666 34,835Stanford dragon 100,000 50,000

Table 3-2. Comparison of execution times. The time taken by the GPU implementation isshown in the column ’GPU’ and that by the CPU implementation is shown inthe column ’CPU’.

Dataset Sample Points GPU(sec) CPU(sec) Speed-upSoccer ball 262k 0.44125 63.9975 145Soccer ball 2,097k 2.6365 511.02 194Stanford bunny 32k 0.90175 181.3325 201Stanford bunny 262k 5.76275 1451.54 252Stanford dragon 32k 1.2325 383.6 311Stanford dragon 262k 8.16425 3063.5725 375

33

CHAPTER 4CUBIC INTERPOLATION IN BODY CENTERED CUBIC LATTICE USING HERMITE

DATA

In this chapter, we introduce a local cubic interpolation scheme on the BCC lattice

that uses Hermite data, i.e. function and derivative values. This scheme constructs

cubic polynomial interpolants locally in tetrahedral regions of the lattice using data

and derivative values in the neighborhood of the region. We then discuss a tricubic

interpolation scheme using Hermite data on the CC lattice proposed by Leiken and

Marsden in [31].

4.1 Cubic Interpolation in Body Centered Cubic Lattice

4.1.1 Interpolating Splines

Consider a function f sampled on the lattice points of a BCC lattice. We describe a

method to interpolate this function using piecewise polynomials over the BCC lattice. At

this point, we assume that the first and second order derivatives of f at the lattice points

are also available to us. In the next section, we discuss a finite-differencing scheme to

estimate the derivatives when they are not available.

As described previously Section 1.2, the BCC lattice can be uniformly partitioned

into congruent tetrahedrons with the lattice points acting as their corners. Within each

tetrahedron, the interpolating spline is defined by a polynomial of degree n, p ∈ �n.

In the trivariate setting, which is what we are interested in, this can be represented as

�n(R3) := {p(x) =∑i+j+k≤n

i ,j ,k≥0 aijkxiy jzk}. Such a polynomial has C n+3

n coefficients and

hence can be uniquely determined by (n + 3)(n + 2)(n + 1)/6 constraints.

A polynomial of degree 1 can be determined uniquely by 4 constraints and hence

can be constructed by restricting the value it takes at the 4 corners of the tetrahedron to

the corresponding values of f :

f (vi) = p(vi), when vi ∈ δ, for i = 1 ... 4, p ∈ �1, (4–1)

where vi indicates vertices of a tetrahedron δ.

34

This gives us a linear system of equations in 4 variables, solving for which yields

the 4 coefficients of the polynomial. This is essentially linear interpolation within each

tetrahedron and the spline formed by these polynomials is a piecewise linear interpolant

to the data given at lattice points

A polynomial of degree 2 requires 10 constraints. As the data values at the corners

can provide only 4 constraints, we use the first order partial derivatives along each axis

at the corners to form the remaining constraints. However, it is not possible to choose 6

constraints from 4 vertices in an unbiased manner (bias, here, refers to an asymmetric

choice of constraints per vertex). Since we require an isotropic choice of constraints,

we use degree 3 polynomials, which can be uniquely defined using 20 constraints. This

gives us 5 constraints per vertex. The data value (4–1) and the three first derivatives

(4–2) at each corner form 16 constraints.

fx(vi) = px(vi), fy(vi) = py(vi), fz(vi) = pz(vi), (4–2)

where fx , fy and fz denote the partial derivatives with respect to x , y and z respectively.

For the remaining 4 constraints, we use second order partial derivatives.

While a single second order partial derivative, like ∂2f∂x2

, can be chosen from each

corner of the tetrahedron, such a choice would be biased along specific axes. This

can be avoided by choosing a constraint based on a symmetric sum of individual

second derivatives at each corner. One such sum is (∂2f

∂x2+ ∂2f

∂y2+ ∂2f

∂z2). However, an

interpolating constraint based on this sum is found to be linearly dependent on the other

16 constraints. The other choice is a constraint based on ( ∂2f∂x∂y

+ ∂2f∂y∂z

+ ∂2f∂x∂z

) which is

linearly independent and can be used to determine the polynomial.

However, this combination does not restrict the individual second order partial

derivative values taken by the polynomial. As the interpolation constraint is enforced

only on the sum, the values the individual second derivatives take at each corner could

disagree with the corresponding values of f . The polynomial so generated could turn

35

out to be a poor approximation of f . In our experiments, the individual second derivative

values had large deviations from corresponding values of f , leading to severe artifacts in

rendered images.

To get a better approximation of f and avoid such artifacts, each individual second

order partial derivative must be constrained separately. Enforcing interpolating

constraints on every second derivative value at the corners gives us 3 constraints

per corner, making it an over-determined system. Hence, we relax the interpolation

constraint on these values and opt to minimize the L2 norm of their error over the set of

all vertices of the tetrahedron instead. To this end, we define an error function over the

space of all cubic polynomials p ∈ �3 as follows.

E(p) :=

4∑i=1

(∥ ∂2f

∂x∂y(vi)−

∂2p

∂x∂y(vi)∥2+

∥ ∂2f

∂x∂z(vi)−

∂2p

∂x∂z(vi)∥2 + ∥ ∂2f

∂y∂z(vi)−

∂p

∂y∂z(vi)∥2

).

(4–3)

Here, each term is the squared error in a specific second order partial derivative

at a specific corner. Also, this set of partial derivatives is invariant along x , y and z

directions. Thus, there is no bias along a particular direction. Minimizing this error

function with respect to the coefficients of p accounts for the remaining 4 degrees of

freedom in determining the polynomial. This is a constrained minimization problem,

where the error function acts as the objective and the 16 constraints are formed by the

interpolating constraints defined in (4–1) and (4–2). This problem is solved for the 20

coefficients from which the polynomial is constructed.

The error function defined above is quadratic in terms of the polynomial coefficients

in p, that can be represented by a 20 × 1 vector a = [a1, ... , a20]T . The constrained

minimization of this error function is carried out with respect to the coefficients of the

cubic interpolant p, with constraints defined in (4–1) and (4–2). Since these interpolation

constraints are linear in terms of polynomial coefficients, we can model our optimization

36

problem as a specific case of quadratic programming problem, known as Equality

QP [4]:

minimizea∈R20

E(a) = aTGa+ hTa+ b (4–4)

subject to Ma = f. (4–5)

We introduce a vector notation for representing our polynomials that allows us

to transform the function (4–3) to the quadratic form in (4–4) and the interpolation

constraints (4–1) and (4–2) into the constraints in (4–5). The polynomial p can be

represented as an inner product: p(x) = ⟨m, a⟩(x), in which the column vector, a,

encodes the coefficients of our polynomial. m is a column vector in which each element

is one of the monomials (in variable x) of the form xαyβzγ with α, β, γ > 0 and α+β+γ <

4 that span �3. The inner product results in a typical power-form representation of cubic

polynomial which can be evaluated at point x = vi (i.e., one of the corners of tetrahedron

δ). As a simple example, one can write a generic quadratic polynomial evaluated at

sample point 2 as: (a1 + 2a2 + 4a3) = ⟨[1, x , x2]T , [a1, a2, a3]T ⟩(2).

Interpolation constraints introduced in (4–1) and (4–2) can now be written in terms

of the coefficient vector a:

f (vi) = ⟨m, a⟩(vi)

fx(vi) = ⟨mx , a⟩(vi)

fy(vi) = ⟨my , a⟩(vi)

fz(vi) = ⟨mz , a⟩(vi),

(4–6)

that are defined for each vertex of the BCC tetrahedron vi ∈ δ, i = 1 ... 4. These 16

equations form a linear system of constraints in (4–5), where the 16 × 20 matrix M is

formed by the monomials in m and their partial derivatives evaluated at the vertices of

the tetrahedron, vi ∈ δ. In other words, each row of M corresponds to m or any of its

three partial derivatives evaluated at a vertex vi . As mentioned before, column vector a

37

represents the unknown coefficients of p and finally, f is holding the sample values of the

underlying function and its partial derivatives.

Moreover, by simple linear algebra one can reformulate the error function defined

in (4–3) in the form of (4–4) using u vT to denote the outer product of two column vector

u and v:

G =

4∑i=1

(mxy m

Txy +mxz m

Txz +myz m

Tyz

)(vi)

h = −2

4∑i=1

(fxymxy + fxzmxz + fyzmyz) (vi)

b =

4∑i=1

(f 2xy + f 2xz + f 2yz

)(vi).

(4–7)

Solving the under-determined linear system of equations, (4–5), one can find a

particular solution (e.g., normal equations on M), that we call a0. Any solution to the

system (4–5) can be written as a sum of the particular solution, a0, and an arbitrary

element in the null space of M:

a = a0 + Zt,

where columns of Z form a basis for the null space of M and t ∈ R4 since M is full rank

(i.e., 16) for the BCC tetrahedron. The basis for the null space can be computed by the

reduced-row echelon form or via singular value decomposition.

Substituting this relation in E(a) allows us to re-write the minimization process as

a function of t, E(t). The minimizer can then be explicitly derived by solving the linear

system of equations that is obtained from differentiating E(t):

Et(t) = 0. (4–8)

The unique minimizer to the error functional is, then, obtained by:

(ZTGZ

)t = −ZT

(1

2h+ Ga0

). (4–9)

38

This linear system has a unique solution since ZTGZ is positive definite (as G is

symmetric positive definite).

Since the BCC lattice can be tetrahedralized to congruent tetrahedra, a single t

parameter for the optimal cubic interpolant can be pre-computed for the geometry of δ

in the BCC lattice. Hence, we can pre-compute the coefficients of the cubic interpolant

in terms of the samples of the underlying function and its first-order partial derivative –

as specified in (4–5). In other words, the computation of the optimal cubic interpolant

can be implemented as a fast filter by considering the tetrahedron in the BCC lattice that

contains the interpolation point.

In summary, 20 degrees of freedom for a cubic interpolant for each tetrahedron are

satisfied by 16 (exact) interpolation constraints for the function values and the first-order

partial derivative values at the lattice sites (i.e., vertices of containing tetrahedron). The

additional 4 degrees of freedom are chosen optimally to minimize interpolation errors

on the second-order partial derivatives on the lattice points. The quadratic programming

problem has a unique minimizer that we use to construct the optimal cubic interpolant

for the geometry of the BCC lattice.

We tested this approach on the carp dataset. The original dataset has a resolution

of 256 × 256 × 256 (Figure 4-1(A)) which represents the ground truth and the

low-resolution, sub-sampled, datasets have about 16% of the high resolution data

on the BCC and the CC lattices. The subsampled CC volume has a resolution of

140 × 140 × 140 (Figure 4-1(B)) and the subsampled BCC volume has a resolution of

111 × 111 × 222 (Figure 4-1(C)). The BCC dataset was rendered with our cubic splines

and the CC dataset was rendered with the tricubic Catmull-Rom splines. The rib area

has been mostly distorted in the CC image but are better preserved in the BCC image.

4.1.2 Smoothness And Approximation Order

The spline s , constructed as described in the previous section, is C 1 smooth across

the faces of the tetrahedra. First, we will note that the values of s and its first derivatives

39

A B C

Figure 4-1. The Carp fish dataset. The ground-truth Carp fish dataset, (A), with 16, 777kpoints is sub-sampled to 16% on the Cartesian (B) and BCC (C) lattices forcomparison. The Cartesian data is interpolated with Catmull-Rom and theBCC lattice is interpolated with our cubic spline. The tail fins and rib areashave preserved their connectivity in the BCC dataset while have beendistorted in the Catmull-Rom case.

on a face of a tetrahedron depends only on the vertices on that face. This becomes

evident when the polynomial within the tetrahedron is represented in Bernstein-Bezier

basis form, as the barycentric coordinates of the corner opposite the face is 0 on the

face. Now, consider a face shared by two tetrahedra, T1 and T2. We’ll denote the

polynomial pieces of s within them as P1 and P2 respectively. At the three vertices of

the shared face, P1 and P2 have the same values and first derivatives, equal to the

corresponding values of f , the underlying function. Thus, the values taken by P1 and P2

on the shared face are the same, making s C 1 smooth across the face.

The order of approximation can be thought of as referring to the accuracy of an

approximation, or alternatively, the order of magnitude of the error in approximation.

When an interpolating polynomial p is used to represent a function f , an order of

approximation of n means that the approximation error can be represented in terms of

the sampling distance h as O(hn).

The classical Strang-Fix condition relates the approximation order α of a Spline

space Sn with its ability to precisely reproduce polynomials of degree up to, and

40

including, (α − 1). In the multivariate setting, the notion of local reproduction of

polynomials is needed for proving approximation order.

We now show that our cubic interpolant can exactly reproduce all polynomials up

to degree 3 within the corresponding tetrahedron. Remember that the coefficients of

the polynomial are recovered by solving a constrained minimization problem, where

the 16 constraints are formed by enforcing interpolating constraints on the values

the polynomial and its first derivatives take at the 4 vertices of the tetrahedron. The

original polynomial f ∈ �≤3(R3) can be recovered by recovering its 20 coefficients from

this problem. It is easy to see that the coefficients of f satisfy all 16 aforementioned

constraints. Now, consider the error function in (4–3). This is a summation of squared

terms and hence cannot take a negative value. Using the coefficients of f in this function

will make each individual term 0, thus returning the minimum value of the function. Thus,

p = f is a valid solution of the constrained minimization problem. Since the function to be

minimized is quadratic, its solution has to be unique, proving that the solution obtained

will be p = f .

Since our interpolating spline can locally reproduce all polynomials up to degree 3,

it has an order of approximation of α = 4. This order of approximation is only possible

when exact partial derivatives of f are available. One area of application where the

exact first derivatives are available is the sampling and reconstruction of signed distance

fields, which is discussed in the next chapter. For scalar field data, where only the

function values are available, we need a finite-differencing scheme that meets the

approximation order of our construction. This is discussed in the next section.

4.1.3 Isotropic Finite-Differences on the Lattice

When the partial derivatives of the function f is known (e.g., Hermite data), the

spline construction interpolates f and its first order partial derivatives exactly. However,

for the scalar-field data where only function values are known, we need to employ

finite-differences to approximate partial derivatives of f (used in (4–2)). This approach

41

in the univariate setting leads directly to the Catmull-Rom splines which are essentially

Hermite interpolation when derivatives are approximated with finite-differences.

Furthermore, in order to maintain the approximation order, the finite-differencing scheme

must be exact on the polynomial space of interest. In other words, we need to design

fourth-order finite differences on the BCC lattice that provide exact partial derivatives

whenever f ∈ �≤3(R3).

In the univariate setting, derivative estimation is easily derived using Taylor series

expansion. The idea behind central differencing is to use the expansion to evaluate f at

various small distances, h, from x to get a good estimate of f ′ at x .

f (x + h)− f (x − h) = 2hf ′(x) +h3

3f ′′′(x) +O(h4) (4–10)

The Taylor series analysis shows that the central differencing estimate of the univariate

derivative is a second order approximant. Higher orders of approximation to the

derivative f ′ are obtained by employing a technique, called Richardson’s extrapolation[7],

that scales h:

f (x + 2h)− f (x − 2h) = 4hf ′(x) +8h3

3f ′′′(x) +O(h4). (4–11)

One can eliminate f ′′′(x) term among (4–10) and (4–11) and obtain a higher-order

approximant. Therefore, the well-known five-point stencil for approximating the

derivative is of order four:

f ′(x) =8f (x + h)− 8f (x − h)− f (x + 2h) + f (x − 2h)

12h

+O(h4).

This approach can be repeated to obtain seven-point and nine-point stencils that

constitute filters with increasing approximation orders for derivative estimation. The main

observation here is that designing high-order finite differencing involves a Taylor series

expansion of the function locally. The polynomial is formed by the first few terms in the

42

Taylor expansion. Then the polynomial’s derivative at the expansion point approximates

the derivative of the original function with an order of accuracy which is one greater

than the degree of the polynomial. This polynomial can be constructed by a polynomial

interpolation using the neighboring sample points (i.e., f (x ± h), f (x ± 2h), . . . ). The

larger the neighborhood is, the larger the degree of the polynomial is which determines

the order of accuracy for derivative estimation. The actual derivative of this polynomial

interpolant, then, constitutes the finite-difference approximation to the true derivative.

Extending Richardson’s extrapolation to the multivariate setting involves the

multivariate Taylor series expansion. On the other hand, we can employ Richardson’s

extrapolation method on the BCC, or any other lattice, leveraging the equivalence of

Richardson’s extrapolation with polynomial interpolation on a local neighborhood. The

idea behind our approach is to employ a polynomial interpolation scheme that builds

an interpolant on an isotropic neighborhood of a lattice point. This interpolant agrees

with the terms in Taylor series expansion up to its degree. If the underlying function f is

a polynomial itself (of the same degree as the interpolant), then the unique polynomial

interpolant agrees with the underlying function and the derivative estimation will be

exact. Hence, if f ∈ �≤3(R3), then a local cubic polynomial interpolation at a lattice point

will provide the exact derivative. In this approximation scheme, the partial derivatives are

estimated in a non-separable fashion and one can choose an isotropic combination of

the neighbors of a lattice point.

For a BCC lattice point, there are 8 neighbor points at offsets of (±1,±1,±1),

and 6 neighbor points at offsets of (±2, 0, 0), (0,±2, 0) and (0, 0,±2). The next ring of

neighbors are located at offsets of (±2,±2, 0), (±2, 0,±2) and (0,±2,±2) that together

with the original lattice point form a 27-point neighborhood for it (see Figure 4-2).

Considering that an interpolating polynomial in �3(R3) needs 20 data points, the

27-point neighborhood over-determines the polynomial interpolation problem. The

over-determined system of equations can be solved using a least-squares method (i.e.,

43

Normal equations) which is detailed below. When the original function f ∈ �≤3(R3), the

least-squares solution coincides with f and hence, the derivative estimation becomes

exact.

Let x1, x2, ... , x27 denote the 27-point neighbors of a BCC lattice point. We fit a cubic

polynomial interpolant p(x) = ⟨m, a⟩ (x) on this 27-point neighborhood. Here m is,

again, a vector that contains monomials up to cubics and a denotes the coefficients of p.

Then we can set up an interpolation problem to determine the local polynomial fit to the

function f by solving the minimization problem as:

mina

27∑i=1

wi(fi − p(xi))2. (4–12)

The scalar value, fi , here, is the sample value of the function f at the lattice point xi ,

and wi is a weight that allows us to control the interpolation error at lattice point xi . Let

� denote the interpolation matrix which is of dimension 27 × 20. Then the weighted

least-squares solution to the linear system is given by:

(�TW�)a = �TWy (4–13)

where W is a diagonal matrix with Wi ,i = wi , yi = fi and �i ,j = mj(xi). We can set

Wi ,i = 1 for normal least-squares solution or other choices for weighted least-squares

solution.

Since monomial terms mj(x) are known, and the local coordinates of the 27-point

neighborhood of a BCC lattice point are fixed, we can solve for the coefficients a from

(4–13). When we want to estimate the derivatives of the function at any given lattice

point, we perform a local fit and use the derivatives of that local polynomial at that lattice

point (which is x = 0 in the local coordinate system) as the approximated derivatives.

For example, the estimated first order derivative along x at a given lattice point x is then

approximated by px(0).

44

A B

Figure 4-2. Weights of the finite-differencing kernel on the 27-point neighborhood of theBCC lattice for (A) fx and (B) fyz . The illustrated coefficients are divided by24 in (A) and divided by 72 in (B). The 27-point neighborhood includes thered, blue, green and gray lattice points and excludes the yellow points at thecorners.

When we consider the derivative with respect to x , px(x) = ⟨mx , a⟩ (x), we can

construct the finite-differencing weights by evaluating px(0):

px(0) = bT (�TW�)−1�TWy = Kxy. (4–14)

In this notation, y is a vector that contains the function values from the neighboring 27

points and b = mx(0). The finite differencing kernel Kx is a matrix whose convolution

with y gives the partial derivative of the underlying f with respect to x . The finite

differencing weights for the other partial derivatives can be obtained similarly.

The spatial distribution of finite differencing weights as the kernel values for fx is

shown in Figure 4-2(A) and that of the kernel values for fyz is shown in Figure 4-2(B).

Similarly, by changing the order of the axes, we can get the kernel for other first and

second order derivatives as needed.

45

Table 4-1. The L2 norm error in reconstruction of datasets sampled at a resolution of111× 111× 222 on the BCC lattice using the proposed cubic interpolation.The weighted least-squares approach is designed to use a zero meanGaussian with σ2 = .5 to estimate partial derivative values which showmarginal improvements in terms of reconstruction error.

Dataset LSQE Weighted LSQEML 8.939 8.725

Carp 0.74 0.731Bonsai 5.741 5.721Lobster 8.538 8.467

Intuitively, the lattice points that are closer to the center are more important (with

respect to the residuals in the interpolation conditions in (4–12)) than the lattice points

further from the center. Therefore, one can assign higher weights to the error terms

corresponding to the lattice points closer to the center.

The choice of the weighting function is very flexible. An isotropic choice for the

weighting function is the Gaussian function:

w(xi) = exp

(−∥xi∥2

σ2

), (4–15)

where ∥xi∥ is the distance of the neighbor point xi from the center point, 0, and σ2 is the

variance which defines how fast the weighting function decays. When σ2 is very large,

the weighting function degenerates into a constant function which gives the unweighted

solution. Smaller σ2 value means higher weights are assigned to the residuals closer

to the center. σ2 = .5 showed the minimum error in estimating first and second order

partial derivatives, but improvement in the performance of interpolation using Gaussian

weighted least-squares was marginal in our experiments. Table 4-1 summarizes the

improvements obtained by the weighted least-squares approach.

4.2 Tricubic Interpolation in Cartesian Cubic Lattice

This section provides a summary of the local tricubic interpolation scheme proposed

by Leiken and Marsden in their paper [31]. This scheme uses Hermite data to achieve

full C 1 interpolation of a given function sampled on a CC lattice. Chapter 5 discusses the

46

application of this method in interpolating distance fields sampled on CC lattices. Please

note that this section only serves to briefly describe what has already been proposed in

[31] and the contribution of this thesis is in applying it in the area of distance fields.

Consider a trivariate function f sampled at the vertices of a regular grid. The

interpolant is a piecewise polynomial which can be represented by the general form

p(x , y , z) =∑N

i ,j ,k=0 aijkxiy jzk within each cubic cell of the grid. As the interpolant is

tri-cubic, N takes the value 3 and the polynomial has 64 coefficients given by aijk . These

coefficients must be determined in a way that achieves C 1 continuity across all faces

of the cube. To that end, interpolation constraints are enforced on the values taken by

P and its three first derivatives at the 8 corners of the cube, giving 32 constraints. To

recover 64 coefficients, an additional 32 constraints are required. These constraints

are chosen such that they are isotropic, i.e. invariant under rotation of the axes,

and in a manner that favors smoothness over accuracy. Smoothness is improved

by using interpolating constraints on higher order derivatives of p. Thus, we need 4

higher order derivatives from each corner for the additional constraints. There are

only two such sets that are isotropic. Of these, the set (∂2f

∂x2, ∂2f∂y2

, ∂2f∂z2

, ∂3f∂x∂y∂z

) is linearly

dependent on the first 32 constraints and hence cannot be used, leaving us with the

set ( ∂2f∂x∂y

, ∂2f∂y∂z

, ∂2f∂x∂z

, ∂3f∂x∂y∂z

). Thus, 64 constraints are formulated by restricting the

values taken by the functions in the following set at each corner of the cube to the

corresponding values of f .

(p, ∂p∂x, ∂p∂y, ∂p∂z, ∂2p∂x∂y

, ∂2p∂y∂z

, ∂2p∂x∂z

, ∂3p∂x∂y∂z

)

This gives a linear system of 64 equations with 64 unknown coefficients. This can

be represented in matrix form as Bx = b where ’x ’ is a vector of the 64 coefficients and

’b’ is a vector of the values taken by f and its derivatives at each of the 8 corners of the

cube. As the 64x64 matrix ’B ’ has a determinant of 1, its inverse can be computed and

the linear system can be solved as x = B−1b. This gives the values for the coefficients

from which the interpolant can be constructed.

47

For a detailed description of the method, the motivation behind it and various proofs,

we refer you to [31].

48

CHAPTER 5INTERPOLATION OF DISTANCE FIELDS AND EXPERIMENTS

In Chapter 4, we proposed a piecewise cubic interpolation method on the BCC

lattice and described its construction, smoothness and approximation order. We also

summarized a local tricubic interpolation proposed by Leiken and Marsden in [31]. Both

these methods use Hermite data associated with the function being interpolated, i.e.

the values of the function and its partial derivatives at the respective lattice points. In

Section 4.1.3, we described a finite-differencing technique to estimate the derivatives

of the function from the function values sampled at the lattice points. However, when

the true values of the function derivatives are available, these can be incorporated

directly into both interpolation schemes to provide a more accurate reconstruction of the

underlying surface.

It has already been mentioned that the true gradients of distance fields can be

computed with relative ease. While sampling a distance field, it is possible to also

compute the true first derivatives of the field at the lattice points. This can be done on

both CC and BCC lattices using the technique explained in Section 3.4 and requires

almost no additional computation. The aforementioned interpolation schemes can

then be employed, along with the true derivative values, on these distance fields to

reconstruct or visualize the original function.

In this chapter, we study the effects of applying the proposed interpolation scheme

on distance fields sampled on BCC lattices. These distance fields are constructed at

different resolutions from triangular meshes of various 3D models(Figure 5-1), using

the GPU accelerated sampling method described in Chapter 3. We use a ray caster to

visualize the results of the interpolation. We are also interested in assessing how well

the BCC sampling lattice does compared to the CC lattice in the context of distance

fields. Hence, we sample the same triangular meshes on CC lattices and use the

tricubic scheme to render them. The resolution of the CC and BCC lattices are chosen

49

A B C

D E

Figure 5-1. The triangular meshes used in our experiments. The soccer ball(A) has3,516 triangles, the bunny(D) has 69,666 triangles, the buddha(B) anddragon(C) have 100,000 triangles each and the pawn(E) has 304 triangles.

50

such that the total number of sample points are approximately the same in both cases.

Both the interpolation schemes are employed twice on each dataset, once using true

first order derivatives and once using estimated values. Since true values of the higher

order derivatives are not available, they are always estimated. On BCC, this is done

using the method described in Section 4.1.3 and on CC, a simple finite-differencing

scheme is used. As a base case for comparison, we also render the CC version of each

dataset using Catmull-Rom interpolation.

The results of these experiments are given below, along with relevant observations.

Each set of images are arranged as follows. The first row has the Catmull-Rom image,

the image using cubic interpolation on BCC with true derivatives and the image using

the same scheme with estimated derivatives, in that order. The second row has images

rendered using tricubic interpolation on CC with the first image using true derivatives

and the second using estimated derivatives.

The images in Figure 5-2 were rendered from distance fields sampled from the

soccer ball dataset. The resolutions used were 80x80x80 for CC and 64x64x128 for

BCC. Compared to the Catmull-Rom image, the stitches on the ball appear much

sharper in the two images that use true first derivatives with the interpolation schemes

we are interested in. In the images that use the same interpolation schemes with

estimated derivatives, the stitches are about as blurred as in the Catmull-Rom case, and

these images are comparable in quality to each other.

Images in Figure 5-3 and Figure 5-4 were rendered from the Stanford dragon

dataset at resolutions of 80x80x80 for CC and 64x64x128 for BCC. Notice the scales on

the surface of the body and the fine details on the head in Figure 5-3 and the ridges on

the body just below the head in Figure 5-4. The Catmull-Rom image in Figure 5-4 has a

disconnected surface near the ear as well. These images show that the sharper features

are reproduced much better by the two interpolation methods using true derivative

values. Among the other three images, i.e. the ones that do not use true derivatives, the

51

A B C

D E

Figure 5-2. The soccer ball dataset with approximately 512,000 samples. The stitcheson the ball are significantly sharper in the images using true derivatives(B,D) while the images using estimated derivatives(C, E) are comparable inquality to the Catmull-Rom image(A).

narrow areas behind the head (Figure 5-3) and the ridges on the body (Figure 5-4) are

reproduced better in the BCC image. These phenomena can be observed in Figure 5-5

as well, which shows the Stanford buddha dataset sampled at the same resolutions as

the previous sets. This is a model that has a large amount of fine details on its surface

and the varying accuracy to which these details are reproduced by the different methods

is obvious from the images. The bunny dataset, rendered at resolutions of 85x85x85 for

CC and 68x68x136 for BCC, is shown in Figure 5-3. Here, all CC images show staircase

artifacts on the ear of the bunny. The same areas in the BCC images are mostly artifact

free.

52

A B C

D E

Figure 5-3. The Stanford dragon dataset (side view) with approximately 512,000samples. The details on the head and scales on the body are more visible inthe images using the true derivatives(B, D).

Figure 5-7 shows images rendered from the pawn dataset. These were sampled

at the extremely low resolutions of 32x32x32 for CC and 26x26x52 for BCC. At these

resolutions, it can be seen that the cubic and tricubic schemes using true derivatives do

a much better job of retaining the basic shape of the pawn. Figures 5-8, 5-9, 5-10 and

5-11 show the pawn dataset sampled at increasing resolutions. As the resolution

increases, the difference in quality between the different interpolation schemes

diminishes until all the images are comparable in quality as seen in the final set. This

shows that the advantage of using the true derivatives is more prominent at lower

resolutions where the sampled distance field might not have enough data to produce a

reasonably accurate reconstruction.

53

A B C

D E

Figure 5-4. The Stanford dragon dataset (front view) with approximately 512,000samples. The ridges on the underside of the belly are better reproduced bythe images using the true derivatives(B, D). Moreover, the surface near theear is disconnected in the Catmull-Rom image(A).

54

A B C

D E

Figure 5-5. The buddha dataset with approximately 512,000 samples. Images with truederivatives(B, D) show much more detail compared to the rest.

55

A B C

D E

Figure 5-6. The bunny dataset with approximately 615,000 samples. The staircaseartifacts seen on the ear in the CC images(A, B, C) are mostly absent in theBCC images(D, E).

56

A B C

D E

Figure 5-7. The Pawn dataset with approximately 33,000 samples.

57

A B C

D E


58

A B C

D E


59

A B C

D E


60

A B C

D E

Figure 5-11. The Pawn dataset with approximately 2,095,000 samples.

61

CHAPTER 6CONCLUSION AND FUTURE WORK

In this thesis, we studied sampling and reconstruction of distance fields for surfaces

represented by triangular meshes. We examined the idea of optimal sampling lattices in

this context and discussed the sampling theoretic motivation for using BCC lattices.

For sampling distance fields from triangular meshes, we proposed a GPU

implementation of the brute force approach. Since the brute force approach iterates

over all possible point-triangle pairs, the distance values obtained are exact. While a

CPU implementation of the brute force approach is prohibitively time consuming, our

GPU implementation, based on the CUDA architecture, uses the parallel processing

capabilities of the GPU to achieve significant acceleration. We then discussed ways

to adapt our implementation to BCC lattices. In addition to the distance values, our

implementation also calculates the exact gradients of the distance field at the lattice

points with relatively few additional computations. While our GPU implementation is

fairly basic, we believe that it paves the way for future implementations that can achieve

even higher speedups by utilizing the latest GPU hardware and architectures, and by

using accelerating techniques like hierarchical space partitioning.

We introduced a local cubic interpolation scheme on the BCC lattice that can be

used to reconstruct and visualize discrete distance fields. The constructed splines

are exactly interpolating at the lattice points and the cubic spline space leads to a

fourth-order method with C 1 continuity. Our interpolation scheme utilizes the exact

derivative values available for distance fields to give a more accurate reconstruction.

Where exact values of the derivatives are not available, it uses a finite-differencing

scheme to estimate the derivatives, which provides a generalization to Catmull-Rom

splines in the non-separable setting. We also devised a finite-differencing scheme on

the BCC lattice that guarantees the order of accuracy. Unlike super-splines, our spline

construction is possible without introducing intermediate (i.e. non-lattice) points. Its

62

low degree has advantages from the polynomial fitting point of view. It is also simple to

implement and efficient to compute.

Finally, we evaluated the merits of using the BCC lattice to sample distance fields by

conducting a series of experiments on distance fields sampled on CC and BCC lattices

of comparable density (i.e. number of samples). The measure of quality used was the

visual quality of images rendered from these distance fields. Our cubic interpolation

was used to reconstruct the samples on the BCC lattice while both Catmull-Rom and

the tricubic interpolation method (Leiken and Marsden [31]) were used on the CC

lattice. We also examined the effects of using exact derivative values on the quality of

reconstruction. Our experiments showed that the BCC datasets using true derivatives

were able to reproduce sharp details on the original surfaces much more faithfully

than the Catmull-Rom method on the CC lattice. Without exact derivatives, the BCC

datasets produced images of quality comparable to, or better than, both Catmull-Rom

and tricubic(without exact derivatives) images.

63

REFERENCES

[1] Cuda programming guide 3.1, 2010. URL http://developer.download.nvidia.

com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.

pdf.

[2] J. Bærentzen. Volumetric Manipulations with Applications to Sculpting. PhD thesis,IMM, Technical University of Denmark, 2001.

[3] J. A. Bærentzen and H. Aanæs. Signed distance computation using the angleweighted pseudonormal. IEEE Transactions on Visualization and ComputerGraphics, 11:243–253, May 2005. ISSN 1077-2626. doi: http://dx.doi.org/10.1109/TVCG.2005.49. URL http://dx.doi.org/10.1109/TVCG.2005.49.

[4] I. Bomze, V. Demyanov, R. Fletcher, T. Terlaky, and I. Polik. Nonlinear Optimization:Lectures Given at the CIME Summer School Held in Cetraro, Italy, July 1-7, 2007.Springer Verlag, 2010. ISBN 3642113389.

[5] D. Breen and R. Whitaker. A level-set approach for the metamorphosis of solidmodels. IEEE Transactions on Visualization and Computer Graphics, 7(2):173–192,2002. ISSN 1077-2626.

[6] D. Breen, S. Mauch, and R. Whitaker. 3D scan conversion of CSG models intodistance volumes. In Proceedings of the 1998 IEEE Symposium on VolumeVisualization, pages 7–14. ACM, 1998. ISBN 1581131054.

[7] R. L. Burden and J. D. Faires. Numerical analysis. Prindle, Weber and Schmidtseries in mathematics. PWS-Kent Pub. Co., pub-PWS-KENT:adr, fifth edition, 1993.ISBN 0-534-93219-3.

[8] D. Cohen-Or, A. Solomovic, and D. Levin. Three-dimensional distance fieldmetamorphosis. ACM Transactions on Graphics (TOG), 17(2):116–141, 1998.ISSN 0730-0301.

[9] B. Csebfalvi. Prefiltered Gaussian reconstruction for high-quality rendering ofvolumetric data sampled on a body-centered cubic grid. In Visualization, 2005. VIS05. IEEE, pages 311–318. IEEE, 2005. ISBN 0780394623.

[10] B. Csebfalvi. An evaluation of prefiltered B-spline reconstruction forquasi-interpolation on the body-centered cubic lattice. Visualization and Com-puter Graphics, IEEE Transactions on, 16(3):499–512, 2010. ISSN 1077-2626.

[11] B. Csebfalvi and M. Hadwiger. Prefiltered B-spline reconstruction forhardware-accelerated rendering of optimally sampled volumetric data. In Vi-sion, modeling, and visualization 2006: proceedings, November 22-24, 2006,Aachen, Germany, page 325. IOS Press, 2006. ISBN 3898380815.

64

http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf



http://dx.doi.org/10.1109/TVCG.2005.49

[12] O. Cuisenaire. Distance Transformations: Fast Algorithms and Applications toMedical Image Processing. PhD thesis, Catholic University of Leuven, Belgium,October 1999.

[13] C. de Boor. Quasi-interpolants and approximation power of multivariate splines.Computation of curves and surfaces, pages 313–345, 1990.

[14] D. H. Eberly. 3D Game Engine Design, Second Edition: A Practical Approach toReal-Time Computer Graphics (The Morgan Kaufmann Series in Interactive 3DTechnology). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006.ISBN 0122290631.

[15] A. Entezari. Optimal sampling lattices and trivariate box splines. PhD thesis, SimonFraser University, Vancouver, Canada, July 2007. URL http://www.cise.ufl.edu/

~entezari/research/docs/dissertation.pdf.

[16] A. Entezari, R. Dyer, and T. Moller. Linear and cubic box splines for the bodycentered cubic lattice. In Proceedings of the conference on Visualization ’04, VIS’04, pages 11–18, Washington, DC, USA, 2004. IEEE Computer Society. ISBN0-7803-8788-0. URL http://dx.doi.org/10.1109/VISUAL.2004.65.

[17] A. Entezari, M. Mirzargar, and L. Kalantari. Quasi-interpolation on the bodycentered cubic lattice. In Computer Graphics Forum, volume 28, pages 1015–1022.John Wiley & Sons, 2009.

[18] A. Entezari, D. Van De Ville, and T. Moller. Practical box splines for reconstructionon the body centered cubic lattice. Visualization and Computer Graphics, IEEETransactions on, 14(2):313–328, March-April 2008. ISSN 1077-2626.

[19] B. Finkbeiner, A. Entezari, D. Van De Ville, and T. Moller. Efficient volume renderingon the body centered cubic lattice using box splines. Computers & Graphics,34(4):409 – 423, 2010. ISSN 0097-8493. doi: DOI:10.1016/j.cag.2010.02.002.URL http://www.sciencedirect.com/science/article/B6TYG-4YDYSHM-1/2/

3641c9180893327a580f70549c1dd9c9. Procedural Methods in Computer Graphics;Illustrative Visualization.

[20] S. Frisken, R. Perry, A. Rockwood, and T. Jones. Adaptively sampled distancefields: A general representation of shape for computer graphics. In Proceed-ings of the 27th annual conference on Computer graphics and interactive tech-niques, pages 249–254. ACM Press/Addison-Wesley Publishing Co., 2000. ISBN1581132085.

[21] N. Gagvani and D. Silver. Parameter-controlled volume thinning. Graphical Modelsand Image Processing, 61(3):149–164, 1999. ISSN 1077-3169.

[22] A. Gueziec. Meshsweeper: Dynamic Point-to-Polygonal Mesh Distance andApplications. IEEE transactions on visualization and computer graphics, 7(1):47,2001.

65

http://www.cise.ufl.edu/~entezari/research/docs/dissertation.pdf

http://www.cise.ufl.edu/~entezari/research/docs/dissertation.pdf

http://dx.doi.org/10.1109/VISUAL.2004.65

http://www.sciencedirect.com/science/article/B6TYG-4YDYSHM-1/2/3641c9180893327a580f70549c1dd9c9

http://www.sciencedirect.com/science/article/B6TYG-4YDYSHM-1/2/3641c9180893327a580f70549c1dd9c9

[23] C. Hamitouche, L. Ibanez, and C. Roux. Discrete Topology of (A n*) OptimalSampling Grids. Interest in Image Processing and Visualization. Journal ofMathematical Imaging and Vision, 23(3):401–417, 2005. ISSN 0924-9907.

[24] M. Harris. Optimizing parallel reduction in cuda. NVIDIA Devel-oper Technology, 2008. URL http://www.mendeley.com/research/

optimizing-parallel-reduction-cuda/.

[25] J. Helmsen, E. Puckett, P. Colella, and M. Dorr. Two new methods for simulatingphotolithography development in 3D. In Proceedings of SPIE, the Interna-tional Society for Optical Engineering, volume 2726, pages 253–261. Societyof Photo-Optical Instrumentation Engineers, 1996.

[26] K. Hoff, A. Zaferakis, M. Lin, and D. Manocha. Fast 3d geometric proximity queriesbetween rigid and deformable models using graphics hardware acceleration.UNC-CH Technical Report TR02-004, 2002.

[27] M. Jones, J. Bærentzen, and M. Sramek. 3D distance fields: A survey oftechniques and applications. IEEE Transactions on Visualization and ComputerGraphics, pages 581–599, 2006. ISSN 1077-2626.

[28] T. Ju, F. Losasso, S. Schaefer, and J. Warren. Dual contouring of hermite data.ACM Transactions on Graphics (TOG), 21(3):339–346, 2002. ISSN 0730-0301.

[29] R. Kimmel, N. Kiryati, and A. Bruckstein. Multivalued distance maps for motionplanning on surfaces with moving obstacles. Robotics and Automation, IEEETransactions on, 14(3):427–436, 1998. ISSN 1042-296X.

[30] L. Kobbelt, M. Botsch, U. Schwanecke, and H. Seidel. Feature sensitive surfaceextraction from volume data. In Proceedings of the 28th annual conference onComputer graphics and interactive techniques, pages 57–66. ACM, 2001. ISBN158113374X.

[31] F. Lekien and J. Marsden. Tricubic interpolation in three dimensions. Journal ofNumerical Methods and Engineering, 63:455–471, 2005.

[32] J. Lengyel, M. Reichert, B. Donald, and D. Greenberg. Real-time robot motionplanning using rasterizing computer graphics hardware. ACM SIGGRAPH Com-puter Graphics, 24(4):327–335, 1990. ISSN 0097-8930.

[33] W. Lorensen and H. Cline. Marching cubes: A high resolution 3D surfaceconstruction algorithm. In Proceedings of the 14th annual conference on Com-puter graphics and interactive techniques, pages 163–169. ACM, 1987. ISBN0897912276.

[34] S. Mauch. A fast algorithm for computing the closest point and distance transform.Go online to http://www. acm. caltech. edu/seanm/software/cpt/cpt. pdf, 2000.

66

http://www.mendeley.com/research/optimizing-parallel-reduction-cuda/

http://www.mendeley.com/research/optimizing-parallel-reduction-cuda/

[35] S. Mauch. Efficient Algorithms for Solving Static Hamilton-Jacobi Equations. PhDthesis, California Institute of Technology, 2003.

[36] T. Meng, B. Smith, A. Entezari, A. E. Kirkpatrick, D. Weiskopf, L. Kalantari, andT. Moller. On visual quality of optimal 3D sampling and reconstruction. In GraphicsInterface 2007, May 2007.

[37] T. Morvan, M. Reimers, and E. Samset. High performance GPU-based proximityqueries using distance fields. In Computer Graphics Forum, volume 27, pages2040–2052. John Wiley & Sons, 2008.

[38] J. Mullikin. The vector distance transform in two and three dimensions. CVGIP:Graphical Models and Image Processing, 54(6):526–535, 1992. ISSN 1049-9652.

[39] P. Novotny, L. Dimitrov, and M. Sramek. CSG operations with voxelized solids. InComputer Graphics International, 2004. Proceedings, pages 370–377. IEEE, 2005.ISBN 0769521711.

[40] T. Park, S. Lee, J. Kim, and C. Kim. CUDA-based Signed Distance Field Calculationfor Adaptive Grids. In Computer and Information Technology (CIT), 2010 IEEE 10thInternational Conference on, pages 1202–1206. IEEE, 2010.

[41] B. Payne and A. Toga. Distance field manipulation of surface models. ComputerGraphics and Applications, IEEE, 12(1):65–71, 1992. ISSN 0272-1716.

[42] R. Satherley and M. Jones. Vector-city vector distance transform. Computer Visionand Image Understanding, 82(3):238–254, 2001. ISSN 1077-3142.

[43] C. Sequin. Procedural spline interpolation in unicubix. In Proc. of the 3rd USENIXComputer Graphics Workshop, Monterey, CA, pages 63–83, 1987.

[44] J. Sethian. A fast marching level set method for monotonically advancing fronts.Proceedings of the National Academy of Sciences of the United States of America,93(4):1591, 1996.

[45] J. Sethian. Level Set Methods and Fast Marching Methods. Cambridge Mono-graphs on Applied and Computational Mathematics, 1999.

[46] C. Sigg, R. Peikert, and M. Gross. Signed Distance Transform Using GraphicsHardware. In Proceedings of the 14th IEEE Visualization 2003 (VIS’03), pages 12–.IEEE Computer Society, 2003. ISBN 0769520308.

[47] A. Sud, M. Otaduy, and D. Manocha. DiFi: Fast 3D distance field computation usinggraphics hardware. In Computer Graphics Forum, volume 23, pages 557–566.John Wiley & Sons, 2004.

[48] E. Sundholm. Distance fields accelerated with opencl, 2010.

67

[49] M. Teschner, S. Kimmerle, B. Heidelberger, G. Zachmann, L. Raghupathi,A. Fuhrmann, M. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser, et al.Collision detection for deformable objects. In Computer Graphics Forum,volume 24, pages 61–81. Wiley Online Library, 2005.

[50] T. Theußl, T. Moller, and M. Groller. Optimal regular volume sampling. In Visualiza-tion, 2001. VIS’01. Proceedings, pages 91–546. IEEE, 2009. ISBN 0780372018.

[51] G. Thurmer and C. Wuthrich. Computing vertex normals from polygonal facets.Journal of Graphics Tools, 3(1):43–46, 1998. ISSN 1086-7651.

[52] J. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactionson Automatic Control, 40(9):1528–1538, 1995. ISSN 0018-9286.

68

BIOGRAPHICAL SKETCH

Nithin Pradeep Thazheveettil received his bachelor’s in Electrical and Electronics

Engineering from TKM College of Engineering, Kollam, India in 2005. He then worked

as a software developer at Tata Consultancy Services Ltd, Bangalore, India for 3 years

before joining University of Florida, Gainesville to do his MS in computer engineering.

He has been doing research in the field of Computer Graphics as part of his master’s

program and is expected to graduate in May 2011.

69

c 2011 nithin pradeep thazheveettil - university of...

Documents