c 2011 nithin pradeep thazheveettil - university of...
TRANSCRIPT
ON SAMPLING AND RECONSTRUCTION OF DISTANCE FIELDS
By
NITHIN PRADEEP THAZHEVEETTIL
A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE
UNIVERSITY OF FLORIDA
2011
c⃝ 2011 Nithin Pradeep Thazheveettil
2
I dedicate this to my parents.
3
ACKNOWLEDGMENTS
I would like to thank my advisor, Dr. Alireza Entezari for all his support and guidance
without which this thesis would not have been possible. I would also like to thank my
thesis committee members, Dr. Jorg Peters and Dr. Anand Rangarajan for their help
and suggestions. Finally, I thank my family and friends whose constant support kept me
going when all else failed.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Body Centered Cubic Sampling Lattice . . . . . . . . . . . . . . . . . . . . 111.3 Compute Unified Device Architecture . . . . . . . . . . . . . . . . . . . . . 131.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 Sampling Distance Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Reconstruction of Volumetric Data . . . . . . . . . . . . . . . . . . . . . . 202.3 Sampling and Reconstruction on Body Centered Cubic Lattice . . . . . . 22
3 SAMPLING DISTANCE FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Brute Force Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Point-Triangle Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Graphics Processing Unit Implementation . . . . . . . . . . . . . . . . . . 263.4 Generating True Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 CUBIC INTERPOLATION IN BODY CENTERED CUBIC LATTICE USINGHERMITE DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Cubic Interpolation in Body Centered Cubic Lattice . . . . . . . . . . . . . 344.1.1 Interpolating Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.2 Smoothness And Approximation Order . . . . . . . . . . . . . . . . 394.1.3 Isotropic Finite-Differences on the Lattice . . . . . . . . . . . . . . 41
4.2 Tricubic Interpolation in Cartesian Cubic Lattice . . . . . . . . . . . . . . . 46
5 INTERPOLATION OF DISTANCE FIELDS AND EXPERIMENTS . . . . . . . . 49
6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 62
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5
LIST OF TABLES
Table page
3-1 Triangular meshes used for performance testing . . . . . . . . . . . . . . . . . 33
3-2 Comparison of execution times. The time taken by the GPU implementationis shown in the column ’GPU’ and that by the CPU implementation is shownin the column ’CPU’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4-1 The L2 norm error in reconstruction of datasets sampled at a resolution of111×111×222 on the BCC lattice using the proposed cubic interpolation. Theweighted least-squares approach is designed to use a zero mean Gaussianwith σ2 = .5 to estimate partial derivative values which show marginal improvementsin terms of reconstruction error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6
LIST OF FIGURES
Figure page
1-1 The BCC Lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1-2 Hierarchy of threads, warps, blocks and grids. . . . . . . . . . . . . . . . . . . . 15
1-3 CUDA memory model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3-1 The st-plane partitioned into 7 regions. . . . . . . . . . . . . . . . . . . . . . . . 26
3-2 Block diagram of the GPU implementation. . . . . . . . . . . . . . . . . . . . . 27
3-3 Two ways of adapting the GPU implementation for BCC lattices. . . . . . . . . 31
4-1 The Carp fish dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4-2 Weights of the finite-differencing kernel on the 27-point neighborhood of theBCC lattice for fx and fyz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5-1 The triangular meshes used in our experiments. . . . . . . . . . . . . . . . . . 50
5-2 The soccer ball dataset with approximately 512,000 samples. . . . . . . . . . . 52
5-3 The Stanford dragon dataset (side view) with approximately 512,000 samples. 53
5-4 The Stanford dragon dataset (front view) with approximately 512,000 samples. 54
5-5 The buddha dataset with approximately 512,000 samples. . . . . . . . . . . . . 55
5-6 The bunny dataset with approximately 615,000 samples. . . . . . . . . . . . . 56
5-7 The Pawn dataset with approximately 33,000 samples. . . . . . . . . . . . . . . 57
5-8 The Pawn dataset with approximately 200,000 samples. . . . . . . . . . . . . . 58
5-9 The Pawn dataset with approximately 260,000 samples. . . . . . . . . . . . . . 59
5-10 The Pawn dataset with approximately 512,000 samples. . . . . . . . . . . . . . 60
5-11 The Pawn dataset with approximately 2,095,000 samples. . . . . . . . . . . . . 61
7
Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science
ON SAMPLING AND RECONSTRUCTION OF DISTANCE FIELDS
By
Nithin Pradeep Thazheveettil
May 2011
Chair: Alireza EntezariMajor: Computer Engineering
In this thesis, we examine sampling and reconstruction of distance fields from
surfaces represented by triangular meshes. Motivated by sampling theory, we
explore the application of optimal sampling lattices in this context. To sample exact
distance values from a triangular mesh, we propose a Graphics Processing Unit(GPU)
implementation of the brute force approach and show how to adapt it to the Body
Centered Cubic(BCC) lattice. The exact gradients of distance fields can be computed
with relative ease and we believe that incorporating these values could improve
the quality of reconstruction of discrete distance fields. Hence, we discuss ways of
modifying our implementation to sample exact gradients with relatively few additional
computations. The suitability of BCC as a sampling lattice for distance fields and the
merits of using exact gradient data are evaluated by reconstructing and visualizing
distance fields sampled from various triangular meshes. To reconstruct the data on
BCC lattices, we introduce a cubic spline construction that is exactly interpolating at the
lattice points and can utilize true gradient values where available. We also compare and
contrast the images rendered from these datasets to those rendered using Catmull-Rom
interpolation on distance fields sampled on Cartesian Cubic(CC) lattices.
8
CHAPTER 1INTRODUCTION
1.1 Motivation
A distance field of an object is a scalar field around the object in which every point
holds the shortest distance from that point to the surface of the object.1 It can be
considered as an implicit representation of the object. On the surface of the object, the
field will have a value of 0, providing an implicit representation of the object as the zero
level set of the field. In addition to the distances themselves, distance fields can also
indicate other properties of a given point, such as the direction from that point to the
surface and whether the point is inside or outside the object. The gradient of the field at
any point gives the direction from that point to the closest point on the surface. Distance
fields can be signed or unsigned. In a signed distance field, every point is assigned a
sign in addition to the scalar value. This sign indicates whether the point is inside or
outside the object. Note that for this to be meaningful, the surface must be closed and
orientable, while distance fields in general do not require this property.
Distance fields have applications in a variety of fields like computer vision, physics,
medical imaging and computer graphics. They are used for proximity computations[26,
37], collision detection[48, 49], morphing[5, 8], skeletal animation[21], path planning[29,
32] and constructive solid geometry(CSG) operations[6, 39]. In the field of volume
graphics, distance fields are used for modeling, manipulation and visualization of
geometric objects. Techniques of accelerating visualization, like skipping empty voxels
during ray tracing based on distance values, also make distance field representation
attractive. In this thesis, we are primarily concerned with using distance fields to
represent and visualize surfaces represented by triangular meshes.
1 Parts of Chapters 2, 4, 5 and 6 are adapted with permission from M. Mirzargar, N.Thazheveettil, W. Ye and A. Entezari. Cubic Interpolation On The Body Centered CubicLattice. Submitted to Transactions on Visualization and Computer Graphics.
9
Since distance fields of geometric models are almost always too complex to be
represented in analytical form, they are sampled and stored in discrete form. The
sample locations are usually chosen to be points of a regular grid though adaptive
schemes that are better suited for specific applications have been suggested as
well [2, 20]. Non-regular sampling schemes are attractive for the adaptivity features;
however, efficient reconstruction is difficult and expensive scattered-data interpolation
techniques have to be employed. Moreover, uniform grid representation lends to an
easy way to analyze signed distance functions in Fourier space through Fast Fourier
Transform(FFT) of sampled data. In 3D, the most commonly used uniform sampling
lattice is the Cartesian Cubic (CC) lattice, though other lattices like the Body Centered
Cubic (BCC) lattice and the Face Centered Cubic (FCC) lattice have been shown to be
more efficient from the sampling theoretic point of view. In Section 1.2, we discuss the
CC and BCC lattices and their sampling efficiencies.
Each sample point in a discrete distance field of a surface holds the minimum
distance from that point to the surface. Computing this distance involves identifying
the point on the surface that is closest to the sample point. Since this thesis focuses
on surfaces represented by triangular meshes (’surface’ or ’object’ hereafter refers
to a surface represented by triangular mesh), this closest point can be identified by
identifying the triangle it is part of. Thus, a discrete distance field can be constructed
directly from the triangular mesh data using a brute force approach, i.e. computing the
distance from every sample point to every geometric primitive in the object. While this
method is accurate, it is also extremely slow, making it unsuitable for most applications.
Numerous other approaches have been suggested, mostly focusing more on speed than
accuracy. We discuss many of these approaches in Section 2.1. However, as there are
applications that require accurate values of distance fields, we try to address the issue
of generating accurate distance fields as fast as possible by pursuing techniques that
can accelerate the brute force method.
10
Distance fields so generated and the objects they represent are often consumed
visually. Visualization also helps in evaluating the quality of various discrete representations
of distance fields. Hence the topic of visualizing distance fields receives our attention
as well. Distance fields can either be visualized directly using a volume visualization
method like ray casting or be converted to a polygonal mesh using techniques like the
Marching Cubes algorithm (or the Marching Tetrahedra algorithm) and rendered using
a traditional surface rendering technique. In either case, the continuous distance field
must be reconstructed from the discrete values first, using some interpolation scheme.
While a wide variety of interpolation schemes could be used with distance fields, we
are particularly interested in those that make use of the availability of its true gradient
values, which can be computed along with the distance values themselves. Coming up
with such a scheme for the BCC lattice is another topic we address in this thesis.
The process of converting a polygonal mesh to a discrete distance field and back
almost always introduces errors in the mesh geometry. As the accuracy of this process
is often a key consideration in determining the applications it is suitable for, we devote
some attention to the accuracy of the sampling and visualization schemes we introduce
in this thesis.
The rest of this chapter is organized as follows. Section 1.2 talks about sampling
on the BCC lattice. Section 1.3 gives a brief introduction to Compute Unified Device
Architecture(CUDA) and in Section 1.4, we list the major contributions of this thesis.
1.2 Body Centered Cubic Sampling Lattice
The goal of optimal sampling is to capture the entire spectrum of the underlying
signal using the least number of samples. For a specific given signal, there is a unique
choice of optimal sampling lattice, which defines the points in space where the signal
is sampled. This lattice can be computed based on the geometric knowledge of the
spectrum of the signal. But for generic data, where such knowledge is not available,
11
regular lattices are used for sampling. In 3D, the most commonly used regular sampling
lattice is the Cartesian Cubic(CC) lattice.
The CC lattice is formed by the tensor-product of uniform sampling in the lower
dimensions. However, while simple and popular, the CC lattice has been shown to
be inefficient in sampling generic multivariate signals. Both the Body Centered Cubic
lattice(BCC) and Face Centered Cubic(FCC) lattice, which are 3D counterparts of the
hexagonal lattice, exhibit higher sampling efficiency (i.e. capture more information per
sample taken) than the CC lattice with the BCC lattice performing better on smooth
signals.
y
z
x
A B
Figure 1-1. The BCC Lattice. (A) The BCC lattice is formed by adding a lattice point(inblue) to the center of each cubic element of the CC lattice. (B)The uniquetetrahedralization of the BCC lattice is composed of the semi-regularcongruent tetrahedra.
The BCC lattice can be constructed from the CC lattice by adding a lattice point
to the center of each cubic element formed by 8 neighboring lattice points(part (A) of
Figure 1-1). Its relative superiority in sampling generic signals can be explained by a
12
frequency domain analysis of generic signals which we assume to be isotropic (i.e. not
biased in any direction) and band limited. Sampling a signal in a regular manner in the
spatial domain corresponds to periodically replicating its spectrum in the frequency
domain. For the sampling to be optimal, it should be done such that the spectrum is
replicated as densely as possible without overlapping in the frequency domain. The
spectrum of an isotropic band limited signal has a spherical support. So, the optimal
sampling lattice is that whose dual in frequency domain allows spheres to be packed as
densely as possible without overlap. It can be seen that among the lattices discussed,
the FCC lattice enables the most optimal sphere packing. Hence, its dual in the spatial
domain, the BCC lattice, performs best as a sampling lattice.
Every lattice point in the lattice has a neighboring region where it is the only lattice
point. This region can be defined using its Delauny diagram or alternately, its dual, the
Voronoi cells. A Voronoi cell of a lattice point is the set of points in its neighborhood
that are closest to it. For a given lattice point, the other lattice points whose Voronoi
cells share a face with its own Voronoi cell forms its ”‘first ring”’ of neighborhood. In
the case of BCC lattice, this neighborhood forms a rhombic dodecahedron, which
can be decomposed into congruent Delaunay tetrahedra. As the space of BCC lattice
is composed of such rhombic dodecahedra, the entire space can be partitioned into
congruent Delaunay tetrahedra. The rhombic dodecahedron and its tetrahedral partition
are shown in part (B) of Figure 1-1.
It is worth noting here that the Delaunay tetrahedralization of the BCC lattice is
unique. If the tetrahedralization isn’t unique, as is the case with the CC lattice, the
reconstruction of the sampled signal could depend on the choice of the tetrahedra and
hence could result in arbitrary and inconsistent reconstruction.
1.3 Compute Unified Device Architecture
Compute Unified Device Architecture (CUDA) is a general purpose parallel
programming architecture developed by NVIDIA. It makes the massive parallel
13
processing capabilities of the graphics processing unit(GPU) accessible to the user
for general purpose computations. The CUDA architecture allows CUDA enabled
GPUs to be programmed using a high level programming language called C for CUDA
which is based on ANSI C with a few CUDA specific extensions. CUDA programs can
contain code segments targeted to run on the CPU (referred to as host) as well as code
segments targeted to run on the GPU (referred to as device). The code segments that
run on the device are written as C style functions called kernels. CUDA kernels, along
with accompanying host code, are written in .cu files which are compiled using the nvcc
tool. Additional host code can also be written in separate C/C++ files which must be
compiled separately and linked using a standard C/C++ compiler and linker.
When a kernel is launched, a user specified number of CUDA threads are created
to execute the kernel. The number of threads to be created is specified in terms
of 3 dimensional arrays of threads called threadblocks. All threads within a block
can synchronize execution using a barrier primitive and share data through shared
memory. There is a limit on the number of threads that can be assigned to a block.
Multiple equally-shaped blocks are launched to generate the necessary number of
threads for the kernel. These blocks are arranged into a single 2 dimensional grid. The
number of threads assigned to a block (along with the memory resources each thread
requires) influences how well the processing power of the GPU is utilized, indicated by
a percentage value called occupancy. Thus, the number of threads should be carefully
chosen so as to maximize occupancy.
Threads are executed in groups of 32 called warps, in a Single Instruction Multiple
Thread (SIMT) manner. That is, at any given time, every thread in a warp executes
the same instruction. Parallelism is achieved by assigning different data units to each
thread. Thus, this is similar to the Single Instruction Multiple Data (SIMD) execution
model. The relationship between threads, warps, blocks and grids are shown in
Figure 1-2.
14
Figure 1-2. Hierarchy of threads, warps, blocks and grids.
Figure 1-3. CUDA memory model.
15
CUDA separates the host memory space from the device memory space. The
data that a kernel works with during execution must be in the device memory. A CUDA
enabled device has multiple types of memory as illustrated in Figure 1-3. Global,
constant and texture memory are accessible from the host and from every thread on the
device. Hence, data can be transferred directly from the host memory to these memory
spaces and accessed by any thread in the grid. All three are persistent (i.e. retain data)
across kernel launches by the same application. However, global memory can be written
to by the threads while constant and texture memory are read-only from the device.
Shared memory and registers are fast on-chip memory. Shared memory is allocated per
threadblock. Every thread in a threadblock has access to its shared memory but cannot
access the shared memory of other blocks. Registers are allocated per thread and are
used for local variables declared within the kernel and temporary variables used for
computations. The number of registers allocated to a thread depends on the number of
threads assigned to a block. Once the registers are used up, any additional variables are
stored in local memory, which is off-chip and considerably slower. Very large arrays and
data structures go into the local memory as well. Typically, data is loaded into registers
and shared memory before being processed to avoid large memory latencies.
For a more detailed documentation of CUDA, please refer the CUDA programming
guide[1].
1.4 Contributions
The primary contributions of this thesis are as follows.
We introduce an accelerated, GPU based implementation of the brute force method
to construct a discrete signed distance field on a CC lattice from triangular meshes. We
present ways of adapting this to work with BCC lattices and also discuss computing the
true gradient values of the distance field at the lattice points.
We propose a local cubic interpolation scheme in trivariate setting on the BCC
lattice that uses Hermite data (i.e. function values and its partial derivatives). We also
16
briefly summarize a local tricubic interpolation scheme proposed by Leiken and Marsden
[31] that uses Hermite data on the CC lattice.
Finally, we generate distance fields for various triangular meshes on both CC and
BCC lattices using the sampling technique proposed in this thesis. Then, we compare
and contrast the results of their visualization using the two interpolation schemes
mentioned previously along with the well known Catmull-Rom scheme.
17
CHAPTER 2RELATED WORK
2.1 Sampling Distance Fields
Jones et al. [27] present an excellent survey of the various methods used to
generate distance fields along with a comparison of their speed and accuracy. We
discuss a selection of those approaches that are relevant to the context of this thesis.
The naive approach to constructing a discrete 3D distance field from a triangular
mesh is to iterate through every grid point/voxel, compute the shortest distance from
it to every triangle in the mesh and store the minimum value. Payne and Toga in [41]
discuss this approach and suggest a few optimizations to accelerate it, like using
hierarchical bounding boxes in a tree structure to reduce the number of computations.
They also discuss an algorithm to find the shortest distance from a point to a triangle.
The Meshsweeper algorithm proposed by Gueziec [22] presents a dynamic algorithm
to find the shortest distance from a point to a polygonal mesh. It uses a hierarchy of
multilevel bounding boxes with the bounding boxes at each level completely enclosing
the mesh. These bounding regions are indexed into a priority queue based on the
minimum distance from the point to the region.
Maush [34, 35] presented the Characteristic/Scan Conversion method that
computes a distance field around a polygonal mesh up to a certain distance using scan
conversion. The point on a triangular mesh closest to a voxel must lie on either the face
of a triangle, an edge or a vertex. For each of these features, the approach constructs
a polyhedron that holds all the points in 3D space that are closest to that feature, up to
a certain distance from the feature. These polyhedrons are similar to truncated Voronoi
regions and are called characteristics of the feature. Then, for each feature, only the
distance to the points within the corresponding polyhedron need be computed. The
polyhedrons are scan converted to determine the points inside them and the distance
for the corresponding points computed. Sigg et al. [46] improved this algorithm by
18
using the graphics hardware to scan convert slices of the grid. The distance values are
computed using a fragment program. As the slicing is done on the CPU which can form
a bottleneck, the characteristics for the triangle face, edges and vertices are combined
to form a single polyhedron. Sud et al. [47] presents another hardware based method
which exploits properties like connectivity and spatial coherence in Voronoi regions to
cull the number of primitives considered for distance field computations for each slice
and restrict the region of computation around each primitive. This reduces the number of
distance functions computed per slice.
A different approach to computing distance fields, which yields approximate values,
is Distance Transforms. Here, the distance values for a narrow band around the mesh
surface are first computed and then propagated through the rest of the volume. Mullikin
[38] discusses applying one particular distance transform called the Vector Distance
Transform to 3D images. In Vector Distance Transforms, the vector connecting a point
to the closest point on the object surface is computed along with the distance values
and these vectors are propagated to the neighboring voxels and used to compute the
distance values for them. Satherley and Jones [42] introduce a faster and more accurate
Vector Distance Transform and discuss how to generate distance fields using it. Breen
et al. [6] present a wavefront propagation technique to generate distance fields for
CSG models with sub-voxel accuracy. They compute the shortest distance and closest
surface point for a set of points in the narrow band and propagate them to the rest of
the volume using a Fast Marching Method [25, 44, 45, 52]. A critical review of various
Distance Transform methods can be found in Cuisenaire [12].
Finally, a CUDA based approach to computing adaptive distance fields has been
suggested recently [40]. Like our implementation, this approach also uses a GPU
implementation of the brute force method (the naive method described at the beginning
of this section) to compute distance fields. Each CUDA thread is assigned a mesh
element (triangle) and the sample points are fed to the GPU one after the other.
19
While each thread could be assigned a sample point instead of a mesh element, the
non-uniform nature of sampling grids prompts the authors to pick mesh elements as the
foci for parallelization. During each iteration, each thread computes the distance from
the input sample point to the triangle assigned to it. The shortest value among these
is then computed using a parallel reduction technique[24]. The sign of each distance
value is computed using the angle-weighted pseudonormal method [3]. Compared to a
single core CPU implementation of a kd-tree based nearest neighbor search algorithm,
the authors report speedups ranging from 10 to 65 for various meshes and sampling
resolution.
2.2 Reconstruction of Volumetric Data
While a vast amount of literature is available on reconstruction of volumetric data,
we are primarily interested in approaches that make use of Hermite data, i.e. data
values and its exact derivative values, to achieve improved reconstruction. A few such
approaches are reviewed here.
Marching Cubes [33] is a well known algorithm for constructing a triangular mesh
representation of a surface from a grid based volumetric representation. Vertices of the
triangles are formed by the points of intersection of the surface of interest and the edges
of the grid. These intersection points are found using linear interpolation on the values at
the grid points. Then, for each cube in the grid, these vertices are connected according
to a predefined case table to form the triangles. One major drawback of this method is
the poor reconstruction of sharp features inside the grid cells.
Kobbelt et al. [30] propose an Extended Marching Cubes algorithm, along with
an enhanced distance field representation to improve the reconstruction of sharp
features. The enhanced representation involves sampling directed distance values in
x, y and z directions instead of just a scalar distance value. That is, at each sample
point, the distances to the closest surface point in each of positive x, y and z directions
are stored. For triangular meshes, these points can be found from the intersection of
20
the surface with the corresponding edge of the grid. During reconstruction, the surface
points obtained from these values will be more accurate than those obtained from linear
interpolation of the original scalar values. The Extended Marching Cubes algorithm tries
to identify cubes that hold sharp features (corners or edges). For this, at the points of
intersection of the surface and edge, the gradient of the distance field is sampled and
stored. As this point is on the surface, for triangular meshes, this gradient will be the
normal to the triangle at that point. During reconstruction, the opening angle of the cone
formed by these gradients is used to detect if the cube has a sharp feature. If it doesn’t,
the standard Marching Cubes table is used for the cube. If it does, the gradients are
used to construct tangents to the surface at the corresponding points and an additional
vertex is inserted at the intersection of these tangents. A triangle fan is then formed
connecting this vertex to all other vertices on the cube to try and approximate the
sharp feature. A post processing step of flipping edges is then applied to correct mesh
connectivity in the case of sharp edges.
Ju et al. [28] describe a method for contouring a signed grid that improves upon
the Extended Marching Cubes(EMC) algorithm. This approach uses an octree instead
of a uniform 3D grid and the edges of the octree’s leaves that have sign changes are
tagged using exact intersection and normal data. A Quadratic Error Function (QEF)
is formed for each leaf cube of the octree from the normal data. Then, for each cube
that exhibits a sign change, a vertex is placed at the minimizer of the QEF. This avoids
having to explicitly identify cubes that have sharp features. The QEF is chosen such
that the vertex that minimizes it best approximates the original geometry. Then, for each
edge exhibiting sign changes, the minimizer vertex of the cubes sharing the edge are
connected to generate a quad. Simplifications to the Octree are also presented which
avoids wasting of space by collapsing leaf cubes that are homogeneous (i.e. have the
same sign for all vertices) and forming QEFs for internal nodes.
21
The previous two approaches use Hermite data to construct polygonal mesh
representations of surfaces from their discrete implicit representation. Hermite data
can also be used in interpolating these discrete values more accurately to aid in the
visualization of the surfaces. Leiken and Marsden [31] present such an interpolation
scheme on the CC lattice. Their scheme locally approximates the sampled data using
tricubic splines that are interpolating. The Hermite data associated with the vertices of a
CC lattice is used to construct a linear system of equations from which the co-efficients
of the interpolating polynomial are obtained. In reconstructing distance fields, the
availability of exact values of partial derivatives makes approaches using Hermite
data particularly attractive. The interpolation scheme we propose in this thesis is also
capable of making use of sampled derivative values where available.
2.3 Sampling and Reconstruction on Body Centered Cubic Lattice
Motivated by the sampling theoretic advantages of the BCC lattice (Section 1.2),
this thesis explores its application in sampling, and consequently reconstructing,
distance fields. In this section, we discuss some of the previous work carried out in
sampling and reconstructing volumetric data on the BCC lattice. Theußl et al. [50]
make a case for using BCC lattice in volume graphics by showing that BCC lattices
can achieve the same accuracy as CC lattices with 29.3% fewer samples (or, in turn,
samples on a BCC lattice retain about 30% more information than the same number
of samples on a CC lattice). They also demostrate improved rendering rates on BCC
using a splatting technique adapted to BCC lattices. Entezari et al. [15, 16, 18] introduce
a set of box splines reconstruction schemes which are more suited to the geometry of
the BCC lattice and show how to obtain its optimal approximation order [17] using the
principle of quasi-interpolation [13]. Meng et al. [36], confirm that these advantages
significantly improve the visual quality of the visualization pipeline based on the BCC
sampling lattice. Finkbeiner et al. [19] have recently developed a GPU implementation
22
of a fast algorithm for convolution of BCC sampled data with the above-mentioned box
splines.
Based on the idea that the BCC lattice can be considered to be composed of
two overlapping CC lattices, Csebfalvi proposes a Gaussian reconstruction on BCC
using global prefiltering [9] and a prefiltered B-spline reconstruction scheme for
quasi-interpolation [10]. Decomposing BCC lattice into two CC lattices allows for
efficient hardware implementations[11]. But this disregards the topological structure of
the lattice [23] and hence the neighborhood of each lattice point is distorted. Moreover,
neither the quasi-interpolation methods nor the box spline schemes (beyond the linear
C 0 case) are exactly interpolating. Our interpolation scheme addresses both these
shortcomings.
23
CHAPTER 3SAMPLING DISTANCE FIELDS
This chapter discusses how the brute force method is used to sample accurate
distance fields from triangular meshes on CC and BCC lattices. It then describes
how the parallel processing capabilities of the GPU can be utilized to accelerate the
implementation of the algorithm. A method to obtain the true gradients of the distance
field at each lattice point is also discussed.
3.1 Brute Force Method
The simplest and most straightforward way to sample accurate distance fields from
triangular meshes is to employ the brute force method. Before we describe the brute
force method though, it is important to explain the notion of a lattice point. To sample
the distance field of a mesh, we overlay a lattice over the ambient space in which the
mesh lives. The samples are then taken at the points on this lattice, i.e. the lattice
points. Constructing a discrete distance field from a surface involves computing the
minimum distance from each lattice point to the surface. For a given lattice point, this
would be the distance from the lattice point to the point on the surface that is closest
to it. As triangular meshes are composed of a finite number of triangles, every point
on the surface falls on either the face of a triangle, an edge or a vertex. Hence, every
point on the surface can be considered to be part of one or more triangles. Then, the
task of computing the distance to the closest point on the surface can be simplified by
first finding the triangle that holds the closest point and then computing the minimum
distance to it. As the name implies, the brute force technique does this by computing the
minimum distance from each lattice point to every triangle in the mesh and storing the
minimum values corresponding to each lattice point.
To find the minimum distance from a lattice point to a triangle, we use a point-triangle
distance algorithm proposed by Eberly in [14]. This algorithm is described in Section 3.2.
24
Though simple and exact, the brute force technique is computationally very
expensive. It evaluates the distance between every possible lattice point-triangle
pairing, leading to a time complexity of O(mn), where m is the number of lattice points
and n the number of triangles. As a straight forward implementation of this method can
take a prohibitively long time, we propose an implementation that utilizes the parallel
processing capabilities of NVIDIA’s multi-core GPUs using the CUDA development
platform.
3.2 Point-Triangle Distance
This section gives a brief description of the Point-Triangle distance method
proposed by Eberly. To find the minimum distance between a point P and a triangle
T , T is represented in the form
T (s, t) = B + sE0 + tE1 for (s, t) ∈ D = (s, t) : s ∈ [0, 1], t ∈ [0, 1], s + t ≤ 1
where B is one of the vertices of T , E0 and E1 are the vectors from B to the other
two vertices of T and s and t are scalars. Each pair of values for s and t such that
(s, t) ∈ D describes a point that is on the triangle, i.e. on the face, edges or vertices of
the triangle. When (s, t) /∈ D, the point described is on the same plane as the triangle,
but outside it. Then our task is to find the point on T that is closest to P.
The squared distance between P and any point on T is given by
Q(s, t) = |T (s, t)− P|2 = as2 + 2bst + ct2 + 2ds + 2et + f for (s, t) ∈ D
where a = E0 · E0, b = E0 · E1, c = E1 · E1, d = E0 · (B −P), e = E1 · (B −P), and f
= (B − P) · (B − P).
Then, the point on T that is closest to P is obtained by minimizing Q over D. The
minimum can occur in any of the 7 regions as shown in Figure 3-1. If the minimum
occurs in any of regions 1− 6, the corresponding point on the boundary of the triangle is
computed.
Refer [14] for a more detailed description of the algorithm and its implementation.
25
Region 2
Region 1
Region 6Region 5
Region 3
Region 5
Region 5
t
s
Figure 3-1. The st-plane partitioned into 7 regions.
3.3 Graphics Processing Unit Implementation
The brute force method described above performs the same set of computations
on every point-triangle pair. We exploit this inherent potential for parallelization by
implementing these computations to run in parallel on a CUDA capable GPU. For a
small number of point-triangle pairs, it is possible to utilize the available parallelism
completely by using a separate thread for each pair. However, with larger datasets,
limitations in memory and parallel processing power of the GPU make it impossible
to avoid some amount of serialization. Moreover, with multiple threads computing the
distances from the same lattice points, identifying and retaining the shortest distance
would require multiple threads to compare and write to a common location. Since
atomic operations on floats are not available in CUDA (at the time of designing this
implementation), implementing this approach becomes quite complicated and inefficient.
Assigning a triangle to each thread and computing its distance to every lattice point
serially within the thread again involves multiple threads handling the same lattice point,
and suffers from the same drawback mentioned above. Hence, we follow an approach
where every thread is responsible for a specific lattice point and computes the distance
26
between every pair involving that point. These computations are performed in serial
within the thread, making it easy to locally keep track of the shortest distance for that
point. The only data shared between the threads is the triangle coordinates, which are
read-only as far as the threads are concerned.
Within each thread, a loop is used to iterate over the triangles and compute the
distances to them. The algorithm described in Section 3.2 is used to compute the
distances. This loop, along with a few memory operations, forms our kernel (i.e. the
function that is executed on the GPU by each thread). It takes the triangle data and
point coordinates as input and returns the shortest distance from that point to the mesh
surface as the output. As the kernel cannot access the CPU memory space, these
values must be transferred and stored on GPU memory. Figure 3-2 shows a block
diagram of the GPU implementation discussed here.
Thread 0
Point 0
Thread 1
Point 1
Thread N
Point N
...
...
...
Minimum squared distances
Corresponding triangles and
closest points
Loop
GPUCPU
Triangle mesh
Preprocessing
Sign computations
Gradient computations
Figure 3-2. Block diagram of the GPU implementation. Every lattice point is assigned toa CUDA thread. The triangles are processed by these threads in a loop, oneafter the other.
27
The triangle data is only read from within the kernel and every concurrent thread
reads the same triangle data at any given point. This makes the constant memory
on the GPU a good option to store triangle data on. While computing the distance
to a particular triangle, the coordinates of that triangle are fetched from constant
memory and stored in shared memory so that it is faster to access those values during
computation. The shortest distance value for each lattice point is stored in a register
within its thread. To compute the sign of the distance, it is necessary to keep track of
the triangle and the point on the triangle corresponding to the shortest distance. These
values are stored in registers as well. Once all the triangles have been processed and
the shortest distance found, this value along with the corresponding triangle data is
transferred to the global memory so that it can then be transferred to the CPU memory
space.
The coordinates of the lattice point corresponding to each thread can be computed
within the thread using its thread ID and block ID. However, computations within the
kernel uses registers, and sometimes, it might be necessary to reduce register usage
to get better efficiency. In such cases, a part of the computations for the lattice point
coordinates can be done on the CPU and transferred to the GPU via constant memory.
We do this by computing the base x , y and z coordinates of each thread block on
the CPU and passing these values as arrays to the GPU. These base values are the
coordinates of the first thread in the corresponding thread block. For the other threads
in the block, the lattice point coordinates are calculated by adding the coordinates of the
thread ID to these base coordinates.
The limited memory on the GPU limits the number of lattice points and triangles that
can be processed in a single kernel launch. In the case of triangles, the capacity of the
constant memory limits the number of triangles that can be sent to the GPU per launch.
If the mesh has more triangles, we launch the kernel multiple times, until all triangles
have been processed. Since each kernel launch will find its own shortest distance (from
28
among the set of triangles it processed), we need a way to find the shortest distance
among them. This task is simplified by the fact that global memory on the GPU retains
data across kernel launches. Hence, at the time of a particular launch, the global
memory will be holding the shortest distance for each lattice point from the previous
launch. All that needs to be done is to fetch these values into registers at the start of
the kernel execution, as the current shortest distance, so that the distances computed
during that launch are compared against these values.
The number of lattice points that can be processed per launch is limited by the
global memory available on the GPU. This is because each lattice point being processed
during a launch stores its shortest distance and associated triangle data in the global
memory before transferring to the CPU. Therefore, large lattices (or grids) must be split
into smaller sections and processed over multiple kernel launches. For simplicity, we
assume that our lattices are Cartesian cubic and have resolutions that are a multiple of
32 along each axis. Then, we split the lattice into cubical regions of resolution 32x32x32
with each region being processed by a separate kernel launch. With such a split,
the computation of lattice point coordinates can be greatly simplified by choosing the
dimensions of the thread and block IDs appropriately. If the resolution is not an exact
multiple of 32, it can be padded to the next highest multiple. The only drawback is that
one or more kernel launches will have a few threads that are not performing necessary
work. However, this will not cause a significant performance hit. If neither the triangles
nor the lattice points fit within a single kernel launch, then the multiple launches are
structured so that all the triangles for a particular region of the grid is processed before
moving on to the next region. This is done so that the distance values retained by the
global memory on the GPU can be used by the following kernel launches.
Since the lattice is split such that each region has 32, 768 lattice points, there will
be as many threads per kernel launch. These threads must be grouped into blocks. The
number of threads a block can have is restricted by the number of registers used by
29
the thread. At the same time, a block should ideally have at least 192 threads to hide
memory latency. Considering these factors, we group our threads into 32 blocks, each
holding 256 threads. On GPUs that can run more than 32, 768 threads concurrently,
we could process more than one region of the lattice per kernel launch by increasing
the number of thread blocks per launch. This makes sure that the available parallel
processing power is completely utilized.
While the approach described so far assumes a CC lattice, it can be adapted to
sampling on a BCC lattice with a few minor modifications. The easiest way to do this is
by sampling on a CC lattice that is twice the resolution of the required BCC lattice in x
and y directions. In other words, to sample on a BCC lattice of resolution RxxRyx2Rz
(the resolution along z axis is shown as twice Rz to account for the additional points
at the center of each cube on a BCC lattice), we sample on a CC lattice of resolution
2Rxx2Ryx2Rz . This can be considered to be a BCC lattice of resolution RxxRyx2Rz with
additional data points at the center of every face and edge. These additional values can
then be thrown away to obtain the necessary distance field on a BCC lattice. This is
the approach we have followed and is illustrated in Figure 3-3(A) with the points to be
discarded shown in white. However, for large resolutions, this approach takes a large
number of unnecessary samples only to be discarded later. This can slow down the
application considerably. To improve performance, we can sample the distance field
on two CC lattices, of resolutions RxxRyxRz and (Rx − 1)x(Ry − 1)x(Rz − 1), with the
second grid being shifted along each axis by half a unit. In other words, for each point
(x , y , z) on the first grid, the corresponding point on the second grid will be located at
(x + h/2, y + h/2, z + h/2) where h is the distance between two lattice points along
any axis. It can be seen easily that each point on the second grid falls at the center of a
cubical region formed by eight adjacent lattice points of the first grid. This is essentially a
BCC grid of resolution RxxRyx2Rz . Figure 3-3(B) illustrates this method, with the shifted
CC lattice shown in blue. Combining the two sets of samples appropriately is more
30
complicated than simply discarding a set of samples. However, since we only take as
many samples as necessary for the BCC lattice here, the performance will be better for
grids of higher resolution.
y
z
x
A
y
z
x
B
Figure 3-3. Two ways of adapting the GPU implementation for BCC lattices. (A) Sampleon CC lattice of twice the resolution and discard the additional (white) points.(B) Sample on two CC lattices, the second (blue) shifted along each axis byhalf a cell length.
The distance values obtained from the kernel are in fact squared distance values.
These are converted to the actual distance values by taking the square root. Then, the
values corresponding to points inside the triangle mesh are given a negative sign. To
identify points that are inside the mesh, we use a method proposed by Bærentzen and
Aanæs in [3] that uses angle weighted pseudo-normals (originally proposed by Thurmer
and Wuthrich [51] and independently by Sequin [43]). Finally, these values are scaled to
the range [0 − 255] such that a value of 127 represents points that are on the surface of
the mesh. Values in the range [0 − 127) represent points outside the mesh with lower
31
values indicating longer distances and values in the range (127 − 255] indicates points
inside the mesh with the value increasing with distance.
3.4 Generating True Gradients
One characteristic of distance fields is that their true gradients at any point can be
computed precisely with relatively few extra computations. As this gradient data proves
useful in reconstructing distance fields, we describe how these values can be computed
as part of our implementation of the brute force method.
As previously mentioned, the value of a distance field at any point is the shortest
distance from that point to the surface of the triangular mesh under consideration. Then,
the true gradient of the distance field at that point is the vector from the point to the
point on the mesh closest to it. The aforementioned shortest distance is essentially
the absolute length of this vector. The first order partial derivatives of the distance field
along each axis is the component of the gradient vector along that axis.
To compute the gradient value for a particular lattice point, we need the point on the
mesh closest to it. Our kernel, which computes the shortest distance corresponding to a
lattice point, identifies this closest mesh point during the course of its computations.
Recall that along with the shortest distance, our kernel also tracks and returns
associated triangle data for each lattice point. This triangle data includes the triangle
that holds the closest mesh point and the s and t values (as described in Section 3.2)
for that point. Using this data, the closest mesh point can be identified and the gradient
vector computed. This is then split into its x , y and z components and appropriately
scaled to give the true values of the first order partial derivatives.
3.5 Performance
To analyze the speed-up achieved by our implementation, we coded up a pure CPU
implementation of the brute force algorithm, and generated distance fields at different
resolutions from the soccer ball, Stanford bunny and Stanford dragon meshes using
both implementations. The GPU implementation was executed on an NVIDIA GeForce
32
GTX 465 which has 352 CUDA cores. Table 3-1 shows the details of these meshes.
The resulting execution times are given in Table 3-2. For the datasets tested, the GPU
implementation achieves speedups ranging from 145 to 375. It can also be seen that the
performance gain of the GPU implementation improves as the mesh size and resolution
increases.
Table 3-1. Triangular meshes used for performance testingMesh No. of triangles No. of verticesSoccer ball 3,516 1,760Stanford bunny 69,666 34,835Stanford dragon 100,000 50,000
Table 3-2. Comparison of execution times. The time taken by the GPU implementation isshown in the column ’GPU’ and that by the CPU implementation is shown inthe column ’CPU’.
Dataset Sample Points GPU(sec) CPU(sec) Speed-upSoccer ball 262k 0.44125 63.9975 145Soccer ball 2,097k 2.6365 511.02 194Stanford bunny 32k 0.90175 181.3325 201Stanford bunny 262k 5.76275 1451.54 252Stanford dragon 32k 1.2325 383.6 311Stanford dragon 262k 8.16425 3063.5725 375
33
CHAPTER 4CUBIC INTERPOLATION IN BODY CENTERED CUBIC LATTICE USING HERMITE
DATA
In this chapter, we introduce a local cubic interpolation scheme on the BCC lattice
that uses Hermite data, i.e. function and derivative values. This scheme constructs
cubic polynomial interpolants locally in tetrahedral regions of the lattice using data
and derivative values in the neighborhood of the region. We then discuss a tricubic
interpolation scheme using Hermite data on the CC lattice proposed by Leiken and
Marsden in [31].
4.1 Cubic Interpolation in Body Centered Cubic Lattice
4.1.1 Interpolating Splines
Consider a function f sampled on the lattice points of a BCC lattice. We describe a
method to interpolate this function using piecewise polynomials over the BCC lattice. At
this point, we assume that the first and second order derivatives of f at the lattice points
are also available to us. In the next section, we discuss a finite-differencing scheme to
estimate the derivatives when they are not available.
As described previously Section 1.2, the BCC lattice can be uniformly partitioned
into congruent tetrahedrons with the lattice points acting as their corners. Within each
tetrahedron, the interpolating spline is defined by a polynomial of degree n, p ∈ �n.
In the trivariate setting, which is what we are interested in, this can be represented as
�n(R3) := {p(x) =∑i+j+k≤n
i ,j ,k≥0 aijkxiy jzk}. Such a polynomial has C n+3
n coefficients and
hence can be uniquely determined by (n + 3)(n + 2)(n + 1)/6 constraints.
A polynomial of degree 1 can be determined uniquely by 4 constraints and hence
can be constructed by restricting the value it takes at the 4 corners of the tetrahedron to
the corresponding values of f :
f (vi) = p(vi), when vi ∈ δ, for i = 1 ... 4, p ∈ �1, (4–1)
where vi indicates vertices of a tetrahedron δ.
34
This gives us a linear system of equations in 4 variables, solving for which yields
the 4 coefficients of the polynomial. This is essentially linear interpolation within each
tetrahedron and the spline formed by these polynomials is a piecewise linear interpolant
to the data given at lattice points
A polynomial of degree 2 requires 10 constraints. As the data values at the corners
can provide only 4 constraints, we use the first order partial derivatives along each axis
at the corners to form the remaining constraints. However, it is not possible to choose 6
constraints from 4 vertices in an unbiased manner (bias, here, refers to an asymmetric
choice of constraints per vertex). Since we require an isotropic choice of constraints,
we use degree 3 polynomials, which can be uniquely defined using 20 constraints. This
gives us 5 constraints per vertex. The data value (4–1) and the three first derivatives
(4–2) at each corner form 16 constraints.
fx(vi) = px(vi), fy(vi) = py(vi), fz(vi) = pz(vi), (4–2)
where fx , fy and fz denote the partial derivatives with respect to x , y and z respectively.
For the remaining 4 constraints, we use second order partial derivatives.
While a single second order partial derivative, like ∂2f∂x2
, can be chosen from each
corner of the tetrahedron, such a choice would be biased along specific axes. This
can be avoided by choosing a constraint based on a symmetric sum of individual
second derivatives at each corner. One such sum is (∂2f
∂x2+ ∂2f
∂y2+ ∂2f
∂z2). However, an
interpolating constraint based on this sum is found to be linearly dependent on the other
16 constraints. The other choice is a constraint based on ( ∂2f∂x∂y
+ ∂2f∂y∂z
+ ∂2f∂x∂z
) which is
linearly independent and can be used to determine the polynomial.
However, this combination does not restrict the individual second order partial
derivative values taken by the polynomial. As the interpolation constraint is enforced
only on the sum, the values the individual second derivatives take at each corner could
disagree with the corresponding values of f . The polynomial so generated could turn
35
out to be a poor approximation of f . In our experiments, the individual second derivative
values had large deviations from corresponding values of f , leading to severe artifacts in
rendered images.
To get a better approximation of f and avoid such artifacts, each individual second
order partial derivative must be constrained separately. Enforcing interpolating
constraints on every second derivative value at the corners gives us 3 constraints
per corner, making it an over-determined system. Hence, we relax the interpolation
constraint on these values and opt to minimize the L2 norm of their error over the set of
all vertices of the tetrahedron instead. To this end, we define an error function over the
space of all cubic polynomials p ∈ �3 as follows.
E(p) :=
4∑i=1
(∥ ∂2f
∂x∂y(vi)−
∂2p
∂x∂y(vi)∥2+
∥ ∂2f
∂x∂z(vi)−
∂2p
∂x∂z(vi)∥2 + ∥ ∂2f
∂y∂z(vi)−
∂p
∂y∂z(vi)∥2
).
(4–3)
Here, each term is the squared error in a specific second order partial derivative
at a specific corner. Also, this set of partial derivatives is invariant along x , y and z
directions. Thus, there is no bias along a particular direction. Minimizing this error
function with respect to the coefficients of p accounts for the remaining 4 degrees of
freedom in determining the polynomial. This is a constrained minimization problem,
where the error function acts as the objective and the 16 constraints are formed by the
interpolating constraints defined in (4–1) and (4–2). This problem is solved for the 20
coefficients from which the polynomial is constructed.
The error function defined above is quadratic in terms of the polynomial coefficients
in p, that can be represented by a 20 × 1 vector a = [a1, ... , a20]T . The constrained
minimization of this error function is carried out with respect to the coefficients of the
cubic interpolant p, with constraints defined in (4–1) and (4–2). Since these interpolation
constraints are linear in terms of polynomial coefficients, we can model our optimization
36
problem as a specific case of quadratic programming problem, known as Equality
QP [4]:
minimizea∈R20
E(a) = aTGa+ hTa+ b (4–4)
subject to Ma = f. (4–5)
We introduce a vector notation for representing our polynomials that allows us
to transform the function (4–3) to the quadratic form in (4–4) and the interpolation
constraints (4–1) and (4–2) into the constraints in (4–5). The polynomial p can be
represented as an inner product: p(x) = ⟨m, a⟩(x), in which the column vector, a,
encodes the coefficients of our polynomial. m is a column vector in which each element
is one of the monomials (in variable x) of the form xαyβzγ with α, β, γ > 0 and α+β+γ <
4 that span �3. The inner product results in a typical power-form representation of cubic
polynomial which can be evaluated at point x = vi (i.e., one of the corners of tetrahedron
δ). As a simple example, one can write a generic quadratic polynomial evaluated at
sample point 2 as: (a1 + 2a2 + 4a3) = ⟨[1, x , x2]T , [a1, a2, a3]T ⟩(2).
Interpolation constraints introduced in (4–1) and (4–2) can now be written in terms
of the coefficient vector a:
f (vi) = ⟨m, a⟩(vi)
fx(vi) = ⟨mx , a⟩(vi)
fy(vi) = ⟨my , a⟩(vi)
fz(vi) = ⟨mz , a⟩(vi),
(4–6)
that are defined for each vertex of the BCC tetrahedron vi ∈ δ, i = 1 ... 4. These 16
equations form a linear system of constraints in (4–5), where the 16 × 20 matrix M is
formed by the monomials in m and their partial derivatives evaluated at the vertices of
the tetrahedron, vi ∈ δ. In other words, each row of M corresponds to m or any of its
three partial derivatives evaluated at a vertex vi . As mentioned before, column vector a
37
represents the unknown coefficients of p and finally, f is holding the sample values of the
underlying function and its partial derivatives.
Moreover, by simple linear algebra one can reformulate the error function defined
in (4–3) in the form of (4–4) using u vT to denote the outer product of two column vector
u and v:
G =
4∑i=1
(mxy m
Txy +mxz m
Txz +myz m
Tyz
)(vi)
h = −2
4∑i=1
(fxymxy + fxzmxz + fyzmyz) (vi)
b =
4∑i=1
(f 2xy + f 2xz + f 2yz
)(vi).
(4–7)
Solving the under-determined linear system of equations, (4–5), one can find a
particular solution (e.g., normal equations on M), that we call a0. Any solution to the
system (4–5) can be written as a sum of the particular solution, a0, and an arbitrary
element in the null space of M:
a = a0 + Zt,
where columns of Z form a basis for the null space of M and t ∈ R4 since M is full rank
(i.e., 16) for the BCC tetrahedron. The basis for the null space can be computed by the
reduced-row echelon form or via singular value decomposition.
Substituting this relation in E(a) allows us to re-write the minimization process as
a function of t, E(t). The minimizer can then be explicitly derived by solving the linear
system of equations that is obtained from differentiating E(t):
Et(t) = 0. (4–8)
The unique minimizer to the error functional is, then, obtained by:
(ZTGZ
)t = −ZT
(1
2h+ Ga0
). (4–9)
38
This linear system has a unique solution since ZTGZ is positive definite (as G is
symmetric positive definite).
Since the BCC lattice can be tetrahedralized to congruent tetrahedra, a single t
parameter for the optimal cubic interpolant can be pre-computed for the geometry of δ
in the BCC lattice. Hence, we can pre-compute the coefficients of the cubic interpolant
in terms of the samples of the underlying function and its first-order partial derivative –
as specified in (4–5). In other words, the computation of the optimal cubic interpolant
can be implemented as a fast filter by considering the tetrahedron in the BCC lattice that
contains the interpolation point.
In summary, 20 degrees of freedom for a cubic interpolant for each tetrahedron are
satisfied by 16 (exact) interpolation constraints for the function values and the first-order
partial derivative values at the lattice sites (i.e., vertices of containing tetrahedron). The
additional 4 degrees of freedom are chosen optimally to minimize interpolation errors
on the second-order partial derivatives on the lattice points. The quadratic programming
problem has a unique minimizer that we use to construct the optimal cubic interpolant
for the geometry of the BCC lattice.
We tested this approach on the carp dataset. The original dataset has a resolution
of 256 × 256 × 256 (Figure 4-1(A)) which represents the ground truth and the
low-resolution, sub-sampled, datasets have about 16% of the high resolution data
on the BCC and the CC lattices. The subsampled CC volume has a resolution of
140 × 140 × 140 (Figure 4-1(B)) and the subsampled BCC volume has a resolution of
111 × 111 × 222 (Figure 4-1(C)). The BCC dataset was rendered with our cubic splines
and the CC dataset was rendered with the tricubic Catmull-Rom splines. The rib area
has been mostly distorted in the CC image but are better preserved in the BCC image.
4.1.2 Smoothness And Approximation Order
The spline s , constructed as described in the previous section, is C 1 smooth across
the faces of the tetrahedra. First, we will note that the values of s and its first derivatives
39
A B C
Figure 4-1. The Carp fish dataset. The ground-truth Carp fish dataset, (A), with 16, 777kpoints is sub-sampled to 16% on the Cartesian (B) and BCC (C) lattices forcomparison. The Cartesian data is interpolated with Catmull-Rom and theBCC lattice is interpolated with our cubic spline. The tail fins and rib areashave preserved their connectivity in the BCC dataset while have beendistorted in the Catmull-Rom case.
on a face of a tetrahedron depends only on the vertices on that face. This becomes
evident when the polynomial within the tetrahedron is represented in Bernstein-Bezier
basis form, as the barycentric coordinates of the corner opposite the face is 0 on the
face. Now, consider a face shared by two tetrahedra, T1 and T2. We’ll denote the
polynomial pieces of s within them as P1 and P2 respectively. At the three vertices of
the shared face, P1 and P2 have the same values and first derivatives, equal to the
corresponding values of f , the underlying function. Thus, the values taken by P1 and P2
on the shared face are the same, making s C 1 smooth across the face.
The order of approximation can be thought of as referring to the accuracy of an
approximation, or alternatively, the order of magnitude of the error in approximation.
When an interpolating polynomial p is used to represent a function f , an order of
approximation of n means that the approximation error can be represented in terms of
the sampling distance h as O(hn).
The classical Strang-Fix condition relates the approximation order α of a Spline
space Sn with its ability to precisely reproduce polynomials of degree up to, and
40
including, (α − 1). In the multivariate setting, the notion of local reproduction of
polynomials is needed for proving approximation order.
We now show that our cubic interpolant can exactly reproduce all polynomials up
to degree 3 within the corresponding tetrahedron. Remember that the coefficients of
the polynomial are recovered by solving a constrained minimization problem, where
the 16 constraints are formed by enforcing interpolating constraints on the values
the polynomial and its first derivatives take at the 4 vertices of the tetrahedron. The
original polynomial f ∈ �≤3(R3) can be recovered by recovering its 20 coefficients from
this problem. It is easy to see that the coefficients of f satisfy all 16 aforementioned
constraints. Now, consider the error function in (4–3). This is a summation of squared
terms and hence cannot take a negative value. Using the coefficients of f in this function
will make each individual term 0, thus returning the minimum value of the function. Thus,
p = f is a valid solution of the constrained minimization problem. Since the function to be
minimized is quadratic, its solution has to be unique, proving that the solution obtained
will be p = f .
Since our interpolating spline can locally reproduce all polynomials up to degree 3,
it has an order of approximation of α = 4. This order of approximation is only possible
when exact partial derivatives of f are available. One area of application where the
exact first derivatives are available is the sampling and reconstruction of signed distance
fields, which is discussed in the next chapter. For scalar field data, where only the
function values are available, we need a finite-differencing scheme that meets the
approximation order of our construction. This is discussed in the next section.
4.1.3 Isotropic Finite-Differences on the Lattice
When the partial derivatives of the function f is known (e.g., Hermite data), the
spline construction interpolates f and its first order partial derivatives exactly. However,
for the scalar-field data where only function values are known, we need to employ
finite-differences to approximate partial derivatives of f (used in (4–2)). This approach
41
in the univariate setting leads directly to the Catmull-Rom splines which are essentially
Hermite interpolation when derivatives are approximated with finite-differences.
Furthermore, in order to maintain the approximation order, the finite-differencing scheme
must be exact on the polynomial space of interest. In other words, we need to design
fourth-order finite differences on the BCC lattice that provide exact partial derivatives
whenever f ∈ �≤3(R3).
In the univariate setting, derivative estimation is easily derived using Taylor series
expansion. The idea behind central differencing is to use the expansion to evaluate f at
various small distances, h, from x to get a good estimate of f ′ at x .
f (x + h)− f (x − h) = 2hf ′(x) +h3
3f ′′′(x) +O(h4) (4–10)
The Taylor series analysis shows that the central differencing estimate of the univariate
derivative is a second order approximant. Higher orders of approximation to the
derivative f ′ are obtained by employing a technique, called Richardson’s extrapolation[7],
that scales h:
f (x + 2h)− f (x − 2h) = 4hf ′(x) +8h3
3f ′′′(x) +O(h4). (4–11)
One can eliminate f ′′′(x) term among (4–10) and (4–11) and obtain a higher-order
approximant. Therefore, the well-known five-point stencil for approximating the
derivative is of order four:
f ′(x) =8f (x + h)− 8f (x − h)− f (x + 2h) + f (x − 2h)
12h
+O(h4).
This approach can be repeated to obtain seven-point and nine-point stencils that
constitute filters with increasing approximation orders for derivative estimation. The main
observation here is that designing high-order finite differencing involves a Taylor series
expansion of the function locally. The polynomial is formed by the first few terms in the
42
Taylor expansion. Then the polynomial’s derivative at the expansion point approximates
the derivative of the original function with an order of accuracy which is one greater
than the degree of the polynomial. This polynomial can be constructed by a polynomial
interpolation using the neighboring sample points (i.e., f (x ± h), f (x ± 2h), . . . ). The
larger the neighborhood is, the larger the degree of the polynomial is which determines
the order of accuracy for derivative estimation. The actual derivative of this polynomial
interpolant, then, constitutes the finite-difference approximation to the true derivative.
Extending Richardson’s extrapolation to the multivariate setting involves the
multivariate Taylor series expansion. On the other hand, we can employ Richardson’s
extrapolation method on the BCC, or any other lattice, leveraging the equivalence of
Richardson’s extrapolation with polynomial interpolation on a local neighborhood. The
idea behind our approach is to employ a polynomial interpolation scheme that builds
an interpolant on an isotropic neighborhood of a lattice point. This interpolant agrees
with the terms in Taylor series expansion up to its degree. If the underlying function f is
a polynomial itself (of the same degree as the interpolant), then the unique polynomial
interpolant agrees with the underlying function and the derivative estimation will be
exact. Hence, if f ∈ �≤3(R3), then a local cubic polynomial interpolation at a lattice point
will provide the exact derivative. In this approximation scheme, the partial derivatives are
estimated in a non-separable fashion and one can choose an isotropic combination of
the neighbors of a lattice point.
For a BCC lattice point, there are 8 neighbor points at offsets of (±1,±1,±1),
and 6 neighbor points at offsets of (±2, 0, 0), (0,±2, 0) and (0, 0,±2). The next ring of
neighbors are located at offsets of (±2,±2, 0), (±2, 0,±2) and (0,±2,±2) that together
with the original lattice point form a 27-point neighborhood for it (see Figure 4-2).
Considering that an interpolating polynomial in �3(R3) needs 20 data points, the
27-point neighborhood over-determines the polynomial interpolation problem. The
over-determined system of equations can be solved using a least-squares method (i.e.,
43
Normal equations) which is detailed below. When the original function f ∈ �≤3(R3), the
least-squares solution coincides with f and hence, the derivative estimation becomes
exact.
Let x1, x2, ... , x27 denote the 27-point neighbors of a BCC lattice point. We fit a cubic
polynomial interpolant p(x) = ⟨m, a⟩ (x) on this 27-point neighborhood. Here m is,
again, a vector that contains monomials up to cubics and a denotes the coefficients of p.
Then we can set up an interpolation problem to determine the local polynomial fit to the
function f by solving the minimization problem as:
mina
27∑i=1
wi(fi − p(xi))2. (4–12)
The scalar value, fi , here, is the sample value of the function f at the lattice point xi ,
and wi is a weight that allows us to control the interpolation error at lattice point xi . Let
� denote the interpolation matrix which is of dimension 27 × 20. Then the weighted
least-squares solution to the linear system is given by:
(�TW�)a = �TWy (4–13)
where W is a diagonal matrix with Wi ,i = wi , yi = fi and �i ,j = mj(xi). We can set
Wi ,i = 1 for normal least-squares solution or other choices for weighted least-squares
solution.
Since monomial terms mj(x) are known, and the local coordinates of the 27-point
neighborhood of a BCC lattice point are fixed, we can solve for the coefficients a from
(4–13). When we want to estimate the derivatives of the function at any given lattice
point, we perform a local fit and use the derivatives of that local polynomial at that lattice
point (which is x = 0 in the local coordinate system) as the approximated derivatives.
For example, the estimated first order derivative along x at a given lattice point x is then
approximated by px(0).
44
A B
Figure 4-2. Weights of the finite-differencing kernel on the 27-point neighborhood of theBCC lattice for (A) fx and (B) fyz . The illustrated coefficients are divided by24 in (A) and divided by 72 in (B). The 27-point neighborhood includes thered, blue, green and gray lattice points and excludes the yellow points at thecorners.
When we consider the derivative with respect to x , px(x) = ⟨mx , a⟩ (x), we can
construct the finite-differencing weights by evaluating px(0):
px(0) = bT (�TW�)−1�TWy = Kxy. (4–14)
In this notation, y is a vector that contains the function values from the neighboring 27
points and b = mx(0). The finite differencing kernel Kx is a matrix whose convolution
with y gives the partial derivative of the underlying f with respect to x . The finite
differencing weights for the other partial derivatives can be obtained similarly.
The spatial distribution of finite differencing weights as the kernel values for fx is
shown in Figure 4-2(A) and that of the kernel values for fyz is shown in Figure 4-2(B).
Similarly, by changing the order of the axes, we can get the kernel for other first and
second order derivatives as needed.
45
Table 4-1. The L2 norm error in reconstruction of datasets sampled at a resolution of111× 111× 222 on the BCC lattice using the proposed cubic interpolation.The weighted least-squares approach is designed to use a zero meanGaussian with σ2 = .5 to estimate partial derivative values which showmarginal improvements in terms of reconstruction error.
Dataset LSQE Weighted LSQEML 8.939 8.725
Carp 0.74 0.731Bonsai 5.741 5.721Lobster 8.538 8.467
Intuitively, the lattice points that are closer to the center are more important (with
respect to the residuals in the interpolation conditions in (4–12)) than the lattice points
further from the center. Therefore, one can assign higher weights to the error terms
corresponding to the lattice points closer to the center.
The choice of the weighting function is very flexible. An isotropic choice for the
weighting function is the Gaussian function:
w(xi) = exp
(−∥xi∥2
σ2
), (4–15)
where ∥xi∥ is the distance of the neighbor point xi from the center point, 0, and σ2 is the
variance which defines how fast the weighting function decays. When σ2 is very large,
the weighting function degenerates into a constant function which gives the unweighted
solution. Smaller σ2 value means higher weights are assigned to the residuals closer
to the center. σ2 = .5 showed the minimum error in estimating first and second order
partial derivatives, but improvement in the performance of interpolation using Gaussian
weighted least-squares was marginal in our experiments. Table 4-1 summarizes the
improvements obtained by the weighted least-squares approach.
4.2 Tricubic Interpolation in Cartesian Cubic Lattice
This section provides a summary of the local tricubic interpolation scheme proposed
by Leiken and Marsden in their paper [31]. This scheme uses Hermite data to achieve
full C 1 interpolation of a given function sampled on a CC lattice. Chapter 5 discusses the
46
application of this method in interpolating distance fields sampled on CC lattices. Please
note that this section only serves to briefly describe what has already been proposed in
[31] and the contribution of this thesis is in applying it in the area of distance fields.
Consider a trivariate function f sampled at the vertices of a regular grid. The
interpolant is a piecewise polynomial which can be represented by the general form
p(x , y , z) =∑N
i ,j ,k=0 aijkxiy jzk within each cubic cell of the grid. As the interpolant is
tri-cubic, N takes the value 3 and the polynomial has 64 coefficients given by aijk . These
coefficients must be determined in a way that achieves C 1 continuity across all faces
of the cube. To that end, interpolation constraints are enforced on the values taken by
P and its three first derivatives at the 8 corners of the cube, giving 32 constraints. To
recover 64 coefficients, an additional 32 constraints are required. These constraints
are chosen such that they are isotropic, i.e. invariant under rotation of the axes,
and in a manner that favors smoothness over accuracy. Smoothness is improved
by using interpolating constraints on higher order derivatives of p. Thus, we need 4
higher order derivatives from each corner for the additional constraints. There are
only two such sets that are isotropic. Of these, the set (∂2f
∂x2, ∂2f∂y2
, ∂2f∂z2
, ∂3f∂x∂y∂z
) is linearly
dependent on the first 32 constraints and hence cannot be used, leaving us with the
set ( ∂2f∂x∂y
, ∂2f∂y∂z
, ∂2f∂x∂z
, ∂3f∂x∂y∂z
). Thus, 64 constraints are formulated by restricting the
values taken by the functions in the following set at each corner of the cube to the
corresponding values of f .
(p, ∂p∂x, ∂p∂y, ∂p∂z, ∂2p∂x∂y
, ∂2p∂y∂z
, ∂2p∂x∂z
, ∂3p∂x∂y∂z
)
This gives a linear system of 64 equations with 64 unknown coefficients. This can
be represented in matrix form as Bx = b where ’x ’ is a vector of the 64 coefficients and
’b’ is a vector of the values taken by f and its derivatives at each of the 8 corners of the
cube. As the 64x64 matrix ’B ’ has a determinant of 1, its inverse can be computed and
the linear system can be solved as x = B−1b. This gives the values for the coefficients
from which the interpolant can be constructed.
47
For a detailed description of the method, the motivation behind it and various proofs,
we refer you to [31].
48
CHAPTER 5INTERPOLATION OF DISTANCE FIELDS AND EXPERIMENTS
In Chapter 4, we proposed a piecewise cubic interpolation method on the BCC
lattice and described its construction, smoothness and approximation order. We also
summarized a local tricubic interpolation proposed by Leiken and Marsden in [31]. Both
these methods use Hermite data associated with the function being interpolated, i.e.
the values of the function and its partial derivatives at the respective lattice points. In
Section 4.1.3, we described a finite-differencing technique to estimate the derivatives
of the function from the function values sampled at the lattice points. However, when
the true values of the function derivatives are available, these can be incorporated
directly into both interpolation schemes to provide a more accurate reconstruction of the
underlying surface.
It has already been mentioned that the true gradients of distance fields can be
computed with relative ease. While sampling a distance field, it is possible to also
compute the true first derivatives of the field at the lattice points. This can be done on
both CC and BCC lattices using the technique explained in Section 3.4 and requires
almost no additional computation. The aforementioned interpolation schemes can
then be employed, along with the true derivative values, on these distance fields to
reconstruct or visualize the original function.
In this chapter, we study the effects of applying the proposed interpolation scheme
on distance fields sampled on BCC lattices. These distance fields are constructed at
different resolutions from triangular meshes of various 3D models(Figure 5-1), using
the GPU accelerated sampling method described in Chapter 3. We use a ray caster to
visualize the results of the interpolation. We are also interested in assessing how well
the BCC sampling lattice does compared to the CC lattice in the context of distance
fields. Hence, we sample the same triangular meshes on CC lattices and use the
tricubic scheme to render them. The resolution of the CC and BCC lattices are chosen
49
A B C
D E
Figure 5-1. The triangular meshes used in our experiments. The soccer ball(A) has3,516 triangles, the bunny(D) has 69,666 triangles, the buddha(B) anddragon(C) have 100,000 triangles each and the pawn(E) has 304 triangles.
50
such that the total number of sample points are approximately the same in both cases.
Both the interpolation schemes are employed twice on each dataset, once using true
first order derivatives and once using estimated values. Since true values of the higher
order derivatives are not available, they are always estimated. On BCC, this is done
using the method described in Section 4.1.3 and on CC, a simple finite-differencing
scheme is used. As a base case for comparison, we also render the CC version of each
dataset using Catmull-Rom interpolation.
The results of these experiments are given below, along with relevant observations.
Each set of images are arranged as follows. The first row has the Catmull-Rom image,
the image using cubic interpolation on BCC with true derivatives and the image using
the same scheme with estimated derivatives, in that order. The second row has images
rendered using tricubic interpolation on CC with the first image using true derivatives
and the second using estimated derivatives.
The images in Figure 5-2 were rendered from distance fields sampled from the
soccer ball dataset. The resolutions used were 80x80x80 for CC and 64x64x128 for
BCC. Compared to the Catmull-Rom image, the stitches on the ball appear much
sharper in the two images that use true first derivatives with the interpolation schemes
we are interested in. In the images that use the same interpolation schemes with
estimated derivatives, the stitches are about as blurred as in the Catmull-Rom case, and
these images are comparable in quality to each other.
Images in Figure 5-3 and Figure 5-4 were rendered from the Stanford dragon
dataset at resolutions of 80x80x80 for CC and 64x64x128 for BCC. Notice the scales on
the surface of the body and the fine details on the head in Figure 5-3 and the ridges on
the body just below the head in Figure 5-4. The Catmull-Rom image in Figure 5-4 has a
disconnected surface near the ear as well. These images show that the sharper features
are reproduced much better by the two interpolation methods using true derivative
values. Among the other three images, i.e. the ones that do not use true derivatives, the
51
A B C
D E
Figure 5-2. The soccer ball dataset with approximately 512,000 samples. The stitcheson the ball are significantly sharper in the images using true derivatives(B,D) while the images using estimated derivatives(C, E) are comparable inquality to the Catmull-Rom image(A).
narrow areas behind the head (Figure 5-3) and the ridges on the body (Figure 5-4) are
reproduced better in the BCC image. These phenomena can be observed in Figure 5-5
as well, which shows the Stanford buddha dataset sampled at the same resolutions as
the previous sets. This is a model that has a large amount of fine details on its surface
and the varying accuracy to which these details are reproduced by the different methods
is obvious from the images. The bunny dataset, rendered at resolutions of 85x85x85 for
CC and 68x68x136 for BCC, is shown in Figure 5-3. Here, all CC images show staircase
artifacts on the ear of the bunny. The same areas in the BCC images are mostly artifact
free.
52
A B C
D E
Figure 5-3. The Stanford dragon dataset (side view) with approximately 512,000samples. The details on the head and scales on the body are more visible inthe images using the true derivatives(B, D).
Figure 5-7 shows images rendered from the pawn dataset. These were sampled
at the extremely low resolutions of 32x32x32 for CC and 26x26x52 for BCC. At these
resolutions, it can be seen that the cubic and tricubic schemes using true derivatives do
a much better job of retaining the basic shape of the pawn. Figures 5-8, 5-9, 5-10 and
5-11 show the pawn dataset sampled at increasing resolutions. As the resolution
increases, the difference in quality between the different interpolation schemes
diminishes until all the images are comparable in quality as seen in the final set. This
shows that the advantage of using the true derivatives is more prominent at lower
resolutions where the sampled distance field might not have enough data to produce a
reasonably accurate reconstruction.
53
A B C
D E
Figure 5-4. The Stanford dragon dataset (front view) with approximately 512,000samples. The ridges on the underside of the belly are better reproduced bythe images using the true derivatives(B, D). Moreover, the surface near theear is disconnected in the Catmull-Rom image(A).
54
A B C
D E
Figure 5-5. The buddha dataset with approximately 512,000 samples. Images with truederivatives(B, D) show much more detail compared to the rest.
55
A B C
D E
Figure 5-6. The bunny dataset with approximately 615,000 samples. The staircaseartifacts seen on the ear in the CC images(A, B, C) are mostly absent in theBCC images(D, E).
56
A B C
D E
Figure 5-7. The Pawn dataset with approximately 33,000 samples.
57
A B C
D E
Figure 5-8. The Pawn dataset with approximately 200,000 samples.
58
A B C
D E
Figure 5-9. The Pawn dataset with approximately 260,000 samples.
59
A B C
D E
Figure 5-10. The Pawn dataset with approximately 512,000 samples.
60
A B C
D E
Figure 5-11. The Pawn dataset with approximately 2,095,000 samples.
61
CHAPTER 6CONCLUSION AND FUTURE WORK
In this thesis, we studied sampling and reconstruction of distance fields for surfaces
represented by triangular meshes. We examined the idea of optimal sampling lattices in
this context and discussed the sampling theoretic motivation for using BCC lattices.
For sampling distance fields from triangular meshes, we proposed a GPU
implementation of the brute force approach. Since the brute force approach iterates
over all possible point-triangle pairs, the distance values obtained are exact. While a
CPU implementation of the brute force approach is prohibitively time consuming, our
GPU implementation, based on the CUDA architecture, uses the parallel processing
capabilities of the GPU to achieve significant acceleration. We then discussed ways
to adapt our implementation to BCC lattices. In addition to the distance values, our
implementation also calculates the exact gradients of the distance field at the lattice
points with relatively few additional computations. While our GPU implementation is
fairly basic, we believe that it paves the way for future implementations that can achieve
even higher speedups by utilizing the latest GPU hardware and architectures, and by
using accelerating techniques like hierarchical space partitioning.
We introduced a local cubic interpolation scheme on the BCC lattice that can be
used to reconstruct and visualize discrete distance fields. The constructed splines
are exactly interpolating at the lattice points and the cubic spline space leads to a
fourth-order method with C 1 continuity. Our interpolation scheme utilizes the exact
derivative values available for distance fields to give a more accurate reconstruction.
Where exact values of the derivatives are not available, it uses a finite-differencing
scheme to estimate the derivatives, which provides a generalization to Catmull-Rom
splines in the non-separable setting. We also devised a finite-differencing scheme on
the BCC lattice that guarantees the order of accuracy. Unlike super-splines, our spline
construction is possible without introducing intermediate (i.e. non-lattice) points. Its
62
low degree has advantages from the polynomial fitting point of view. It is also simple to
implement and efficient to compute.
Finally, we evaluated the merits of using the BCC lattice to sample distance fields by
conducting a series of experiments on distance fields sampled on CC and BCC lattices
of comparable density (i.e. number of samples). The measure of quality used was the
visual quality of images rendered from these distance fields. Our cubic interpolation
was used to reconstruct the samples on the BCC lattice while both Catmull-Rom and
the tricubic interpolation method (Leiken and Marsden [31]) were used on the CC
lattice. We also examined the effects of using exact derivative values on the quality of
reconstruction. Our experiments showed that the BCC datasets using true derivatives
were able to reproduce sharp details on the original surfaces much more faithfully
than the Catmull-Rom method on the CC lattice. Without exact derivatives, the BCC
datasets produced images of quality comparable to, or better than, both Catmull-Rom
and tricubic(without exact derivatives) images.
63
REFERENCES
[1] Cuda programming guide 3.1, 2010. URL http://developer.download.nvidia.
com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.
pdf.
[2] J. Bærentzen. Volumetric Manipulations with Applications to Sculpting. PhD thesis,IMM, Technical University of Denmark, 2001.
[3] J. A. Bærentzen and H. Aanæs. Signed distance computation using the angleweighted pseudonormal. IEEE Transactions on Visualization and ComputerGraphics, 11:243–253, May 2005. ISSN 1077-2626. doi: http://dx.doi.org/10.1109/TVCG.2005.49. URL http://dx.doi.org/10.1109/TVCG.2005.49.
[4] I. Bomze, V. Demyanov, R. Fletcher, T. Terlaky, and I. Polik. Nonlinear Optimization:Lectures Given at the CIME Summer School Held in Cetraro, Italy, July 1-7, 2007.Springer Verlag, 2010. ISBN 3642113389.
[5] D. Breen and R. Whitaker. A level-set approach for the metamorphosis of solidmodels. IEEE Transactions on Visualization and Computer Graphics, 7(2):173–192,2002. ISSN 1077-2626.
[6] D. Breen, S. Mauch, and R. Whitaker. 3D scan conversion of CSG models intodistance volumes. In Proceedings of the 1998 IEEE Symposium on VolumeVisualization, pages 7–14. ACM, 1998. ISBN 1581131054.
[7] R. L. Burden and J. D. Faires. Numerical analysis. Prindle, Weber and Schmidtseries in mathematics. PWS-Kent Pub. Co., pub-PWS-KENT:adr, fifth edition, 1993.ISBN 0-534-93219-3.
[8] D. Cohen-Or, A. Solomovic, and D. Levin. Three-dimensional distance fieldmetamorphosis. ACM Transactions on Graphics (TOG), 17(2):116–141, 1998.ISSN 0730-0301.
[9] B. Csebfalvi. Prefiltered Gaussian reconstruction for high-quality rendering ofvolumetric data sampled on a body-centered cubic grid. In Visualization, 2005. VIS05. IEEE, pages 311–318. IEEE, 2005. ISBN 0780394623.
[10] B. Csebfalvi. An evaluation of prefiltered B-spline reconstruction forquasi-interpolation on the body-centered cubic lattice. Visualization and Com-puter Graphics, IEEE Transactions on, 16(3):499–512, 2010. ISSN 1077-2626.
[11] B. Csebfalvi and M. Hadwiger. Prefiltered B-spline reconstruction forhardware-accelerated rendering of optimally sampled volumetric data. In Vi-sion, modeling, and visualization 2006: proceedings, November 22-24, 2006,Aachen, Germany, page 325. IOS Press, 2006. ISBN 3898380815.
64
[12] O. Cuisenaire. Distance Transformations: Fast Algorithms and Applications toMedical Image Processing. PhD thesis, Catholic University of Leuven, Belgium,October 1999.
[13] C. de Boor. Quasi-interpolants and approximation power of multivariate splines.Computation of curves and surfaces, pages 313–345, 1990.
[14] D. H. Eberly. 3D Game Engine Design, Second Edition: A Practical Approach toReal-Time Computer Graphics (The Morgan Kaufmann Series in Interactive 3DTechnology). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006.ISBN 0122290631.
[15] A. Entezari. Optimal sampling lattices and trivariate box splines. PhD thesis, SimonFraser University, Vancouver, Canada, July 2007. URL http://www.cise.ufl.edu/
~entezari/research/docs/dissertation.pdf.
[16] A. Entezari, R. Dyer, and T. Moller. Linear and cubic box splines for the bodycentered cubic lattice. In Proceedings of the conference on Visualization ’04, VIS’04, pages 11–18, Washington, DC, USA, 2004. IEEE Computer Society. ISBN0-7803-8788-0. URL http://dx.doi.org/10.1109/VISUAL.2004.65.
[17] A. Entezari, M. Mirzargar, and L. Kalantari. Quasi-interpolation on the bodycentered cubic lattice. In Computer Graphics Forum, volume 28, pages 1015–1022.John Wiley & Sons, 2009.
[18] A. Entezari, D. Van De Ville, and T. Moller. Practical box splines for reconstructionon the body centered cubic lattice. Visualization and Computer Graphics, IEEETransactions on, 14(2):313–328, March-April 2008. ISSN 1077-2626.
[19] B. Finkbeiner, A. Entezari, D. Van De Ville, and T. Moller. Efficient volume renderingon the body centered cubic lattice using box splines. Computers & Graphics,34(4):409 – 423, 2010. ISSN 0097-8493. doi: DOI:10.1016/j.cag.2010.02.002.URL http://www.sciencedirect.com/science/article/B6TYG-4YDYSHM-1/2/
3641c9180893327a580f70549c1dd9c9. Procedural Methods in Computer Graphics;Illustrative Visualization.
[20] S. Frisken, R. Perry, A. Rockwood, and T. Jones. Adaptively sampled distancefields: A general representation of shape for computer graphics. In Proceed-ings of the 27th annual conference on Computer graphics and interactive tech-niques, pages 249–254. ACM Press/Addison-Wesley Publishing Co., 2000. ISBN1581132085.
[21] N. Gagvani and D. Silver. Parameter-controlled volume thinning. Graphical Modelsand Image Processing, 61(3):149–164, 1999. ISSN 1077-3169.
[22] A. Gueziec. Meshsweeper: Dynamic Point-to-Polygonal Mesh Distance andApplications. IEEE transactions on visualization and computer graphics, 7(1):47,2001.
65
[23] C. Hamitouche, L. Ibanez, and C. Roux. Discrete Topology of (A n*) OptimalSampling Grids. Interest in Image Processing and Visualization. Journal ofMathematical Imaging and Vision, 23(3):401–417, 2005. ISSN 0924-9907.
[24] M. Harris. Optimizing parallel reduction in cuda. NVIDIA Devel-oper Technology, 2008. URL http://www.mendeley.com/research/
optimizing-parallel-reduction-cuda/.
[25] J. Helmsen, E. Puckett, P. Colella, and M. Dorr. Two new methods for simulatingphotolithography development in 3D. In Proceedings of SPIE, the Interna-tional Society for Optical Engineering, volume 2726, pages 253–261. Societyof Photo-Optical Instrumentation Engineers, 1996.
[26] K. Hoff, A. Zaferakis, M. Lin, and D. Manocha. Fast 3d geometric proximity queriesbetween rigid and deformable models using graphics hardware acceleration.UNC-CH Technical Report TR02-004, 2002.
[27] M. Jones, J. Bærentzen, and M. Sramek. 3D distance fields: A survey oftechniques and applications. IEEE Transactions on Visualization and ComputerGraphics, pages 581–599, 2006. ISSN 1077-2626.
[28] T. Ju, F. Losasso, S. Schaefer, and J. Warren. Dual contouring of hermite data.ACM Transactions on Graphics (TOG), 21(3):339–346, 2002. ISSN 0730-0301.
[29] R. Kimmel, N. Kiryati, and A. Bruckstein. Multivalued distance maps for motionplanning on surfaces with moving obstacles. Robotics and Automation, IEEETransactions on, 14(3):427–436, 1998. ISSN 1042-296X.
[30] L. Kobbelt, M. Botsch, U. Schwanecke, and H. Seidel. Feature sensitive surfaceextraction from volume data. In Proceedings of the 28th annual conference onComputer graphics and interactive techniques, pages 57–66. ACM, 2001. ISBN158113374X.
[31] F. Lekien and J. Marsden. Tricubic interpolation in three dimensions. Journal ofNumerical Methods and Engineering, 63:455–471, 2005.
[32] J. Lengyel, M. Reichert, B. Donald, and D. Greenberg. Real-time robot motionplanning using rasterizing computer graphics hardware. ACM SIGGRAPH Com-puter Graphics, 24(4):327–335, 1990. ISSN 0097-8930.
[33] W. Lorensen and H. Cline. Marching cubes: A high resolution 3D surfaceconstruction algorithm. In Proceedings of the 14th annual conference on Com-puter graphics and interactive techniques, pages 163–169. ACM, 1987. ISBN0897912276.
[34] S. Mauch. A fast algorithm for computing the closest point and distance transform.Go online to http://www. acm. caltech. edu/seanm/software/cpt/cpt. pdf, 2000.
66
[35] S. Mauch. Efficient Algorithms for Solving Static Hamilton-Jacobi Equations. PhDthesis, California Institute of Technology, 2003.
[36] T. Meng, B. Smith, A. Entezari, A. E. Kirkpatrick, D. Weiskopf, L. Kalantari, andT. Moller. On visual quality of optimal 3D sampling and reconstruction. In GraphicsInterface 2007, May 2007.
[37] T. Morvan, M. Reimers, and E. Samset. High performance GPU-based proximityqueries using distance fields. In Computer Graphics Forum, volume 27, pages2040–2052. John Wiley & Sons, 2008.
[38] J. Mullikin. The vector distance transform in two and three dimensions. CVGIP:Graphical Models and Image Processing, 54(6):526–535, 1992. ISSN 1049-9652.
[39] P. Novotny, L. Dimitrov, and M. Sramek. CSG operations with voxelized solids. InComputer Graphics International, 2004. Proceedings, pages 370–377. IEEE, 2005.ISBN 0769521711.
[40] T. Park, S. Lee, J. Kim, and C. Kim. CUDA-based Signed Distance Field Calculationfor Adaptive Grids. In Computer and Information Technology (CIT), 2010 IEEE 10thInternational Conference on, pages 1202–1206. IEEE, 2010.
[41] B. Payne and A. Toga. Distance field manipulation of surface models. ComputerGraphics and Applications, IEEE, 12(1):65–71, 1992. ISSN 0272-1716.
[42] R. Satherley and M. Jones. Vector-city vector distance transform. Computer Visionand Image Understanding, 82(3):238–254, 2001. ISSN 1077-3142.
[43] C. Sequin. Procedural spline interpolation in unicubix. In Proc. of the 3rd USENIXComputer Graphics Workshop, Monterey, CA, pages 63–83, 1987.
[44] J. Sethian. A fast marching level set method for monotonically advancing fronts.Proceedings of the National Academy of Sciences of the United States of America,93(4):1591, 1996.
[45] J. Sethian. Level Set Methods and Fast Marching Methods. Cambridge Mono-graphs on Applied and Computational Mathematics, 1999.
[46] C. Sigg, R. Peikert, and M. Gross. Signed Distance Transform Using GraphicsHardware. In Proceedings of the 14th IEEE Visualization 2003 (VIS’03), pages 12–.IEEE Computer Society, 2003. ISBN 0769520308.
[47] A. Sud, M. Otaduy, and D. Manocha. DiFi: Fast 3D distance field computation usinggraphics hardware. In Computer Graphics Forum, volume 23, pages 557–566.John Wiley & Sons, 2004.
[48] E. Sundholm. Distance fields accelerated with opencl, 2010.
67
[49] M. Teschner, S. Kimmerle, B. Heidelberger, G. Zachmann, L. Raghupathi,A. Fuhrmann, M. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser, et al.Collision detection for deformable objects. In Computer Graphics Forum,volume 24, pages 61–81. Wiley Online Library, 2005.
[50] T. Theußl, T. Moller, and M. Groller. Optimal regular volume sampling. In Visualiza-tion, 2001. VIS’01. Proceedings, pages 91–546. IEEE, 2009. ISBN 0780372018.
[51] G. Thurmer and C. Wuthrich. Computing vertex normals from polygonal facets.Journal of Graphics Tools, 3(1):43–46, 1998. ISSN 1086-7651.
[52] J. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactionson Automatic Control, 40(9):1528–1538, 1995. ISSN 0018-9286.
68
BIOGRAPHICAL SKETCH
Nithin Pradeep Thazheveettil received his bachelor’s in Electrical and Electronics
Engineering from TKM College of Engineering, Kollam, India in 2005. He then worked
as a software developer at Tata Consultancy Services Ltd, Bangalore, India for 3 years
before joining University of Florida, Gainesville to do his MS in computer engineering.
He has been doing research in the field of Computer Graphics as part of his master’s
program and is expected to graduate in May 2011.
69