no load-balancing - crtc.cs.odu.edu · decomp ose the original meshing problem in to smaller...

14

Upload: hoangkien

Post on 09-Nov-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

PARALLEL GUARANTEED QUALITY DELAUNAY MESHGENERATION AND REFINEMENT: CURRENT STATUSv1.0 1/18/2003Nikos Chriso hoides{{ Computer S ien e DepartmentCollege of William and MaryWilliamsburg, VA 23185Abstra t: Unstru tured mesh generation and re�nement poses a number ofdi� ult te hni al hallenges, both in terms of algorithms and software. Duringthe last �ve years we have dire ted our resear h e�ort on mesh quality andparallel mesh generation and re�nement. This paper des ribes a summary ofour results, open problems and future dire tions for parallel guaranteed qualityDelaunay mesh generation and re�nement.Key Words: Parallel, Mesh, Generation, Re�nement, Quality, Delaunay.Introdu tionParallel mesh generation methods de ompose the original meshing problem intosmaller subproblems that an be solved (i.e., meshed) in parallel. The require-ments for the solution of the subproblems on distributed memory omputers are:(1) stability, distributed meshes should retain the good quality of elements andde omposition properties of the sequentially generated and partitioned meshes,(2) s alability (3) fault-toleran e, and (4) ode re-use, in order to leverage fromthe ever evolving basi sequential meshing te hniques. The design and imple-mentation of well tested and �ne tuned adaptive mesh re�nement odes is avery expensive and time onsuming task. The same task be omes orders ofmagnitude more omplex in the ase of stable, s alable and guaranteed-qualitymesh generation.In order to develop ost-e�e tive (i.e., high ode re-use), stable and s alableguaranteed-quality parallel mesh generation odes we are experimenting withthree di�erent approa hes: (1) parallelization of existing sequential guaran-teed quality Delaunay methods, (2) parallel Constrained Delaunay meshing te h-niques, and (3) parallel Domain De oupling approa hes. In the rest of the paperwe present a short overview on the urrent state-of-the-art parallel methods formesh generation and then we will brie�y des ribe the three approa hes we de-veloped at the College of William and Mary in ollaboration with our olleaguesat Cornell University in the ontext of ra k propagation proje t [5℄.{nikos� s.wm.edu

1 Parallel Mesh Generation: An OverviewIn [15℄ parallel mesh generation methods, for distributed memory omputersare lassi�ed in terms of the way and the order the arti� ial boundary surfa es(interfa es) of the subproblems are meshed. Spe i� ally, parallel methods are lassi�ed in three large lasses: (i) the methods that mesh simultaneously theinterfa es as they mesh the individual subproblems [14, 7, 12℄, (ii) methods that�rst mesh (in parallel or sequentially [22℄) the interfa es of the subproblems andthen mesh in parallel the individual subproblems, and (iii) the methods that �rstsolve the meshing problem in ea h of the subproblems in parallel and then meshthe interfa es [16℄ so that the global mesh is onsistent. Other lassi� ationsare possible, for example in [19℄ Lohner et al. lassify parallel mesh generationmethods in terms of the basi sequential meshing te hniques used to mesh theindividual subproblems.In [10℄ we evaluate and lassify parallel mesh generation methods in terms of twofundamental issues: on urren y and syn hronization that a�e t the stability,performan e, and ode re-use of parallel meshing methods. The level ( oarse-grain or �ne-grain) at whi h on urren y is explored and the type (lo al orglobal) syn hronization is required, determine the degree ode re-use is possibleand vise-versa. For example, high ode re-use of inherently tightly oupledmethods, it redu es opportunities for on urren y at a �ne-grain level [22℄ orrequires frequent global syn hronization [19℄.Con urren y and syn hronization are independent o� implementation hoi eslike ommuni ation paradigm (shared memory or message passing) and pro-gramming model (data-parallel or task-parallel) of parallel and distributed meshgeneration odes. And therefore on urren y and syn hronization an be usedas key attributes to evaluate parallel mesh generation methods. Parallel meshgeneration methods an explore on urren y in either a oarse-grain level (i.e.,subdomain or submesh level of the geometry to be meshed) or a �ne-grain level(node or element level of the mesh) and they syn hronize either globally (allpro essors are involved) or lo ally (only small set of pro essors is involved).2 Parallel Guaranteed Quality Delaunay MeshGenerationMost of the traditional parallel mesh generation methods explore on urren yat a oarse-grain level using stop-and-repartition methods whi h are based onglobal syn hronization �a major sour e of overhead for large-s ale parallel ma- hines (see Figure 1). Also, traditional parallel mesh generation methods solvethe mesh generation and partitioning problems separately �this leads to anunne essary and expensive memory a ess overheads, sin e parallel mesh gen-eration is NOT a omputation intensive appli ation. Moreover the boundary(interfa es) of the subproblems are �xed �for some algorithms this a�e ts thestability of the parallel mesh.In [13, 20℄ we presented the �rst provable 3-dimensional (3D) parallel guaranteed

0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120nodes

0

200

400

600

800

1000

1200T

ime (

sec)

JmeshInitSyncIdleCommunicationHandlerPackDebugIO

No load-balancing168K tets, 640 subdomains

0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120nodes

0

200

400

600

800

1000

1200

1400

Tim

e (

sec)

JmeshCommunicationSynchronization

Stop-and-repartition load-balancing168K tets, 640 subdomains, 5 synchronizations

Figure 1: Performan e data from parallel advan e front te hnique [21℄ on 128pro essor luster. In the left graph, the blue olor depi ts the y les lost dueto work-load imbalan e due to re�nement. In the right graph, the red and ma-genta depi t the y les spend in ommuni ation (due to data movement) andglobal syn hronization for traditional stop-and-repartition parallel mesh gener-ation methods. The green olor in both graphs depi ts the a tual omputationtime of the advan e front mesh generation.quality Delaunay mesh (PGQDM) generation and re�nement algorithm andsoftware for polyhedral domains. The PGQDM mathemati ally guarantees thegeneration of tetrahedra with ir umradius to shortest edge ratio less than 2, aslong as the angle separating any two in ident segments and/or fa ets is between90o and 270o degrees. The PGQDM kernel is based on existing sequentialalgorithms [23, 9℄.The PGQDM algorithm: (1) explores on urren y at both oarse-grain and�ne-grain levels in order to tolerate ommuni ation and lo al syn hronizationlaten ies, (2) ouples the mesh generation and partitioning problems into asingle optimization problem in order to eliminate redundant memory operations(loads/stores) from and to a he, lo al & remote memory, and dis s, (3) allowsthe submesh interfa es indu ed by an element�wise partitioning of an initialmesh of the domain to hange as the mesh is re�ned in order to mathemati allyguarantee the quality of the elements.Figure 2a depi ts an initial de omposition of a oarse mesh whi h is used fordata distribution purposes only. Figure 2b shows that the Bowyer-Watson [4, 25℄ avity extend beyond the submesh interfa es in order to guarantee the qualityof the mesh. The extension of the avity beyond the interfa es is a sour e ofintensive ommuni ation. However as Figure 2 shows we an tolerate almost

P

Submesh M

Submesh M

Submesh M

I12

I01

I02

t

t*

o 2

1(a) (b)Service Remote Data Gather

Service Remote Data Gather

ExpandACEI

Partial Cavity FGBCA

Triangulate FGHBCA

Submesh Mo

Submesh M 1 Submesh M 2

Service Remote Completion

Poll

Poll

Poll

Expand

FGBRequest cavity expansion

ServiceRemote Scater

Request cavity expansion

.

.

.

Submesh M

Submesh M

Submesh M

I12

I01

I02

t

t*

2

1A

B

CD

E F

G

H

I

Pk

3

o

Pi

( ) (d)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Processor Number

0

10

20

30

40

50

60

70

80

90

Tim

e (

s)

TerminationPollingReceivesSetbacksEncroachedW/o activeW/active

Meshing time distributiontee2:16 procs:2M elements:SMGP0

P

Submesh 2

Submesh 1

Submesh 0

(e) (f)Figure 2: a) Initial data distribution, b) avity extension beyond the submeshinterfa es, ) time diagram with on urrent point insertion for tolerating om-muni ation laten ies, (d) an example that depi ts a sour e of mesh in onsisten y(e) performan e data that show setba ks due to elimination of mesh in onsis-ten ies and �nally (f) the re�nement of a avity with simultaneous distributionof the newly reated elements.

90% of the ommuni ation by on urrently re�ning other regions of the sub-meshes while we wait for remote data to arrive. Unfortunately, the on urrentre�nement an reate a number of in onsisten ies in the mesh as the Figure 2dshows. These in onsisten ies are resolved at the ost of setba ks and frequentmessage polling shown in Figure 2e. At a ommuni ation ost the PGQDM it an a�ord to be domain de omposition independent and it allows the distribu-tion of the new elements as they are generated (Figure 2f). Both hara teristi seliminate the need to solve a hallenging partitioning problem before, during orafter parallel meshing.Our performan e data from a 16 node SP2 and 32 node Cluster of Spar Work-stations suggest that between 80% and 90% (in some instan es even more) ofthe ommuni ation laten y an be masked e�e tively at the ost of in reasing ommuni ation overhead up to 20% of the total run time. Despite the in reasein the ommuni ation overhead the laten y-tolerant PGQDM an generate ele-ments for parallel �nite element PDE solvers many times (it depends on the sizeof the generated mesh) many times faster than the traditional approa h whi his based on the sequential generation and parallel partition and pla ements ofelements.In PGQDM although we mask most of the ommuni ation and syn hronizationlaten ies, the ommuni ation overhead ( y les for pushing and pulling messagesfrom the network) is around 50% of the total runtime. In summary, PGQDMis stable, but it is ommuni ation intensive with limited ode re-use. Next, Ides ribe a stable method we developed whi h eliminates lo al syn hronizationand minimizes ommuni ation as it improves ode re-use.3 Parallel Constrained Delaunay Mesh Genera-tionContrary to PGQDM whi h ignores the interfa e edges or fa es, with the Par-allel Constrained Delaunay Mesh (PCDM) generation method [7℄ ea h submeshis treated as a mesh de�ned by external boundary and onstrained edges (orfa es in 3D) whi h are the edges of the interfa es between any pair of adja entsubmeshes. PCDM simpli�es and minimizes the ommuni ation required forthe parallel generation of unstru tured meshes.Intuitively, the Constrained Delaunay Mesh (CDM) is as lose as one an get tothe Delaunay triangulation given that one needs to handle ertain ( onstrained)edges. This idea an be des ribed mathemati ally [8℄ and leads to Delaunay-based mesh generators that allow internal boundaries. It has be shown [8℄ thatthese internal boundaries do not a�e t the quality of the resulting mesh for2-dimensional (2D) planar domains. Although an observant user might inferthe presen e of su h boundaries in the resulting mesh by noting the way inwhi h triangle edges are aligned. Using the idea of a CDT, we an introdu e inthe mesh arti� ial onstrained (interfa e) edges whi h de ompose the mesh intosubmeshes that an be meshed independently.

Processor 0

Processor 1

x

Processor 0

Processor 1

x(a) (b)Processor 0

Processor 1

x

x

Processor 0

Processor 1

x

x

( ) (d)Processor 0

Processor 1

x

(e) (f)Figure 3: Pro essor P1 inserts a new point (a) whi h is en roa hing upon aninterfa e edge (b). Then P1 dis ards the new point and inserts the midpoint ofthe en roa hed edge( ) while at the same time it sends a request to split thesame interfa e edge on pro essor P0. Pro essor P0 omputes the avity of themidpoint (d). The triangulation of the avities (d) and (e) of the midpoint ofthe interfa e edge result into a new onsistent and distributed Delaunay (in theCDT sense) triangulation whi h guarantees the quality of the elements.

The interfa es between submeshes must be reated during a prepro essing (do-main de omposition) step. The domain de omposition problem is not trivialsin e the interfa es must satisfy ertain properties:� The interfa es should not reate any new features like segments and angleswhi h are smaller in size than the features of the original geometry.� The interfa es should minimize surfa e to volume ratio and onne tivitybetween the submeshes.The interfa es do not have to be simple line segments. The boundaries anbe polylines or even urves, although line segments are probably su� ient formost situations. For planar geometries we an a hieve su h de omposition usingMedial Axis Transform (MAT). Figure 3 depi ts the MAT of the ross se tionof a pipe and the orresponding de omposition (see Figure 3 ).Note that, by the de�nition of the CDM, points on one side of the interfa eshave no e�e t on triangles on the other side; thus, no syn hronization is requiredduring the element reation pro ess. In addition, inter-pro ess ommuni ationis tremendously simpli�ed: the only message between adja ent pro esses is ofthe form, �Split this interfa e (i.e., onstrained) edge� if a newly inserted pointen roa hes upon an interfa e edge (i.e., violates the Delaunay property of aDelaunay edge). Sin e interfa e edges always split exa tly in half, no additionalinformation needs to be ommuni ated.Although PCDM eliminates the syn hronization and minimizes the ommuni- ation of parallel guaranteed quality Delaunay mesh generation, the PCDM for3D domains it is still an open problem. The ode re-use for PCDM is not ashigh as we would like it to be.4 Parallel Domain De oupling Delaunay MeshGenerationIn order to maximize ode re-use and be able to leverage from the ever evolv-ing and mature te hnologies in sequential meshing we developed the ParallelDomain De oupling Delaunay (PD3) method [18℄. PD3 works for 2D domainsand we are working for its extension on 3D domains.The PD3 based on the idea of de oupling the individual submeshes so that they an be meshed independently with zero ommuni ation and syn hronization. Inthe past similar attempts to parallelize Delaunay triangulations and implementDelaunay based mesh generation presented in [3, 17℄. However, in [18℄ we solvesome of the drawba ks and improve upon the previously published methods.The two-dimensional, divide-and- onquer Delaunay triangulation (DCDT) al-gorithm and its parallel implementation presented in [3℄ assumes that all pointsare known at the beginning of the triangulation. This is not a pra ti al as-sumption for mesh generation and re�nement, be ause new points are insertedon demand in order to improve element quality, perform mesh re�nement, andre overy of boundary edges and fa es.

(a) (b)( ) (d)Figure 4: a) Initial boundary onforming mesh used to ompute the MedialAxis Transformation (b) whi h in turn is used to a hieve high quality domainde omposition ( ). For PD3 the interfa es of the subdomains are re�ned (d) ina pre-pro essing step.The Parallel Proje tive Delaunay Meshing (P 2DM) method presented in [17℄ re-quires pre-pro essing of the domain in order to ompute sequentially the properDelaunay separators. This by itself is not a major problem sin e many methodsrequire some pre-pro essing before the parallel mesh generation. However thePD2M method after that pre-pro essing might su�er setba ks and start all-overagain. The setba ks are in the form of dis arding the mesh reated be ause theseparators do not always guarantee the Delaunay triangulation (i.e., they arenot always Delaunay admissible) of the domain as new points are inserted [17℄.The se ond problem is that the dynami load balan ing of the parallel meshis based on predi tion of the omputational load of the pro essors. However,the omputation of Delaunay meshing is variable and the workload predi tionwihtout major orre tions (during the mesh re�nement) is very di� ult andina urate.In summary both methods have two main problems, proper de omposition ofthe domain in the ontext of adaptivity and dynami load balan ing. The onstru tion of de ompositions that an de ouple the mesh and the proof thatthese de ompositions lead to stable parallel meshes is a hallenging problem.We use a domain de omposition method whi h is based on the partition of a

Table 1: Preliminary data on the number of elements and exe ution time (inse onds) for generating very large meshes using 1 to 128 pro essors.Pro s 1 8 16 32 48 64 128Domains 12 96 192 384 576 768 1200# elems 19M 150M 300M 600M 934M 1.2B 1.6BDe . Time 0.20 0.31 0.35 0.54 0.90 1.27 3.08Mesh Time 124.18 131.57 132.57 135.04 136.78 135.81 106.31Total time 124.38 131.88 132.92 135.58 137.68 137.08 109.39weighted graph and an approximation of the MAT. Before the onstru tion ofthe graph, the initial mesh is pre-pro essed in order to prevent the reationof �bad� angles [18℄ between either the interfa es themselves or between theinterfa es and the external boundary of the domain. Then the pre-pro essedmesh is used to onstru t a dual weighted graph of the sides of the triangles.This graph is simply onne ted and its partition provides a de omposition ofthe domain.After the de omposition of the domain, the PD3 onstru ts a �zone� aroundthe interfa es of the submeshes. In [18℄ we prove that Delaunay meshers willnot insert any new points within this zone i.e., the sequential Delaunay mesh onthe individual submeshes it an terminate without inserting any new points onthe interfa es and thus eliminate ommuni ation and ode modi� ations of thesequential odes. So the problem of parallel meshing is redu ed into a �proper�domain de omposition and a dis retization of interfa es. Of ourse one hasto show that: (1) the domain is not over-re�ned be ause of a prede�ned dis- retization and (2) the quality of the elements is maintained. Our preliminaryexperimental data indi ate that over-re�ned is not a major fa tor of ine� ien yin the geometries we tested our method. Similarly the high quality of the ele-ments is preserved.At the end of the parallel mesh generation pro ess some ommuni ation is re-quired to ompute the global topology of the mesh. Finally, the PD3 a hieves100% ode re-use.5 Parallel ImplementationProviding adequate software support for adaptive mesh re�nement odes is a hallenging task even for sequential implementations. The program omplex-ity, for parallel mesh generation odes that are s alable and stable, in reasesby an order of magnitude, due their dynami and irregular omputation, om-muni ation and syn hronization requirements. In order to handle the software omplexity of managing task parallelism we developed a Portable Runtime En-vironment for Multipro essor Appli ations (PREMA).

5.1 Parallel Runtime Environment for Multipro essor Ap-pli ationsPREMA's design obje tive is to assist mesh generation pra titioners to developparallel meshing odes in an evolutionary (in remental) way starting from welltested and �ne-tuned sequential odes (preferable written in C or C++ pro-gramming languages). PREMA supports a Simple Mesh Partitioning Library(SMPL) that runs on a single pro essor and gets as input a mesh whi h is de-�ned in the standard format that most of the publi domain Finite ElementAnalysis odes use. SMPL generates as output the submeshes using the sameformat, but at the end appends adja en y information at the subdomain level.The next two layers handle ommuni ation [2, 11℄. The Data Movement andControl Substrate (DMCS) implements one-sided low-laten y ommuni ation.DMCS is built on top of low-level, vendor-spe i� ommuni ation subsystemsthat bypass the operating system and a ess the network interfa e dire tly inorder to minimize ommuni ation startup overheads. Also, it is implementedon widely available libraries like MPI for lusters of workstations and PCs.DMCS adds a small overhead (less than 10%) to the ommuni ation operationsprovided by the underline ommuni ation system. In return DMCS providesa �exible and easy to understand appli ation program interfa e for one-sided ommuni ation operations in luding put-ops and remote pro edure alls. Fur-thermore, DMCS is designed so that it an be easily ported and maintained bynon-experts. The se ond ommuni ation omponent is the Mobile Obje t Layer(MOL). MOL supports one-sided ommuni ation in the ontext of data (sub-mesh) or obje t mobility. Spe i� ally, MOL uses global logi al name spa e andsupports automati message forwarding in order to ease the implementation ofdynami load balan ing methods for adaptive appli ations on distributed mem-ory ma hines. MOL is implemented on top of DMCS and adds another 10% to15% overhead. In return the appli ation programmer an perform data (sub-mesh) or obje t migration, for load balan ing, without hanging a single line of ode.Finally, the third layer implements the dynami Load Balan ing Library (LBL)on top of DMCS and MOL. LBL supports both expli it and impli it load bal-an ing methods. LBL uses the abstra tion of S hedulable Obje t (whi h some-times is alled Work Unit) to represent appli ation-de�ned data obje ts likethe submeshes, for the ase of dis rete DD methods. The S hedulable Obje t(SO) is the smallest unit of granularity that is managed by the runtime system.Balan ing pro essors workload or memory (in the ase of adaptive re�nement)results in moving one or more S hedulable Obje ts. The SOs are migrated au-tomati ally by the Load Migrator whi h is responsible for un-installing, moving,and installing SOs using MOL so that messages to them will automati ally for-warded. The de ision making for whi h SO and where to migrate is handledby the S heduler. The S heduler supports a wide spe trum of load-balan ingmethods and it is des ribed in detail in [1℄. In the next version of the LBLtools like Load Monitor and System Monitor will provide to the s heduler allthe systems (pro essors and network) information it needs for e�e tive dynami

0 8 16 24 32nodes

0

1000

2000

3000

4000T

ime (

sec)

JmeshInitSyncIdleCommunicationHandlerMETISPackDebugIO

No load-balancing168K tets, 640 subdomains

0 8 16 24 32nodes

0

1000

2000

3000

Tim

e (

sec)

JmeshInitSyncIdle

Explicit load-balancing168K tets, 640 subdomains

Figure 5: Performan e data from parallel advan e front te hnique [21℄ on 32pro essor Sun luster. The expli it method is based on work-stealing [1℄ anddoes not require global syn hronization.load balan ing.By using PREMA parallel mesh pra titioners an be assisted in many di�erentlevels. First, they an ustomize to their ode requirements one-sided messagepassing operations like put_op, where op is user de�ned fun tion. We found thatthis fun tionality simpli�es ode omplexity substantially, sin e the appli ationprogrammer does not have to on ern about what to do when a message arrives.Se ond, they know that at some point they have to implement mesh re�nementi.e., they know that they will have to move submeshes (obje t) among thepro essors for load balan ing purposes, they an use MOL messages whi h �ndthe obje t independently o� its physi al lo ation. Third, the programmer anget load balan ing (and in the future out-of- ore implementation), for free, byusing the LBL's S heduler.Finally on e the parallel meshing ode is build and it is ready for produ tion,the mesh pra titioners an �ne tune the performan e by dire tly using LoadMigrator and for performing expli it load balan ing and swapping of SOs (i.e.,submeshes). This in remental or evolutionary approa h is des ribed in greatdetail in [1℄, for parallel Constrained Delaunay meshing. Figure 5 depi ts someinitial performan e data from balan ing a parallel advan e front method on 32pro essors. More and better methods need to be developed for larger pro essors on�gurations. Fortunately the plug-and-play software design of PREMA makesthis task very easy. From the last 15 years of experien e we have seen that thereis no single dynami load balan ing method for all appli ations or even parallelmesh generation te hniques.

Con lusionsWe have des ribed three parallel guaranteed quality Delaunay mesh generationmethods; the PGQDM for 3D polyhedral domains and two methods (PCDMand PD3) for 2D planar geometries. In addition we brie�y des ribed a parallelruntime system, PREMA, we used to implement these methods. PREMA anbe used to solve the dynami load balan ing problem. Thus PREMA allowsus to fo us on yet another very important and still open problem for parallelmeshing, the domain de omposition problem.The PGQDM is the �rst parallel provable 3D meshing method and it is inde-pendent of the quality of the domain de omposition. Its disadvantage are: (1) itinvolves omplex data stru tures for maintaining the onsisten y and Delaunayproperties of the distributed mesh, (2) it requires high ommuni ation whi h af-fe ts redu es its �xed speedup to O(logP), P is the number of pro essors and (3)it has zero ode re-use. However, despite these di� ulties it is the only methodthat does not dependent on domain de omposition and thus we are working [24℄to alleviate some of the problems we mentioned above and extended it for 3Dgeometries with urved boundary.Contrary to PGQDM the PCDM is easier to implement with an order of mag-nitude less ommuni ation and with linear speedup, but it depends on highquality domain de ompositions. Current te hniques for domain de ompositionare not apable to always provide domain de ompositions suitable for PCDM.We have solved the domain de omposition problem for 2D planar geometriesand we are working [6℄ to solve this problem for 3D geometries.Finally, the PD3 method we believe is the �holy grail� for parallel meshing.It requires zero ommuni ation and syn hronization with 100% ode re-use.Unfortunately, like PCDM requires spe ial domain de ompositions whi h arevery hallenging to generate for general geometries. For 2D planar geometrieswe have been very su essful to generate su h de ompositions. We have a verygood progress in applying the same te hniques for 3D polyhedral geometries,but for 3D domains with urved boundary the problem is still open.6 A knowledgmentsMany results that appear in this paper are due to my work with my studentsAndrey Chernikov, Andriy Fedorov, Demian Nave, Leonidas Linardakis andChaman Verma, as well as my olleagues from Cornell University espe ially Dr.Paul Chew. This resear h has been funded by NSF CISE Challenge #EIA-9726388, Career Award CCR-9876179, Resear h Infrastru ture grant #EIA-9972853, ITR grant #ACI-0085969 and NGS grant EIA-0203974.Referen es[1℄ K. Barker, N. Chriso hoides, A. Chernikov, and K. Pingali. Ar hite -ture and evaluation of a load balan ing framework for adaptive and asyn-

hronous appli ations. IEEE Trans. Parallel and Distributed Systems,, Un-der revision, 2003.[2℄ Kevin Barker, Nikos Chriso hoides, Je� Dobbelaer, Démian Nave, and Ke-shav Pingali. Data movement and ontrol substrate for parallel, adaptiveappli ations. Con urren y and Computation Pra ti e and Experien e, 14,2002.[3℄ G. Blello h, J. Hardwi k, G. Miller, and D. Talmor. Design and implemen-tation of a pra ti al parallel delaunay algorithm. Algorithmi a, 24:243�269,1999.[4℄ Adrian Bowyer. Computing Diri hlet tessellations. The Computer Journal,24(2):162�166, 1981.[5℄ Bru e Carter, Chuin-Shan Chen, L. Paul Chew, Nikos Chriso hoides,Guang R. Gao, Gerd Heber, Antony R. Ingra�ea, Chris Myers RolandKrause and, Démian Nave, Keshav Pingali, Paul Stodghill, Stephen Vava-sis, and Paul A. Wawrzynek. Parallel fem simulation of ra k propagation� hallenges, status. In Le ture Notes in Computer S ien e, volume 1800,pages 443�449. Springer-Verlag, 2000.[6℄ Andrey Chernikov. Domain De omposition for Guaranteed Quality ParallelMesh Generation. PhD thesis, College of William and Mary, 2005.[7℄ P. Chew, N. Chriso hoides, and F. Sukup. Parallel onstrained delaunaymeshing. In Spe ial Symposium on Trends in Unstru tured Mesh Genera-tion. ASME/ASCE/SES, 1997.[8℄ Paul Chew. Constrained delaunay triangulations. Algorithmi a, 4:97�108,1989.[9℄ Paul Chew. Guaranteed-quality mesh generation for urved surfa es. InPro . 9th Annu. ACM Sympos. Cornput. Geom., pages 274�280, 1993.[10℄ Nikos Chriso hoides. A survey of parallel unstru tured mesh generationmethods. SIAM Review, To be submitted, 2003.[11℄ Nikos Chriso hoides, Kevin Barker, Demian Nave, and Chris Hawblitzel.Mobile obje t layer: A runtime substrate for parallel adaptive and irregular omputations. Advan es in Engineering Software, 31(8-9):621�637, 2000.[12℄ Nikos Chriso hoides and Demian Nave. Simultaneous mesh generation andpartitioning. Mathemati s and Computers in Simulation, 54(4-5):321�339,2000.[13℄ Nikos Chriso hoides and Démian Nave. Parallel delaunay mesh generationkernel. IJNME, To appear in 2003.

[14℄ Nikos Chriso hoides and Florian Sukup. Task parallel implementation ofthe BOWYER-WATSON algorithm. In Pro eedings of Fifth InternationalConferen e on Numeri al Grid Generation in Computational Fluid Dynam-i s and Related Fields, 1996.[15℄ H. de Cougny and M. Shephard. CRC Handbook of Grid Generation, hap-ter Parallel unstru tured grid generation, pages 24.1�24.18. CRC Press,In ., 1999.[16℄ H. de Cougny and M. Shephard. Parallel volume meshing using fa e re-movals and hierar hi al repartitioning. Comp. Meth. Appl. Me h. Engng.,174(3-4):275�298, 1999.[17℄ J. Galtier and P. L. George. Prepartitioning as a way to mesh subdo-mains in parallel. In Spe ial Symposium on Trends in Unstru tured MeshGeneration. ASME/ASCE/SES, 1997.[18℄ L. Linardakis and N. Chriso hoides. Parallel delaunay domain de ouplingmethod for planar geometries. Algorithmi a, To be submitted in 2003.[19℄ R. Lohner and J. Cebral. Parallel advan ing front grid generation. InInternational Meshing Roundtable. Sandia National Labs, 1999.[20℄ D. Nave, N. Chriso hoides, and P. Chew. Guaranteed-quality parallel de-launay re�nement for restri ted polyhedral domains. In Pro eedings of 8thACM Symposium on Computational Geometry, pages 135�144, 2002.[21℄ J. B. Caval ante Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha,and A. R. Ingra�ea. An algorithm for three-dimensional mesh generationfor arbitrary regions with ra ks. EwC, 17(1):75�91, 2001.[22℄ R. Said, N. Weatherill, K. Morgan, and N. Verhoeven. Distributed paralleldelaunay mesh generation. Comp. Methods Appl. Me h. Engrg., 177:109�125, 1999.[23℄ Jonathan Ri hard Shew huk. Delaunay Re�nement Mesh Generation. PhDthesis, Carnegie Mellon University, S hool of Computer S ien e, May 1997.Available as Te hni al Report CMU-CS-97-137.[24℄ Chaman Verma. Parallel guaranteed quality delaunay mesh generation for3d geometries with urved boundaries. Master's thesis, College of Williamand Mary, 2003.[25℄ David F. Watson. Computing the n-dimensional delaunay tessellation withappli ation to voronoi polytopes. The Computer Journal, 24(2):167�172,1981.