geometry based parallel mesh generation and adaptation · 12 parallel mesh adaptation •...
TRANSCRIPT
Geometry Based Parallel Mesh Generation and Adaptation
Saurabh TendulkarMark Beall
Rocco Nastasia
2
Introduction
• Q. Why parallel mesh generation/adaptation?• A. Enables large scale parallel adaptive
simulations– Billions of elements, 10s/100s of thousands of processors.– If mesh is serial, scaling is not possible.– Eliminate I/O where possible – I/O is very slow.– Seamless simulations that scale well without bottlenecks.
3
Topics• Partitioned mesh• Parallel mesh generation
– Surface meshing.– Volume meshing.
• Parallel adaptation– Mesh modifications.– Predictive load balancing.– Anisotropic size fields.– Boundary layer adaptation.
• Distributed parallel geometry• File-free adaptive analysis• Multithreaded mesh generation/adaptation
4
Partitioned Mesh
• Mesh distributed among available processors.
• Each processor has part of the mesh.
• Entities on part boundaries:– Replicated on each part– Know about their copies on other
parts.
• Mesh classified on model.
5
Partitioned Mesh• Partitioned mesh allows
– Communication at part boundaries• Operations independently performed on each part.• Communication of data so mesh is in sync.
– Mesh migration• Migrate individual/groups of entities from part to part.• Localize given neighborhood around entities.
– Partitioning • Load balance.• Parallel graph partitioner – ParMetis, user defined.• In volume mesh, regions/faces are graph nodes/edges.
6
Parallel Mesh Generation
• Surface Meshing– Fully automatic.– Decompose model faces
among processes.– Automatically determined.– Load balance not guaranteed,
but scales well in practice (more faces than processors).
7
Parallel Mesh Generation
• Volume Meshing– Fully automatic.– Octree based spatial
decomposition for load balance.
– Mesh local areas (away from part boundaries).
– Hierarchical repartitioning to localize and mesh areas between part boundaries.
8
Distributed Volume Meshing• Scaling
– Hard problem to scale well as amount of work is unknown.
– Good speedup = half the number of processes.
– Focus on generating.– >300M element meshes
generated in 10 min.– Generation << I/O time.– But do not need I/O!– Reduces overall time.
Volume meshing up to 12 processors
Volume meshing up to 64 processors
9
Parallel Mesh Generation
1/8 of 180M element mesh on 64 processors
10
Parallel Mesh Adaptation
• Error estimation specifies new mesh size field– Mesh size at vertices.
• Adaptation, based on this size field involves:– Refinement (splits), coarsening (collapses).– Optimization to improve shape (swaps etc).
• Maintain fidelity to geometry (snapping)– Motion.– Modifications.– Cavity remeshing.
11
Parallel Mesh Adaptation
• Modifications in parallel (at part boundaries)– Refinement
• Split in parallel independently.• Communicate new data to keep in sync.
– Coarsening, optimization, snapping• Localize mesh, then modify.
12
Parallel Mesh Adaptation
• Predictive load balancing– New size field may lead to heavy refinement on
one part and coarsening in another.– Cannot go with current partitioning – memory as
well as work load could be unbalanced.– Use new size field to set weights on regions.– Do weighted repartitioning before modifications.– Load/memory balance and suitable partitioning
after modifications for next analysis step.
13
Anisotropic Adaptation• Anisotropic size field
– Ellipsoidal sizes at vertices.
Transonic flow over ONERA M6 wing
14
Boundary Layer Adaptation
• Boundary layer mesh– Semistructured mesh.– Models high gradients normal
to surface, e.g. no-slip walls in CFD.– First layer height (t0), number of layers (n), total height (T)
or gradation factor (g).
• Adaptation must maintain structure.• Parallel BL adaptation under development.
– Serial available.
15
Boundary Layer Adaptation
• Separate normal and in-plane adaptation.• In-plane
– Size field same as for unstructured mesh.– Mesh modifications propagate thru stack.– Keep stacks together in parallel.
• Normal– User specifies t0, n, and T or g on vertices.
– Shrink/expand BL, change number of layers.
16
Boundary Layer Adaptation
BL adaptation – pipe manifold example
17
Boundary Layer Adaptation
Pipe adaptation – close up of corner
18
Boundary Layer Adaptation
Normal BL adaptation
19
File-free Adaptive Analysis
• Direct interface between Simmetrix and Solver codes.– All required data is in memory, no I/O.– In-place solution transfer during adaptation with FieldSim.
• In progress:– RPI/Colorado's PHASTA CFD code.– NASA's FUN3D CFD code.
FUN3D Itersdone?
ErrorEstimation
Errorok?
MeshSimAdaptMain
End End
Y Y
N N
FUN3D RPI
FieldSim
20
Distributed Model Geometry• Requiring entire model on each process poses
memory and I/O (or communication) overhead.• Truly require only geometry where mesh is classified.• Partitioned model representation
– Similar to partitioned mesh.– Migrate model entities between processors.– Maintain enough data to properly hook up with adjacent entities.
• Driven by mesh migration– Geometry required by receiving process migrated first.– Then mesh is migrated so it can be classified.– Geometry not required any more on sending process (mesh was
migrated away) is deleted.
21
Distributed Model Geometry
Partitioned mesh and model – model geometry in grey
• Substantial memory savings for models with large number of model entities.
• Parts entirely in interior require no geometry at all!
22
Multithreaded Meshing
• Utilize multicore machines to get results faster.• Limit critical sections where threads need exclusive access.• Initial goal: speedup to 1.5 on 2 and 2 on 4 threads.• Hybrid distributed + MT for modern parallel clusters:
– For example, MPI (funneled calls via master) + pthreads.
23
Concluding Remarks• Parallel mesh generation and adaptation enable
large scale parallel adaptive analyses.• Parallel model representation allows scaling of
geometry as well.• Work in progress
– Better scaling for high number of processors.• Use RPI's CCNI supercomputer.
– Better scaling for multithreaded (2, 4, 8... threads).– Parallel adaptation in the boundary layer.– File-free adaptive analyses.– Hybrid MPI + thread model.