lattice boltzmann methods on the way to exascale - lss · pdf filelbm on the way to exascale...
TRANSCRIPT
LBM on the way to ExaScale — Ulrich Rüde
Lehrstuhl für Simulation Universität Erlangen-Nürnberg
www10.informatik.uni-erlangen.de
Ulrich Rüde(LSS Erlangen, [email protected])
1
Lattice Boltzmann Methods on the way to exascale
HIGH PERFORMANCE COMPUTINGFrom Clouds and Big Data to Exascale and Beyond
An International Advanced WorkshopCetraro – Italy, June 27 – July 1, 2016
LBM on the way to ExaScale — Ulrich Rüde
OutlineGoals:
drive algorithms towards their performance limits (scalability is necessary but not sufficient) sustainable software: reproducibility & flexibility coupled multi physics
Three software packages:1. Many body problems: rigid body dynamics
2.8 × 1010 non-spherical particles2. Kinetic methods: Lattice Boltzmann - fluid flow
>1012 cells, adaptive, load balancing3. Continuum methods: Finite element - multigrid
fully implicit solves with >1013 DoFReal life applications
2
LBM Methods Ulrich Rüde
The work horsesJUQUEEN SuperMUC
Blue Gene/Q architecture 458,752 PowerPC A2 cores 16 cores (1.6 GHz) per node 16 GiB RAM per node 5D torus interconnect 5.8 PFlops Peak TOP 500: #13
Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27
LBM on the way to ExaScale — Ulrich Rüde
Building block I:
The Lagrangian View:
Granular media simulations
with the physics engine
4
1250000 spherical particles256 processors 300300 time stepsruntime: 48h (including data output)texture mapping, ray tracing
Pöschel, T., & Schwager, T. (2005). Computational granular dynamics: models and algorithms. Springer Science & Business Media.
Non-penetration conditions Coulomb friction conditions
ξ ≥ 0 ⊥ λn ≥ 0 ‖λto‖2 ≤ μλn
ξ̇+ ≥ 0 ⊥ λn ≥ 0 ‖δv+to‖2λto = −μλnδv
+to
ξ̈+ ≥ 0 ⊥ λn ≥ 0 ‖ ˙δv+
to‖2λto = −μλn˙δv
+
to
ξ ≥ 0 ⊥ Λn ≥ 0 ‖Λto‖2 ≤ μΛn
ξ̇+ ≥ 0 ⊥ Λn ≥ 0 ‖δv+to‖2Λto = −μΛnδv
+to
ξ
δt+ δv′n(λ) ≥ 0 ⊥ λn ≥ 0
‖λto‖2 ≤ μλn
‖δv′to(λ)‖2λto = −μλnδv
′to(λ)
Signorini condition impact law friction cone condition frictional reaction opposes slip
ξ = 0
ξ̇+ = 0
ξ = 0
‖δv+to‖2 = 0fo
rces
impu
lses
cont
inuo
usdi
scre
te
LBM on the way to ExaScale — Ulrich Rüde
Nonlinear Complementarity and Time Stepping
5
Moreau, J., Panagiotopoulos P. (1988): Nonsmooth mechanics and applications, vol 302. Springer, Wien-New York
Popa, C., Preclik, T., & UR (2014). Regularized solution of LCP problems with application to rigid body dynamics. Numerical Algorithms, 1-12.
Preclik, T. & UR (2015). Ultrascale simulations of non-smooth granular dynamics; Computational Particle Mechanics, DOI: 10.1007/s40571-015-0047-6
LBM on the way to ExaScale — Ulrich Rüde 6
Dense granular channel flow with crystallization
25.9%
9.5%
8.0
% 25.8%
18.1%
12.6%
(a) Time-step profile of the granular gas exe-cuted with 5×2×2 = 20 processes on a singlenode.
16.0%
5.9%
22
.7%
22
.7%
30.6%
16.5%
8.3%
(b) Time-step profile of the granular gas exe-cuted with 8 × 8 × 5 = 320 processes on 16nodes.
LBM on the way to ExaScale — Ulrich Rüde
Scaling ResultsSolver algorithmically not optimal for dense systems, hence cannot scale unconditionally, but is highly efficient in many cases of practical importance Strong and weak scaling results for a constant number of iterations performed on SuperMUC and Juqueen Largest ensembles computed
2.8 × 1010 non-spherical particles 1.1 × 1010 contacts
granular gas: scaling results
7
(b) Weak-scaling graph on the Juqueen supercomputer.
Breakup up of compute times on Erlangen RRZE Cluster Emmy
Largest ensembles computed 10
10
granular gas: scaling results
Building Block III:
Scalable Flow Simulationswith the Lattice Boltzmann Method
8Extreme Scale LBM Methods - Ulrich Rüde
Succi, S. (2001). The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press.Feichtinger, C., Donath, S., Köstler, H., Götz, J., & Rüde, U. (2011). WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2), 105-112.
LBM on the way to ExaScale — Ulrich Rüde
Partitioning and Parallelization
9
static load balancing
allocation of block data (→ grids)
static block-level refinement (→ forest of octrees)
separation of domain partitioningfrom simulation (optional)
compact (KiB/MiB) binary MPI IO
LBM on the way to ExaScale — Ulrich Rüde
Parallel AMR load balancing
10
forest of octrees: octrees are not explicitly stored,
but implicitly defined via block IDs
2:1 balanced grid(used for the LBM)
distributed graph: nodes = blocks
edges explicitly stored as<block ID, process rank> pairs
different views on domain partitioning
AMR and Load Balancing with waLBerla
11Extreme Scale LBM Methods - Ulrich Rüde
Isaac, T., Burstedde, C., Wilcox, L. C., & Ghattas, O. (2015). Recursive algorithms for distributed forests of octrees. SIAM Journal on Scientific Computing, 37(5), C497-C531.
Meyerhenke, H., Monien, B., & Sauerwald, T. (2009). A new diffusion-based multilevel algorithm for computing graph partitions. Journal of Parallel and Distributed Computing, 69(9), 750-761.
Schornbaum, F., & Rüde, U. (2016). Massively Parallel Algorithms for the Lattice Boltzmann Method on NonUniform Grids. SIAM Journal on Scientific Computing, 38(2), C96-C126.
AMR Performance
12Extreme Scale LBM Methods - Ulrich Rüde
••
•
•
••
•
•
AMR Performance
13Extreme Scale LBM Methods - Ulrich Rüde
••
•
•
••
•
•
uring this refresh process …… all
AMR Performance
14Extreme Scale LBM Methods - Ulrich Rüde
• –
⇔ ⇔
AMR Performance
15Extreme Scale LBM Methods - Ulrich Rüde
• –
Performance onCoronary Arteries Geometry
Extreme Scale LBM Methods - Ulrich Rüde
Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., & UR (2013). A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (p. 35). ACM.
Weak scaling458,752 cores of JUQUEENover a trillion (1012) fluid lattice cells
cell sizes 1.27μmdiameter of red blood cells: 7μm 2.1 1012 cell updates per second 0.41 PFlops
Strong scaling32,768 cores of SuperMUC
cell sizes of 0.1 mm2.1 million fluid cells6000+ time steps per second
Color coded proc assignment
Single Node Performance
Extreme Scale LBM - Ulrich Rüde
SuperMUCJUQUEEN
vectorized
optimized
standard
Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel large-scale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference on Supercomputing (p. 21). IEEE Computer Society.
Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and Engineering, 4(1), 3-11.
LBM on the way to ExaScale — Ulrich Rüde
Flow through structure of thin crystals (filter)
18
work with Jose Pedro Galache and Antonio Gil CMT-Motores Termicos, Universitat Politecnica de Valencia
LBM on the way to ExaScale — Ulrich Rüde 19
Direct numerical simulation ofcharged particles in flowMasilamani, K., Ganguly, S., Feichtinger, C., & UR (2011). Hybrid lattice-boltzmann and finite-difference simulation of electroosmotic flow in a microchannel. Fluid Dynamics Research, 43(2), 025501.
Bartuschat, D., Ritter, D., & UR (2012). Parallel multigrid for electrokinetic simulation in particle-fluid flows. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 374-380). IEEE.
Bartuschat, D. & UR (2015). Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows, Journal of Computational Science, Volume 8, May 2015, Pages 1-19
Positive and negatively charged particles in flow subjected to transversal electric field
Building Block IV (electrostatics)
hydrodynam. force
object motion
Lubricationcorrection
electrostat. force
velocity BCs
object distance
LBM
correction force
charge distribution
Newtonian mechanicscollision response
treat BCsstream-collide step
Finite volumes
MGiterat.
treat BCsV-cycle
LBM on the way to ExaScale — Ulrich Rüde
6-way coupling
20
LBM on the way to ExaScale — Ulrich Rüde
Separation experiment
21
0 250 500 750 1000 1250 1500 1750 2000Number of nodes
0102030405060708090
103
MFL
UPS
(L
BM
)
LBM Perform.20
40
60
80
100
120
103
ML
UPS
(M
G)
MG Perform.
1 2 4 8 16 32 64 12825
651
210
2420
48
0
100
200
300
400
Number of nodes
Total
runtimes
[]
LBM
Map
Lubr
HydrFpeMG
SetRHS
PtCm
ElectF
240 time steps fully 6-way coupled simulation 400 sec on SuperMuc weak scaling up to 32 768 cores 7.1 Mio particles
LBM on the way to ExaScale — Ulrich Rüde
Volume of Fluids Methodfor Free Surface Flows
22
joint work with Regina Ammer, Simon Bogner, Martin Bauer, Daniela Anderl, Nils Thürey, Stefan Donath, Thomas Pohl, C Körner, A. Delgado
Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR. (2005). Lattice Boltzmann model for free surface flow for modeling foaming. Journal of Statistical Physics, 121(1-2), 179-196. Donath, S., Feichtinger, C., Pohl, T., Götz, J., & UR. (2010). A Parallel Free Surface Lattice Boltzmann Method for Large-Scale Applications. Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, 318. Anderl, D., Bauer, M., Rauh, C., UR, & Delgado, A. (2014). Numerical simulation of adsorption and bubble interaction in protein foams using a lattice Boltzmann method. Food & function, 5(4), 755-763.
Building Block V
LBM on the way to ExaScale — Ulrich Rüde 23
Free Surface FlowsVolume-of-Fluids like approach Flag field: Compute only in fluidSpecial “free surface” conditions in interface cells Reconstruction of curvature for surface tension
LBM on the way to ExaScale — Ulrich Rüde 24
Free Surface Bubble ModelData of a Bubble:
Initial Volume (Density=1)Current VolumeDensity/Pressure = initial volume / current volume
Update ManagementEach process logs change of volume due to cell conversions (Interface – Gas / Gas – Interface) and mass variations in Interface cellsAll volume changes are added to the volume of the bubble at the end of the timestep (which also has to be communicated)
LBM on the way to ExaScale — Ulrich Rüde
Simulation for hygiene products (for Procter&Gamble)
capillary pressure inclination
surface tension contact angle
25
ill f t
LBM on the way to ExaScale — Ulrich Rüde
Additive ManufacturingFast Electron Beam Melting
26
Bikas, H., Stavropoulos, P., & Chryssolouris, G. (2015). Additive manufacturing methods and modelling approaches: a critical review. The International Journal of Advanced Manufacturing Technology, 1-17.
Klassen, A., Scharowsky, T., & Körner, C. (2014). Evaporation model for beam based additive manufacturing using free surface lattice Boltzmann methods. Journal of Physics D: Applied Physics, 47(27), 275303.
LBM on the way to ExaScale — Ulrich Rüde
Electron Beam Melting Process3D printing
EU-Project Fast-EBM
ARCAM (Gothenburg) TWI (Cambridge) FAU Erlangen
Generation of powder bed Energy transfer by electron beam
penetration depth heat transfer
Flow dynamics meltingmelt flow surface tension wettingcapillary forcescontact angles solidification
27
Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014). Simulating fast electron beam melting with a parallel thermal free surface lattice Boltzmann method. Computers & Mathematics with Applications, 67(2), 318-330.
Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014). Validation experiments for LBM simulations of electron beam melting. International Journal of Modern Physics C.
LBM on the way to ExaScale — Ulrich Rüde
Simulation of Electron Beam Melting
28
Simulating powder bed generation using the PE framework
High speed camera shows HHHHiiiigggggggggghh ssppppppppeeeeddddd ccaammeerraaaa sssshhoowwss melting step for manufacturing a sstteepppppppppp fffoorr mmaaaannnnuufffaacc
hollow cylinder
WaLBerla Simulation
LBM on the way to ExaScale — Ulrich Rüde
Conclusions
29
LBM on the way to ExaScale — Ulrich Rüde
CSE research is done by teams
30
Harald KöstlerChristian
Godenschwager Kristina Pickl Regina Ammer Simon Bogner
Florian Schornbaum
Sebastian Kuckuk
Christoph Rettinger
Dominik Bartuschat Martin Bauer
LBM on the way to ExaScale — Ulrich Rüde
Thank you for your attention!
31
Videos, preprints, slides at https://www10.informatik.uni-erlangen.de