timestepping and parallel computing in highly dynamic n-body systems

46
Timestepping and Timestepping and Parallel Computing in Parallel Computing in Highly Dynamic N-body Highly Dynamic N-body Systems Systems Joachim Stadel Joachim Stadel [email protected] [email protected] University of Zürich University of Zürich Institute for Theoretical Institute for Theoretical Physics Physics

Upload: dotty

Post on 10-Jan-2016

36 views

Category:

Documents


2 download

DESCRIPTION

Timestepping and Parallel Computing in Highly Dynamic N-body Systems. Joachim Stadel [email protected] University of Zürich Institute for Theoretical Physics. Astrophysical N-body Simulations. Physics. Hydro. Gravity. Collisions. Near Integrable. 12. 11. 10. 9. 8. 7. 6. 5. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Timestepping and Parallel Timestepping and Parallel Computing in Highly Dynamic Computing in Highly Dynamic

N-body SystemsN-body Systems

Joachim StadelJoachim Stadel

[email protected]@physik.unizh.ch

University of ZürichUniversity of Zürich

Institute for Theoretical PhysicsInstitute for Theoretical Physics

Page 2: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

LSS-SurveysGalaxy Formation

Solar System Formation

Astrophysical N-body SimulationsAstrophysical N-body Simulations

10 102

112

10

10

1010 10 10 10 10 1010 10911 10 678 345

1310 10 10 10 101 10 10

12 1715106 84210

PhysicsPhysics

AppsApps

Gravity

Hydro

Collisions Near Integrable

SS-Stability

Page 3: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

OutlineOutline

Collisionless Simulations Collisionless Simulations and Resolutionand Resolution

Parallel ComputersParallel Computers Tree Codes – Tree Codes Tree Codes – Tree Codes

on Parallel Computerson Parallel Computers PKDGRAV (and Gasoline)PKDGRAV (and Gasoline) Applications – Various Applications – Various

MoviesMovies Warm Dark MatterWarm Dark Matter Multistepping Part 1Multistepping Part 1 New Parallelization New Parallelization

ProblemsProblems

Multistepping Part 2Multistepping Part 2 Initial Conditions – ShellsInitial Conditions – Shells Blackhole "Mergers"Blackhole "Mergers" Fast Multipole MethodFast Multipole Method PKDGRAV2PKDGRAV2 Cosmo Initial ConditionsCosmo Initial Conditions GHALO SimulationGHALO Simulation GHALO Prelim. ResultsGHALO Prelim. Results

- Density ProfileDensity Profile- Phase-Space DensityPhase-Space Density- Subhalos & ReionizationSubhalos & Reionization

What next?What next?

Page 4: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

WMAP Satellite 2003

Fluctuations in the Microwave Background Radiation

The initial conditions for structure formation.

The Universe is completely smooth to one part in 1,000 at z=1000.

Page 5: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Greenbank radio galaxy survey (1990) 31,000 galaxies

At z=0 and on the very largest scales the distribution of galaxies is in fact homogeneous.

Page 6: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

On ´smaller´ scales: redshift surveysOn ´smaller´ scales: redshift surveys

Page 7: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Numerical SimulationNumerical Simulation

From the microwave background fluctuations

to the present day structure seen in galaxy

redshift surveys.

Page 8: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

N-body simulations as models of stellar systemsN-body simulations as models of stellar systems

j≠i

N

xi=∑-Φ(xi,xj)

∂ƒ/∂t + [ƒ,H] = 0; ƒdz = 1

¨

dx/dt = v ; dv/dt = -Φ

N

Typically Nsimulation << Nreal sothe equation below is NOT theone we should be solving.

the Collisionless Boltzman Equation (CBE)

CBE is 1st order non-linear PDE. These can be solved by the method of characteristics. The characteristics are the path along which information propagates; for CBE defined by:

But these are the equations of motion we had above! ƒ is constant along the characterisics, thus each particle carries a piece of ƒ in its trajectory.

Page 9: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

only difficulty is in evaluating only difficulty is in evaluating ΦΦ

Φ(x) ≈ -GM/N ∑ ƒ(zi)/ƒs(zi) 1/|x-x´|i=1

N

∫dz g(z) = lim 1/N ∑ g(zi)/ƒs(zi)N∞

N

i=1

Φ(x) = -GM∫dz´ƒ(z´)/|x-x´|

In terms of the distribution function,

Monte Carlo: for any reasonable function g(z),

zi are randomly chosen with sampling probability density ƒs

Apply this to the Poisson Integral

So in a conventional N-body simulation ƒs(z) = ƒ(z), so the particle density represents the underlying phase space density.

Page 10: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

SofteningSoftening

dE/dt = xi ∂E/∂xi + vi ∂E/∂vi + ∂E/∂t = ∂Φ/∂t

The singularity at x = x´ in the Poisson integral causes very large scatter in the estimation of Φ.

This results in a fluctuation in the potential, δΦ, which has 2 effects.

1. Change in the particle‘s energy along its orbit:

Fluctuations in Φ due to discrere sampling will cause a random walk in enegy for the particle: this is two-body relaxation.

2. Mass segregation: if more and less massive particles are present, the less massive ones will typically recoil from an encounter with more velocity than a massive particle.

Softening, either explicitly introduced or as part of the numerical method, lessens these effects.

Page 11: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

All N-body simulations of the CBE All N-body simulations of the CBE suffer from 2-body relaxation!suffer from 2-body relaxation!

This is even more important for cosmological simulations where all structures formed from smaller initial objects.

All particles experienced a large relative degree of relaxation in the past.

Diemand and Moore 2002

Page 12: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Increasing ResolutionIncreasing Resolution

Cluster Resolved

67,500

Galaxy Halos Resoved

1,300,000

Dwarf Galaxy HalosResolved

10,500,000

Page 13: Timestepping and Parallel Computing in Highly Dynamic N-body Systems
Page 14: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

zBox: (Stadel & Moore) 2002 288 AMD MP2200+ processors, 144 Gigs ram, 10 Terabyte disk

Compact, easy to cool and maintain

Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latencyA teraflop computer for $500,000 ($250,000 with MBit)

Roughly one cubic meter, one ton and requires 40kilowatts of power

Page 15: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Parallel supercomputing

Page 16: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

500 CPUs/640 GB RAM500 CPUs/640 GB RAM~100 TB of Disk~100 TB of Disk

A parallel computer is currently still mostly wiring.A parallel computer is currently still mostly wiring.The human brain (Gary Kasparov) is no exception.The human brain (Gary Kasparov) is no exception.

However, wireless CPUs are now under developmentHowever, wireless CPUs are now under developmentwhich will revolutionize parallel computer construction. which will revolutionize parallel computer construction.

Page 17: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Spatial Binary TreeSpatial Binary Tree

k-D Tree spatial binary with squeeze

Page 18: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Forces are calculated using a 4th order multipoles.Forces are calculated using a 4th order multipoles.

Ewald summation technique used to introduce Ewald summation technique used to introduce periodic boundary conditions (also based on a 4th periodic boundary conditions (also based on a 4th order expansion).order expansion).

Work is tracked and fed back into domain decomp.Work is tracked and fed back into domain decomp.

Page 19: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Compute time vs. AccuracyCompute time vs. Accuracy

Page 20: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Parallelizing Gravity (PKDGRAV)Parallelizing Gravity (PKDGRAV)

Spatial Locality = Computational Spatial Locality = Computational LocalityLocality(1/r^2) This means it is benificial to (1/r^2) This means it is benificial to

divide space in order to achieve divide space in order to achieve load balance. Minimizes load balance. Minimizes communication with other communication with other processors.processors.

But... add constraint on the But... add constraint on the number of particles/processor, number of particles/processor, Memory is limitted!Memory is limitted!

Domain Decomposition is a Domain Decomposition is a global optimization of these global optimization of these requirements which is solved requirements which is solved dynamically with every step.dynamically with every step.

Example division of Example division of space for 8 processorsspace for 8 processors

Page 21: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Other decomposition strategies...Other decomposition strategies...

Page 22: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

How are non-local parts of the tree How are non-local parts of the tree walked by PKDGRAV?walked by PKDGRAV?

CPU iCPU i CPU jCPU jLow latency Low latency message message passingpassing

Local cache of Local cache of remote data remote data elementselements

PKDGRAV does not attempt to determine in advance which PKDGRAV does not attempt to determine in advance which data elements are going to be required in a step (LET).data elements are going to be required in a step (LET).

The hit rate in the cache is very good with as little as 10 MB.The hit rate in the cache is very good with as little as 10 MB.

Page 23: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

PKDGRAVScaling

On the T3E it was possible to obtain 80% of linear scaling on 512 processors.

PKDGRAVJoachim StadelThomas Quinn

Page 24: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

GASOLINE: Wadsley, Stadel & Quinn NewA 2003GASOLINE: Wadsley, Stadel & Quinn NewA 2003

Fairly standard Fairly standard SPH formulation SPH formulation is used in is used in GASOLINEGASOLINE

SPH is very well matched to a particle based gravity code like SPH is very well matched to a particle based gravity code like PKDGRAV since all the core data structures and many of the PKDGRAV since all the core data structures and many of the same algorithms can be used. For example, the neighbor same algorithms can be used. For example, the neighbor searching can simply use the parallel distrinuted tree structure.searching can simply use the parallel distrinuted tree structure.

Evrard 88, Benz 89Evrard 88, Benz 89

Hernquist & Katz 89Hernquist & Katz 89

Monaghan 92Monaghan 92

Page 25: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Algorithms within GASOLINEAlgorithms within GASOLINE

We perform 2 NN operationsWe perform 2 NN operations1.1. Find 32 NN and calculate densities.Find 32 NN and calculate densities.2.2. Calculate forces in a second pass.Calculate forces in a second pass. For active particles we do a gather on For active particles we do a gather on

the k-NN, and a scatter from the k-the k-NN, and a scatter from the k-Inverse NN. We never store the nearest Inverse NN. We never store the nearest neighbors. (Springel 2001 similar)neighbors. (Springel 2001 similar)

Cooling and Heating and Ionization quite Cooling and Heating and Ionization quite efficient.efficient.

Page 26: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

The Large The Large Magellanic Magellanic

Cloud (LMC) in Cloud (LMC) in gas and starsgas and stars

Chiara MastropietroChiara Mastropietro(University of Zürich)(University of Zürich)

With fully dynamical With fully dynamical Milky Way Halo (dark Milky Way Halo (dark matter and hot gas and matter and hot gas and stellar disk and bulge) stellar disk and bulge) which are not shown which are not shown here.here.Both tidal and ram-Both tidal and ram-pressure stripping of pressure stripping of gas is taking place.gas is taking place.

Page 27: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Collisional Collisional PhysicsPhysics

Derek C. Derek C. RichardsonRichardson

Gravity with Gravity with hard spheres hard spheres including including surface friction, surface friction, coefficient of coefficient of restitution and restitution and aggregates; the aggregates; the Euler equations Euler equations for solid bodies.for solid bodies.

Page 28: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Asteroid Asteroid CollisionsCollisions

Page 29: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Part of an Part of an asteroid disk, asteroid disk, where the where the outcomes of outcomes of the asteroid the asteroid impact impact simulations simulations are included.are included.

Page 30: Timestepping and Parallel Computing in Highly Dynamic N-body Systems
Page 31: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Movies of 1000 years of evolution.Movies of 1000 years of evolution.

Page 32: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

The power spectrum of density fluctuations in three different dark matter models

Small scales (dwarf galaxies)

Large scales (galaxy clusters)

CMB

Horizon scale

Page 33: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

40MpcN=10^7

Andrea Maccio et al

CDM T=GeV

Page 34: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

40MpcN=10^7

Andrea Maccio et al

WDM T=2keV

Page 35: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

40MpcN=10^7

Andrea Maccio et al

WDM T=0.5keV

Page 36: Timestepping and Parallel Computing in Highly Dynamic N-body Systems
Page 37: Timestepping and Parallel Computing in Highly Dynamic N-body Systems
Page 38: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

CDM ~500 satellites 1kev WDM ~10 satellites

Very strong constraint on the lowest mass WDM candidate – need to form at least one Draco sized substructure halo

Halo density profiles unchanged – Liouvilles constraint gives cores ~< 50pc

Page 39: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

CDM n(M)=M^-2WDM n(M)=M^-1Data n(L)=L^-1

Page 40: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

With fixed timesteps these codes all scale With fixed timesteps these codes all scale very well.very well.

However, this is no-longer the only However, this is no-longer the only measure since the scaling of a very "deep" measure since the scaling of a very "deep" multistepping run can be a lot worse.multistepping run can be a lot worse.

How do we do multistepping now and why How do we do multistepping now and why does it have problems?does it have problems?

Page 41: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Drift-Kick-Drift Multistepping LeapfrogDrift-Kick-Drift Multistepping Leapfrog

DriftDrift KickKick

Rung 0Rung 0

Rung 1Rung 1

Rung 2Rung 2

timetime

SelectSelect

SelectSelect SelectSelect

Note that none of the Kick tick marks align, meaning that gravity is calculated Note that none of the Kick tick marks align, meaning that gravity is calculated for a single rung at a time, despite the fact that the tree is built for all particles.for a single rung at a time, despite the fact that the tree is built for all particles.

The select operators are performed top-down until all particles end up on The select operators are performed top-down until all particles end up on appropriate timestep rungs. 0:DSKD, 1:DS(DSKDDSKD)D, 2:DS(DS(DSKD...appropriate timestep rungs. 0:DSKD, 1:DS(DSKDDSKD)D, 2:DS(DS(DSKD...

Page 42: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Kick-Drift-Kick Multistepping LeapfrogKick-Drift-Kick Multistepping LeapfrogSelectSelect

SelectSelectSelectSelect

SelectSelect

This method is more efficient since it performs half the This method is more efficient since it performs half the number of tree build operations.number of tree build operations.

It also exhibits somewhat lower errors than the It also exhibits somewhat lower errors than the standard DKD integratorstandard DKD integrator

It is the only scheme used in production at present.It is the only scheme used in production at present.

Page 43: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Choice of TimestepChoice of Timestep

Want a criterion which commutes with the Kick Want a criterion which commutes with the Kick operator and is Galilean invariant, so it should operator and is Galilean invariant, so it should not depend on velocities.not depend on velocities.

03.0,/ G

encG /

2.0~,/~ a

a/

and can take the and can take the minimum of any or minimum of any or all of these criteria all of these criteria

LocalLocal

Non-local, based on Non-local, based on max acceleration in max acceleration in moderate densitiesmoderate densities

Page 44: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Multistepping: The real parallel Multistepping: The real parallel computing challenge.computing challenge.

T ~ 1/sqrt(GT ~ 1/sqrt(Gρρ), even ), even more dramatic in SPHmore dramatic in SPH

Implies NImplies Nactiveactive << N<< N Global approach to Global approach to

load balancing fails.load balancing fails. Less compute/commLess compute/comm Too many Too many

synchronization synchronization points between all points between all processors.processors.

Want all algorithms of the simulation code to scale as O(Nactive log N)!

Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping

Page 45: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

The TrendsThe Trends

Parallel computers are getting ever more Parallel computers are getting ever more independent computing elements. Eg: independent computing elements. Eg: Bluegene (100'000s), Multicore CPUsBluegene (100'000s), Multicore CPUs

Our simulations are always increasing in Our simulations are always increasing in resolution and hence we need many more resolution and hence we need many more timesteps than were required in the past.timesteps than were required in the past.

Multistepping methods have ever more Multistepping methods have ever more potential to speed up calculations, but potential to speed up calculations, but introduce new complexities into codes, introduce new complexities into codes, particularly for large parallel machines.particularly for large parallel machines.

Page 46: Timestepping and Parallel Computing in Highly Dynamic N-body Systems

What can be done?What can be done?Tree repair instead of rebuild.Tree repair instead of rebuild.

Don't drift all particles, only drift terms Don't drift all particles, only drift terms that appear on the interaction list!that appear on the interaction list!

Do smart updates of local cache information Do smart updates of local cache information instead of flushing at each timestep.instead of flushing at each timestep.

Use some local form of achieving load balancing, Use some local form of achieving load balancing, perhaps scheduling? Remote walks?perhaps scheduling? Remote walks?

Allow different parts of the simulation Allow different parts of the simulation to get somewhat out-of-sync?to get somewhat out-of-sync?

Use O(N^2) for very active regions.Use O(N^2) for very active regions.

Hybrid Methods: Block+SymbaHybrid Methods: Block+Symba