Download - Astronomy toolkits and data structures
Astronomy toolkits and data structures
Andrew Jenkins
Durham University
Data requirements of cosmological simulations
Adrian Jenkins
Durham University
3
Talk outline
• DiRAC and its major users• New astronomical instruments and
missions• Mock catalogues• Millennium simulation and database• Future directions for simulations
DiRAC 2 facility
• Cambridge HPC Service: data analytic cluster
• Cambridge COSMOS shared memory service
• Durham ICC Service: data centric cluster (6720 core - idataPlex)
• Edinburgh 6144 node Bluegene/Q• Leicester IT services: complexity cluster
DiRAC2 facility used by
Time allocated by RAC. Supports large projects (up to 3 years), and smaller allocations.
• Large users: UKQCD Virgo Consortium (UK) UKMHD
Horizon, Leicester …
JWST
Launch date: ~2017-8
Cost >$5 billion
EUCLID
Launch date:~2019
Cost ~€500 million
Future large surveys• Photometric e.g. Pan-STARRs, DES, LSST, Euclid-VIS
• Spectroscopic e.g. BOSS, BigBOSS, Euclid-NIS
• Multi-wavelength e.g SKA (HI)
Wide-field (>10,000 sq deg), wide redshift (z=0-3)
z-surveys: 10-50 million galaxies imaging surveys ~billions of galaxies
Why build a mock?
• Test galaxy formation models• Test algorithms - validation• Test processing pipelines• Assess survey performance (FoM)
Large surveys need mocks now!
Mock catalogues need observables
SFRSFHStellar massCold gas massBlack hole mass
imagesFull SED (UV, Optical, FIR, Radio)Galaxies : stars, gas, AGN
Euclid OU-LE3 requirements for simulations
CSWG OU-SIM
Cosmologicalsimulators
Instrumentsimulators
Generic needs from Euclid• Position, redshift
• Emission line properties/spectra Line flux, equivalent width• Broad photometry to AB~24-24.5 Euclid NIR Euclid VIS Pan-STARRS griz DES grizy CFHTLS ugriyz WFCAM ZYJHK SDSS ugriz VISTA-VHS-VIDEO ZYJHKs• Photometric redshifts
Specific needs: clustering
• 1% P(k) accuracy• Covariance estimates: P(k) etc• Initial conditions for reconstruction• Different cosmologies• Different galaxy formation models
(vary bias)
Specific needs – clusters of galaxies
• DM haloes M>1.e+13Msun, r(ΔΔ2500, 500,200; velocity dispersion along axes from DM particles
• For each galaxy host halo ID, central or sat?
• Simulated images for cluster detection and mass determination through weak/strong lensimg
Specific needs: weak lensing
• Galaxies and DM to generate kappa map
• Galaxy shapes with noise (no IA) • Galaxy shapes with IA• Shear at each galaxy position• Image properties: mask, bright stars, chip boundaries,
CCD defects, ghosts, variations in depth & background
16
Infrastructure required to make mocks
• Require large simulations
• To date these have been simulations of dark matter in large cosmological volumes.
17
18
19
20
21
24
Input simulations
• Large N-body simulations
• Approaching a trillion particles
MXXL simulation
2626
27
Future needsSimulations for Euclid multi-trillion particle simulations
Produce multi-petabyte datasets
Data growing faster than network capabilities
Need to scale databases up
Ideally would like to serve the raw simulation data - two or more orders of magnitude larger.
28
Current and future simulations
29
Summary
• Cosmological simulations are required to make the best use of observatories and space missions
• The size of the required simulations makes this a Big data problem
• Databases have proved very successful way of presenting processed data
• Making the raw simulation data public desirable - but very challenging given financial constraints.