the big bang of astronomical data. how to use python to survive … · 2016-10-09 pycones 2016...
TRANSCRIPT
-
The Big Bang of astronomical data. How to use Python to survive the
data flood.Jose Sabater MontesInstitute for Astronomy, University of Edinburgh
P. Best, W. Williams, R. van Weeren, S. Sanchez, J. Garrido, J. E. Ruiz, L. Verdes-Montenegro and
the LOFAR surveys team
-
2016-10-09 PyConES 2016 Almería
The new astronomy
-
2016-10-09 PyConES 2016 Almería
The new astronomy
ALMA correlator
-
2016-10-09 PyConES 2016 Almería
Astronomy and Python
● Python is currently the main language used for astronomy
● General Python computing libraries: numpy, scipy, matplotlib, pandas, emcee...
● Specific astronomical libraries (seehttp://www.astropython.org/packages/ )– astroML: machine learning and data mining– astropy: main general library for astronomy– etc.
http://www.astropython.org/packages/
-
2016-10-09 PyConES 2016 Almería
The future of astronomy
● New state of the art astronomical infrastructures that produce an overwhelming amount of data
● Examples:– ESA Gaia– Large Synoptic Survey Telescope– ESA Euclid– The Square Kilometre Array and its pathfinders (LOFAR,
ASKAP, Meerkat...)– Etc.
-
2016-10-09 PyConES 2016 Almería
Large Synoptic Survey Telescope
● 8.4 m mirror● Covers the full visible
sky every two nights● Under construction -
operational in 2022
-
2016-10-09 PyConES 2016 Almería
Large Synoptic Survey Telescope
● Camera 189x16 Mpix● Pipeline
preprocessing: 3GB/s● 30 TB per night
during 10 years● 2 M events triggered
per night
-
2016-10-09 PyConES 2016 Almería
Square Kilometre Array (SKA)
● Radio telescope with 1 km² of collecting area● Phase 1 - 2020
-
2016-10-09 PyConES 2016 Almería
SKA data
● Phase 1:– 10 TB/s from the
antennas to the correlator
– 40 GB/s of data → 70 PB per year
– 1 MW infrastructure and 10 MW processing
-
2016-10-09 PyConES 2016 Almería
SKA data
● Phase 2:– 160 TB/s from the
antennas to the correlator
– > 100 GB/s of data → 4.6 EB per year
– 200 to 2000 dishes– 130K to 1M antennas
-
2016-10-09 PyConES 2016 Almería
LOFAR
● Low Frequency Array● Software defined radio-interferometer
working at low frequencies (30 to 240 MHz)● One of the Square Kilometre Array
pathfinders
-
2016-10-09 PyConES 2016 Almería
LOFAR Stations
-
2016-10-09 PyConES 2016 Almería
LOFAR Stations
-
2016-10-09 PyConES 2016 Almería
LOFAR frequencies● LBA 30-80 MHz ● HBA 120-240 MHz
-
2016-10-09 PyConES 2016 Almería
LOFAR science
● Origin and evolution of galaxies and supermassive black holes
● Epoch of reionization● Solar science and space weather● Transients● Map the galaxy using pulsars● Exoplanets, SETI
-
2016-10-09 PyConES 2016 Almería
Radio galaxies
Hercules A. Credits: NASA and the NRAO
-
2016-10-09 PyConES 2016 Almería
LOFAR aperture synthesis
● field of view diameter of ~5 deg at 150 MHz
● resolution < 5 arcsec (up to 0.1 arcsec)
-
2016-10-09 PyConES 2016 Almería
LOFAR imaging
In 8 hours~40 sq. deg.5000 sources
Calibration on IAA (Granada) cluster
-
2016-10-09 PyConES 2016 Almería
LOFAR imaging
In 8 hours~40 sq. deg.5000 sources
Calibration on IAA (Granada) cluster
-
2016-10-09 PyConES 2016 Almería
Extended sources
-
2016-10-09 PyConES 2016 Almería
Ionosphere
● Effect depends on frequency, length of the baselines and f.o.v.
● LOFAR, worst case:– Wide field of view– Long distance
baselines– Low frequency
H. Intema
-
2016-10-09 PyConES 2016 Almería
Ionosphere
● Effect depends on frequency, length of the baselines and f.o.v.
● LOFAR, worst case:– Wide field of view– Long distance
baselines– Low frequency
H. Intema
-
2016-10-09 PyConES 2016 Almería
Ionosphere
◙
https://www.youtube.com/watch?v=5KWGDx0fq50
-
2016-10-09 PyConES 2016 Almería
Challenges for the astronomer
● User data calibration (remove the effect of the ionosphere and the RFI)– 8 hours full resolution → ~20 TB– Minimum of 2 CPU years to run the calibration– Experimental pipeline
● LOFAR calibration software– Difficult to install– Continuous development
-
2016-10-09 PyConES 2016 Almería
Computational solution needed
● Parallelizable:– Deal with a large amount of data in a
reasonable time.● Flexible:
– Adapt the infrastructure (“hardware”) to different calibration strategies
– Deal with quickly changing temperamental software
– On-demand (optional but very useful)
-
2016-10-09 PyConES 2016 Almería
HPC, HTC and cloud computing● Tests in different infrastructures: clusters,
GRID, cloud, etcetera.● SKA-AWS astrocompute proposal
– Preparation of the base infrastructure (virtual machine images, check provisioning of spot instances, etc.)
– Data transfer: 50 TB – Adapt calibration pipeline and run
http://www.lofarcloud.uk
http://www.lofarcloud.uk/
-
2016-10-09 PyConES 2016 Almería
Experimental calibration pipeline
Processing Calibrationsolutions
Pre-processing
Facet calibration
Calibrator data
360 chunks (1 sb)
Main target data
360 chunks (1 sb)
Combined data:9 chunks (40 sb)
~30 iterations
Final image
Preprocessed target data
36 chunks (10 sb)
Self-cal and subtraction
Data split:- field- observation- frequency
-
2016-10-09 PyConES 2016 Almería
The role of PythonLOFAR software
libraries
prog. n…
script 1 script n…
prog. 1 …
-
2016-10-09 PyConES 2016 Almería
The role of PythonLOFAR software
Pipeline 1 Pipeline n
libraries
prog. n…
script 1 script n…
prog. 1
step 1 step 2
step 3 step 4
step 5 step 6
… …
Experimental pipelines
-
2016-10-09 PyConES 2016 Almería
The role of PythonLOFAR software
Pipeline 1 Pipeline n
Infrastructure
libraries
pipelinechunk 1
prog. n…
script 1 script n…
control
prog. 1
step 1 step 2
step 3 step 4
step 5 step 6
pipelinechunk 4
…
…
pipelinechunk 2
pipelinechunk 5
pipelinechunk 3
pipelinechunk n
…
…
Experimental pipelines
-
2016-10-09 PyConES 2016 Almería
The role of PythonLOFAR software
Pipeline 1 Pipeline n
Infrastructure
libraries
pipelinechunk 1
prog. n…
script 1 script n…
control
prog. 1
step 1 step 2
step 3 step 4
step 5 step 6
pipelinechunk 4
…
…
pipelinechunk 2
pipelinechunk 5
pipelinechunk 3
pipelinechunk n
…
…
Experimental pipelines
Python
Python wrapper
Python mixed
Other
Cython
Ansible
-
2016-10-09 PyConES 2016 Almería
Summary● Big software and data managing challenges
associated to new astronomical infrastructures, even for final users.
● The role of Python: – Quick prototyping - fundamental for experimental
pipelines and testing.– Multi-domain - Can be used for a wide range of
problems.– Robust - Enough to write “real” efficient software.– Unifying tool - that holds all together. ◙
https://www.youtube.com/watch?v=8BBoDw2qVD0
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32