oskar: simulating data from the · pdf file1 oskar: simulating data from the ska oxford...

36
1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

Upload: lyngoc

Post on 31-Jan-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

1

OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014

Fred Dulwich, Ben Mort, Stef Salvini

Page 2: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

2

Overview

• Simulating interferometer data for SKA: – Radio interferometry basics. – Measurement equation basics.

• Structure of OSKAR. • Experiences moving from Fermi to Kepler GPU

architecture. • Some recent simulation results.

Page 3: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

3

Radio interferometry

VLA (1973-1980)

One-Mile Telescope (1964)

First to use Earth-rotation aperture synthesis

Page 4: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

4

Comparison with optical system

•  Traditional optical telescope records image of the sky formed by lens (or mirror).

Sky

Image plane of lens

EM radiation from the sky

Lens

Page 5: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

5

Comparison with optical system

•  A radio interferometer samples the wave-front in the Fourier domain: Image formation done electronically.

Sky

Image formed by FT

Processing

EM radiation from the sky

Array of detectors

Page 6: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

6

Aperture arrays as stations

•  Advantages: – Cost effective at low

frequency – No moving parts –  Fast scanning – Multi-beaming

capability

•  Disadvantages: – Sparse at high frequency – Relatively high sidelobe

levels – Continually variable beam

shape – Continually variable

beam polarisation

•  Omni-directional antennas measure voltage signals from whole sky.

•  Spatial filtering (electronic beam forming) to isolate a direction of interest.

Page 7: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

7

Modelling Challenges (1)

• AA have complex beam patterns that have to be modelled across whole sky

Page 8: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

8

Modelling Challenges (2)

•  Science goals demand very high sensitivity – Require good understanding of instrumental characteristics

•  Need comprehensive models of sky and telescope – Very large instruments and sky model require HPC

•  Design of SKA not yet finalised: simulator has to be flexible

Page 9: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

9

Why simulate the SKA?

•  Imaging performance depends strongly on how the detector elements are arranged.

• Aperture arrays have unique problems. – Assess performance of evolving system design. – Simulations can produce data challenges for pipeline developers.

•  Ideas for SKA design have changed in recent years: – Few large stations (11200 elements per station) – Many small stations (256 elements per station)

Page 10: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

10

• A radio interferometer makes measurements of radiation in the Fourier domain (visibilities) for the “true” sky after various corruption effects, for example:

– Sky rotation (parallactic angle) – Ionosphere – Antenna pattern & shape of station beam

• The Hamaker-Bregman-Sault Measurement Equation of a radio interferometer can be used to simulate measured visibility data.

• Relies on concepts of: – Source coherency matrix – Jones matrix

Measurement Equation formalism

Page 11: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

11

• Source coherency matrix encapsulates source properties. – Stokes parameters I, Q, U and V completely describe

average polarisation of radiation from a source. • Coherency matrix defined as 2x2 complex quantity for

each source, s. – Using linear polarisation basis:

Source coherency (brightness) matrix

Bs =I +Q U + iVU − iV I −Q

"

#$$

%

&''

Page 12: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

12

• Describes some physical effect on the radiation. – For a single source, s, at a single receiving station, i.

• Jones matrix is another 2x2 complex quantity: – Allows intermixing of polarisations. – Allows modification of amplitude and phase of received

electromagnetic wave.

Jones matrix

⎥⎦

⎤⎢⎣

++

++=

2121

2121, iddicc

ibbiaaisJ

Page 13: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

13

• Gives modularity and makes complex simulations tractable: – Jones matrices can be chained together. – Allows us to separate different physical effects.

• Multiply matrices in order in which things actually happen:

• Visibility on baseline between stations i and j for all visible sources (s) is then:

Vi, j = Js,iBsJs, jH

s∑

Jones matrix and Measurement Equation

!isisiss,i ,,, ZYXJ =

Page 14: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

14

A pictorial Measurement Equation!

B

E

Z

R

K

Vi, j = Ks,iEs,iZs,iRs,iBsRs, jH

s∑ Zs, j

H Es, jH Ks, j

H

Page 15: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

15

OSKAR overview (1)

• GPU-enabled software to produce simulated visibilities by direct evaluation of a measurement equation.

• Currently ~120000 lines of code, mostly C (some C++). • Currently ~40 CUDA kernels/functions. • Single or double precision computation available. • Balance between highest performance and highest flexibility.

– Problem sizes vary hugely. – Simulations need to run on many different systems.

• Minimize PCIe traffic: – Copy input sky and telescope models to GPU memory. – Intermediate data generated on the GPU and used without transfer to

host. – Host keeps track of pointers to GPU memory. – Use GPU memory effectively as a giant cache.

Page 16: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

16

OSKAR overview (2)

• Each source is independent with respect to all other sources. • There are many sources in the sky…

– Can trivially parallelise over sources. – In general, each GPU thread works on one source. – Easily guaranteed 104 – 105 threads for any given kernel launch.

• Most expensive steps: – Station beam evaluation, for all stations.

• Compute limited (DFT). – “Cross-correlation” step (visibility evaluation per baseline).

• Bandwidth limited (Kepler); register limited (double precision, Fermi).

Page 17: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

17

Jones matrix data structure S

ourc

e s

(fast

est v

aryi

ng)

Station i (slowest varying)

Page 18: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

18

Jones matrix data structure S

ourc

e s

(fast

est v

aryi

ng)

Station i (slowest varying)

⎥⎦

⎤⎢⎣

++

++=

2121

2121, iddicc

ibbiaaisJ

• OSKAR functions calculate each Jones matrix for each source at each station in GPU memory (used as “scratchpad”).

Page 19: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

19

Joining Jones matrices S

ourc

e s

(fast

est v

aryi

ng)

Station i (slowest varying)

x =

Trivially parallel: each thread does one colour

isiss,i ,, YXJ =

Page 20: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

20

Vi, j = Js,iBsJs, jH

s∑

Forming visibilities (“correlator”)

1

Sou

rce

s (fa

stes

t var

ying

) Sta

tion

i

3 •  Exploits the fact that XY = YHXH

•  Each thread block computes result for one baseline, or one correlation between two stations, for all sources.

•  Each thread does a subset of sources.

– Accumulates partial sum into shared memory.

– Result of final accumulation into global memory.

2S

tatio

n j

B

Page 21: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

21

Vi, j = Js,iBsJs, jH

s∑

Forming visibilities (“correlator”)

1

Sou

rce

s (fa

stes

t var

ying

) Sta

tion

i

3 • Multiply together numbered cells.

• Accumulate results. – One shared memory location

per colour/thread (partial sum).

– Final step adds different colours, putting result into global memory.

2S

tatio

n j

B

Page 22: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

22

Vi, j = Js,iBsJs, jH

s∑

Forming visibilities (“correlator”)

1

Sou

rce

s (fa

stes

t var

ying

) Sta

tion

i

3 • Next thread block does same again for another station pair.

• Why not just use some matrix math library?

2S

tatio

n j

B

Page 23: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

23

Vi, j = Js,iBsJs, jH

s∑

Forming visibilities (“correlator”)

1

Sou

rce

s (fa

stes

t var

ying

) Sta

tion

i

3 2S

tatio

n j

B

f (s, i, j)

• Not quite the whole story... • Non-separable baseline-

dependent effects must be modelled here too:

– Smearing terms – Extended sources

Page 24: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

24

Fermi to Kepler

•  “Correlate” kernel (on compute 3.5 architecture, using CUDA 5.5) – 43 registers (single precision) – 68 registers (double precision)

• Must load from global memory: – Stokes parameters (4 values per source) – Direction cosines (3 values per source) – Extended source parameters (3 values per source) – Station coordinates (8 values per thread block) – Jones matrices (2 x 8 values per source)

•  Computes rotation matrix, two sinc functions, one exponential, three vector products, and two Jones complex matrix products.

– Not very operationally dense, but lots of data to store in registers. – Global memory load is bandwidth heavy! (N2 reads for N stations)

•  (Current baseline design makes this worse: 1024 stations!)

Page 25: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

25

Fermi to Kepler

•  Expecting big performance gains from reduced register pressure.

Kernel time

Simulation time

Simulation time

Precision double double single M2090 (Emerald)

9.44 s (ECC off)

1125 s (ECC on)

197 s (ECC on)

K20c (Ruby) 5.02 s 516 s 231 s Speedup 1.9 x 2.2 x 0.85 x (?)

Page 26: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

26

Inside Kepler K20 family (slide from NVIDIA GTC 2012)

Page 27: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

27

Inside Kepler K20 family (slide from NVIDIA GTC 2012)

•  L1 cache in Kepler no longer used for global memory loads! – Profiler showed that performance was limited by bandwidth to L2 cache.

Page 28: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

28

Jones matrix data structure S

ourc

e s

(fast

est v

aryi

ng)

Station i (slowest varying)

⎥⎦

⎤⎢⎣

++

++=

2121

2121, iddicc

ibbiaaisJ

•  Using const __restrict__ not enough! – Data structure too complex for compiler to

optimize load from global memory. •  Needed four explicit __ldg(float2)

or __ldg(double2) instructions to make use of Kepler’s read only data cache.

} float2!

Page 29: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

29

Fermi to Kepler

•  Expecting big performance gains from reduced register pressure. – Profiler showing >150 GB/s global memory bandwidth on K20c

(theoretical max 208 GB/s).

Kernel time

Simulation time

Simulation time

Simulation time

Simulation time

Precision double double single double single M2090 (Emerald)

9.44 s (ECC off)

1125 s (ECC on)

197 s (ECC on)

1125 s (ECC on)

197 s (ECC on)

K20c (Ruby) 5.02 s 516 s 231 s 292 s 124 s Speedup 1.9 x 2.2 x 0.85 x (?) 3.9 x 1.6 x

Page 30: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

30

Example study: Modelling the impact of distant interfering sources

•  AA have considerable sensitivity to sources outside primary beam. –  Strong function of frequency: Can we image at 600 MHz?

•  Understand impact of interfering sources to a AA snapshot observation.

•  Metric called (far) side-lobe confusion noise. –  With AA beams the signal from sources outside the field of interest is nonzero. –  The power from these sources is spread into the main field though their PSF side-lobes. –  Both the PSF and beam are a function of frequency and time. –  Known as confusion noise: millions of point sources which cannot be individually corrected for.

•  This an important limit to the imaging performance of AAs. Region of Interest

Side lobes

Interfering sources

Page 31: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

31

AA telescope configuration

−800 −600 −400 −200 0 200 400 600 800

−600

−400

−200

0

200

400

600

800

x (East) [metres]

y (N

orth

) [m

etre

s]

−20 −15 −10 −5 0 5 10 15 20−20

−15

−10

−5

0

5

10

15

20

x (East) [metres]

y (N

orth

) [m

etre

s]

256 antennas (courtesy N. Razavi)

693 stations (courtesy K. Grainge)

Page 32: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

32

AA station beams

100 MHz 600 MHz

Page 33: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

33

Sky model

•  The SKA will be more sensitive than any current telescope, so no all-sky models exist with enough sources.

– Generate a 2M source sky model with the correct statistics extrapolated from the VLSS catalogue (~68k sources).

−1 −0.5 0 0.5 1 1.5 20

1

2

3

4

5

6

7

Log10 flux bin [Jy]

Log1

0 cu

mul

ativ

e nu

mbe

r cou

nt

Page 34: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

34

Image of sidelobe confusion noise

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

13:50:014:00:00.010:00.020:00.030:00.040:00.050:00.015:00:00.010:00.0-40:00:00.0

-38:00:00.0

-36:00:00.0

-34:00:00.0

-32:00:00.0

-30:00:00.0

-28:00:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

26:00.028:00.014:30:00.032:00.034:00.036:00.038:00.030:00.0

-35:00:00.0

30:00.0

-34:00:00.0

-33:30:00.0

100 MHz

600 MHz

15 deg

2.5 deg

Page 35: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

35

Interfering (FSC) snapshot noise as a function of frequency

100 220 350 500 60010−4

10−3

10−2

Frequency [MHz]

FSC

N −

RM

S [J

y/be

am]

Page 36: OSKAR: Simulating data from the  · PDF file1 OSKAR: Simulating data from the SKA Oxford e-Research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini

36

Summary

• Large scale SKA simulations are challenging. – GPUs make them possible.

• Simulations are vital to – Assess the evolving system design. – Generate semi-realistic data products for tool-chain

developers and for data flow testing.