1 kfpa critical design review – fri., jan. 30, 2009 kfpa data pipeline bob garwood- nrao-cv

25
1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

Upload: oliver-warner

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

1 KFPA Critical Design Review – Fri., Jan. 30, 2009

KFPA Data Pipeline

Bob Garwood- NRAO-CV

Page 2: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

2 KFPA Critical Design Review – Fri., Jan. 30, 2009

History

● Science and Data Pipeline Workshop – November 2007. Initial pipeline sketch.

● Conceptual Design Review – February 2008. Initial design.

● KFPA Data Analysis Meeting – June 2008. ● Memo describing possible KFPA observing

modes. Pisano, August 2008.

Page 3: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

3 KFPA Critical Design Review – Fri., Jan. 30, 2009

Changes since Conceptual Design Review

● Basic design essentially unchanged● Out-of-scope items (deferred)

– continuum– cross-correlation (polarization)– complicated calibration schemes

(“basketweaving”)● baseline fitting added as an explicit step

Page 4: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

4 KFPA Critical Design Review – Fri., Jan. 30, 2009

M&C FilesBackend IFGO Antenna LO1 Rcvr calib.

Data Capture

Calibration Database

Total Power CalibrationTP and Tsys spectra

Default calib

ration values

Scal

Tcal

Data Editing and Flaggingstatistical; interactive

Identify “OFF” dataautomatic; interactive

Calibration

OFFs

Data Editing and Flaggingstatistical; interactive

“basketweaving” calibration

Imaging (gridding)

Data Visualization

FITS Cubes

Baseline Removal

Page 5: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

5 KFPA Critical Design Review – Fri., Jan. 30, 2009

Existing GBT Data Analysis Software

● sdfits tool produces SDFITS file – associates raw data from a backend with meta data describing the observations. DCR, SP, Spectrometer. (data capture)

● GBTIDL – recommended spectral line analysis tool. Focused on single spectra processing and analysis, not on imaging. Used to prepare the data to be imaged elsewhere. (calibration, editing)

● AIPS is used to produce images.

Page 6: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

6 KFPA Critical Design Review – Fri., Jan. 30, 2009

We can reduce k-band data now

● K-band spectrometer data calibrated and imaged using existing tools.

Page 7: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

7 KFPA Critical Design Review – Fri., Jan. 30, 2009

Missing components

● None of the steps to an image are automated.● Uses lab-measured Tcal values.● Uses a scalar Tcal without regard to any

structure in Tcal across the bandpass.● Cross-correlation (polarization) data is not

supported after the sdfits step.● Poor support for continuum data.

Page 8: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

8 KFPA Critical Design Review – Fri., Jan. 30, 2009

Missing Components continued

● Only have prototype tool for visually interacting with large amounts of data (e.g. visual flagging).

● Only prototype tools for statistically flagging or editing the data (e.g. RFI rejection).

Page 9: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

9 KFPA Critical Design Review – Fri., Jan. 30, 2009

Goals of the Prototype Pipeline

● Support KFPA commissioning● Explore new processing tools/techniques not

yet widely available in GB (vector calibration, statistical data flagging and editing, visualization, parallel processing).

● Prototype an automated pipeline – add necessary meta data to capture user intent

● Prototype tools necessary to support larger focal plane array (e.g. parallel computing)

Page 10: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

10 KFPA Critical Design Review – Fri., Jan. 30, 2009

Goals continued

● Based on prototyped tools, estimate cost associated with delivering a pipeline and necessary computing hardware to handle the expected data rates for a larger focal plane array.

● Develop these tools and pipeline infrastructure for use with data from other backends.

Page 11: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

11 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipelines

● Crude pipeline can be assembled from existing components for quick-look images.– Small modification to sdfits (data capture) to

properly capture individual feed offsets from pointing position.

– Some additional meta data to capture default image parameters and associated “off” information.

Page 12: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

12 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipelines

● Imperative for large focal plane array.– large data rates and volume

● Necessary for even a modest 7 element array. ● Useful for data from other GBT backends

– Users often end up creating partial pipelines– The NRAO archive needs this to be able to

provide more than just the raw GBT data.– Other telescopes routinely provide roughly-

calibrated data to their users – most institutions consider this the starting point of a data pipeline.

Page 13: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

13 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipelines

● Requires using a standard observing mode.– Sufficient meta data needs to be captured to drive

the pipeline (e.g. groups of scans that should be processed together, associated “off” information, etc).

● Individual components can be used outside of the pipeline – often with additional options.

Page 14: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

14 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipeline

● None of those steps is unique to the KFPA– KFPA-specific steps are likely as part of the

statistical flagging and editing component as well as in data capture.

● Components are being developed independently.– no dependencies between components

● Some components are likely to be useful interactively – especially flagging and editing.

Page 15: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

15 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipeline Design continued

● Eventually - Continuum data will be extract from the spectral line data at the appropriate point in the pipeline. This work is out-of-scope for the initial pipeline.

● Language – python– Experience with python in Green Bank– Same language used in the ALMA pipeline and

in casa.

Page 16: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

16 KFPA Critical Design Review – Fri., Jan. 30, 2009

Pipeline design, continued

● Data formats– SDFITS up to imaging step.

● Currently produced by data capture (sdfits)● Tools already exist to interact with this data.● May be necessary to split data into multiple SDFITS

files for parallel computing needs.

– Alternatives used as necessary – for speed or take advantage of existing tools – e.g. AIPS

Page 17: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

17 KFPA Critical Design Review – Fri., Jan. 30, 2009

Parallel Computing

● Most of these steps are “embarrassingly parallel” - data from individual feeds can be processed independently– exceptions: some statistical flagging and editing

and cross-correlation data – these are out of scope for the initial pipeline.

● Parallel processing will be explored during KFPA pipeline development.

Page 18: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

18 KFPA Critical Design Review – Fri., Jan. 30, 2009

Development Priorities

● Calibration– Complete GBTIDL vector Tcal and initial

calibration database work.– Design pipeline calibration database.

● Data Capture– This is the current bottleneck. Work is underway

to improve the processing speed. A new raw data format may be necessary.

Page 19: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

19 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities continued

● Data capture (continued)– ensure that feed offsets are used properly with

pointing direction to get individual feed pointings– put default calibration values into calibration

database (GBTIDL model first, pipeline model when design completed).

– Add appropriate meta information as necessary to automate data flow through the pipeline.

Page 20: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

20 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities, continued

● Pipeline design and implementation– Automate flow of data between existing

compontents.– Initially this will be a simple script triggered off

of the standard observing modes using default values and available meta information.

– It will be possible to re-run the pipeline using alternative parameters (e.g. baseline fits, additional statistical flags, interactive flagging and editing, etc).

Page 21: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

21 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities, continued

● Data Visualization– Evaluate existing tools for viewing with and

interacting with GBT data in sdfits form.● Data quality throughout the pipeline● Interactive flagging● Summer student project – 2008 – prototype data

viewer. Can do interactive flagging, not sufficiently general.

Page 22: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

22 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities, continued

● Investigate simple parallel processing options– start with existing code (sdfits)– take advantage of independence of data from

each feed– keep things simple

Page 23: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

23 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities, continued

● Statistical data flagging– Borrow from code developed by GBTIDL users– Borrow from aips++/casa autoflagger– Develop “basketweaving” equivalent for KFPA

array. ● Use (near) crossing points on sky (same feed; multiple

feeds) to adjust data.● out of scope for initial pipeline development

Page 24: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

24 KFPA Critical Design Review – Fri., Jan. 30, 2009

Priorities, continued

● Algorithm development (calibration, continuum data handling, etc). Roberto Ricci, U. Calgary.

Page 25: 1 KFPA Critical Design Review – Fri., Jan. 30, 2009 KFPA Data Pipeline Bob Garwood- NRAO-CV

25 KFPA Critical Design Review – Fri., Jan. 30, 2009

Resources

● Bob Garwood, NRAO – 1 FTE, component design and development

● Robert Ricci, U. Calgary – algorithm development