naasc data processing capabilities (including reprocessing scope)

16
NAASC data processing capabilities (including reprocessing scope) Mark Lacy Data Services Lead, NAASC, NRAO ANASAC 13-14 Sept 2010

Upload: edena

Post on 25-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

NAASC data processing capabilities (including reprocessing scope). Mark Lacy Data Services Lead, NAASC, NRAO. NAASC Data Services. Data services group formed within the NAASC (other groups are User Support Services [Brogan] and JAO support [Hibbard]). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NAASC data processing capabilities (including reprocessing scope)

NAASC data processing capabilities (including reprocessing scope)

Mark LacyData Services Lead, NAASC, NRAO

ANASAC 13-14 Sept 2010

Page 2: NAASC data processing capabilities (including reprocessing scope)

ALMANAASC Data Services• Data services group formed within the NAASC (other groups

are User Support Services [Brogan] and JAO support [Hibbard]).

• Goal: to provide processed ALMA data and the tools to analyze it to NA users.

• Responsibilities:– NA ALMA archive and user portal, including VO and

interaction with VAO LLC– Splatalogue– Simdata– Pipeline (implementation)– “Advanced tools” (e.g. data cube visualization and

marginalization).

ANASAC 13-14 Sept 2010

Page 3: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

Overview of ES processing• JAO plans to process all Early Science (ES) data in

order to perform Quality Assurance (QA2)• Processing at SCO will be performed using

desktop machines. • Tests indicate that these will be able to deal with

ES data rates (expected to be ~20TB/year. ~1/10th of Full Science, but with ramp-up near end).

• NAASC will provide reprocessing capabilities for NA users.– Already getting experience with CSV data

processing through NAASC SV “Tiger team”

Page 4: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

Details of NAASC processing plans in ES• We have recently written a computing plan

for the NAASC covering ES through operations. – A small cluster (~4-12 nodes), forerunner of

NAASC pipeline machine will be built up slowly, based on EVLA experience.

– In addition, we will purchase desktop machines for visitor use and evaluation, with the aim of producing recommendations to users for offline processing.

– We will thus have the ability to perform reprocessing of all NA data.

Page 5: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

NAASC cluster - ES

Page 6: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

NAASC cluster - operations

Page 7: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

How users will reprocess• Option 1: Come to the NAASC and use the cluster

through a login on an NRAO desktop machine (or the desktop directly for small datasets).

• Option 2: Use VNC from their home institution to login to the cluster.

• Option 3: Submit a pipeline job remotely to the NAASC via a webpage.

Which we do will depend on the level of support and interaction with the data that is required. Likely to begin with option 1 and move to option 3 as algorithms for e.g. automatic flagging improve, with option 2 as a backup.

(Also likely to have ASDM to MS conversion implemented for users getting their data from the archive.)

Page 8: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

Getting the data to NA• Baseline plan is disk shipment for bulk data, but

we will attempt to take advantage of improved links to Chile required by NOAO for DES and LSST.

• Have AUI/AURA agreement to share fast data link Chile to Florida Intl University (10Gb/s).

• Thereafter data travels via Internet 2 to Charlottesville/UVa

• Should be adequate to move both bulk data and metadata without requiring shipping of disks.

• Archive replication tests to begin next year.

Page 9: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

NAASC and related software systems• Splatalogue

– Currently concentrating on documentation and database enhancement.

– Future plans include improvements to usability (new front end).– Plan to make Splatalogue an “official” ALMA software project,

working on a Splatalogue memo to ALMA describing the database and the plan for management and maintenance.

• Simdata (task in CASA)– Simdata now largely complete, including single dish capability

(in collaboration with NAOJ).– CASA code freeze Sept 17th prior to October release.– Working on new ES examples.

simdata will allow us to demonstrate the limitations of the ES array both in terms of sensitivity and dynamic range/uv-coverage

Page 10: NAASC data processing capabilities (including reprocessing scope)

ALMAExample: ALMA Band 6 deep pointing

9x8hr 234GHz ALMA track in continuum.Simulated using Oxford S-cubedsimulations (Obreschkow & Rawlings 2009) for the model and simdata2 in CASA for the “observation”

Model Early Science (16 ants)

Full Science (50 ants)

Page 11: NAASC data processing capabilities (including reprocessing scope)

ALMACASA/pipeline performance• CASA currently has similar speed to other packages for ~ 10 GB datasets

except for a few high nails being aggressively pursued (flagging, plotting)• CASA’s architecture has been written with parallelization in mind

• Channelization of radio data makes the problem “embarrassingly parallelizable”

• However, particularly for imaging, the problem is I/O and not CPU limited making the problem trickier (~60:40 I/O:CPU).

• Pursuing mitigation through hardware solutions (fast file systems e.g. Lustre with Infiniband interconnect), and software solutions (improving i/o efficiency in code).

• Nevertheless, parallelization efforts of highest current risk and priority• Release of multi-core CASA functionality will be staged so that

functionality becomes available for pipeline testing and the community as soon as possible • Simple imaging (single field or simple mosaic cube) well progressed,

expected for October 2010 release• Multi-core flagging and more imaging cases (multi-frequency synthesis

continuum) expected June 2011

Page 12: NAASC data processing capabilities (including reprocessing scope)

ALMACASA development Priorities• Support of ALMA and EVLA commissioning needs• Parallelization and cluster fine-tuning for imaging and flagging

• Working on combining Torque resource manager with Python scripting in CASA

• Improvements needed for polarization calibration of linear feeds• Improvements to calibration table plotting (incorporate into plotms)• Planet models for use as resolved calibrators• Splatalogue search capabilities (including offline database) and

overplotting• Viewer improvements (especially for spectral line plotting and analysis)• Improvements to image analysis tasks• Improvements to “TV” based flagging in the Viewer (on-the-fly spectral

and time averaging)• A CARMA miriad filler (through partnership with Peter Teuben at U.

Maryland)• Expanded and more modularized simulation capabilities.

Page 13: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

NAASC advanced tools• The NAASC staff will push some of the

ALMA-related software development items as Splatalogue & Simdata reach completion. We will also be hiring an additional developer.– For example, image cube visualization and

analysis are areas which will likely require work.– Can’t do this all ourselves, so will aim to be

responsive to community suggestions and contributions, incorporating some into CASA and posting others as “contributed software”.

Page 14: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

Summary• Within 1 year expect significant data from

ALMA, comparable data rate to that from e.g. HST, Spitzer.

• Within 3 years, data rate will exceed by more than an order of magnitude that from any other PI-driven telescope apart from the EVLA.

• Must continue to be focused on the challenges and opportunities this presents.

Page 15: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

Backup slides

Page 16: NAASC data processing capabilities (including reprocessing scope)

ALMA

ANASAC 13-14 Sept 2010

CASA tutorial examples

3C391 polarization (EVLA)

M99 moment maps (CARMA)