origami and gips: running a hyperspectral sounder ...work order among nodes cluster nodes run job,...

1
Integration of Origami and GIPS drove the creation of an XML interface for algorithms hosted by the system Science user-provided files Task Dispatcher Algorithm Source Code XML interface description Algorithm Interface files interface file generator Algorithm Compile Algorithm Binaries Deployment script Deploy User chooses desired product and searches for input data Dispatcher process on cluster distributes work order among nodes Cluster nodes run job, access raw data on SAN User is notified of completed work order Launch visualization of product (eg. McIDAS V) SAN Cluster Metadata server Centralized Computing resources User visualization workstation Origami: Step-by-step Origami by-the-numbers: Workflow for processing and visualizing large volumes of data At the heart of GIPS is an algorithm pipeline that produces calibrated radiances from raw imaging interferometer data and is a candidate heavy- weight ground processing task Separate by band, (and optionally by scan direction and pixel) Initial FFT (Ifg to Spectrum) Stage (optional) Nonlinearity and tilt correction Nonlinearity study Stage (research) Radiometric Calibration Stage Radiometric Calibration Study Stage Spectral Resampling Stage Spectral Calibration Study Stage (research) Pipeline Reference Database Instance(s) Retrieve records for time, band, scan direction, and (optionally) pixel index of incoming observations Stream of incoming observations (interferograms) with metadata Bundle reference (audit) metadata with outgoing calibrated spectra Interferograms Complex Spectra Complex Spectra Real Calibrated Spectra Real Calibrated Resampled Spectra create context from blackbody observations create context from blackbody observations use context to apply correction use context to calibrate use context to calibrate Data Metadata Component level architecture of the distributed processing system contributes to individual buy vs. build decisions. McIDAS-V MATLAB, IDL GIPS GEOCAT, LEOCAT XSAN Sun Grid Engine OpenPBS • data provenance tracking • scientific workflow globus and other grid technologies Workstation Service Layer Computing Cluster • metadata management • queueing and resource management • computational payload • data discovery • visualization • data storage management Storage System THREDDS Custom databases Web portals SRB LUSTRE Individual binaries Storage area networks: Federated data service: Distributed Volumes: custom scripts The scientific workflow need we are addressing involves the retrieval, processing and visualization of large volumes of data, often on the order of Terabytes, associated with hyperspectral sounder-imager instruments. Science workstations, compute clusters and storage area networks are the hardware components available to us. Likewise, certain commodity software packages for visualization, queueing, data storage management and grid resource management are readily available. While there are also off the shelf science workflow management and scientific programming packages out there, these are often proprietary and impose serious architectural constraints on the rest of the system. Processing raw data from the Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) instrument was the principal motivation for this work. Initially the GIFTS Information Processing System (GIPS) was meant to cover various aspects of distributed computing as well as hosting the GIFTS algorithms. Eventually, it was down-scoped to handle provisioning of the algorithms with appropriate data and metadata. This form of GIPS is suitable for running on a single compute node, and the need arose to find a distributed computing solution to match. Origami is the outcome of a parallel effort in distributed computing at the SSEC - it is an experiment in rapid prototyping of distributed science workflows using lightweight web components. As such, it can be extended to serve as a glue technology between GIPS running on individual nodes, and other components of a true distributed computing system. To make GIPS compatible with an external distribution and workflow manager such as Origami, we found that it was beneficial to convert all algorithm interfaces to a common XML format. This XML interface file would be converted to an API-level interface for compiling into GIPS, and would also be used by the Origami engine to perform workflow and scheduling decisions. This XML interface style has already spread to GEOCAT and LEOCAT, two other satellite data processing frameworks being developed at the SSEC. These frameworks utilize a fundamentally similar XML interface format as well as the suite of tools for managing it. Maciej Smuga-Otto, Raymond Garcia, Graeme Martin, Bruce Flynn, Robert Knuteson Space Science and Engineering Center University of Wisconsin - Madison ORIGAMI AND GIPS: RUNNING A HYPERSPECTRAL SOUNDER PROCESSING SYSTEM ON A LIGHTWEIGHT ON-DEMAND DISTRIBUTED COMPUTING FRAMEWORK

Upload: others

Post on 25-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ORIGAMI AND GIPS: RUNNING A HYPERSPECTRAL SOUNDER ...work order among nodes Cluster nodes run job, access raw data on SAN User is notified of completed work order Launch visualization

Integration of Origami and GIPS drove the creation of an XML interface for algorithms hosted by the system

Science user-provided files

TaskDispatcher

AlgorithmSource Code

XML interface description

AlgorithmInterface files

interface filegenerator

Algorithm

Compile

Algorithm BinariesDeployment scriptDeploy

User chooses desired product and searches for input data

Dispatcher processon cluster distributes work order among nodes

Cluster nodes run job, access raw data on SANUser is notified of

completed work order

Launch visualization of product (eg. McIDAS V)

SAN

Cluster

Metadata server

CentralizedComputing resourcesUser visualization workstation

Origami: Step-by-step

Origami by-the-numbers: Workflow for processing and visualizing large volumes of data

At the heart of GIPS is an algorithm pipeline that produces calibrated radiances from raw imaging

interferometer data and is a candidate heavy-weight ground processing task

Separate by band, (and optionallyby scan direction and pixel)

Initial FFT(Ifg to Spectrum)

Stage

(optional) Nonlinearityand tilt correction

Nonlinearitystudy Stage(research)

RadiometricCalibration

Stage

RadiometricCalibrationStudy Stage

Spectral ResamplingStage

Spectral CalibrationStudy Stage

(research)

PipelineReferenceDatabase

Instance(s)

Retrieve records fortime, band, scan direction,and (optionally) pixel index of incoming observations

Stream of incoming

observations(interferograms)with metadata

Bundle reference (audit) metadatawith outgoing calibrated spectra

Interferograms

Complex Spectra

Complex Spectra

RealCalibrated

Spectra

RealCalibratedResampled

Spectra

create contextfrom blackbody

observations

create contextfrom blackbody

observations

use contextto apply

correction

use contextto calibrate

use contextto calibrate

Data MetadataComponent level architecture of the distributed processing system

contributes to individual buy vs. build decisions.

McIDAS-V MATLAB, IDL

GIPS

GEOCAT, LEOCAT

XSAN

Sun Grid Engine

OpenPBS

• dataprovenance

tracking

• scientific workflowglobus and other grid technologies

Workstation

ServiceLayer

Computing Cluster• metadata

management

• queueing andresource management

• computational payload

• data discovery

• visualization

• data storage management

Storage System

THREDDS

Custom databases

Web portals

SRBLUSTRE

Individual binaries

Storage area networks:

Federated data service:Distributed Volumes:

custom scripts

The scientific workflow need we are addressing involves the retrieval, processing and visualization of large volumes of data, often on the order of Terabytes, associated with hyperspectral sounder-imager instruments.

Science workstations, compute clusters and storage area networks are the hardware components available to us. Likewise, certain commodity software packages for visualization, queueing, data storage management and grid resource management are readily available. While there are also off the shelf science workflow management and scientific programming packages out there, these are often proprietary and impose serious architectural constraints on the rest of the system.

Processing raw data from the Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) instrument was the principal motivation for this work. Initially the GIFTS Information Processing System (GIPS) was meant to cover various aspects of distributed computing as well as hosting the GIFTS algorithms. Eventually, it was down-scoped to handle provisioning of the algorithms with appropriate data and metadata. This form of GIPS is suitable for running on a single compute node, and the need arose to find a distributed computing solution to match.

Origami is the outcome of a parallel effort in distributed computing at the SSEC - it is an experiment in rapid prototyping of distributed science workflows using lightweight web components. As such, it can be extended

to serve as a glue technology between GIPS running on individual nodes,

and other components of a true distributed computing system.To make GIPS compatible with an external distribution and workflow manager such as Origami, we found that it was beneficial to convert all algorithm interfaces to a common XML format. This XML interface file would be converted to an API-level interface for compiling into GIPS, and would also be used by the Origami engine to perform workflow and scheduling decisions.

This XML interface style has already spread to GEOCAT and LEOCAT, two other satellite data processing frameworks being developed at the SSEC. These frameworks utilize a fundamentally similar XML interface format as well as the suite of tools for managing it.

Maciej Smuga-Otto, Raymond Garcia, Graeme Martin, Bruce Flynn, Robert Knuteson Space Science and Engineering Center University of Wisconsin - Madison

ORIGAMI AND GIPS: RUNNING A HYPERSPECTRAL SOUNDER PROCESSING SYSTEMON A LIGHTWEIGHT ON-DEMAND DISTRIBUTED COMPUTING FRAMEWORK