overview introduction missions evolution philosophy software components and subsystems: > rdbms...

Post on 16-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Overview

IntroductionMissionsEvolutionPhilosophySoftwareComponents and Subsystems:

> RDBMS> Ingest> VDC/Scheduler> Distribution

Scientific SupportBrowser

Ocean Data Processing System (ODPS)

Introduction

The ODPS is an automated data system that provides ingest, processing, archive, and distribution functions for legacy, operational, and future remote-sensing satellite missions.

Legacy Missions:

> CZCS Oct 1978 – Jun 1986

> OCTS Nov 1996 – Jun 1997

Operational Missions:

> Aqua-MODIS Jul 2002 -

> MERIS Mar 2002 -

> SeaWiFS Sep 1997 -

> Terra-MODIS Feb 2000 -

Future Missions:

> Aquarius

> Glory

> NPP VIIRS

Evolution

Originally developed between 1991 and 1996 to support SeaWiFS

Support for OCTS added in 1996

Delivered to MODIS project to serve as the MODIS Emergency Backup System (MEBS) in 1997

Complete system redesign and rewrite 2003-2004

Delivered to GISS in 2008 to support Glory mission

Multiple evolutionary cycles in response to changes in hardware infrastructure and support-function requirements

> Began on early multi-processor SGI IRIX systems> Ported to Linux in 2000> Processing concurrency increased from 30 to over 500> Distribution functions added in 2004> Storage evolution> Validation targets

Philosophy

Adaptive framework that allows any standalone program to be incorporated as a system job

Loosely coupled, modular subsystems> Ease of maintenance> Development and testing alongside production> Subsystem swapping

Standardized coding practices minimize impact of operating-system upgrades> SGI IRIX to Linux> 32-bit to 64-bit> Strict GSFC IT requirements necessitate more-frequent OS updates

Software lifecycle of requirements analysis, rapid-prototype development, and refinement allows new concepts to be quickly developed and adopted for operational use

> Data subscriptions and orders

Ingest and Distribution Statistics

ODPS currently manages over 20 million files in its archive, about 1.06 petabytes

Daily ingests:576 MODIS-L0 granules, 120 GB (60 GB each for Aqua and Terra) 2 SeaWiFS recorder dumps, 200 MB each 2-3 SeaWiFS HRPT (direct broadcast) passes, 50 MB each 5-6 MERIS-L1 granules, 1 GB each

Distribution (Oct 2010):978 orders; 650,786 files; 5.2 TB473 active subscriptions; 576,346 files staged

Proprietary Software

RDBMS

Sybase Adaptive Server Enterprise 15.0.3

Sybase Open Client CT Library

Sybase Transact-SQL

Processing

IDL (limited use)

Open Source Software

Framework

GCC 4.x

Perl 5

Perl DBI module with Sybase driver

OpenMotif 2.x

Bash

Image Generation

GMT

ImageMagick

NetPbm

Octave

Version Control

Subversion

Software

Subsystems

VDC/Scheduler

DataAcquisition

andIngest

ArchiveDevice

Manager

DataDistribution

RDBMS

FileManagement

andMigration

Level-3Scheduler

Primary element that manages all system activity

Core databases support generic system framework, data ingest, processing, file management, and distribution functions

Mission databases house mission-specific data and procedures

High level of reuse possible for similar missions, i.e. MODIS Aqua/Terra, SeaWiFS, and OCTS are ocean-color missions and have similar product suites, data flows, and processing requirements

Database and transaction-log dumps performed regularly and stored in three different locations

Clone of database-server hardware and OS maintained as a warm backup

Components and Subsystems: RDBMS

Components and Subsystems: RDBMS

Admin Catalog Dataflow Processing

Generic Core Databases

MODISAqua

MODISTerra

OCTS SeaWiFS

Mission-Specific Databases

NewMission

CZCS Aquarius VIIRS

Components and Subsystems: RDBMS

RDBMS

Vendor Client LibraryVendor Library Module

Database Services Layer

C Interface FunctionsPerl DBI Module

PerlScripts

CPrograms

Goal: Isolate RDBMS from system software

To use a differentRDBMS vendor, swap

in a new DatabaseServices Layer

Data types and sources are described in the database

Active, passive, and periodic notification methods> Active method scans remote systems for new files> Passive method handles messages for new files> Periodic method schedules transfers of files at specified intervals

File transfers performed by ingest daemons and scheduler tasks

FTP, RCP, SCP, SFTP, and HTTP transfer protocols supported

Generic file transfer process hands off to data-specific post-transfer scripts

Subsystems: Ingest

Ingest: Flowchart

Visual Database Cookbook (VDC)> Prototype developed in 1991> Four separate programs> Originally a distributed model

Runs in a daemon-like state on each server on which processing or supporting jobs need to run

Two main functions:

Task Scheduler – Run high-level jobs (tasks) that support a variety of system functions

Processing Engine – Run processing streams, typically scientific programs, sequenced into steps such as L0->L1, L1->L2, etc

Greedy client model adapted in 2004

Unification of task scheduler and processing engine in 2007

Subsystems: VDC/Scheduler

VDC Function: Scheduler

Primary system element responsible for coordinating most of the system activity

Monitors task records in a to-do list database table and runs tasks according to defined attributes

> Manual> Periodic> Timed> Triggered

Standard job-shell interface allows new programs to be quickly adapted for Scheduler control

Tasks may be bound to specific hosts or claimed by any available host in the processing group

VDC Function: Scheduler

VDC/Scheduler

DailyTask

Scheduler

TaskShell

Tasksfor thecurrent day

DailyTasks

To-doList User input via

SCHEDMON GUI

SCHEDMON GUI

VDC Function: Processing Engine

Scalable infrastructure for concurrent processing of serial streams (e.g. L0 -> L1A -> L1B -> L2)

Each instance of the VDC Engine actively competes for jobs that it is allowed to run based on priority, length of time in the queue, and processing weight

Uses recipes to encapsulate data-specific processing schemes, parameters, and pre-processing rules

Virtual Processing Units (VPUs) serve as distinct processing resources and are allocated based on available time, current OS load, and processing weight

Comprehensive processing priorities allow high-priority real-time data to be handled ahead of lower-priority processing

Standard job-shell interface allows new scientific programs to be quickly adapted as recipe steps

Captures system boot time and monitors OS load

Invokes recipe steps and monitors step-execution time

Handles operator-requested stream actions

Performs flushing operations on completed tasks and streams

VDC Function: Processing Engine

Runs in a daemon-like state

Polls jobs in the processing queue and runs the pre-processing rule procedures

Promotes job status when all rule procedures complete successfully

Governed by currently configured processing priorities

Primarily used for matching proper ancillary data with granules in the processing queue

VDC: Rule Manager

Polls processing queue for jobs that have met pre-processing requirements

Generates VDC job files from recipe templates according to configured priorities and populates the VDC queue

Runs as a Scheduler task, so it can easily be configured to run as often as needed to keep the VDC queue full

VDC: MakeVDC

VDC: Flowchart

VDCMON GUI

Interactive, web-based Data Ordering System, currently supporting Aqua and Terra MODIS, CZCS, OCTS, SeaWiFS

Data Subscription System, currently supporting Aqua and Terra MODIS and SeaWiFS, allows users to define region and products of interest

Order and Subscription Manager daemons poll the order and subscription queues and stage files on FTP servers (stage rate ~12 GBs / hr)

Near-real-time data extraction and image support

Web-CGI applications that allow users to view and update their orders and subscriptions

Subsystems: Distribution

OrderManager

Users

SubscriptionManager

Extractionand

MappingRecipe

Local DistributionServers

Users

Data and images optionally pushed to users

RegionalExtractionand MapRequests

DataSub-

scriptions

DataOrders

Distribution: Flowchart

Scientific Support

24/7 operational support for forward-stream processing

> 9-to-5 staffing

> Extended lights-out periods

> No unscheduled down time in past year due to system-software faults

Support algorithm/calibration testing alongside production

> Product suites

> Test recipes

> Alternate tags in science-software repository

> Processing priorities

Non-standard processing requests

> Regional L3 processing

> Great Barrier Reef research

> Mozambique Whale Shark research

> GMT Intermediate Coastline

> Aquarius Simulation

OceanColor Web

oceancolor.gsfc.nasa.go

v

Consolidated data access, information,services andcommunityfeedback

OceanColor Web

oceancolor.gsfc.nasa.go

v

Consolidated data access, information,services andcommunityfeedback

OceanColor Web

oceancolor.gsfc.nasa.go

v

Consolidated data access, information,services andcommunityfeedback

top related