space physics interactive data resource – spidr :dr. zhi n, mik hail dr. zhi zhi n, mik hail (ge...

23
Space Physics Interactive Data Resource – SPIDR Mikhail ZHIZHIN (Geophysical Center Russian Acad. Sci.) Eric KIHN (National Geophysical Data Center NOAA) Dmitry MEDVEDEV (Geophysical Center Russian Acad. Sci.) Rob REDMON (National Geophysical Data Center NOAA) Dmitry MISHIN (Institute of Physics of the Earth Russian Acad. Sci.

Upload: harry-crispin

Post on 15-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Space Physics Interactive Data Resource – SPIDR

Mikhail ZHIZHIN (Geophysical Center Russian Acad. Sci.)Eric KIHN (National Geophysical Data Center NOAA)Dmitry MEDVEDEV (Geophysical Center Russian Acad. Sci.)Rob REDMON (National Geophysical Data Center NOAA)Dmitry MISHIN (Institute of Physics of the Earth Russian Acad. Sci.)

50 years ago – International Geophysical Year – IGY1957

Mail

Sun and space

Sun and space

Solid Earth

Meteo

Solid Earth

Satellites

Meteo

Meteo

Solid Earth

World Data Center A

World Data Center B

World Data Center C

Total data volume ~ 1 Gb

Exchange ~ 1 Mb/year

Yesterday – databases, Internet, web – Y2K

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Total data volume ~ 1 Tb

Exchange ~ 1 Gb/year

Tomorrow – Electronic Geophysical Year – EGY2007

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

Data Resource

GRID`

Total data volume ~ 1 Pb

Exchange ~ 1 Tb/year

SPIDR mission

SPIDR is a de facto standard data source on solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers.

It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet.

SPIDR can work as a fully-functional web-application (portal) or as a grid of web-services, providing functions for other applications to access its data holdings.

SPIDR databases

Currently SPIDR archives include • solar activity and solar wind data, • geomagnetic variations and indices, • ionospheric, cosmic rays, radio-telescope

ground observations, • telemetry and images from NOAA, NASA, and

DMSP satellites.

SPIDR database clusters and portals are installed in the USA, Russia, China, Japan, Australia, South Africa, and India.

SPIDR components

Virtual Community ofRegistered Users

Virtual ObservatoryMetadata

VirtualData Sources

Authenticate

Find event

Get data

User results

queries

Web Portal:Workflow, Data Ingest, Mining,

Visualization and Delivery

SPIDR portal combines the central XML metadata repository with a set of distributed data web services and data file collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and plot or download in parallel the selected data in different formats, including XML and NetCDF.

Metadata catalog of data services

Selections from different data services plotted in parallel

Satellite orbits navigator

FTP data file repository viewer

Data service: common data model serialization + URL

SpidrClient

Datafile

Subsetting

Formatting

WS DataService

Databases

Datafile URL

Save to disk

localcopy of Datafile

Local filename

Download

Remote SPIDR server

Local user workstation

SQLData request

All grid data services in SPIDR share the same Common Data Model and compatible metadata schema.

Local and/or remote data service: output data stream

Servicecontainer

Servicecontainer

Data service

SPIDR WSclient

SOAP

SOAP JDBC

JDBCSPIDRWeb application

SPIDRWeb application

Table 1

Table 2

Common Data

Model

Common Data

Model

S

OA

PData

service

Local databasevia JDBC

Remote databasevia Web Service

It is possible at the same time to use a local data source with JDBC protocol and a remote data service with SOAP protocol. The type of protocol is defined by the SPIDR configuration.

Data upload and synchronization: input data stream

FileClient

Datafile

Loader

Parser

WS FileService

Databases

Datafile

localcopy of Datafile

local Filename

Upload

Remote SPIDR server

Local user workstation

Loader options

Loading log

Web Service

Mirror SPIDR server

Sync

A database administrator can upload new files into the SPIDR databases using the web services directly or through the web portal. SPIDR databases are self-synchronizing via the web services.

SPIDR metadata “compromise”XML database (high level, low-granularity metadata) =

Virtual Observatory (VxO)– Hierarchy of the data categories, key words, textual descriptions– Methods and credentials to access the data (web-service, ftp-

directory)– User Forum for data quality and usability support

SQL database (low level, high-granularity metadata) = Data Inventory– Parameters (name, physical meaning, units of measurement,

virtual formula) or database schema– Availability and accreditation of the data (inventory)– Visualization details (type of the plot and coordinate system,

scales, labels)– Input-output formats

High-level metadata search

Low-level database inventory

Simplistic for novice usersto be driven by Guru Advanced user interface

System administrator interface

SPIDR usage tutorial Data description and help

Different workflows and interfaces for different User groups

SPIDR homepagehttp://spidr.ngdc.noaa.gov

Real-time usage statisicsfor a given time interval

User sessionsper day

Total ~20 000registered users

Per database requests for plot (red) and export (blue)

IMFKpDst

10.7 cm FluxHPI

MagnetometerGOES

AMIE

TIEGCM

MSMHigh Lat Elec

Geostationary Magnetic Field, Kp

Init Conditions

SWRDATA

TEC, FoF2,Neutral Winds

Magnetic, Electric Potential, Etc.

Particle Data

Input: ground and satellite data from SPIDR data services

Space weather numerical models

Output: high-resolution rendering of the near-Earth space

Numerical modeling on the Grid:Space Weather Reanalysis - SWR

SWR Computer Resources• 768 Intel Pentium 4 Xeon Nodes (Dual 2.2

GHz Processors) • Myricom Myrinet CLOS64 (2.4 Gbs)   • ADIC Fileserve MSS (100 Tbytes)  • NGDC was the #2 JET user for 2004-2005• The SWR consumed 400,000 + CPU Hours• The SWR has produced over 2.5 Tb data,

this exceeds all of NGDC’s non-satellite holdings!

JET SupercomputerFSL/NOAA, Boulder

The SWR requires a tremendous array of computer support in order to meet its goals. Challenges include sufficient CPU power, integrating distributed model runs, and storage space for input and output data sets. The SWR project makes use of shared time on FSL’s JET supercomputer as well as RAID and Tivoli based storage systems at NGDC NOAA

SPIDR integration with VxO and Grid infrastructure

Web Middleware: Tomcat

VxO Application Layer

Grid Middleware: OGSA-DAI

Metadata Services DataSource Services

SPIDR ConenctionManager

XML DB ConnectionManager

AMIE Model ConnectionManager

SQL DB cluster:MySQL

nativeXML DB:eXist

ModelAnalysis Services

Parallel-AMIE on computer cluster

Two reasons to move to the Grid middleware:

1. The digital certificates for security and authentication simplify inter-site communication

2. Processing large environmental archives requires asynchronous web-services call mechanism

Some conclusions

• Grid (web) data services accessible from SPIDR portal and a number of clients in Java, C#, Matlab, MS Excel

• Near-real time IMF, ionosphere and geomagnetic data input streams

• Data accreditation, FTP file depositary synchronous with the database

• Metadata service with high-level data description and low-level data inventory

• Virtual Observatory and User Community functionality: forum, bookmarks, i-mail, external metadata services

• Integration with Web Map Services• “Fork” of the SPIDR-based data resource on solid Earth• “Proprietary” SPIDR common data model becomes

limiting, need generic like NetCDF • SPIDR as a resource on the Space Physics Grid