space physics interactive data resource – spidr :dr. zhi n, mik hail dr. zhi zhi n, mik hail (ge...
TRANSCRIPT
Space Physics Interactive Data Resource – SPIDR
Mikhail ZHIZHIN (Geophysical Center Russian Acad. Sci.)Eric KIHN (National Geophysical Data Center NOAA)Dmitry MEDVEDEV (Geophysical Center Russian Acad. Sci.)Rob REDMON (National Geophysical Data Center NOAA)Dmitry MISHIN (Institute of Physics of the Earth Russian Acad. Sci.)
50 years ago – International Geophysical Year – IGY1957
Sun and space
Sun and space
Solid Earth
Meteo
Solid Earth
Satellites
Meteo
Meteo
Solid Earth
World Data Center A
World Data Center B
World Data Center C
Total data volume ~ 1 Gb
Exchange ~ 1 Mb/year
Yesterday – databases, Internet, web – Y2K
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Total data volume ~ 1 Tb
Exchange ~ 1 Gb/year
Tomorrow – Electronic Geophysical Year – EGY2007
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
Data Resource
GRID`
Total data volume ~ 1 Pb
Exchange ~ 1 Tb/year
SPIDR mission
SPIDR is a de facto standard data source on solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers.
It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet.
SPIDR can work as a fully-functional web-application (portal) or as a grid of web-services, providing functions for other applications to access its data holdings.
SPIDR databases
Currently SPIDR archives include • solar activity and solar wind data, • geomagnetic variations and indices, • ionospheric, cosmic rays, radio-telescope
ground observations, • telemetry and images from NOAA, NASA, and
DMSP satellites.
SPIDR database clusters and portals are installed in the USA, Russia, China, Japan, Australia, South Africa, and India.
SPIDR components
Virtual Community ofRegistered Users
Virtual ObservatoryMetadata
VirtualData Sources
Authenticate
Find event
Get data
User results
queries
Web Portal:Workflow, Data Ingest, Mining,
Visualization and Delivery
SPIDR portal combines the central XML metadata repository with a set of distributed data web services and data file collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and plot or download in parallel the selected data in different formats, including XML and NetCDF.
Data service: common data model serialization + URL
SpidrClient
Datafile
Subsetting
Formatting
WS DataService
Databases
Datafile URL
Save to disk
localcopy of Datafile
Local filename
Download
Remote SPIDR server
Local user workstation
SQLData request
All grid data services in SPIDR share the same Common Data Model and compatible metadata schema.
Local and/or remote data service: output data stream
Servicecontainer
Servicecontainer
Data service
SPIDR WSclient
SOAP
SOAP JDBC
JDBCSPIDRWeb application
SPIDRWeb application
Table 1
Table 2
Common Data
Model
Common Data
Model
S
OA
PData
service
Local databasevia JDBC
Remote databasevia Web Service
It is possible at the same time to use a local data source with JDBC protocol and a remote data service with SOAP protocol. The type of protocol is defined by the SPIDR configuration.
Data upload and synchronization: input data stream
FileClient
Datafile
Loader
Parser
WS FileService
Databases
Datafile
localcopy of Datafile
local Filename
Upload
Remote SPIDR server
Local user workstation
Loader options
Loading log
Web Service
Mirror SPIDR server
Sync
A database administrator can upload new files into the SPIDR databases using the web services directly or through the web portal. SPIDR databases are self-synchronizing via the web services.
SPIDR metadata “compromise”XML database (high level, low-granularity metadata) =
Virtual Observatory (VxO)– Hierarchy of the data categories, key words, textual descriptions– Methods and credentials to access the data (web-service, ftp-
directory)– User Forum for data quality and usability support
SQL database (low level, high-granularity metadata) = Data Inventory– Parameters (name, physical meaning, units of measurement,
virtual formula) or database schema– Availability and accreditation of the data (inventory)– Visualization details (type of the plot and coordinate system,
scales, labels)– Input-output formats
Simplistic for novice usersto be driven by Guru Advanced user interface
System administrator interface
SPIDR usage tutorial Data description and help
Different workflows and interfaces for different User groups
SPIDR homepagehttp://spidr.ngdc.noaa.gov
Real-time usage statisicsfor a given time interval
User sessionsper day
Total ~20 000registered users
Per database requests for plot (red) and export (blue)
IMFKpDst
10.7 cm FluxHPI
MagnetometerGOES
AMIE
TIEGCM
MSMHigh Lat Elec
Geostationary Magnetic Field, Kp
Init Conditions
SWRDATA
TEC, FoF2,Neutral Winds
Magnetic, Electric Potential, Etc.
Particle Data
Input: ground and satellite data from SPIDR data services
Space weather numerical models
Output: high-resolution rendering of the near-Earth space
Numerical modeling on the Grid:Space Weather Reanalysis - SWR
SWR Computer Resources• 768 Intel Pentium 4 Xeon Nodes (Dual 2.2
GHz Processors) • Myricom Myrinet CLOS64 (2.4 Gbs) • ADIC Fileserve MSS (100 Tbytes) • NGDC was the #2 JET user for 2004-2005• The SWR consumed 400,000 + CPU Hours• The SWR has produced over 2.5 Tb data,
this exceeds all of NGDC’s non-satellite holdings!
JET SupercomputerFSL/NOAA, Boulder
The SWR requires a tremendous array of computer support in order to meet its goals. Challenges include sufficient CPU power, integrating distributed model runs, and storage space for input and output data sets. The SWR project makes use of shared time on FSL’s JET supercomputer as well as RAID and Tivoli based storage systems at NGDC NOAA
SPIDR integration with VxO and Grid infrastructure
Web Middleware: Tomcat
VxO Application Layer
Grid Middleware: OGSA-DAI
Metadata Services DataSource Services
SPIDR ConenctionManager
XML DB ConnectionManager
AMIE Model ConnectionManager
SQL DB cluster:MySQL
nativeXML DB:eXist
ModelAnalysis Services
Parallel-AMIE on computer cluster
Two reasons to move to the Grid middleware:
1. The digital certificates for security and authentication simplify inter-site communication
2. Processing large environmental archives requires asynchronous web-services call mechanism
Some conclusions
• Grid (web) data services accessible from SPIDR portal and a number of clients in Java, C#, Matlab, MS Excel
• Near-real time IMF, ionosphere and geomagnetic data input streams
• Data accreditation, FTP file depositary synchronous with the database
• Metadata service with high-level data description and low-level data inventory
• Virtual Observatory and User Community functionality: forum, bookmarks, i-mail, external metadata services
• Integration with Web Map Services• “Fork” of the SPIDR-based data resource on solid Earth• “Proprietary” SPIDR common data model becomes
limiting, need generic like NetCDF • SPIDR as a resource on the Space Physics Grid