san diego supercomputer center, ucsd scir&d a scalable system for online access to national and...
Post on 20-Dec-2015
216 views
TRANSCRIPT
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
A SCALABLE SYSTEM FOR ONLINE ACCESS TO NATIONAL AND LOCAL
REPOSITORIES OF HYDROLOGIC TIME SERIES
Ilya Zaslavsky, Reza Wahadj, David Valentine, Blair Jennings
(San Diego Supercomputer Center, UCSD)David Maidment
(CRWR, UT-Austin)and other HIS development partners
from UT-Austin, Utah State U, Drexel U, Duke U
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
The Grid is becoming the backbone for collaborative science and data sharing
CI is about RE-USING data and research resources !!
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
CI Vision for Hydrologic Science
• Leverage ongoing cyberinfrastructure projects:• Geosciences Network (GEON)• Share data between Earth Disciplines• Secure access to Grid resources, single sign-on authentication/
authorization, distributed data management, data publication, search, information integration, knowledge management, scientific workflows, archiving
• Integrate with common COTS (commercial off-the shelf) software: • Excel, ArcGIS, Matlab… • and Fortran … mostly on Windows… • Interesting survey of CUAHSI partners by David Tarboton!
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
HIS User Assessment (Chapter 4 in Status Report)
Data Access
Science Observatorysupport
Education
Which of the four HIS goals is most important to you?
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
Tuning to unique features of hydrology
• Hydrologic observations:• Reliance on federally-organized data collection (NWIS, STORET,
Ameriflux, etc.) with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership
• Handling time in both UTC and local• Various spatial offsets• Multiple data types: time series, fields, spatial data
• Integrative discipline:• Interoperation with atmospheric, ocean, soils, geomorphology, social
datasets and services…• Community:
• Organized by “natural boundaries” natural object hierarchy networks of relatively autonomous self-managed data nodes
• Partnership with public sector water management
ontologies
SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SciR&D
Problems
• Microsoft and .NET vs Linux and J2EE
• Open source vs proprietary
• Free vs not free
Open architecture, web services,
well-defined interfaces
Main Components• Web services for
accessing hydrologic repositories
• Hydrologic Observations Data Model
• Hydrologic Data Access System + Time SeriesViewer
• Collection of CUAHSI nodes
NWISNWIS
ArcGISArcGIS
ExcelExcel
NCARNCAR
UnidataUnidata
NASANASAStoretStoret
NCDCNCDC
AmerifluxAmeriflux
MatlabMatlabAccessAccess SASSAS
FortranFortran
Visual BasicVisual Basic
C/C++C/C++
CUAHSI Web ServicesCUAHSI Web Services
NWIS
Matlab ArcGIS Excel Web browser
Application services: analysis, mapping, charting, models,workflow, integration (8)
Data registration/Search/Query rewriting & orchestration(6)
NAWQA STORET . . .
Data Nodes
External data resources registrymetadata
We b
ser
vices
regi
stry
and
rela
ted
serv
ices
(10)
Hosted dataservices (5)
Fortran/C/VB/Java codes
Data Node Data Node
Core grid services: monitoring nodes, scheduling,data transfer, replication, collectionmanagement,…(1)
Resource drivers (2)
Service consumers
User registration/authentication/authorization (9)portal
SensorsSensors SensorsSensors
Sensor management services (3)
Sensor data filtering (4)
Ontology source andservices (7)
RServer
ArcGIS Server
Conversion engine
Certificateauthority
Data Node Data Node
3
2
1
NWISNWIS
ArcGISArcGIS
ExcelExcel
NCARNCAR
UnidataUnidata
NASANASAStoretStoret
NCDCNCDC
AmerifluxAmeriflux
MatlabMatlab
AccessAccess SASSAS
FortranFortran
Visual BasicVisual Basic
C/C++C/C++
Some operational services
CUAHSI Web ServicesCUAHSI Web Services
Data SourcesData Sources
ApplicationsApplications
Extract
Transform
Load
http://www.cuahsi.org/his/
Database Sizes
EPA
NWS
USGS
Records
200 million
?
Stations Time range
250 million
800,000 100 years
1.5 million 100 years
100 years19,000
(From Jon Goodall, Duke U.)
Language for Data Representation
EPA
NWS
USGS
Unique Identifier for a Observation Station
site_no
Station ID
COOPID
Latitude, Longitude
Time of Measurement
Station Latitude, Station Longitude
Activity Start
dec_lat_va, dec_long_va
dv_dt
YEAR,MO,DA,TIME LATITUDE, LONGITUDE
Lots of semantic differences in parameter names, methods, etc.
Core Web ServicesService Input Output
GetSites Obs Network, filter Get station codes in network
GetSiteInfo Station Code Lat/long, station name
GetVariables Obs Network or data source, filter
Get variable codes
GetVariableInfo Variable code Description of variable
GetValues Station code or lat/long point, variable code, begin date, end date
A time series of values
GetChart As for GetValue A chart plotting the values
CUAHSI Web Serviceshttp://www.cuahsi.org/his/webservices.html
NCEP North American Forecast Model 12 Km grid for continental US
CUAHSI Point HydrologicObservations Data Model
• A relational database stored in Access, PostgreSQL, MS SQL Server, ….
• Stores observation data made at points
• Consistent format for storage of observations from many different sources and of many different types.
Streamflow
Flux towerdata
Precipitation& Climate
Groundwaterlevels
Water Quality
Soil moisture
data
(D. Tarboton, USU)
Community design requirements(22 reviewers)
Feature
Waterbody
HydroIDHydroCodeFTypeNameAreaSqKmJunctionID
HydroPoint
HydroIDHydroCodeFTypeNameJunctionID
Watershed
HydroIDHydroCodeDrainIDAreaSqKmJunctionIDNextDownID
ComplexEdgeFeature
EdgeType
Flowline
Shoreline
HydroEdge
HydroIDHydroCodeReachCodeNameLengthKmLengthDownFlowDirFTypeEdgeTypeEnabled
SimpleJunctionFeature
1HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
*
1
*
HydroNetwork
*
HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
1
1
CouplingTable
WaterID (GUID)HydroID (Integer)
MonitoringPoint
WaterIDHydroCodeNameLatitudeLongitude…
Hydrologic Observations Data Model
1
1
OR
Independent of, but coupled to Geographic Representation
HODM Arc Hydro
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
Uses and tools for HODM
• HODM is central to HIS infrastructure, but lacks tools• Testing HODM with two types of data: federal repositories, and
external databases (Panola). Personal and enterprise versions.• Mapping wizard: loading
Excel observation data to HODM database:• Can save mapping files
for subsequent runs of similarly formatted spreadsheets
• Local data analysis can be done: charts and stats
• HDAS as an interface to HODM datasets - but shall not be the only one - so exposing HODM as Web services
SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SciR&D
Hydrologic Data Access System
http://river.sdsc.edu/hdas/
Cross-platform design Central CUAHSI HIS Node (Windows) GEON Data Node (Linux)
Data
Apache TomcatIIS Web Server
ASP . Net
Geon Software Stack
SQL Server
Proxy
ArcGIS
Technologies
HDASHODM
Web
ServiceWeb
Services
Web Serviceproxies
Data
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb
Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb
Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb
Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb
Services
Web Serviceproxies
Remote CUAHSI
HIS Nodes (Windows)
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
HIS Scalability• Adding…
…data types and datasets; processing models and services; servers; users and roles –
- shall not create unmanageable bottlenecks that require system re-engineering
• Designing for scalability:• Distilling a generic set of web service signatures; resolving
semantic and structural heterogenities• Using HODM as a common generic format for time series
data, for ease of coding and uniform search interfaces• HDAS GUI design to abstract specifics of disparate
repositories• Leveraging common CI components developed in GEONNeed to work with agencies to remove web services
bottleneck
SAN DIEGO SUPERCOMPUTER CENTER, UCSDSciR&D
Future Work
• Updating and standardizing web services; services against additional repositories
• Adopting HODM for storing time series observations, and developing tools for loading data, querying, analyzing and visualizing data in HODM
• Finalizing the Windows-based CUAHSI Node, and preparing it for distribution, along with documentation
• “Digital Watershed” conceptualization