Web Services-Based Mediator of Distributed Data Flow and Processing
Project Coordinators:Software Architecture: R. Husar
Software Implementation: K. HöijärviData and Applications: S. Falke, R. Husar
Center for Air Pollution Impact and Trend Analysis (CAPITA)Washington University, St. Louis, MO 63130
DataFed Description
DataFed VisionBetter air quality management and science through by effective use of relevant data
DataFed GoalsFacilitate the access and flow of atmospheric data from provider to usersSupport the development of user-driven data processing value chainsParticipate in specific application projects
Approach: Mediation Between Users and Data ProvidersDataFed assumes spontaneous, autonomous emergence of AQ data (a la Internet)Non-intrusively wraps datasets for access by web servicesWS-based mediators provide homogeneous data views e.g. geo-spatial, time...
End-user programming of data access and processing through WS composition (limited)
Applications
Building browsers and analysis tools for distributed monitoring data Serve as data gateway for user programs; web pages, GIS, science toolsDataFed is currently focused on the mediation of air quality data
DataFed Multidimensional Data Model4 D Geo-Environmental Data Cube (X, Y, Z, T)
Environmental data represent measurements in the physical world which has space (X, Y, Z) and time (T) as its dimensions.
The specific inherent dimensions for geo-environmental data are: Longitude X, Latitude Y, Elevation Z and DateTime T.
The needs for finding, sharing and integration of geo-environmental data requires that data are ‘coded’ in this 4D data space – at the minimum.
Data Flow & Processing in Air Quality Management
AQ DATA
EPA Networks IMPROVE Visibility Satellite-PM Pattern
METEOROLOGY
Met. Data Satellite-Transport Forecast model
EMISSIONS
National Emissions Local Inventory Satellite Fire Locs
Status and Trends
AQ Compliance
Exposure Assess.
Network Assess.
Tracking Progress
AQ Management Reports
‘Knowledge’ Derived from Data
Primary Data Diverse Providers
Data ‘Refining’ Processes Filtering, Aggregation, Fusion
Mediator-Based Integration Architecture (Wiederhold, 1992) • The job of the mediator is to provide an answer to a user query (Ullman, 1997)
• In database theory sense, a mediator is a view of the data found in one or more sources • Heterogeneous sources are wrapped by translation software local to global language• Mediators (web services) obtain data from wrappers or other mediators and process it …
Wrapper Wrapper
Service
Service
User Query Views
Heterogeneous Data
Generic Data Flow and Processing in DataFed
DataView 1
Data Processed Data
Portrayed Data
Process Data
Portrayal/ Render
Abstract Data Access
View Wrapper
Physical Data
Abstract Data
Physical Data
Resides in autonomous servers; accessed by view-specific wrappers which
yield abstract data ‘slices’
Abstract Data
Abstract data slices are requested by viewers;
uniform data are delivered by wrapper services
DataView 2
DataView 3
View Data
Processed data are delivered to the user as multi-layer views by portrayal and overlay web services
Processed Data
Data passed through filtering, aggregation, fusion and other web
services
Anatomy of a Wrapper Service: TOMS Satellite Image Data
• Given the URL template and the image description, the wrapper service can access the image for any day, any spatial subset using a HTTP URL or SOAP protocol:
• Wrapper classes are available for geo-spatial (incl. satellite) images, SQL servers, text files,etc. The mediator classes are implemented as web services for uniform data access, transformation and portrayal.
src_img_width
src_
img_
heig
h t
src_margin_rightsrc_margin_left
src_margin_top
src_margin_bottom
src_lon_min src_lat_max
src_lat_min src_lon_max
Image Description for Data Access:
src_image_width=502 src_image_height=329
src_margin_bottom=105 src_margin_left=69 src_margin_right=69 src_margin_top=46
src_lat_min=-70 src_lat_max=70 src_lon_min=-180 src_lon_max=180
The daily TOMS images reside on the FTP archive, e.g. ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y2000/ea000820.gif
URL template: ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y[yyyy]/ea[yy][mm][dd].gif
Transparent colors for overlays
RGB(89,140,255) RGB(41,117,41) RGB(23,23,23) RGB(0,0,0)
An Application Program: Voyager Data Browser
• The web-program consists of a stable core and adoptive input/output layers• The core maintains the state and executes the data selection, access and render services• The adoptive, abstract I/O layers connects the core to evolving web data, flexible displays and to the
a configurable user interface:– Wrappers encapsulate the heterogeneous external data sources and homogenize the access– Device Drivers translate generic, abstract graphic objects to specific devices and formats – Ports connect the internal parameters of the program to external controls– WDSL web service description documents
Data Sources
Controls
Displays
I/O Layer
Dev
ice
Dri
vers
Wra
pp
ers App State Data
Flow Interpreter
Core
Web Services
WSDL
Ports
SeaWiFS Satellite
SeaWiFS Satellite
Aerosol Chemical
Air Trajectory
Map Boarder
VIEW by Web Service Composition
Air Quality Datasets
• Data are accessed from autonomous, distributed providers• DataFed ‘wrappers’ provide uniform geo-time referencing• Tools allow space/time overlay, comparisons and fusion
Near Real Time Data IntegrationDelayed Data Integration
Surface Air Quality AIRNOW O3, PM25 ASOS_STI Visibility, 300 sitesMETAR Visibility, 1200 sitesVIEWS_OL 40+ Aerosol Parameters
SatelliteMODIS_AOT AOT, Idea ProjectGASP Reflectance, AOTTOMS Absorption Indx, Refl.SEAW_US Reflectance, AOT
Model OutputNAAPS Dust, Smoke, Sulfate, AOTWRF Sulfate
Fire DataHMS_Fire Fire PixelsMODIS_Fire Fire Pixels
Surface MeteorologyRADAR NEXTRADSURF_MET Temp, Dewp, Humidity…SURF_WIND Wind vectorsATAD Trajectory, VIEWS locs.
Some of the Tools of DataFed
Consoles: Data from diverse sources are displayed to create a rich context for exploration and analysis
CATT: Combined Aerosol Trajectory Tool for the browsing backtrajectories for specified chemical conditions
Viewer: General purpose spatio-temporal data browser and view editor applicable for all DataFed datasets
Sulfate in the Northeast
Sahara Dust in the Gulf
Fires in the Southeast
Time Series Console: Southeast
Analyst Console Applications:
Sulfate Episode: 8/27/04