Enabling User-Oriented Data Access in a Satellite Data
Portal
Rajesh KalyanamLan Zhao
Taezoon ParkCarol X. Song
RCAC, Purdue University, West Lafayette, IN 47907
Larry Biehl PTO, Purdue University, West Lafayette, IN 47907
Outline
• Background
• Motivation
• System Design
• Data Production
• Data Subscription
• Data Delivery
• Future Work
Background
• Overview of Purdue Terrestrial Observatory (PTO)– Remote-sensing research facility
– Goes-12 GVAR, AVHRR, and MVISR sensor systems – AQUA/TERRA satellites
– Component of the TeraGrid data provider framework
• Satellite data products– Land, ocean and atmosphere data
– Provide trends on local or continental scales
– Used in climatology, hydrology, agriculture and transportation
March 20, 2007
Example MODIS Products
• Level 1A (MOD01)
• Level 1B (MOD02) with/without bowtie correction
• Geolocation (MOD03)
• Aerosol (MOD04)
• Water Vapor (MOD05)
• Clouds (MOD06)
• Atmospheric Profiles (MOD07)
• Reflectance (MOD09)
• Snow (MOD10)
• Fire Detection (MOD14)
• Ocean Color (MOD18)
• Sea Surface Temperature (MOD28)
• Sea Ice (MOD29)
• Cloud Mask (MOD35)
• Also Multiday composites of above
Note that each data set product may contain a few to many variables.
Aerosol_Type_LandAngstrom_Exponent_1_OceanAngstrom_Exponent_2_OceanAngstrom_Exponent_LandAsymmetry_Factor_Average_OceanAsymmetry_Factor_Best_OceanBackscatter_Ratio_Average_OceanBackscatter_Ratio_Best_OceanCloud_Condensation_Nuclei_OceanCloud_Fraction_LandCloud_Fraction_OceanCloud_Mask_QAContinental_Optical_Depth_LandCorrected_Optical_Depth_LandCritical_Reflectance_LandEffect_Optical_Depth_Ave_OceanEffect_Optical_Depth_Best_OceanEffect_Radius_OceanError_Critical_Reflectance_LandError_Path_Radiance_LandEstimated_Uncertainty_LandLeast_Squares_Error_OceanMass_Concentration_LandMass_Concentration_OceanMean_Reflectance_LandMean_Reflectance_Land_AllMean_Reflectance_OceanNumber_Pixels_Percentile_Land
Number_Pixels_Used_OceanOptDepth_Ratio_Small_LandOptDepth_Ratio_Small_Land_Ocean OptDepth_Ratio_Small_OceanOptical_Depth_Land_And_OceanOptical_Depth_Large_Ave_OceanOptical_Depth_Large_Best_OceanOptical_Depth_Small_Ave_OceanOptical_Depth_Small_Best_OceanOptical_Depth_by_models_oceanPath_Radiance_LandQualityWt_Critical_Reflect_LandQualityWt_Path_Radiance_LandQuality_Assurance_Crit_Ref_LandQuality_Assurance_LandQuality_Assurance_OceanReflected_Flux_Average_OceanReflected_Flux_Best_OceanReflected_Flux_LandReflected_Flux_Land_And_OceanSTD_Reflectance_LandSTD_Reflectance_OceanScan_Start_TimeScattering_AngleSensor_AzimuthSensor_ZenithSolar_AzimuthSolar_Zenith
Solution_Index_Ocean_LargeSolution_Index_Ocean_SmallStd_Dev_Reflectance_Land_AllTransmitted_Flux_Average_OceanTransmitted_Flux_Best_OceanTransmitted_Flux_LandLatitudeLongitude
Variables in MOD04 Product
Motivation
• User Requirement– Custom-tailored data configurations
– Receive continuous data updates
– Real-time or near-real-time access
• Current Systems– Impossible to generate complete range of data products
– Have to route through the support staff
– Manual process which is time consuming and error-prone
Motivation
“Web-based data configuration, subscription and delivery
system”
System Design
• Processing and Storage Backbone– PTO infrastructure
– PTO data processing cluster
– SDSC SRB middleware
• Publish-Subscribe manager– Interface between the client side and the data processing backend
– Manager user subscriptions
– Handles enabling/disabling data production
• Client side applications– Subscription interface
– Data access portal
System Design
Data Manufacturer
Pub-Sub Manager
Data Management
Portlets
PTO Satellite Ground Station- Tracking antenna
- Stationary antenna
Predicate Sharing- Tagging
SATPro Portal - GridSphere, JSR 168 portlets - Web2.0: AJAX, Tagging, RSS
Visualization- Animation, QuickView
GoogleEarth
Access- HTTP/FTP, Email,
RSS
Discovery- Metadata search
Subscription- On demand, - User controlled
SRB Data Grid- MCAT
- SRB server
PTO Processing Cluster- TeraScan system
Sub. Information Manager- User information table
- Product table- Predicate Maaper
Sub. Workflow- Web Services modules- Pre-composed workflow
Data Services- Enable/Disable data subscription
- SRB data access, query- Monitoring Component
System Design
• User-driven publish/subscribe model
– Dynamic data generation
– User specifies, controls, and receives custom-tailored data
– Continuous data updates in near-real-time
– Multiple ways to access the data
Data Production
• Data production software– SeaSpace TeraScan software
– Configuration variables
– Various projections and output formats
• On-demand data production– User choice driven production
– “configproc” file mechanism
– Automatic enabling and disabling
– scp based data transfer to SRB archive and webserver
Data Production
• Example configproc file input_directory: products/tdf/Local/modis/ndvi input_files: %yyyy.%mmdd.%hhmm.%satel.MYD_NDVI image_variable: EVI image_format: jpeg scale_range: -0.25 1.00 color_palette: modis_ndvi grid_delta: 0 boundaries: dcw.coast dcw.states max_width: 256 output_template: %yyyy.%mmdd.t_evi.jpg save_directory: products/images/modis save_files: 20??.????.t_evi.jpg
Data Subscription
• Data Subscription Components– Publish-Subscribe based subscription manager
– Subscription Interface
• Publish-Subscribe subscription manager– Simulates operation of a PubScribe system
– Implemented through an Apache Axis webservice
• Subscription Interface– Available on a web-based scientific gateway portal
– Naïve and advanced user interfaces
Data Subscription
• Advanced user interface– Requires knowledge of variables involved in data product
– Choice-list based configuration
– AJAX dynamic filtering of choice lists
– Will allow advanced configuration variables with strict logical composition rules
• Naïve user interface– Plain English description : “bimonthly composite of vegetation
data”
– Scoring mechanism for selecting possible products
– Learning mechanism for improving performance over time
– Work in progress
Data Subscription
• Predicate matching– Keyword definitions for each data product : “BIMONTHLY
COMPOSITE of VEGETATION data”
– Score captures the degree of correlation between descriptions and products
– Additional keywords are added to a list for further consideration, scores are updated based on repetition frequency
– Successful product descriptions are tagged
– Tags can be reused by other users to search for common products
Data Subscription
Data Subscription
• Subscription Manager– Subscription data management
– Receives updates from data generator
– Distributes notifications to subscribed users
– Enabling and disabling data generation
• Subscription data management– MySQL database
– Product information – product key, generation frequency, configuration variables, filename pattern, webserver path
– User subscription information – userid, product key, date range, email address
Data Subscription
• Pull-based notifications– Simpler approach
– Perl script tracks updates to data repository
– Loops through all data products based on the highest generation frequency
– Trade-off between performance and notification delays
Web Server
P1+
Config FilesPTO Cluster
UserWorkstation
Subscription Database
WS1
WS2
Monitoring Agent
Web Dir
Config Files
Remote Sensing Data
Sub Form
Ground Station
Satellite
Config Files
P1+
SeaSpace Data
Processing Daemon
Pub-Sub Manager
Data Subscription
• Push-based notifications– Requires tight integration with data generation process– Included as an entry in the configproc file– Product name argument is used to query list of users– Constraints on the execution node and environment
Web Server
PTO Cluster
Subscription Database
WS1
WS2
Web Dir
P1+
Config FilesConfig Files
Remote Sensing Data
Config Files
Sub FormGround Station
Satellite
P1+
Monitoring Agent
WS3
SeaSpace Data
Processing Daemon
Pub-Sub Manager
UserWorkstation
Data Delivery
• Http access– Users can download images off the webserver– Cannot verify if they are interested in the image– Images cannot be stored for a long time on the webserver
• RSS feed based access– Thumbnails are sent as RSS feeds when new images are
available– Users can download the actual image from the feed link based on
the thumbnail
• Data portal access of archive data– Can access archived data from the SRB server– Difficult to sift through the large number of images
RSS Feed notification
Future Work
• Future Direction– Explore advantages of standard PubScribe models– Utilise current state of the art in ontology based
methods for predicate mapping– Performance studies for scalability– Transfer data automatically to user specified location
Conclusion
“A user-oriented subscription framework that will encourage broader access from the grid
user community”
Acknowledgements
This work was made possible by the National Science
Foundation, TeraGrid Resource Partners grant OCI-0503992
References• C. Baru, R. Moore, A. Rajasekar, M. Wan, "The SDSC Storage Resource Broker," Proc. CASCON’98 Conference, 1998.
• Content Standard for Digital Geospatial Metadata” (CSDGM) Version 2 (FGDC-STD-001-1998), http://www.fgdc.gov/standards/documents/standards/metadata/v2_0698.pdf.
• Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata (FGDC-STD-012-2002), http://www.fgdc.gov/standards/documents/standards/remote_sensing/MetadataRemoteSensingExtens.pdf.
• C. Pautasso, "JOpera: An Agile Environment for Web Service Composition with Visual Unit Testing and Refactoring, " VL/HCC 2005.
• Earth System Grid (ESG), http://www.earthsystemgrid.org/.
• J. Novotny, M. Russell, O. Wehrens, "GridSphere: An Advanced Portal Framework, " EUROMICRO 2004, 412-419
• JSR 168: Portlet Specification http://www.jcp.org/jsr/detail/168.jsp.
• L. Zhao, T. Park, R. Kalyanam, S. Goasguen, "Purdue Multidisciplinary Data Management Framework Using SRB", SRB Workshop, Vol. 1, pp. 6-11, February 2006.
• LEAD Portal, http://lead.ou.edu.
• MODIS portal from the Oregon State University direct broadcast station, http://sugar.coas.oregonstate.edu/MODIS/.
• M. E. Pierce, G. C. Fox, H. Yuan, and Y. Deng, "Cyberinfrastructure and Web 2.0, " Proceedings of HPC2006, July 4 2006, Cetraro Italy.
• M. E. Pierce, G. C. Fox, M. S. Aktas, G. Aydin, H. Gadgil, Z. Qi, and Ahmet Sayar, "The QuakeSim Project: Web Services for Managing Geophysical Data and Applications, " PAGEOPH Special Issue for 5th ACES International Workshop, Island of Maui, Hawaii.
• nanoHUB, http://www.nanohub.org.
• NEES portal, http://neesforge.nees.org/projects/simportal/.
• Purdue Terrestrial Observatory, http://www.itap.purdue.edu/pto/.
• R. Kalyanam, L. Zhao, T. Park and S. Goasguen, "A Service-Enabled Distributed Workflow System for Scientific Data Processing," Proceedings of IEEE Int’l Workshop on Future Trends of Distributed Computing Systems (FTDCS’07), Sedona, AZ, March, 2007.
• SeaSpace Corporation, http://www.seaspace.com.
• U. Nambiar, B. Ludaescher, K. Lin, C. Baru, "The GEON portal: accelerating knowledge discovery in the geosciences," Workshop On Web Information And Data Management Archive, Proceedings of the eighth ACM international workshop on Web information and data management, 2006.
• Java Message Service, http://java.sun.com/products/jms
Questions?