gads: a web service for accessing large environmental data sets jon blower, keith haines, adit...

15
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading http://www.resc.rdg.ac.uk [email protected]

Upload: miranda-jefferson

Post on 11-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

GADS: A Web Service for accessing large environmental data sets

Jon Blower, Keith Haines, Adit Santokhee

Reading e-Science Centre

University of Reading

http://www.resc.rdg.ac.uk

[email protected]

Page 2: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Background

At Reading we hold copies of various datasets (~2TB)– Mainly from models of oceans and atmosphere

– Also some observational data (e.g. satellite data)

– From Met Office, SOC, ECMWF, more

We serve these datasets to many end users– Scientists (1000s of hits per year)

– Industry (e.g. British Maritime Technology)

Datasets are in a variety of formats– netCDF, GRIB, HDF, HDF5 …

Data do not conform to naming conventions– E.g. “temp” instead of “sea_water_potential_temperature”

Page 3: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Background (2)

There is a clear need to make access to these datasets easier– Users shouldn’t have to know details of how data are stored

Hence development of GADS (Grid Access Data Service) Developed as part of GODIVA project

– Grid for Ocean Diagnostics, Interactive Visualisation and Analysis

– NERC e-Science pilot project

Originally developed by Woolf et al (2003) Allows richer queries and more flexibility than DODS

standard– Although we plan to implement a DODS translation layer

Page 4: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

GODIVA Web Portal

• Allows users to interactively select data for download using a GUI

• Users can create movies on the fly

• cf. Live Access Server

Page 5: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Advantages of GADS

User’s don’t need to know anything about storage details Can expose data with conventional names without

changing data files Users can choose their preferred data format, irrespective

of how data are stored Behaves as aggregation server

– Delivers single file, even if original data spanned several files Deployed as a Web Service

– Can be called from any platform/language– Can be called programmatically (easily incorporated into larger

systems), workflows– Java / Apache Axis / Tomcat

Page 6: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Architecture

META-DATA

DATAFILES

Metadata Manager Utility

Metadata Interface

dataQuery

dataRequest

GADS Web Service

Client

Page 7: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Metadata structure

Page 8: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

GADS Methods

dataQuery() is used for querying the data holdings– “What datasets are there?”

– “What variables are there in the dataset X?”

dataRequest() is used for downloading data– User can choose the data format

– Can easily download subsets of data

– Uses start-stride-count semantics (familiar in community)

dataRequestNatural()– Same as dataRequest() but in natural units (degrees, metres …)

Page 9: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

dataQuery – examples of use

dataQuery(dataset, variable, axis) – general form dataQuery(“”, “”, “”) – gets all dataset names in the

catalogue dataQuery(“FOAM_NINTH”, “”, “”) – gets all the

variable names in the FOAM_NINTH dataset dataQuery(“FOAM_NINTH”, “temperature”, “”)

– gets the details of the grid for the temperature variable dataQuery(“FOAM_NINTH”, “temperature”,

“z”) – gets all values that the z coordinate can take dataQuery(“”, “temperature”, “”) – gets all

datasets that contain the “temperature” variable

Page 10: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

dataRequest – example of use

dataRequest(“FOAM_NINTH”, “temperature”, “CDF”,“t”, 0, 1, 20,“z”, 0, 1, -1,“y”, 100, 4, 400,“x”, 300, 4, 600)

dataRequestNatural(“FOAM_NINTH”, “temp”, “CDF”, “t”, “2004-06-01 00:00:00”, “2004-06-22 00:00:00”, “z”, “0”, “10”, “y”, “42”, “64”, “x”, “-26”, “9”)

Returns URL to extracted dataset

Page 11: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Metadata manager (in progress)

e.g. Adding a dataset – can “harvest” metadata from netCDF file headers

Page 12: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Limitations

Assumes one timestep per file– Hence doesn’t handle timeseries well

Long queries can cause problems (synchronous)– Needs a queuing system

Rotated grids a problem (esp. for dataRequestNatural())

Could have richer metadata queries

Page 13: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Application: Search and Rescue

Search And Rescue Information System (SARIS)– British Maritime Technology (BMT)

Used by Coastguard to locate people who have fallen overboard

Runs a model using wind and surface current data– Forecasts where person will be by the time rescue arrives

By incorporating GADS, SARIS can consume up-to-date Met Office forecasts on demand.– Should improve quality of prediction

Page 14: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Spatial Databases

Database systems now including capability for storing geospatial data– IBM Informix, Oracle 10g, PostgreSQL, mySQL …

ReSC is evaluating some of these– Informix with Grid DataBlade looks promising

(www.barrodale.com) We need capability to store raster data (i.e. gridded data)

– Many only store vector data– Gotcha – some vendors use “raster” to mean “photograph”, not

“model data” We also need to store 3-D data

– Some only have native understanding of 2-D data

Page 15: GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading

Future plans

Interact more with GIS community– There are already some relevant initiatives out there (e.g.

MarineGIS)

– Use of databases may help (some are OGC compliant)

– But have problem that GIS tends to talk in 2-D

Develop DODS (=OpenDAP) layer Encourage others to install GADS

– We don’t want to hold lots of data in Reading!

– POL, Met Office, ECMWF all expressed interest

– Software needs “hardening” first…

Find more applications!