aggregation and subsetting in erddap

16
Aggregation and Subsetting in ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <[email protected]> NOAA NMFS SWFSC ERD

Upload: the-hdf-eos-tools-and-information-center

Post on 08-Jul-2015

235 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Aggregation and Subsetting in ERDDAP

Aggregation and Subsettingin ERDDAP (a middleman data server)

http://coastwatch.pfeg.noaa.gov/erddap

Bob Simons <[email protected]>

NOAA NMFS SWFSC ERD

Page 2: Aggregation and Subsetting in ERDDAP
Page 3: Aggregation and Subsetting in ERDDAP
Page 4: Aggregation and Subsetting in ERDDAP
Page 5: Aggregation and Subsetting in ERDDAP
Page 6: Aggregation and Subsetting in ERDDAP
Page 7: Aggregation and Subsetting in ERDDAP
Page 8: Aggregation and Subsetting in ERDDAP
Page 9: Aggregation and Subsetting in ERDDAP

Aggregating Gridded Data• Aggregating time points:

10,000's of data files: sst[latitude][longitude]become one virtual dataset:sst[time][latitude][longitude]

• Aggregating variables:Many files with one variable per filebecome one virtual dataset with all variables

Page 10: Aggregation and Subsetting in ERDDAP

Subsetting Gridded Data

• OPeNDAP Projection Constraintssst[57:57][121:2:141][163:2:183]ERDDAP: sst[(2012-08-12)][(20):2:(40)][(-140):2:(-120)]

• Huge time-saver: User can just request what she needs (1%).

• Aggregated datasets need to be subset-able.

Page 11: Aggregation and Subsetting in ERDDAP

Aggregating In-Situ and Tabular Data

• A database-like table with rows and columnsE.g., One file has data for one buoy for one month. It isn't a multi-dimensional grid.There are no dimensions.

• Aggregating features and time points: Features: stations, trajectories, profiles, ...Append into a giant virtual table.

Page 12: Aggregation and Subsetting in ERDDAP

SubsettingIn-Situ and Tabular Data

• OPeNDAP Selection Constraints(no indices, because no multi-dimensional grids)longitude,latitude,time,sst&sst>35Easy to create. Uses domain units (degC).Very flexible. (Based on database's SQL SELECT.)

• Huge time-saver User can just request what she needs (1%).

• Aggregated datasets need to be subset-able.

Page 13: Aggregation and Subsetting in ERDDAP

Don't Treat In-Situ/Tabular Data Like Gridded Data

• CF DSG stores in-situ data as as gridded .ncFine for storage, not for subsetting.

• Problem: Indices aren't domain units. How do you request sst>35 with indices?

• Problem: Indices aren't real-world sequence.Grid: lat[] is a sequence. lat[42:53] has meaning.Table: Buoy number isn't. &lat>20&lat<40 is buoy #2,14,26,109, not buoy[42:53]

• Problem: 5 CF DSG data structures.

Page 14: Aggregation and Subsetting in ERDDAP

Option: Treat Gridded Data Like Tabular Data

• Standard request: time, lat, lon bounding boxWhat about unusual requests of gridded data,e.g., SST>35 ("Select by value")

• ERDDAP's EDDTableFromEDDGrid creates a giant virtual table from a gridded dataset.Columns: longitude, latitude, time, sstQuery: e.g., longitude,latitude,time,sst&sst>35Response: a table (one data point per row)

• Risk: huge effort for server.

Page 15: Aggregation and Subsetting in ERDDAP

Summary: Huge Advantages of Aggregation and Subsetting

• Users can find and deal with one aggregated dataset.

• Users can make one subset request to one aggregated datasetGrids: indices to get a temporal and spatial subset.Tables (selection constraints): any subset you want.(Not: one subset request to each unaggregated file,or worse, using FTP to download lots of entire files.)

• Don't treat tabular/in-situ data like gridded data.

Page 16: Aggregation and Subsetting in ERDDAP

Aggregation and Subsettingin ERDDAP (a middleman data server)

http://coastwatch.pfeg.noaa.gov/erddap

Bob Simons <[email protected]>

NOAA NMFS SWFSC ERD