the live access server (access to observational data) jonathan callahan (university of washington)...
TRANSCRIPT
![Page 1: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/1.jpg)
The Live Access Server(Access to observational data)
Jonathan Callahan (University of Washington)
Steve Hankin (NOAA/PMEL – PI)
Roland Schweitzer, Kevin O’Brien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott,
Jerry Davison
![Page 2: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/2.jpg)
Gridded vs. Observational Data
•Clean
•Organized
•Labeled
•Voluminous
•Handled by machines
•Dirty
•Messy
•Often un/mis-labeled
•Increasingly voluminous
•Previously handled by hand
![Page 3: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/3.jpg)
Live Access Server (LAS)
• Web based, common interface to diverse sources of climate data
• Single interface for subsetting, download, visualization, comparison
• Easy access to metadata and documentation
• Unified access to distributed data holdings
• Uniform user interface to existing back end visualization packages
![Page 4: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/4.jpg)
LAS Data Model
For data access users must specify:
Dataset
Variable
4D Region‘Constraints’
![Page 5: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/5.jpg)
Dataset
![Page 6: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/6.jpg)
Dataset
![Page 7: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/7.jpg)
Variable
![Page 8: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/8.jpg)
4D RegionConstraints
![Page 9: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/9.jpg)
Output
![Page 10: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/10.jpg)
LAS Architecture
LAS is three tiered
![Page 11: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/11.jpg)
Access to Remote Data
Ferret back end is linked with OPeNDAP
![Page 12: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/12.jpg)
Data Server Details
Javaservletredesig
n
![Page 13: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/13.jpg)
Server Side Functionality
After parsing the user request LAS must:
For interactive results each task should take <5 sec.
Access & Subset the data
Perform analysis
Create Visualization
![Page 14: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/14.jpg)
The Hard Part
After parsing the user request LAS must:
Access & Subset the data
Perform analysis
Create Visualization
![Page 15: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/15.jpg)
Classes of Observational Climate Data
Station time series (Eulerian)– Oceanic
• tide guages (1D)
• moored thermister chains (2D)
– Atmospheric• surface weather stations (1D)
• profilers (2D)
![Page 16: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/16.jpg)
Classes of Observational Climate Data
Profile data– Oceanic
• CTD casts, bottle data (ordered by cruise track, quasi-scattered)
• repeat stations (ordered by cruise track or station location)
– Atmospheric• profilers (station based)
• baloons (2D, quasi-lagrangian)
![Page 17: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/17.jpg)
Classes of Observational Climate Data
Tracks (Lagrangian)– Oceanic
• ship underway data (surface)
• drifting buoys (surface)
• ARGO floats (surface tracks, scattered profiles)
• instrumented animals (depth)
– Atmospheric• airplane underway data (altitude)
• baloons (altitude, quasi-stationary, quasi-profile)
![Page 18: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/18.jpg)
Classes of Observational Climate Data
Random Scatter– Oceanic
• surface ship observations
• profile locations
– Atmospheric• surface weather obs
![Page 19: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/19.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001– data collected from ocean cruises and moorings
– scattered profiles, lagrangian drifters
– physical, chemical and biological data
– dozens (hundreds?) of variables
– > 7 million profiles (1792-present, global)
– > 10 Gigabytes of data (accelerating every year)
![Page 20: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/20.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001Current access:
• Choose either temporally or spatially sorted data• Choose year(s) or 10x10 degree box• Choose instrument• Retrieve data for all variables from that ‘file’
Problems:• Cannot subset data (1 year x 1 instrument ≈ 7 Mbytes)• Data returned in impenetrable compressed ASCII files• Associated metadata is lost
![Page 21: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/21.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001Our attempt at synoptic/cross-instrument data access
– Store data by variable• Plan for those getting data out, not putting data in.
• What do scientific analysis and visualization packages need?
– Store data for minimum # of disk seeks• Memory is fast (and cheap!), disk seeks are slow.
• Multi-stage process for determining data blocks needed.
• Read excess data into memory, then winnow.
![Page 22: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/22.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001
Longitude
Lati
tude
Tim
e
Step 1: synoptic meta-pointer file (0.3 MByte)a) load synoptic meta-pointer file into memoryb) subset to extract metadata pointers
10deg x 10deg x 50 irregular timesteps = 260 Kbytes
number of profilespointer into NetCDF metadata file=
![Page 23: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/23.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001
Step 2: metadata/data-pointer file (200 Mbyte)a) read blocks of profile metadata into memoryb) subset by X/Y/T to obtain valid data pointers
TXY
Julian dayLatLonCruise ID# of levelsVar_ptrVar_QC
=
N variablesx
![Page 24: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/24.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001
Step 3: data files (10 - 2000 Mbyte)a) read profile datab) subset by depth/quality flag to obtain valid data
1D profile
TXY Depth
ValueQuality flag
=Z N depthsx
![Page 25: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/25.jpg)
Example Dataset
NOAA/NODC/OCL World Ocean Database 2001Our attempt at synoptic/cross-instrument data accessSuccesses:
• Able to subset without accessing (much) unwanted data• Access to (<1 Mbyte) subsets in seconds• Access to metadata (“What profiles exist?”) even faster
Problems:• Only set up for most important variables• Data cannot be updated, must be rewritten• Must reinvent logic for relational queries• Funky, home built soluition
![Page 26: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/26.jpg)
Other data streams
• METAR obs (station time series)– 1700 US weather stations report hourly data– 25 variables = 120 Mbytes/month
• ARGO floats (profiles)– 4000 floats reporting profiles every 10 days– 50 levels x 10 variables = 24 Mbytes/month
• Tagging Of Pacific Pelagics (TOPP) (lagrangian tracks)– 50 animals per year tagged with 1 min data recorders– 5 variables = 0.8 Mbytes/month
• Voluntary Observing Ships (random scatter)– 3000 surface ship reports per day– 25 variables = 9 Mbytes/month
![Page 27: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/27.jpg)
Observational Data Access Requirements
• Subset based on X, Y, Z, T or metadata (e.g. quality flag or station/ship/platform/animal_ID).
• Only return requested data. (Reduced volume for remote data access.)
• For near-real-time, daily updates are acceptable. (Can recreate static files on a daily basis if necessary.)
• Use standards wherever possible.• Make the creation of the database as simple as
possible. (Non-experts can follow cookbook examples.)
![Page 28: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin](https://reader036.vdocuments.net/reader036/viewer/2022081518/5515ed8e550346d46f8b525b/html5/thumbnails/28.jpg)
Conclusion
• Efficient access to observational data is an unsolved problem.
• Data volumes are increasing exponentially.
• Data access problems hinder the development of interactive visualization tools.