enabling technologies for facilitating access and use of data
DESCRIPTION
Enabling technologies for facilitating access and use of data. Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville , 2010-03-09. CDM. Goal: N + M instead of N * M things on your TODO List. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/1.jpg)
Enabling technologies for facilitating access and use of
dataRuss Rew and John Caron, Unidata
Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society
NCDC, Asheville, 2010-03-09
QuickTime™ and a decompressor
are needed to see this picture.
![Page 2: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/2.jpg)
File Format#N
File Format#2
File Format#1
CDM
Visualization&Analysis
Goal: N + M instead of N * M things on your TODO List
NetCDF file
Data Server
Web Service
![Page 3: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/3.jpg)
Common Data Model
• What is it?• Capabilities for observational data• Current status
![Page 4: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/4.jpg)
What is it?
• Abstract Data Model for scientific data• Implemented by Netcdf-Java library• Core of the THREDDS Data Server• Co-evolving with the CF Conventions
![Page 5: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/5.jpg)
Abstract Data Modelaka Object Model
• Data Access Layer– NetCDF / HDF5 / OPeNDAP– subset in index space
• Coordinate System Layer– CF, VisAD, HDF-EOS, GRIB– georeferencing
• Feature Type Layer– OGC WxS, ISO, CSML,– Subset in coordinate space
![Page 6: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/6.jpg)
Abstract Data Model
• Turns a collection of bytes into a collection of objects called features– Eg: Grids, swaths, profiles, radial sweeps
• These objects play the same role as a schema does in a database
• Defines the things (nouns) and what operations (verbs) are possible
![Page 7: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/7.jpg)
Netcdf-Java library implementation
• 100 % pure Java, open source, developed and maintained by Unidata
• Object oriented, strongly typed, garbage collected, huge open-source libraries, runtime configurable == highly productive
• Many different file formats• Many different coordinate system conventions• Library is used by many other software
packages
![Page 8: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/8.jpg)
Netcdf-Java File Formats
• General: NetCDF-3, NetCDF-4, HDF5, HDF4, OPeNDAP
• Gridded: GRIB-1, GRIB-2, GEMPAK, McIDAS, UAMIV CAMx
• Point: BUFR, GEMPAK
• Radar: NEXRAD 2&3, DORADE, CINRAD, UF
• Satellite: DMSP, GINI, McIDAS, FYSAT, HDF-EOS
• Misc: GTOPO, NLDN, USPLN, etc• Write your own IOServiceProvider Java class
![Page 9: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/9.jpg)
Transforms (CF)
Projections albers_conical_equal_area, lambert_azimuthal_equal_area,
lambert_conformal_conic, mcidas_area, mercator, orthographic, rotated_pole , stereographic (including polar), transverse_mercator, UTM (ellipsoidal), vertical_perspective
Vertical Transforms• atmosphere_sigma, atmosphere_hybrid_sigma_pressure,
ocean_s, ocean_sigma, existing3DField
Write your own CoordTransBuilderIF Java class
![Page 10: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/10.jpg)
Used by other applications
– Integrated Data Viewer, ToolsUI (Unidata)– Panoply (NASA)– ncBrowse (EPIC/NOAA)– Java NEXRAD Viewer (NCDC/NOAA)– MyWorld GIS (Northwestern)– EDC for ArcGIS, ERRDAP (SFSC/NOAA)– Live Access Server (PMEL/NOAA)– ncWMS (Reading)– Matlab plug-in (USGS)
![Page 11: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/11.jpg)
Servlet Container
Core of the THREDDS Data Server
Datasets
catalog.xml
motherlode.ucar.edu
THREDDS Server
NetCDF-Javalibrary
Remote AccessClient
IDD Data
•HTTPServer
•WMS
•WCS•OPeNDAP
configCatalog.xml
![Page 12: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/12.jpg)
THREDDS Data Server (TDS)• Web server for scientific data• 100% Java - servlet• Provides remote data access
– OPeNDAP – Open Geospatial Consortium (OGC) WMS and
WCS – HTTP file transfer– Experimental data access protocols.
• Infrastructure – not a portal
![Page 13: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/13.jpg)
TDS and NcML
• Embed NcML into the TDS configuration catalog• Server serves a virtual dataset defined by NcML
– NcML hidden from the client• Can “fix” metadata problems• Can augment metadata• General Aggregations
– joinNew, joinExisting, Union• Specialized Aggregations
– Forecast Model Run Collection (FMRC)– Point Feature Collections (version 4.2)
![Page 14: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/14.jpg)
TDS / NcML Modify all files in datasetScan
<datasetScan name="Ocean Satellite Data" path="/data/ocean/sat/" location= "/data/ncdc/impacts/scenario4b/run1234">
<netcdf> <attribute name=“NCS:Provenence" value=“NCDC assimilation prog4gd from
GOES-10"/> </netcdf>
</datasetScan>
![Page 15: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/15.jpg)
TDS / NcML aggregation<dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-
CONUS_4km">
<netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" <aggregation dimName="time" type="joinNew"> <scan location="/data/satellite/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf>
</dataset>
![Page 16: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/16.jpg)
Co-evolving with the CF Conventions
• Implementation of the CF Conventions• Strong feedback (in both directions) between
CF and CDM• CF is the recommended way to write datasets• CDM also deals with legacy datasets and other
file formats besides netCDF
![Page 17: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/17.jpg)
CF
• CF has mostly focused on model gridded data– Driven by IPCC work
• Has a general coordinate system model– :coordinates = “lat lon alt time”;– Sufficient for swath, some in-situ data
• Current efforts– Radial data (NCAR/EOL)– Discrete Sample data (aka point, in-situ data)
![Page 18: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/18.jpg)
• Point: measured at one point in time and space • Station: time-series of points at the same location• Profile: points along a vertical line • TimeSeries of Profiles a time-series of profiles at same
location. • Trajectory: points along a 1D curve in time/space • Trajectory of Profiles: a collection of profile features
which originate along a trajectory.
Discrete Sample Data Categorization
![Page 19: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/19.jpg)
Proposed Encoding Variations
• Rectangular Array– Multidimensional– Single : one feature in the file
• Ragged Array – different length features– Contiguous – Non-Contiguous– Flattened
![Page 20: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/20.jpg)
Current CDM Status
• Discrete Sample Data proposal– Almost finalized (Caron/Gregory/Hankin)– CDM implementation now in 4.1– Collections of files to be in 4.2
• Forecast Model Run Collection refactor– Also using Collection – Caching on the server– Scale to much larger collections (NCDC/Nomads)– Scheduled for 4.2
![Page 21: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/21.jpg)
CDM funding status
• CDM/THREDDS work competes with many other priorities at Unidata
• THREDDS is most used by large data centers (NOAA/NASA/USGS/EPS, EU)
• Important (but indirect) benefits to NSF ATM constituency (US academic meteorology)
• Unidata is fully committed but not much chance of expanded base funding from NSF
![Page 22: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/22.jpg)
CDM funding status (cont)
• Have a proposal in to NSF Cyber-Infrastructure solicitation– Integration of TDS and IDD/LDM data streams– Explore use of Hadoop (Map/Reduce) for very
large collections• Need commitment of resource from you
– ($$) Custom work when compatible– In-kind contribution == time and attention for
CF/CDM from domain experts and engineers
![Page 23: Enabling technologies for facilitating access and use of data](https://reader036.vdocuments.net/reader036/viewer/2022062408/56814371550346895daff0ce/html5/thumbnails/23.jpg)
Thank You!