7 +/- 2 maybe good ideas john caron june 2011. (1) netcdf-java (aka cdm) has lots of functionality,...

13
7 +/- 2 Maybe Good Ideas John Caron June 2011

Upload: julie-strickland

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

7 +/- 2 Maybe Good Ideas

John CaronJune 2011

Page 2: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(1)

• NetCDF-Java (aka CDM) has lots of functionality, but only available in Java– NcML Aggregation– Access to lots of other file formats– Feature types (eg collections of point data)– Ironically, some functionality (eg aggregation) already available for

remote datasets through opendap– But not for local datasets

How can we get the CDM into other languages ?– Replicate in C and maintain two software stacks– Use reverse JNI (call Java from C)– Or …

Page 3: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

CdmRemote Server (aka TDS Lite)• Lightweight server for CDM datasets

– Zero configuration – use queries to configure– Local filesystem– Cache expensive objects– Allow non-Java applications access to CDM stack– Create virtual datasets: aggregations, logical views– Coordinate space queries– Feature Type subsetting– New API (!)

Page 4: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

CdmRemote Server (aka TDS Lite)

Data

cdmRemote Server

Coordinate Systems

Data Access

C Client

Application

cdmRemoteCDM Point Feature API

CDM Point Feature API

Python / ?

Page 5: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(2) Ncstream as a netCDF file format

• Write-optimized• Append only• Encode the full CDM object model• Uses Google’s protobuf for serialization• Java, C Libraries can read and access through

the standard netCDF API• Tools to convert to netcdf-3 and 4 formats

Page 6: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(3) BUFR/GRIB Table registration

• Unidata sponsored web service• Registered users can upload BUFR/GRIB tables

– Unique id is assigned (MD5 16 byte checksum?)– Convince producers to include the id into the data – unambiguous

which table was used– Anyone can download.

• GRIB and BUFR Decoding– Using CDM – find bugs !– Might become (ad-hoc) reference library– Might spur objections from “the experts”– Turn over to WMO if they want it

• Survival of Human Race is at stake here

Page 7: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(4)Streaming data / standing queries

• The proposal Dennis and I submitted last year• “As soon as it arrives on IDD, send me

PrecipTotal from NCEP/ RUC2 model subsetted by lat/lon bounding box in netCDF-4 / CF format”

• “As it arrives, send me GTS BUFR data in lat/lon bounding box in CSV”

Page 8: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

TDS

Current IDD data access

Dataset

LDM Push (header)

DatasetDataset

Dataset

FILE

CDMlibrary

Pull requests

IDD Data

Page 9: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

Content based filtering(standing requests)

LDM

Push (content)

PIPE

Standing request

IDD Data

RequestRequest

RequestContentFilter

service

Message Service•Content filtering•Change encoding•Protocol?

Page 10: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(5) Python

• Unidata should choose a scripting language to support, and give scientists full access to all of our tools in it

• Python wants to be the open-source Matlab• DOE, BADC have bought into Python• Python is a safe choice

Page 11: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(6) NetCDF management tools

• Develop consistent set of tools for managing collections of netCDF files– Use existing tools (ncgen, nccopy, ncdump, nco,

etc) under the covers– but don’t be constrained by their interfaces

• Look at RDBMS management languages • Use a scripting language like Python

Page 12: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(7) Hadoop

– Open Source started by Doug Cutting (Lucene) and Yahoo– Based on Google’s Map-Reduce for parallel processing– Lots of industry use, part of new data ecosystems– Objects in distributed, replicated file system– Commodity, shared-nothing hardware nodes– Simple key-value store– Append-only, sequential reading– Scale to arbitrarily large amount of data (batch)– Gather many queries and run them over the data

Page 13: 7 +/- 2 Maybe Good Ideas John Caron June 2011. (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access

(8) SciDB• Michael Stonebraker, David DeWitt

– “SciDB will be optimized for data management of big data and for big analytics.

– “The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure.”

• Getting involved:1. Load netcdf/hdf5 into SciDB2. “Native mode” – leave data in netcdf/hdf5