rasdaman: big datacubes at your fingertips · datacubes :: © 2017 rasdaman mea: sample datacubes...
TRANSCRIPT
Datacubes :: © 2017 rasdaman
INSPIRE Conference, Strasbourg, 2017-09-08Peter Baumann
Jacobs University | rasdaman GmbH
rasdaman:Big Datacubes At Your Fingertips
Datacubes :: © 2017 rasdamanDatacubes :: GeoInfSymp :: ©2017 rasdaman
Datacubes :: © 2017 rasdaman
EarthServer: Datacubes At Your Fingertips
Agile Analytics on x/y/t + x/y/z/t Earth & Planetary datacubes- Rigorously standards: OGC WMS + WCS + WCPS- EU rasdaman + US NASA WorldWind- 700+ TB 1+ PB
Intercontinental initiative, 3+3 years: EU + US + AUS
www.earthserver.eu
Datacubes :: © 2017 rasdaman
Datacubes :: © 2017 rasdaman
MEA: Sample Datacubes Coverages
[MEEO, using rasdaman]
Datacubes :: © 2017 rasdaman
MEA: NDVI Timeseries Extraction
[MEEO, using rasdaman]
Datacubes :: © 2017 rasdaman
MEA: Land Surface Temperature, Cloudfree
[MEEO, using rasdaman]
Datacubes :: © 2017 rasdaman
ECMWF: River Discharge
[ECMWF, using rasdaman]
Datacubes :: © 2017 rasdaman
Ortho Image Timeseries
[Diedrich et al 2002, using rasdaman]
Datacubes :: © 2017 rasdaman
rasdaman: Agile Datacube Analytics= „raster data manager“: SQL + n-D arrays
- Scalable parallel “tile streaming” architecture- Cross-domain: Earth, Space, life Science & engineering
Mature, in operational use- blueprint for Big Datacube standards:
• ISO SQL/MDA („Multi-Dimensional Arrays“)• OGC, ISO, INSPIRE coverage service suite
- OGC & INSPIRE Reference Implementation
Datacubes :: © 2017 rasdaman
external archives
rasserver
databasefile system
rasdamangeo services
Web clients (m2m, browser)
Architecture
distributed query processing
No single point of failure
Internet
alternative storage
[SSTD 2013]
optional compression[SSTD 2013]
Datacubes :: © 2017 rasdaman
Parallel, Distributed Processing
1 query 1,000+ cloud nodes[ACM SIGMOD DanaC 2014][VLDB BOSS 2016]
Dataset B
Dataset A
Dataset D
Dataset C
selectmax((A.nir - A.red) / (A.nir + A.red))
- max((B.nir - B.red) / (B.nir + B.red))- max((C.nir - C.red) / (C.nir + C.red))- max((D.nir - D.red) / (D.nir + D.red))
from A, B, C, D
Datacubes :: © 2017 rasdaman
OGC WCPS: Elevation & Image Fusion
for $s in (SatImage), $d in (DEM)where $s/metadata/@region = “Glasgow"return
encode(struct {
red: (char) $s.b7[x0:x1,x0:x1],green: (char) $s.b5[x0:x1,x0:x1],blue: (char) $s.b0[x0:x1,x0:x1],alpha: (char) scale( $d, 20 )
},“image/png"
)
[JacobsU, Fraunhofer; data courtesy BGS, ESA]
Datacubes :: © 2017 rasdaman
OGC WMS via WCPSfor $p in (OrthoPhoto),
$wl in (WaterLines), $wa in (WaterAreas),$d in (DEM)
returnencode( (unsigned char) (
$p * { 1, 1, 1 }overlay$wl * { 0, 128, 255 }
overlay$wa * { 191, 255, 255 }
overlayswitch $d case $d > 260 return { red:255, green:0, blue:0 }case $d > 262 return { red:0, green:255, blue:0 }case $d > 264 return { red:0, green:0, blue:255 }default return { red:0, green:0, blue:0 }end
),"image/png" )
Datacubes :: © 2017 rasdaman
...But That‘s Not What You Want to See Let users remain in comfort zone of well-known tools
- Map navigation: OpenLayers, Leaflet, ...- Virtual globe: NASA WorldWind, Cesium, ...- Web GIS: MapServer, GeoServer,
QGIS, ArcGIS, ...- Analysis: GDAL, R,
python (OWSLIB, Jupyter notebooks), ...
...via WCS / WCPS / WMS as standard client/server APIs
[screenshots: rasdaman-based portals]
Datacubes :: © 2017 rasdaman 15
Conclusion Spatio-temporal datacubes for analysis-ready data
- „one cube says more than a million images“
Standards are available and proven- OGC / ISO / INSPIRE coverages + WCS- Conformance tests interoperable
rasdaman: pioneer scalable datacube technology- Src code + ample tutorials: www.rasdaman.org- Mature: 700+ TB, 1,000+ cloud parallelization, peer federation
“The RASDAMAN product is currently the world leading environment in this domain and the standard working horse for OGC standardisation on these innovative data access interfaces.“ -- G. Landgraf, ESA, Jan 2017
Datacubes :: © 2017 rasdaman
Backup
Datacubes :: © 2017 rasdaman
Image Pyramids, Revisited 2-D pyramids are 1-D!
- x/y scaling coupled
...not so with height or time!
What is an n-D pyramid?
multi-dimensional pre-aggregation [Garcia Gutierrez 2010]- Generalization of pyramids: keep lower-resolution scalings on stock- Combinatorial explosion → what combinations of x/y/z/t scale factors to keep?
→ workload based
Datacubes :: © 2017 rasdaman
Inset: INSPIRE Time Handling
OGC Coverages: time just another axis
INSPIRE (WaterML): timseries = time slices- WaterML extended: scalars → images
«FeatureType»Coverage
«Union»DomainSet
«Union»RangeSet
«type»DataRecord
rangeSetdomainSet rangeType
Datacubes :: © 2017 rasdaman
Linear Algebra Ops
Matrix multiplication
Histogram
select marray bucket in [0:255] values count_cells( img = bucket )
from img
select marray i in [0:m], j in [0:p] values condense +
over k in [0:n]using a [ i, k ] * b [ k, j ]
from matrix as a, matrix as b
Datacubes :: © 2017 rasdaman
Outlook: WCPS 2.0 merge WCPS with XQuery FLWOR
Implementation: Jacobs University & Athena Research Lab federation of eXist + rasdaman
let $r := doc(“WCPS”)//coverage/metadata/region/[ @name = "Austria" ]
for $c in doc(“WCPS”)//coverage/[ some( $c.nir > 127 and $r ) ] )
returnencode( abs( $c.red - $c.nir ), "image/tiff" )
for $c in doc(“WCPS”)//coverage/[ some( $c.nir > $c.red )]return
<id> { $c/@id } </id><area> { $c/boundedBy } </area>
Datacubes :: © 2017 rasdaman
Standards: ISO Array SQL [SSDBM 2014]
select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ ) from LandsatSceneswhere acquired between „1990-06-01“ and „1990-06-30“ and
avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0
create table LandsatScenes(id: integer not null, acquired: date, scene: row( band1: integer, ..., band7: integer ) mdarray [ 0:4999,0:4999] )
Datacubes :: © 2017 rasdaman
Application 1: EOfarm Startup
Big Data Analytics for farmers- rasdaman via OGC WCS & WCPS- similar framework deployed
for water quality monitoring
Data: Landsat8, Sentinels, RapidEye
Functionality:- Color Composites, Band Ratios and Indices- Vegetation Detection- Canopy Greenness Estimation- Land Surface Temperature- Time series over AOI
Datacubes :: © 2017 rasdaman
MEA: Daily Sentinel 2A Availability
Datacubes :: © 2017 rasdaman
Cloud Demo
Datacubes :: © 2017 rasdaman
Big Datacube Standards Open Geospatial Consortium (OGC) :
- WCS „Big Geo Data“ standards suite- rasdaman WCS Core Reference Implementation
ISO:- TC211: coverages & WCS
• ISO 19123 19123-1• OGC CIS 19123-2• OGC WCS ISO WCS
- SC32: SQL/MDA („Multi-Dimensional Arrays“)
INSPIRE:- coverages & WCS- rasdaman INSPIRE Reference Implementation
Research Data Alliance:- Big Data Interest Group,
Geospatial Interest Group, Array Database Assessment WG
Datacubes :: © 2017 rasdaman
ESA Tech Harm Strat Meetg, 2017-feb-21
Datacubes :: © 2017 rasdaman
The Datacube Manifesto
datacube = massive n-D array (“raster data”, “gridded data”) with data values at grid points in n-D grid- plus metadata describing geo reference, value semantics, etc.
6 core requirements:- Req 1: at least 1..4 D space/time support- Req 2: treat all axes alike- Req 3: trimming, slicing along axes in single request- Req 4: similar performance along all axes- Req 5: adaptive partitioning, invisible to user- Req 6: datacube processing language www.earthserver.eu/datacube-manifesto
Datacubes :: © 2017 rasdaman
Geo• Environmental sensor data, 1-D• Satellite / seafloor maps, 2-D [VLDB 1999]• Geophysics (3-D x/y/z)• Climate modelling (4-D, x/y/z/t)
Life science• Gene expression simulation (3-D) [InfSys 2003]• Human brain imaging (3-D / 4-D) [TiNS 2001]
Other• Computational Fluid Dynamics (3-D)• Astrophysics (4-D)• Statistics (n-D)
Domains Investigated
Datacubes :: © 2017 rasdaman
Server-Side Processing: Federation
Datacubes :: © 2017 rasdaman
A Brief History of Datacubes
Datacubes :: © 2017 rasdaman
Related Work: Hadoop – one size does not fit all- “Since it was not originally designed to leverage the structure
its performance is suboptimal” [Daniel Abadi]- U Madison / GMU benchmark confirms [AGU 2015]
[C. Scheele, F. Hu, M. Yu, M. Xu, K. Liu, Q. Huang, C. Yang 2015]