Download - Gbroccolo foss4 guk2016_brin4postgis
![Page 1: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/1.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
June, 14th 2016
BRIN indexes on
geospatial
databases
www.2ndquadrant.com
![Page 2: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/2.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
~$ whoami~$ whoami
Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant
@giubro gbroccolo7
gbroccolo gemini__81
![Page 3: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/3.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Indexes & PostgreSQLIndexes & PostgreSQL
• Binary structures on disc:
– Indexing organised in nodes into 8kB pages
– Nodes point to table rows
• Speed up data access: ~O(log N)
– High performance until the index can be contained in RAM
• Size: ~O(N)
![Page 4: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/4.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Geospatial indexes in PostgreSQLGeospatial indexes in PostgreSQL
• All datatypes can be indexed in principle...– ...just define the needed Operator Class!
• Operators & support functions
– ...define how the data has to be stored into the index nodes
• Geospatial data:– can be larger than 8kB– Is generally a varlena– a single node cannot be contained into a single page
![Page 5: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/5.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
PostGIS datatypes & GiST indexingPostGIS datatypes & GiST indexing
In PostGIS (since v0.6) geometry/geography areconverted into their float-precision bounding boxes
gidx/box2df are then used as storagedatatype to populate the index’s nodes
![Page 6: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/6.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
GIS & BigDataGIS & BigData
LiDAR surveys
Maps of entire countries
It’s relatively easy to have to work with terabytes of geospatial data…
...can indexes be contained in RAM?
![Page 7: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/7.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
A new index: Block Range Indexing (PG 9.5)A new index: Block Range Indexing (PG 9.5)
CREATE INDEX idx ON table
USING brin(column) WITH (pages_per_range=10);
data must be physically sorted on disk!
index node → row
index node → block
less specific than GiST!
Really small!
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
(S. Riggs, A. Herrera)
![Page 8: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/8.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
BRIN support for PostGISBRIN support for PostGIS
https://github.com/dalibo/postgis/tree/for_upstream
Julien RouhaudRonan Dunklau me
![Page 9: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/9.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
BRIN support for PostGISBRIN support for PostGIS
NO kNN SUPPORT!!
– Different OpClass for • 2D (default), 3D, 4D geometry• geography
• box2d/box3d
• cross-operators defined in the OpFamily
– Storage datatype: float-precision (such as in GiST)• gidx (3D, 4D geometry)
• box2df (2D geometry, geography)
– Operators: &&, @, ~ (2D) - &&& (3D, 4D)
8kB8kB8kB8kB
8kB8kB8kB8kB
![Page 10: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/10.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE INDEX idx_gist ON points USING gist
-# (geom gist_geometry_ops_nd);
=# CREATE INDEX idx_brin_128 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d);
=# CREATE INDEX idx_brin_10 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d)
-# WITH (pages_per_range=10);
BRIN(128)/GiST 1/2000
BRIN(10)/GiST 1/1000
BRIN(128)/GiST 1/30
BRIN(10)/GiST 1/20
SizesBuilding
times
![Page 11: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/11.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# SELECT * FROM points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 50/1
BRIN(10)/GiST 6/1
BRIN(128)/SeqScan 1/60
BRIN(10)/SeqScan 1/350
Execution times
![Page 12: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/12.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE TABLE unsorted_points AS
-# SELECT * FROM points ORDER BY random();
=# SELECT * FROM unsorted_points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 2800/1
BRIN(10)/GiST 400/1
BRIN(128)/SeqScan 1/1
BRIN(10)/SeqScan 1/12
Execution times
![Page 13: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/13.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points
=# CREATE EXTENSION pageinspect;
=# SELECT blknum, value FROM brin_page_items(get_raw_page('idx_brin_128', 2), 'idx_brin_128')
-# LIMIT 5;
blknum | value
--------+------------------------------------------------------------------------
0 | {GIDX( 1.11394846439 1.64980053902 1.11335003376, 99982.2578125 99982.64062599982.1171875 ) .. f .. f}
128 | {GIDX( 0.224781364202 0.184096381068 0.798862099648, 99995.859375 99995.17187599995.0390625 ) .. f .. f}
256 | {GIDX( 6.25802087784 6.50119876862 6.8031873703, 99993.15625 99993.820312599993.2109375 ) .. f .. f}
384 | {GIDX( 1.9203556776 1.11907613277 1.55043530464, 99995.421875 99995.23437599995.421875 ) .. f .. f}
512 | {GIDX( 3.44181084633 3.4120452404 3.41762590408, 99996.3125 99996.7109375 99996.59375) .. f .. f}
(5 rows)
![Page 14: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/14.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
...what about ...what about INSERTINSERT’s?’s?
INSERT INTO points
SELECT ST_MakePoint(a, a, a)
FROM generate_series(1, 1000) AS f(a);
GiST 6 times than insertions without index
BRIN (10) 3 times than insertions without index
BRIN (128) 2 times than insertions without index
...blocks could be “not sorted” anymore!
![Page 15: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/15.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Manage LiDAR
data with
PostgreSQL
is it possible?
Is the relational approach valid for point clouds?Is the relational approach valid for point clouds?
![Page 16: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/16.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
PostgreSQL & LiDARPostgreSQL & LiDAR• PostgreSQL is a relational DBMS with an extension for LiDAR data:
pg_pointcloud
• Part of the OpenGeo suite, completely compatible with PostGIS– two new datatype: pcpoint, pcpatch (compressed set of points)
• N points (with all attributes from the survey) → 1 patch → 1 record
– compatible with PDAL drivers to import data directly from .las 32b 32b 32b 16b
https://github.com/pgpointcloud/pointcloud
![Page 17: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/17.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG
• GiST indexing in PostGIS• GiST performances:
– storage: 1TB RAID1, RAM 16GB, 8 CPU @3.3GHz,PostgreSQL9.3
– index size ~O(table size)– Index was used:
• up to ~300M points in bbox inclusion searches• up to ~10M points in kNN searches
LiDAR size: ~O(109÷1011) → few % can be properly indexed!
http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex
![Page 18: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/18.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project
• 3D point cloud, coverage: almost the whole Netherlands– EPSG: 28992, ~8 points/m2
• 1.6TB, ~250G points in ~560M patches (compression: ~10x)– PDAL driver – filter.chipper
• available RAM: 16GB
• the point structure: X Y Z scan LAS time RGB chipper
32b 32b 32b 40b 16b 64b 48b 32b
the “indexed” part (can be converted to PostGIS datatype)
(thanks to Tom Van Tilburg)
![Page 19: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/19.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Typical searches on ahn2 - Typical searches on ahn2 - x3d_viewerx3d_viewer
• Intersection with a polygon (PostGIS)– the part that lasts longer – need indexes here!
• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)
• constrained Delaunay Triangulation (SFCGAL)
![Page 20: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/20.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
WITH patches AS (
SELECT patches FROM ahn2
WHERE patches && ST_GeomFromText(‘POLYGON(...)’)
), points AS (
SELECT ST_Explode(patches) AS points
FROM patches
), sorted_points AS (
SELECT points,
ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt
FROM points ORDER BY points <#> poly_pt LIMIT 1;
), sel AS (
SELECT points FROM sorted_points
WHERE points && ST_GeomFromText(‘POLYGON(...)’)
)
SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;
All in the DB & just with one query!All in the DB & just with one query!
![Page 21: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/21.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - GiST performancepatches && polygons - GiST performance
GiST 2.5 days
GiST 29GB
index buildingindex size
polygonsize
timing
~O(10m) ~20ms
~O(100m) ~60ms
~O(1Km) ~3s
~O(10km) hours
searches based on GiST
index not contained inRAM anymore
(~5G points → ~3%)
![Page 22: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/22.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygonsize
timing
~O(m) ~150s
searches based on BRIN
index size
![Page 23: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/23.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 5 h BRIN 14MB
index building
polygonsize timing
~O(m) ~150s
searches based on BRIN
How data was inserted...
LASLASLASLASLASLAS
chipper
chipper
chipper
PDAL driver(parallel processes)
ahn2
index size
![Page 24: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/24.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE INDEX patch_geohash ON ahn2
USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));
CLUSTER ahn2 USING patch_geohash;
Morton order [http://geohash.org/]
Need more than 2X table size free space
Relatively fast!
![Page 25: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/25.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
Slower than clustering using the index!
![Page 26: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/26.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
Slower than clustering using the index!
45 hours
![Page 27: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/27.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons – BRIN performancepatches && polygons – BRIN performance
polygonsize
BRINtiming
GiSTtiming
~O(10m) ~380ms ~20ms
~O(100m) ~400ms ~60ms
~O(1Km) ~2.7s ~3s
~O(10km) ~4.7s hours
• After sorting: 150s→380ms
• GiST X20 faster than BRIN [r~O(10m)]
– BRIN X200 faster than SeqScan
• BRIN X1000 faster than SeqScan [r~O(100m)]•
• BRIN ~ GiST [r~O(1Km)]
• Just BRIN is used for searches r>1km
![Page 28: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/28.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST searches
= x10÷x100 GiST searches
= x10÷x100 GiST searches
patch explosion+
NN sorting
constrainedtriangulation
![Page 29: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/29.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
the future of the project...add kNN supportthe future of the project...add kNN support
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
k
get blocks gradually included within the k parameter, as already madein GiST with single tuples
WIP…sponsorships are welcome!
ORDER BY geoms <-> point LIMIT N
![Page 30: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/30.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
ConclusionsConclusions• BRINs can be successfully used in geo DB based on PostgreSQL
– totally support PostGIS datatype• A patch almost ready for the next PostGIS release (2.3.0)
– easier indexes maintenance, low indexing time, low impact on concurrent INSERTs
– less specific than GiST...but not to much!
• Really small indexes!– GiST performances drop as well as it cannot be totally contained in RAM
• Can be relational approach to LiDAR data valid with PostgreSQL?– Yes, at least for bbox inclusion searches
– make sure that data has the same sequentiality of .las
![Page 31: Gbroccolo foss4 guk2016_brin4postgis](https://reader030.vdocuments.net/reader030/viewer/2022020301/58ae45841a28abad338b5a63/html5/thumbnails/31.jpg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Creative Commons license
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
http://creativecommons.org/licenses/by-nc-sa/2.5/it/© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it