Download - On Beyond (PostgreSQL) Data Types
![Page 1: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/1.jpg)
On Beyond Data TypesJonathan S. Katz
PostgreSQL España February 16, 2015
![Page 2: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/2.jpg)
About• CTO, VenueBook
• Co-Organizer, NYC PostgreSQL User Group (NYCPUG)
• Director, United States PostgreSQL Association
• ¡Primera vez en España!
• @jkatz05
2
![Page 3: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/3.jpg)
A Brief Note on NYCPUG• Active since 2010
• Over 1,300 members
• Monthly Meetups
• PGConf NYC 2014
• 259 attendees
• PGConf US 2015:
• Mar 25 - 27 @ New York Marriott Downtown
• Already 160+ registrations
3
![Page 5: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/5.jpg)
Community Updates
![Page 6: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/6.jpg)
Community Updates
![Page 7: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/7.jpg)
![Page 8: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/8.jpg)
8
![Page 9: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/9.jpg)
Data Types• Fundamental
• 0 => 1
• 00001111
• Building Blocks
• 0x41424344
• Accessibility
• 1094861636
• 'ABCD'
9
![Page 10: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/10.jpg)
C• char
• int
• float, double
• bool (C99)
• (short, long, signed, unsigned)
10
![Page 11: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/11.jpg)
PostgreSQL
• char, varchar, text
• smallint, int, bigint
• real, double
• bool
11
![Page 12: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/12.jpg)
I kid you not, I can spend close to an hour
on just those data types
12
![Page 13: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/13.jpg)
PostgreSQL Primitives Oversimplified Summary
• Strings
• Use "text" unless you need actual limit on strings, o/w use "varchar"
• Don't use "char"
• Integers
• Use "int"
• If you seriously have big numbers, use "bigint"
• Numerical types
• Use "numeric" almost always
• If have IEEE 754 data source you need to record, use "float"
13
![Page 14: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/14.jpg)
And If We Had More Time• (argh no pun intended)
• timestamp with time zone, timestamp without time zone
• date
• time with time zone, time without time zone
• interval
14
![Page 15: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/15.jpg)
Summary of PostgreSQL Date/Time Types
• They are AWESOME
• Flexible input that you can customize
• Can perform mathematical operations in native format
• Thank you intervals!
• IMO better support than most programming languages have, let alone databases
15
![Page 16: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/16.jpg)
16
![Page 17: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/17.jpg)
PostgreSQL is a ORDBMS
• Designed to support more complex data types
• Complex data types => additional functionality
• Data Integrity
• Performance
17
![Page 18: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/18.jpg)
Let's Start Easy: Geometry
18
![Page 19: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/19.jpg)
PostgreSQL Geometric Types
19
Name Size Representation Format
point 16 bytes point on a plane (x,y)
lseg 32 bytes finite line segment ((x1, y1), (x2, y2))
box 32 bytes rectangular box ((x1, y1), (x2, y2))
path 16 + 16n bytes
closed path (similar to polygon, n = total points
((x1, y1), (x2, y2), …, (xn, yn))
path 16 + 16n bytes
open path, n = total points
[(x1, y1), (x2, y2), …, (xn, yn)]
polygon 40 bytes + 16n
polygon ((x1, y1), (x2, y2), …, (xn, yn))
circle 24 bytes circle – center point and radius
<(x, y), r>
http://www.postgresql.org/docs/current/static/datatype-geometric.html
![Page 20: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/20.jpg)
Geometric Operators• 31 different operators built into PostgreSQL
20
obdt=# SELECT point(1,1) + point(2,2);!----------! (3,3)
obdt=# SELECT point(1,1) ~= point(2,2);!----------! f!!obdt=# SELECT point(1,1) ~= point(1,1);!----------! t
obdt=# SELECT point(1,1) <-> point(4,4);!------------------! 4.24264068711929
Equivalence
Translation
Distance
http://www.postgresql.org/docs/current/static/functions-geometry.html
![Page 21: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/21.jpg)
Geometric Operators
21
obdt=# SELECT '(0,0),5)'::circle && '((2,2),3)'::circle;!----------! t
obdt=# SELECT '(0,0),5)'::circle @> point(2,2);!----------! t
Overlapping
Containment
obdt=# SELECT '((0,0), (1,1))'::lseg ?|| '((1,-1), (2,0))'::lseg; !----------! t
Is Parallel?
http://www.postgresql.org/docs/current/static/functions-geometry.html
obdt=# SELECT '((0,0), (1,1))'::lseg ?# '((0,0), (5,5))'::box;!----------! t
Intersection
![Page 22: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/22.jpg)
Geometric Functions• 13 non-type conversion functions built into PostgreSQL
22
obdt=# SELECT area('((0,0),5)'::circle);!------------------! 78.5398163397448
Area
obdt=# SELECT center('((0,0),(5,5))'::box);!-----------! (2.5,2.5)
Center
obdt=# SELECT length('((0,0),(5,5))'::lseg);!------------------! 7.07106781186548
Length
obdt=# SELECT width('((0,0),(3,2))'::box);!-------! 3
obdt=# SELECT height('((0,0),(3,2))'::box);!--------! 2
Width
Height
![Page 23: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/23.jpg)
Geometric Performance
• Size on Disk
• Consider I/O on reads
• But indexing should help!!
23
![Page 24: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/24.jpg)
Geometric Performance
24
CREATE TABLE houses (plot box);!!INSERT INTO houses!SELECT box(!! point((500 * random())::int, (500 * random())::int),! ! point((750 * random() + 500)::int, (750 * random() + 500)::int)! )!FROM generate_series(1, 1000000);
obdt=# CREATE INDEX houses_plot_idx ON houses (plot);!ERROR: data type box has no default operator class for access method "btree"!HINT: You must specify an operator class for the index or define a default operator class for the data type.
![Page 25: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/25.jpg)
Solution #1: Expression Indexes
25
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;!-------------! Seq Scan on houses (cost=0.00..27353.00 rows=5000 width=32) (actual time=0.077..214.431 rows=26272 loops=1)! Filter: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! Rows Removed by Filter: 973728! Total runtime: 215.965 ms
obdt=# CREATE INDEX houses_plot_area_idx ON houses (area(plot));!!obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE area(plot) BETWEEN 50000 AND 75000;!------------! Bitmap Heap Scan on houses (cost=107.68..7159.38 rows=5000 width=32) (actual time=5.433..14.686 rows=26272 loops=1)! Recheck Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! -> Bitmap Index Scan on houses_plot_area_idx (cost=0.00..106.43 rows=5000 width=0) (actual time=4.300..4.300 rows=26272 loops=1)! Index Cond: ((area(plot) >= 50000::double precision) AND (area(plot) <= 75000::double precision))! Total runtime: 16.025 ms
http://www.postgresql.org/docs/current/static/indexes-expressional.html
![Page 26: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/26.jpg)
Solution #2: GiST Indexes
26
obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;!------------! Seq Scan on houses (cost=0.00..19853.00 rows=1000 width=32) (actual time=0.009..96.680 rows=40520 loops=1)! Filter: (plot @> '(300,300),(100,100)'::box)! Rows Removed by Filter: 959480! Total runtime: 98.662 ms
obdt=# CREATE INDEX houses_plot_gist_idx ON houses USING gist(plot);!!obdt=# EXPLAIN ANALYZE SELECT * FROM houses WHERE plot @> '((100,100),(300,300))'::box;!------------! Bitmap Heap Scan on houses (cost=56.16..2813.20 rows=1000 width=32) (actual time=12.053..24.468 rows=40520 loops=1)! Recheck Cond: (plot @> '(300,300),(100,100)'::box)! -> Bitmap Index Scan on houses_plot_gist_idx (cost=0.00..55.91 rows=1000 width=0) (actual time=10.700..10.700 rows=40520 loops=1)! Index Cond: (plot @> '(300,300),(100,100)'::box)! Total runtime: 26.451 ms
http://www.postgresql.org/docs/current/static/indexes-types.html
![Page 27: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/27.jpg)
Solution #2+: KNN-Gist
27
obdt=# CREATE INDEX locations_geocode_gist_idx ON locations USING gist(geocode);!!obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;!------------!Limit (cost=0.29..1.06 rows=10 width=16) (actual time=0.098..0.235 rows=10 loops=1)! -> Index Scan using locations_geocode_gist_idx on locations (cost=0.29..77936.29 rows=1000000 width=16) (actual time=0.097..0.234 rows=10 loops=1)! Order By: (geocode <-> '(41.88853,-87.628852)'::point)!
Total runtime: 0.257 ms
obdt=# CREATE TABLE locations (geocode point);!!obdt=# INSERT INTO locations!SELECT point(90 * random(), 180 * random())!FROM generate_series(1, 1000000);
obdt=# EXPLAIN ANALYZE SELECT * FROM locations ORDER BY geocode <-> point(41.88853,-87.628852) LIMIT 10;!------------! Limit (cost=39519.39..39519.42 rows=10 width=16) (actual time=319.306..319.309 rows=10 loops=1)! -> Sort (cost=39519.39..42019.67 rows=1000110 width=16) (actual time=319.305..319.307 rows=10 loops=1)! Sort Key: ((geocode <-> '(41.88853,-87.628852)'::point))! Sort Method: top-N heapsort Memory: 25kB! -> Seq Scan on locations (cost=0.00..17907.38 rows=1000110 width=16) (actual time=0.019..189.687 rows=1000000 loops=1)! Total runtime: 319.332 ms
http://www.slideshare.net/jkatz05/knn-39127023
![Page 28: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/28.jpg)
• For when you are doing real things with shapes
28• (and geographic information systems)
Solution #3: PostGIS
![Page 29: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/29.jpg)
For more on PostGIS, please go back in time to yesterday
and see Regina & Leo's tutorial
29
![Page 30: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/30.jpg)
Let's Take a Break With UUIDs
30
2024e06c-44ff-5047-b1ae-00def276d043
![Page 31: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/31.jpg)
! Universally Unique Identifiers
! 16 bytes on disk
! Acceptable input formats include:
– A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
– {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
– a0eebc999c0b4ef8bb6d6bb9bd380a11
– a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
– {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
UUID + PostgreSQL
31
http://www.postgresql.org/docs/current/static/datatype-uuid.html
![Page 32: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/32.jpg)
UUID Functions
32http://www.postgresql.org/docs/current/static/uuid-ossp.html
obdt=# CREATE EXTENSION IF NOT EXISTS "uuid-ossp";!"obdt=# SELECT uuid_generate_v1();! uuid_generate_v1 !--------------------------------------! d2729728-3d50-11e4-b1af-005056b75e1e!"obdt=# SELECT uuid_generate_v1mc();! uuid_generate_v1mc !--------------------------------------! e04668a2-3d50-11e4-b1b0-1355d5584528!"obdt=# SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresopen.org');! uuid_generate_v3 !--------------------------------------! d0bc1ba2-bf07-312f-bf6a-436e18b5b046!"obdt=# SELECT uuid_generate_v4();! uuid_generate_v4 !--------------------------------------! 0809d8fe-512c-4f02-ba37-bc2e9865e884!"obdt=# SELECT uuid_generate_v5(uuid_ns_url(), 'http://www.postgresopen.org');! uuid_generate_v5 !--------------------------------------! d508c779-da5c-5998-bd88-8d76d446754e
![Page 33: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/33.jpg)
Network Address Types• inet (IPv4 & IPv6)
– SELECT '192.168.1.1'::inet;
– SELECT '192.168.1.1/32'::inet;
– SELECT '192.168.1.1/24'::inet;
• cidr (IPv4 & IPv6)
– SELECT '192.168.1.1'::cidr;
– SELECT '192.168.1.1/32'::cidr;
– SELECT '192.168.1.1/24'::cidr;
• macaddr
– SELECT '08:00:2b:01:02:03'::macaddr;
33
http://www.postgresql.org/docs/current/static/datatype-net-types.html
![Page 34: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/34.jpg)
Networks can do Math
34
http://www.postgresql.org/docs/current/static/functions-net.html
![Page 35: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/35.jpg)
Postgres Can Help Manage Your Routing Tables
35
http://www.postgresql.org/docs/current/static/functions-net.html
...perhaps with a foreign data wrapper and a background worker, perhaps it can fully mange your routing tables?
![Page 36: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/36.jpg)
Arrays
• ...because a database is an "array" of tuples
• ...and a "tuple" is kind of like an array
• ...can we have an array within a tuple?
36
![Page 37: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/37.jpg)
37
![Page 38: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/38.jpg)
Array Facts
obdt=# SELECT (ARRAY[1,2,3])[1];!-------! 1
38
obdt=# SELECT (ARRAY[1,2,3])[0];!-------!
Arrays are 1-indexed
obdt=# CREATE TABLE lotto (!! ! numbers int[3]! );!"obdt=# INSERT INTO lotto VALUES (!! ARRAY[1,2,3,4]! );!"obdt=# SELECT * FROM lotto;!-----------! {1,2,3,4}
Size constraints not enforced
![Page 39: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/39.jpg)
Arrays Are Malleable
39
obdt=# UPDATE lotto SET numbers = ARRAY[1,2,3];!"obdt=# SELECT * FROM lotto;!---------! {1,2,3}!"obdt=# UPDATE lotto SET numbers[3] = '7';!"obdt=# SELECT * FROM lotto;!---------! {1,2,7}!"obdt=# UPDATE lotto SET numbers[1:2] = ARRAY[6,5];!"obdt=# SELECT * FROM lotto;!---------! {6,5,7}
![Page 40: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/40.jpg)
Array Operations• <, <=, =, >= >, <>
– full array comparisons
– B-tree indexable
40
SELECT ARRAY[1,2,3] @> ARRAY[1,2];!SELECT ARRAY[1,2] <@ ARRAY[1,2,3];
SELECT ARRAY[1,2,3] || ARRAY[3,4,5];!SELECT ARRAY[ARRAY[1,2]] || ARRAY[3,4];!SELECT ARRAY[1,2,3] || 4;
SELECT ARRAY[1,2,3] && ARRAY[3,4,5]; Overlaps
Containment
Concatenation
![Page 41: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/41.jpg)
Integer Arrays Use GIN
41
obdt=# CREATE INDEX int_arrays_data_gin_idx ON int_arrays USING GIN(data);!"obdt=# EXPLAIN ANALYZE SELECT *!FROM int_arrays!WHERE 5432 = ANY (data);!---------------!Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual time=1.237..157.397 rows=3 loops=1)! Filter: (5432 = ANY (data))! Rows Removed by Filter: 999997! Total runtime: 157.419 ms!"obdt=# EXPLAIN ANALYZE SELECT * FROM int_arrays
WHERE ARRAY[5432] <@ data;!---------------! Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33) (actual time=0.023..0.024 rows=3 loops=1)! Recheck Cond: ('{5432}'::integer[] <@ data)! -> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50 rows=5000 width=0) (actual time=0.019..0.019 rows=3 loops=1)! Index Cond: ('{5432}'::integer[] <@ data)!
Total runtime: 0.090 ms
![Page 42: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/42.jpg)
Array Functions• modification
! SELECT array_append(ARRAY[1,2,3], 4);!
! SELECT array_prepend(1, ARRAY[2,3,4]);!
! SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);!
! SELECT array_remove(ARRAY[1,2,1,3], 1);!
! SELECT array_replace(ARRAY[1,2,1,3], 1, -4);!
• size
! SELECT array_length(ARRAY[1,2,3,4], 1); -- 4!
! SELECT array_ndims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);!
! -- 2!
! SELECT array_dims(ARRAY[ARRAY[1,2], ARRAY[3,4]]);!
! -- [1:2][1:2]
42
http://www.postgresql.org/docs/current/static/functions-array.html
![Page 43: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/43.jpg)
Array Functions
43
obdt=# SELECT array_to_string(ARRAY[1,2,NULL,4], ',', '*');!-----------------! 1,2,*,4
obdt=# SELECT unnest(ARRAY[1,2,3]);! unnest !--------! 1! 2! 3
Array to String
Array to Set
http://www.postgresql.org/docs/current/static/functions-array.html
![Page 44: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/44.jpg)
array_agg• useful for variable-length lists or "unknown # of columns"
obdt=# SELECT!! t.title!! array_agg(s.full_name)!FROM talk t!JOIN speakers_talks st ON st.talk_id = t.id!JOIN speaker s ON s.id = st.speaker_id!GROUP BY t.title;!
" title | array_agg !---------------------+-----------! Data Types | {Jonathan, Jim}! Administration | {Bruce}! User Groups | {Josh, Jonathan, Magnus}
44
http://www.postgresql.org/docs/current/static/functions-array.html
![Page 45: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/45.jpg)
Ranges• Scheduling
• Probability
• Measurements
• Financial applications
• Clinical trial data
• Intersections of ordered data
45
![Page 46: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/46.jpg)
Why Range Overlaps Are Difficult
46
![Page 47: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/47.jpg)
Before Postgres 9.2• OVERLAPS
"
"
"
• Limitations:
• Only date/time
• Start <= x <= End
SELECT!! ('2013-01-08`::date, '2013-01-10'::date) OVERLAPS ('2013-01-09'::date, '2013-01-12'::date);
47
![Page 48: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/48.jpg)
Postgres 9.2+• INT4RANGE (integer)!
• INT8RANGE (bigint)!
• NUMRANGE (numeric)!
• TSRANGE (timestamp without time zone)!
• TSTZRANGE (timestamp with time zone)!
• DATERANGE (date)
48
http://www.postgresql.org/docs/current/static/rangetypes.html
![Page 49: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/49.jpg)
Range Type Size• Size on disk = 2 * (data type) + 1
• sometimes magic if bounds are equal
obdt=# SELECT pg_column_size(daterange(CURRENT_DATE, CURRENT_DATE));!----------------! 9!
"obdt=# SELECT pg_column_size(daterange(CURRENT_DATE,CURRENT_DATE + 1));!----------------! 17
49
![Page 50: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/50.jpg)
Range Bounds• Ranges can be inclusive, exclusive or both
• [2,4] => 2 ≤ x ≤ 4
• [2,4) => 2 ≤ x < 4
• (2,4] => 2 < x ≤ 4
• (2,4) => 2 < x < 4
"
• Can also be empty
50
![Page 51: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/51.jpg)
Infinite Ranges• Ranges can be infinite
– [2,) => 2 ≤ x < ∞
– (,2] => -∞ < x ≤ 2
• CAVEAT EMPTOR
– “infinity” has special meaning with timestamp ranges
– [CURRENT_TIMESTAMP,) = [CURRENT_TIMESTAMP,]
– [CURRENT_TIMESTAMP, 'infinity') <> [CURRENT_TIMEAMP, 'infinity']
51
![Page 52: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/52.jpg)
Constructing Ranges
obdt=# SELECT '[1,10]'::int4range;!-----------! [1,11)
52
![Page 53: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/53.jpg)
Constructing Ranges• Constructor defaults to '[)'
53
obdt=# SELECT numrange(9.0, 9.5); !------------! [9.0,9.5)
![Page 54: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/54.jpg)
Finding Overlapping Ranges
obdt=# SELECT *!FROM cars!
WHERE cars.price_range && int4range(13000, 15000, '[]')!ORDER BY lower(cars.price_range);!
-----------! id | name | price_range !----+---------------------+---------------! 5 | Ford Mustang | [11000,15001)! 6 | Lincoln Continental | [12000,14001)
54
http://www.postgresql.org/docs/current/static/functions-range.html
![Page 55: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/55.jpg)
Ranges + GiSTobdt=# CREATE INDEX ranges_bounds_gist_idx ON cars USING gist (bounds);!
"obdt=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE int4range(500,1000) && bounds;!
------------!Bitmap Heap Scan on ranges !(actual time=0.283..0.370 rows=653 loops=1)! Recheck Cond: ('[500,1000)'::int4range && bounds)! -> Bitmap Index Scan on ranges_bounds_gist_idx (actual time=0.275..0.275 rows=653 loops=1)! Index Cond: ('[500,1000)'::int4range && bounds)! Total runtime: 0.435 ms
55
![Page 56: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/56.jpg)
Large Search Range?test=# EXPLAIN ANALYZE SELECT * FROM ranges WHERE int4range(10000,1000000) && bounds;! QUERY PLAN !-------------! Bitmap Heap Scan on ranges! (actual time=184.028..270.323 rows=993068 loops=1)! Recheck Cond: ('[10000,1000000)'::int4range && bounds)! -> Bitmap Index Scan on ranges_bounds_gist_idx ! !(actual time=183.060..183.060 rows=993068 loops=1)! Index Cond: ('[10000,1000000)'::int4range && bounds)! Total runtime: 313.743 ms
56
![Page 57: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/57.jpg)
SP-GiST• space-partitioned generalized search tree
• ideal for non-balanced data structures
– k-d trees, quad-trees, suffix trees
– divides search space into partitions of unequal size
• matching partitioning rule = fast search
• traditionally for "in-memory" transactions, converted to play nicely with I/O
57
http://www.postgresql.org/docs/9.3/static/spgist.html
![Page 58: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/58.jpg)
GiST vs SP-‐GiST: Space
GiST Clustered SP-GiST Clustered GiST Sparse SP-GiST Sparse
100K Size 6MB 5MB 6MB 11MB
100K Time 0.5s .4s 2.5s 7.8s
250K Size 15MB 12MB 15MB 28MB
250K Time 1.5s 1.1s 6.3s 47.2s
500K Size 30MB 25MB 30MB 55MB
500K Time 3.1s 3.0s 13.9s 192s
1MM Size 59MB52MB
60MB 110MB
1MM Time 5.1s 5.7s 29.2 777s
58
![Page 59: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/59.jpg)
Schedulingobdt=# CREATE TABLE travel_log (! id serial PRIMARY KEY,! name varchar(255),! travel_range daterange,! EXCLUDE USING gist (travel_range WITH &&)!);!"obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Chicago',
daterange('2012-03-12', '2012-03-17'));!"obdt=# INSERT INTO travel_log (name, trip_range) VALUES ('Austin',
daterange('2012-03-16', '2012-03-18'));!"ERROR: conflicting key value violates exclusion constraint
"travel_log_trip_range_excl"!DETAIL: Key (trip_range)=([2012-03-16,2012-03-18)) conflicts with
existing key (trip_range)=([2012-03-12,2012-03-17)).
59
![Page 60: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/60.jpg)
Extending Ranges
obdt=# CREATE TYPE inetrange AS RANGE (!! SUBTYPE = inet!);!"obdt=# SELECT '192.168.1.8'::inet <@ inetrange('192.168.1.1', '192.168.1.10');!----------! t!"obdt=# SELECT '192.168.1.20'::inet <@ inetrange('192.168.1.1', '192.168.1.10');!----------! f
60
![Page 61: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/61.jpg)
Now For Something Unrelated
Let's talk non-relational data in PostgreSQL
61
![Page 62: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/62.jpg)
hstore• key-value store in PostgreSQL • binary storage • key / values represented as strings when
querying
CREATE EXTENSION hstore;
SELECT 'jk=>1, jm=>2'::hstore; !--------------------! "jk"=>"1", "jm"=>"2"
62
http://www.postgresql.org/docs/current/static/hstore.html
![Page 63: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/63.jpg)
Making hstore objectsobdt=# SELECT hstore(ARRAY['jk', 'jm'], ARRAY['1', '2']);!---------------------! "jk"=>"1", "jm"=>"2"!"obdt=# SELECT hstore(ARRAY['jk', '1', 'jm', '2']);!---------------------! "jk"=>"1", "jm"=>"2"!"obdt=# SELECT hstore(ROW('jk', 'jm'));!---------------------! "f1"=>"jk", "f2"=>"jm"
63
![Page 64: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/64.jpg)
Accessing hstoreobdt=# SELECT ('jk=>1, jm=>2'::hstore) -> 'jk';!----------! 1!"obdt=# SELECT ('jk=>1, jm=>2'::hstore) -> ARRAY['jk','jm'];!----------! {1,2}!"obdt=# SELECT delete('jk=>1, jm=>2'::hstore, 'jm');!-----------! "jk"=>"1"
64
![Page 65: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/65.jpg)
hstore operatorsobdt=# SELECT ('jk=>1, jm=>2'::hstore) @> 'jk=>1'::hstore;!----------! t!
"obdt=# SELECT ('jk=>1, jm=>2'::hstore) ? 'sf';!----------!f!
"obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?& ARRAY['jk', 'sf'];!----------!f!
"obdt=# SELECT ('jk=>1, jm=>2'::hstore) ?| ARRAY['jk', 'sf'];!----------!t
65
![Page 66: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/66.jpg)
hstore Performance
66
obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';!-----------------------! Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.071..214.007 rows=1 loops=1)! Filter: (data ? '3'::text)! Rows Removed by Filter: 999999! Total runtime: 214.028 ms
obdt=# CREATE INDEX keypairs_data_gin_idx ON keypairs USING gin(data);!"obdt=# EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? '3';!--------------! Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual time=0.046..0.046 rows=1 loops=1)! Recheck Cond: (data ? '3'::text)! -> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual time=0.041..0.041 rows=1 loops=1)! Index Cond: (data ? '3'::text)! Total runtime: 0.073 ms
![Page 67: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/67.jpg)
JSON and PostgreSQL• Started in 2010 as a Google Summer of Code Project
• https://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010
• Goal:
• be similar to XML data type functionality in Postgres
• be committed as an extension for PostgreSQL 9.1
67
![Page 68: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/68.jpg)
What Happened?• Different proposals over how to finalize the
implementation
• binary vs. text
• Core vs Extension
• Discussions between “old” vs. “new” ways of packaging for extensions
68
![Page 69: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/69.jpg)
Foreshadowing
69
![Page 70: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/70.jpg)
Foreshadowing
70
![Page 71: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/71.jpg)
PostgreSQL 9.2: JSON• JSON data type in core PostgreSQL
• based on RFC 4627
• only “strictly” follows if your database encoding is UTF-8
• text-based format
• checks for validity
71
![Page 72: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/72.jpg)
PostgreSQL 9.2: JSON
obdt=# SELECT '[{"PUG": "NYC"}]'::json;!------------------! [{"PUG": "NYC"}]!
""obdt=# SELECT '[{"PUG": "NYC"]'::json;!ERROR: invalid input syntax for type json at character 8!DETAIL: Expected "," or "}", but found "]".!CONTEXT: JSON data, line 1: [{"PUG": "NYC"]
72
http://www.postgresql.org/docs/current/static/datatype-json.html
![Page 73: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/73.jpg)
PostgreSQL 9.2: JSON• array_to_json
73
obdt=# SELECT array_to_json(ARRAY[1,2,3]);!---------------! [1,2,3]
![Page 74: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/74.jpg)
PostgreSQL 9.2: JSON• row_to_json
74
obdt=# SELECT row_to_json(category) FROM category;!------------!{"cat_id":652,"cat_pages":35,"cat_subcats":17,"cat_files":0,"title":"Continents"}
![Page 75: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/75.jpg)
PostgreSQL 9.2: JSON
• In summary, within core PostgreSQL, it was a starting point
75
![Page 76: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/76.jpg)
PostgreSQL 9.3: JSON Ups its Game
• Added operators and functions to read / prepare JSON
• Added casts from hstore to JSON
76
![Page 77: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/77.jpg)
PostgreSQL 9.3: JSONOperator Description Example
-> return JSON array element OR JSON object field
’[1,2,3]’::json -> 0; ’{"a": 1, "b": 2, "c": 3}’::json -> ’b’;
->> return JSON array element OR JSON object field AS text
[’1,2,3]’::json ->> 0; ’{"a": 1, "b": 2, "c": 3}’::json ->> ’b’;
#> return JSON object using path ’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’;
#>> return JSON object using path AS text
’{"a": 1, "b": 2, "c": [1,2,3]}’::json #> ’{c, 0}’;
77
http://www.postgresql.org/docs/current/static/functions-json.html
![Page 78: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/78.jpg)
Operator GotchasSELECT * FROM category_documents!
WHERE data->’title’ = ’PostgreSQL’;!
ERROR: operator does not exist: json = unknown!
LINE 1: ...ECT * FROM category_documents WHERE data->’title’ = ’Postgre... ^HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
78
![Page 79: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/79.jpg)
Operator GotchasSELECT * FROM category_documents!
WHERE data->>’title’ = ’PostgreSQL’;!
-----------------------!
{"cat_id":252739,"cat_pages":14,"cat_subcats":0,"cat_files":0,"title":"PostgreSQL"}!
(1 row)
79
![Page 80: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/80.jpg)
For the Upcoming Examples• Wikipedia English category titles – all 1,823,644 that I
downloaded"• Relation looks something like:
80
Column | Type | Modifiers !-------------+---------+--------------------! cat_id | integer | not null! cat_pages | integer | not null default 0! cat_subcats | integer | not null default 0! cat_files | integer | not null default 0! title | text |
![Page 81: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/81.jpg)
Performance?EXPLAIN ANALYZE SELECT * FROM category_documents!
WHERE data->>’title’ = ’PostgreSQL’;!
---------------------!
Seq Scan on category_documents (cost=0.00..57894.18 rows=9160 width=32) (actual time=360.083..2712.094 rows=1 loops=1)!
Filter: ((data ->> ’title’::text) = ’PostgreSQL’::text)!
Rows Removed by Filter: 1823643!
Total runtime: 2712.127 ms
81
![Page 82: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/82.jpg)
Performance?
CREATE INDEX category_documents_idx ON category_documents (data);!
ERROR: data type json has no default operator class for access method "btree"!
HINT: You must specify an operator class for the index or define a default operator class for the data type.
82
![Page 83: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/83.jpg)
Let’s Be Clever• json_extract_path, json_extract_path_text
• LIKE (#>, #>>) but with list of args
83
SELECT json_extract_path(!! ’{"a": 1, "b": 2, "c": [1,2,3]}’::json,!! ’c’, ’0’);!--------!1
![Page 84: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/84.jpg)
Performance RevisitedCREATE INDEX category_documents_data_idx!ON category_documents!! (json_extract_path_text(data, ’title’));!"obdt=# EXPLAIN ANALYZE!SELECT * FROM category_documents!WHERE json_extract_path_text(data, ’title’) = ’PostgreSQL’;!------------! Bitmap Heap Scan on category_documents (cost=303.09..20011.96 rows=9118 width=32) (actual time=0.090..0.091 rows=1 loops=1)! Recheck Cond: (json_extract_path_text(data, VARIADIC ’{title}’::text[]) = ’PostgreSQL’::text)! -> Bitmap Index Scan on category_documents_data_idx (cost=0.00..300.81 rows=9118 width=0) (actual time=0.086..0.086 rows=1 loops=1)! Index Cond: (json_extract_path_text(data, VARIADIC ’{title}’::text[]) = ’PostgreSQL’::text)!" Total runtime: 0.105 ms!
84
![Page 85: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/85.jpg)
The Relation vs JSON• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• Index Size for “title”
• category - 89MB
• category_documents - 89MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents - 0.070ms
85
![Page 86: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/86.jpg)
JSON Aggregates• (this is pretty cool) • json_agg
86
http://www.postgresql.org/docs/current/static/functions-json.html
SELECT b, json_agg(stuff)!FROM stuff!GROUP BY b;!" b | json_agg !------+----------------------------------! neat | [{"a":4,"b":"neat","c":[4,5,6]}]! wow | [{"a":1,"b":"wow","c":[1,2,3]}, +! | {"a":3,"b":"wow","c":[7,8,9]}]! cool | [{"a":2,"b":"cool","c":[4,5,6]}]
![Page 87: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/87.jpg)
hstore gets in the game• hstore_to_json
• converts hstore to json, treating all values as strings
• hstore_to_json_loose
• converts hstore to json, but also tries to distinguish between data types and “convert” them to proper JSON representations
SELECT hstore_to_json_loose(’"a key"=>1, b=>t, c=>null, d=>12345, e=>012345, f=>1.234, g=>2.345e+4’);
----------------
{"b": true, "c": null, "d": 12345, "e": "012345", "f": 1.234, "g": 2.345e+4, "a key": 1}
87
![Page 88: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/88.jpg)
Next Steps?
• In PostgreSQL 9.3, JSON became much more useful, but…
• Difficult to search within JSON
• Difficult to build new JSON objects
88
![Page 89: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/89.jpg)
89
![Page 90: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/90.jpg)
“Nested hstore”• Proposed at PGCon 2013 by Oleg Bartunov and Teodor Sigaev
• Hierarchical key-value storage system that supports arrays too and stored in binary format
• Takes advantage of GIN indexing mechanism in PostgreSQL
• “Generalized Inverted Index”
• Built to search within composite objects
• Arrays, fulltext search, hstore
• …JSON?
90http://www.pgcon.org/2013/schedule/attachments/280_hstore-pgcon-2013.pdf
![Page 91: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/91.jpg)
How JSONB Came to Be• JSON is the “lingua franca per trasmissione la data
nella web”
• The PostgreSQL JSON type was in a text format and preserved text exactly as input
• e.g. duplicate keys are preserved
• Create a new data type that merges the nested Hstore work to create a JSON type stored in a binary format: JSONB
91
![Page 92: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/92.jpg)
JSONB ≠ BSON
BSON is a data type created by MongoDB as a “superset of JSON” "
JSONB lives in PostgreSQL and is just JSON that is stored in a binary format on disk
92
![Page 93: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/93.jpg)
JSONB Gives Us More Operators
• a @> b - is b contained within a?
• { "a": 1, "b": 2 } @> { "a": 1} -- TRUE!
• a <@ b - is a contained within b?
• { "a": 1 } <@ { "a": 1, "b": 2 } -- TRUE!
• a ? b - does the key “b” exist in JSONB a?
• { "a": 1, "b": 2 } ? 'a' -- TRUE!
• a ?| b - does the array of keys in “b” exist in JSONB a?
• { "a": 1, "b": 2 } ?| ARRAY['b', 'c'] -- TRUE!
• a ?& b - does the array of keys in "b" exist in JSONB a?
• { "a": 1, "b": 2 } ?& ARRAY['a', 'b'] -- TRUE
93
![Page 94: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/94.jpg)
JSONB Gives us GIN• Recall - GIN indexes are used to "look inside"
objects
• JSONB has two flavors of GIN:
• Standard - supports @>, ?, ?|, ?&
"
• "Path Ops" - supports only @>
94
CREATE INDEX category_documents_data_idx USING gin(data);
CREATE INDEX category_documents_path_data_idx USING gin(data jsonb_path_ops);
![Page 95: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/95.jpg)
JSONB Gives Us Flexibilityobdt=# SELECT * FROM category_documents WHERE!! data @> '{"title": "PostgreSQL"}';!"----------------! {"title": "PostgreSQL", "cat_id": 252739, "cat_files": 0, "cat_pages": 14, "cat_subcats": 0}!""obdt=# SELECT * FROM category_documents WHERE!! data @> '{"cat_id": 5432 }';!"----------------! {"title": "1394 establishments", "cat_id": 5432, "cat_files": 0, "cat_pages": 4, "cat_subcats": 2}
95
![Page 96: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/96.jpg)
JSONB Gives Us SpeedEXPLAIN ANALYZE SELECT * FROM category_documents!! WHERE data @> '{"title": "PostgreSQL"}';! !------------! Bitmap Heap Scan on category_documents (cost=38.13..6091.65 rows=1824 width=153) (actual time=0.021..0.022 rows=1 loops=1)! Recheck Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)! Heap Blocks: exact=1! -> Bitmap Index Scan on category_documents_path_data_idx (cost=0.00..37.68 rows=1824 width=0) (actual time=0.012..0.012 rows=1 loops=1)! Index Cond: (data @> '{"title": "PostgreSQL"}'::jsonb)! Planning time: 0.070 ms! Execution time: 0.043 ms
96
![Page 97: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/97.jpg)
JSONB + Wikipedia Categories: By the Numbers
• Size on Disk
• category (relation) - 136MB
• category_documents (JSON) - 238MB
• category_documents (JSONB) - 325MB
• Index Size for “title”
• category - 89MB
• category_documents (JSON with one key using an expression index) - 89MB
• category_documents (JSONB, all GIN ops) - 311MB
• category_documents (JSONB, just @>) - 203MB
• Average Performance for looking up “PostgreSQL”
• category - 0.065ms
• category_documents (JSON with one key using an expression index) - 0.070ms
• category_documents (JSONB, all GIN ops) - 0.115ms
• category_documents (JSONB, just @>) - 0.045ms
97
![Page 98: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/98.jpg)
Wow
• That was a lot of material
98
![Page 99: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/99.jpg)
In Summary• PostgreSQL has a lot of advanced data types
• They are easy to access
• They have a lot of functionality around them
• They are durable
• They perform well (but of course must be used correctly)
• Furthermore, you can extend PostgreSQL to:
• Better manipulate your favorite data type
• Create more data types
• ...well, do basically what you want it to do
99
![Page 100: On Beyond (PostgreSQL) Data Types](https://reader030.vdocuments.net/reader030/viewer/2022020218/55a931c61a28ab2b368b45fd/html5/thumbnails/100.jpg)
And That's All
• Thank You!
• Questions?
• @jkatz05
100