teaching postgresql to new people
TRANSCRIPT
PostgreSQL – Tomasz Borek
Teaching PostgreSQL to new people
@LAFK_plConsultant @
About me
@LAFK_plConsultant @
Tomasz Borek
What will I tell you?
● About me (done)● Show of hands● Who „new people” might be
– And usually – in my case – are
● About teaching– Comfort zone, learners, stepping back
● Chosen approaches, features, gotchas and the like● Why, why, why● And yes, this’ll be about Postgres, but in an unusual way
Show of hands
● Developers (not PL/SQL ones)
Show of hands
● Developers● Developers (PL/SQL ones)
Show of hands
● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)
Show of hands
● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps
Show of hands
● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin
Show of hands
● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin● Trainers / consultants
Show of hands
● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin● Trainers / consultants● Other?
„New” people
Surprisingly
● Often your colleagues● Sometimes older● Sometimes more senior● Experienced● With success under their belts
Surprisingly
● Often your colleagues● Sometimes older● Sometimes more senior● Experienced● With success under their belts● Basically: FORMED already
– Or MADE, if you will
Developers are problem solvers
● Your colleagues have certain problems● Is Postgres the solution?
– Or „a solution” at least?
● And how is the learning curve– Time including
Developers are not SQL people!
● Not many know JOINs very well● Not many know how indexes work● Not many know indexes weaknesses● CTEs, window functions, procedures, cursors…● They „omit” this● Comfort zone is nice
Do not abandon them
Or they’ll abandon you
Do not abandon them
● Docs● Materials● Tools● Links to good content● Pictures, pictures, pictures● They can edit / comment (Wiki)● Your (colleagues) time
Teaching
What is YOUR problem?
● DBA wanting respite for your DB?● Malpractice in SQL queries?● Why don’t they use XYZ feature?● From tomorrow on, teach them some SQL● Migration from X to Postgres● Guidelines creation
Xun Kuang once said
不闻不若闻之 , 闻之不若见之 , 见之不若知之 , 知之不若行之
Xunzi book 8: Ruxiao, chapter 11
Xun Kuang once said
不闻不若闻之 , 闻之不若见之 , 见之不若知之 , 知之不若行之
“Not having heard something is not as good as having heard it; having heard it is not as good as having seen it; having seen it is not as good as
knowing it; knowing it is not as good as putting it into practice.”
Xunzi book 8: Ruxiao, chapter 11
Xun Kuang paraphrase would be
不闻不若闻之 , 闻之不若见之 , 见之不若知之 , 知之不若行之
“Not having heard something < having heard it; having heard it < having seen it;
having seen it < knowing it; knowing it < putting it into practice.”
Xunzi book 8: Ruxiao, chapter 11
How do they learn?
● „Practice makes master”– Except it doesn’t
● Learning styles● Docs still relevant
– If well-placed, accessible and easy to get in
Repetitio est mater studiorum
● Crash course● Workshop● Problem solving on their own● Docs to help● Code reviews
Comfort zone
Comfort zone
● Setup / install● Moving around● Logs, timing queries● EXPLAIN + ANALYZE● Indexes● PgSQL and variants● NoSQL + XML
Chosen features, gotchas etc.
so
How to teach Postgres?
In short
● History – battle-tested, feature-rich, used● Basics – moving around, commands, etc.● Prepare your bait accordingly
– My faves
– Advanced features
– NoSQL angle
– …
● Don’t just drink the KoolAid!
Battle-tested
● Matures since 1987● Comes in many flavours (forks)● Largest cluster – 2PBs in Yahoo ● Skype, NASA, Instagram● Stable:
– Many years on one version
– Good version support
– Every year something new
– Follows ANSI SQL standards
https://www.postgresql.org/about/users/
In-/Postgres forks
Support?
Great angles
● Procedures: Java, Perl, Python, CTEs...● Enterprise / NoSQL - handles XMLs and JSONs● Index power – spatial or geo or your own● CTEs and FDWs => great ETL or µservice● Pure dev: error reporting / logging, MVCC (dirty
read gone), own index, plenty of data types, Java/Perl/… inside
● Solid internals: processes, sec built-in,
Basics
● Setup● Psql
– Moving around
– What’s in
● Indexes● Joins● Query path● Explain, Explain Analyze
Query Path
http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals
Parser
● Syntax checks, like FRIM is not a keyword– SELECT * FRIM myTable;
● Catalog lookup– MyTable may not exist
● In the end query tree is built– Query tokenization: SELECT (keyword)
employeeName (field id) count (function call)...
Grammar and a query tree
Planner
● Where Planner Tree is built● Where best execution is decided upon
– Seq or index scan? Index or bitmap index?
– Which join order?
– Which join strategy (nested, hashed, merge)?
– Inner or outer?
– Aggregation: plain, hashed, sorted…
● Heuristic, if finding all plans too costly
Full query path
Example to explain EXPLAIN
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
Explaining EXPLAIN - what
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
● Startup cost – time before output phase begins● Total cost – in page fetches, may change, assumed to
run node to completion● Rows – estimated number to scan (but LIMIT etc.)● Estimated average width of output from that node (in
bytes)
Explaining EXPLAIN - how
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; //358|10k
● No WHERE, no index● Cost = disk pages read * seq page cost + rows scanned
* cpu tuple cost● 358 * 1.0 + 10000 * 0.01 = 458 // default values
Analyzing EXPLAIN ANALYZEEXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
● Actually runs the query● More info: actual times, rows removed by filter,
sort method used, disk/memory used...
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
My Faves
● Error reporting● PL/xSQL – feel free to use Perl, Python, Ruby, Java,
LISP...● Data types
– XML and JSON handling
● Foreign Data Wrappers (FDW)● Windowing functions● Common table expressions (CTE) and recursive queries● Power of Indexes
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
The cake is a lie!
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Consider password VARCHAR(8)
Logging, ‘gotchas’
● Default is to stderr only● Set on CLI or in config, not through sets● Where is it?
● How to log queries… or turning log_collector on
Where is it?
● Default– data/pg_log
● Launchers can set it (Mac Homebrew/plist)● Version and config dependent
Ask DB
Logging, turn it on
● Default is to stderr only● In PG:
logging_collector = on
log_filename = strftime-patterned filename
[log_destination = [stderr|syslog|csvlog] ]
log_statement = [none|ddl|mod|all] // all
log_min_error_statement = ERROR
log_line_prefix = '%t %c %u ' # time sessionid user
Log line prefix
PL/pgSQL
● Stored procedure dilemma– Where to keep your logic?
– How your logic is NOT in your SCM
PL/pgSQL
● Stored procedure dilemma– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of options: – Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
PL/pgSQL
● Stored procedure dilemma– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of options: – Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
● DevOps, SysAdmins, DBAs… ETLs etc.
PL/pgSQL
● Stored procedure dilemma– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of options: – Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
● DevOps, SysAdmins, DBAs… ETLs etc.
Perl function example
CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
my ($x, $y) = @_;
if (not defined $x) {
return undef if not defined $y;
return $y;
}
return $x if not defined $y;
return $x if $x > $y;
return $y;
$$ LANGUAGE plperl;
XML or JSON support
● Parsing and retrieving XML (functions)● Valid JSON checks (type)● Careful with encoding!
– PG allows only one server encoding per database
– Specify it to UTF-8 or weep
● Document database instead of OO or rel– JSON, JSONB, HSTORE – noSQL fun welcome!
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data) VALUES
('name => "John Smith", age => 28, gender => "M"'),
('name => "Jane Smith", age => 24');
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data) VALUES
('name => "John Smith", age => 28, gender => "M"'),
('name => "Jane Smith", age => 24');
SELECT id,
data->'name'
FROM example;
SELECT id, data->'age' FROM example
WHERE data->'age' >= '25';
XML and JSON datatype
CREATE TABLE test (
...,
xml_file xml,
json_file json,
...
);
XML functions example
XMLROOT (
XMLELEMENT (
NAME gazonk,
XMLATTRIBUTES (
’val’ AS name,
1 + 1 AS num
),
XMLELEMENT (
NAME qux,
’foo’
)
),
VERSION ’1.0’,
STANDALONE YES
)
<?xml version=’1.0’ standalone=’yes’ ?>
<gazonk name=’val’ num=’2’>
<qux>foo</qux>
</gazonk>
xml '<foo>bar</foo>''<foo>bar</foo>'::xml
Architecture and internals
Check out processes
● pgrep -l postgres● htop > filter: postgres● Whatever you like / use usually● Careful with kill -9 on connections
– kill -15 better
Summary
Before
● Who are they?● What is your problem?● How large comfort zone, how to push them out?● Materials, docs, workshop preparation● How much time for training?● How much time after?● How many people will it be?● What indicates that problem is solved?
During
● Establish the goal – And – if possible – learning styles
● Promise support (and tell how!)– Push out from comfort zone!
● Ask for hard work and stupid questions● Show documentation, do live tour● Do the workshop● Involve, find best ones
– You will have them help you later
● Expect questions, make them ask– Again, push out from comfort zone!
After
● Where are the docs?– Are they using them?
● Answer the questions– Again, and again
● Code reviews– Deliver on support promise!
– Involve promising students
● Is the problem gone / better?
Don’t omit the basics
● Joins● Indexes – how they work● Query path (EXPLAIN, EXPLAIN ANALYZE)● Moving around (psql)● Setup and getting to DB
Postgres is cool
● Goodies like error reporting or log line prefix● Processes thought out● Good for µservices and enterprise● Not only SQL (XML, JSON, Perl, Python...)● Ask DB● Indexes● Powerful: CTEs, recursive queries, FDWs...● Battle tested and always high
Teaching Postgres – Tomasz Borek
Teaching Postgres to new people
@LAFK_plConsultant @