teaching postgresql to new people

PostgreSQL – Tomasz Borek

Teaching PostgreSQL to new people

@LAFK_plConsultant @

About me


Tomasz Borek

What will I tell you?

● About me (done)● Show of hands● Who „new people” might be

– And usually – in my case – are

● About teaching– Comfort zone, learners, stepping back

● Chosen approaches, features, gotchas and the like● Why, why, why● And yes, this’ll be about Postgres, but in an unusual way

Show of hands

● Developers (not PL/SQL ones)

Show of hands

● Developers● Developers (PL/SQL ones)

Show of hands

● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)

Show of hands

● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps

Show of hands

● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin

Show of hands

● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin● Trainers / consultants

Show of hands

● Developers● Developers (PL/SQL ones)● DBA (Admin, Architect)● DevOps● SysAdmin● Trainers / consultants● Other?

„New” people

Surprisingly

● Often your colleagues● Sometimes older● Sometimes more senior● Experienced● With success under their belts

Surprisingly

● Often your colleagues● Sometimes older● Sometimes more senior● Experienced● With success under their belts● Basically: FORMED already

– Or MADE, if you will

Developers are problem solvers

● Your colleagues have certain problems● Is Postgres the solution?

– Or „a solution” at least?

● And how is the learning curve– Time including

Developers are not SQL people!

● Not many know JOINs very well● Not many know how indexes work● Not many know indexes weaknesses● CTEs, window functions, procedures, cursors…● They „omit” this● Comfort zone is nice

Do not abandon them

Or they’ll abandon you

Do not abandon them

● Docs● Materials● Tools● Links to good content● Pictures, pictures, pictures● They can edit / comment (Wiki)● Your (colleagues) time

Teaching

What is YOUR problem?

● DBA wanting respite for your DB?● Malpractice in SQL queries?● Why don’t they use XYZ feature?● From tomorrow on, teach them some SQL● Migration from X to Postgres● Guidelines creation

Xun Kuang once said

不闻不若闻之 , 闻之不若见之 , 见之不若知之 , 知之不若行之

Xunzi book 8: Ruxiao, chapter 11

Xun Kuang once said


“Not having heard something is not as good as having heard it; having heard it is not as good as having seen it; having seen it is not as good as

knowing it; knowing it is not as good as putting it into practice.”


Xun Kuang paraphrase would be


“Not having heard something < having heard it; having heard it < having seen it;

having seen it < knowing it; knowing it < putting it into practice.”


How do they learn?

● „Practice makes master”– Except it doesn’t

● Learning styles● Docs still relevant

– If well-placed, accessible and easy to get in

Repetitio est mater studiorum

● Crash course● Workshop● Problem solving on their own● Docs to help● Code reviews

Comfort zone

Comfort zone

● Setup / install● Moving around● Logs, timing queries● EXPLAIN + ANALYZE● Indexes● PgSQL and variants● NoSQL + XML

Chosen features, gotchas etc.

so

How to teach Postgres?

In short

● History – battle-tested, feature-rich, used● Basics – moving around, commands, etc.● Prepare your bait accordingly

– My faves

– Advanced features

– NoSQL angle

– …

● Don’t just drink the KoolAid!

Battle-tested

● Matures since 1987● Comes in many flavours (forks)● Largest cluster – 2PBs in Yahoo ● Skype, NASA, Instagram● Stable:

– Many years on one version

– Good version support

– Every year something new

– Follows ANSI SQL standards

https://www.postgresql.org/about/users/

In-/Postgres forks

Support?

Great angles

● Procedures: Java, Perl, Python, CTEs...● Enterprise / NoSQL - handles XMLs and JSONs● Index power – spatial or geo or your own● CTEs and FDWs => great ETL or µservice● Pure dev: error reporting / logging, MVCC (dirty

read gone), own index, plenty of data types, Java/Perl/… inside

● Solid internals: processes, sec built-in,

Basics

● Setup● Psql

– Moving around

– What’s in

● Indexes● Joins● Query path● Explain, Explain Analyze

Query Path

http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals

Parser

● Syntax checks, like FRIM is not a keyword– SELECT * FRIM myTable;

● Catalog lookup– MyTable may not exist

● In the end query tree is built– Query tokenization: SELECT (keyword)

employeeName (field id) count (function call)...

Grammar and a query tree

Planner

● Where Planner Tree is built● Where best execution is decided upon

– Seq or index scan? Index or bitmap index?

– Which join order?

– Which join strategy (nested, hashed, merge)?

– Inner or outer?

– Aggregation: plain, hashed, sorted…

● Heuristic, if finding all plans too costly

Full query path

Example to explain EXPLAIN

EXPLAIN SELECT * FROM tenk1;

QUERY PLAN

------------------------------------------------------------

Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)

Explaining EXPLAIN - what


QUERY PLAN

------------------------------------------------------------


● Startup cost – time before output phase begins● Total cost – in page fetches, may change, assumed to

run node to completion● Rows – estimated number to scan (but LIMIT etc.)● Estimated average width of output from that node (in

bytes)

Explaining EXPLAIN - how


QUERY PLAN

------------------------------------------------------------


SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; //358|10k

● No WHERE, no index● Cost = disk pages read * seq page cost + rows scanned

* cpu tuple cost● 358 * 1.0 + 10000 * 0.01 = 458 // default values

Analyzing EXPLAIN ANALYZEEXPLAIN ANALYZE SELECT *

FROM tenk1 t1, tenk2 t2

WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)

-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)

Recheck Cond: (unique1 < 10)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)

Index Cond: (unique1 < 10)

-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)

Index Cond: (unique2 = t1.unique2)

Planning time: 0.181 ms

Execution time: 0.501 ms

● Actually runs the query● More info: actual times, rows removed by filter,

sort method used, disk/memory used...

Analyzing EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT *

FROM tenk1 t1, tenk2 t2

WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)

-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)

Recheck Cond: (unique1 < 10)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)

Index Cond: (unique1 < 10)

-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)

Index Cond: (unique2 = t1.unique2)

Planning time: 0.181 ms

Execution time: 0.501 ms

My Faves

● Error reporting● PL/xSQL – feel free to use Perl, Python, Ruby, Java,

LISP...● Data types

– XML and JSON handling

● Foreign Data Wrappers (FDW)● Windowing functions● Common table expressions (CTE) and recursive queries● Power of Indexes

Will DB eat your cake?

● Thanks @anandology

The cake is a lie!



Consider password VARCHAR(8)

Logging, ‘gotchas’

● Default is to stderr only● Set on CLI or in config, not through sets● Where is it?

● How to log queries… or turning log_collector on

Where is it?

● Default– data/pg_log

● Launchers can set it (Mac Homebrew/plist)● Version and config dependent

Ask DB

Logging, turn it on

● Default is to stderr only● In PG:

logging_collector = on

log_filename = strftime-patterned filename

[log_destination = [stderr|syslog|csvlog] ]

log_statement = [none|ddl|mod|all] // all

log_min_error_statement = ERROR

log_line_prefix = '%t %c %u ' # time sessionid user

Log line prefix

PL/pgSQL

● Stored procedure dilemma– Where to keep your logic?

– How your logic is NOT in your SCM

PL/pgSQL



● Over dozen of options: – Perl, Python, Ruby,

– pgSQL, Java,

– TCL, LISP…

PL/pgSQL



● Over dozen of options: – Perl, Python, Ruby,

– pgSQL, Java,

– TCL, LISP…

● DevOps, SysAdmins, DBAs… ETLs etc.

Perl function example

CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$

my ($x, $y) = @_;

if (not defined $x) {

return undef if not defined $y;

return $y;

}

return $x if not defined $y;

return $x if $x > $y;

return $y;

$$ LANGUAGE plperl;

XML or JSON support

● Parsing and retrieving XML (functions)● Valid JSON checks (type)● Careful with encoding!

– PG allows only one server encoding per database

– Specify it to UTF-8 or weep

● Document database instead of OO or rel– JSON, JSONB, HSTORE – noSQL fun welcome!

HSTORE?

CREATE TABLE example (

id serial PRIMARY KEY,

data hstore);

HSTORE?



data hstore);

INSERT INTO example (data) VALUES

('name => "John Smith", age => 28, gender => "M"'),

('name => "Jane Smith", age => 24');

HSTORE?



data hstore);

INSERT INTO example (data) VALUES

('name => "John Smith", age => 28, gender => "M"'),

('name => "Jane Smith", age => 24');

SELECT id,

data->'name'

FROM example;

SELECT id, data->'age' FROM example

WHERE data->'age' >= '25';

XML and JSON datatype

CREATE TABLE test (

...,

xml_file xml,

json_file json,

...

);

XML functions example

XMLROOT (

XMLELEMENT (

NAME gazonk,

XMLATTRIBUTES (

’val’ AS name,

1 + 1 AS num

),

XMLELEMENT (

NAME qux,

’foo’

)

),

VERSION ’1.0’,

STANDALONE YES

)

<?xml version=’1.0’ standalone=’yes’ ?>

<gazonk name=’val’ num=’2’>

<qux>foo</qux>

</gazonk>

xml '<foo>bar</foo>''<foo>bar</foo>'::xml

Architecture and internals

Check out processes

● pgrep -l postgres● htop > filter: postgres● Whatever you like / use usually● Careful with kill -9 on connections

– kill -15 better

Summary

Before

● Who are they?● What is your problem?● How large comfort zone, how to push them out?● Materials, docs, workshop preparation● How much time for training?● How much time after?● How many people will it be?● What indicates that problem is solved?

During

● Establish the goal – And – if possible – learning styles

● Promise support (and tell how!)– Push out from comfort zone!

● Ask for hard work and stupid questions● Show documentation, do live tour● Do the workshop● Involve, find best ones

– You will have them help you later

● Expect questions, make them ask– Again, push out from comfort zone!

After

● Where are the docs?– Are they using them?

● Answer the questions– Again, and again

● Code reviews– Deliver on support promise!

– Involve promising students

● Is the problem gone / better?

Don’t omit the basics

● Joins● Indexes – how they work● Query path (EXPLAIN, EXPLAIN ANALYZE)● Moving around (psql)● Setup and getting to DB

Postgres is cool

● Goodies like error reporting or log line prefix● Processes thought out● Good for µservices and enterprise● Not only SQL (XML, JSON, Perl, Python...)● Ask DB● Indexes● Powerful: CTEs, recursive queries, FDWs...● Battle tested and always high

Teaching Postgres – Tomasz Borek

Teaching Postgres to new people


teaching postgresql to new people

Education