postgres vienna db meetup 2014

PostgreSQL...awesome?

So, philip gave me the title for the talk and I've to run with it! ;)

Michael Renner@terrorobe

https://pganalyze.com

Mein Name ist Michael Renner

Twitter Handle - der mich auch schon in Probleme gebracht hat.

Web Operations, starkes Interesse an Datenbanken, Skalierung und Performance.

PG-Enthusiast seit 2004

If you've got questions - please just ask!



Quick poll!

Who's using Postgres?

...with replication?

Postgres.A free RDBMS done right

Relational database management system

It does SELECT, INSERT, UPDATE, DELETE

In a sane & maintainable way.

Tries hard to not surprise users, hype resistant.

Community-driven.

No single commercial entity behind the project.

Multiple consulting companies, distros, large companies who are core developers and have commit access.

One major release per yearFive years maintenance

Multiple maintenance releases per year

Does...

Friendly & Competent Community

• http://www.postgresql.org/list/

• Freenode: #postgresql(-de)

• http://pgconf.(de|eu|us)

more often than not the consultants from various companies are hanging out in the channels

http://www.postgresql.org/list/

http://www.postgresql.org/list/

http://pgconf

http://pgconf

9.4 ante portas~Sep 2014

http://www.postgresql.org/docs/devel/static/release-9-4.html

That being said, the next major release will come after the summer, extrapolating from past releases it should be here around September.

It'll bring quite a bit of new features, I selected a few interesting ones.



"ordered-set aggregate functions"

http://www.postgresql.org/docs/devel/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE

... are aggregate functions over ordered sets!

Aggregate functions are things like sum or count which can operate on random sets of data.

If the set is ordered you can do additional things like...





Calculate 95th percentile

postgres=# SELECT percentile_disc(0.95) WITHIN GROUP(ORDER BY i) FROM generate_series(1,100) AS s(i); percentile_disc ----------------- 95(1 row)

...calculate percentiles

json(b)

http://www.postgresql.org/docs/devel/static/datatype-json.html

Most importantly - native datatype with jsonb

In the past, stored only text which was validated as correct json

Now separate on-disk representation format

Bit more expensive while writing (serialization)but much faster while querying, since json doesn't need to be reparsed each time while accessing.



New JSON functions$ SELECT * FROM json_to_recordset('[{"name":"e","value":2.718},{"name":"pi","value":3.141},{"name":"tau","value":6.283}

]', TRUE) AS x (name text, value numeric);

name | value ------+------- e | 2.718 pi | 3.141 tau | 6.283(3 rows)

http://www.postgresql.org/docs/devel/static/functions-json.htmlhttp://www.depesz.com/2014/01/30/waiting-for-9-4-new-json-functions/

...and to complement the new data type, there are also new accessor functions

http://www.postgresql.org/docs/devel/static/functions-json.html

http://www.postgresql.org/docs/devel/static/functions-json.html

http://www.depesz.com/2014/01/30/waiting-for-9-4-new-json-functions/

http://www.depesz.com/2014/01/30/waiting-for-9-4-new-json-functions/

Replication features......covered later

and quite a bit of new replication features.

which we'll cover later

Database Replication

Which brings us right up to the replication

A tale of sorrowsor: "Brewer hates us"

If you've got a strong stomach, read through: http://aphyr.com/tags/jepsen

which is a tale of sorrows, and this is not limited to Postgres or SQL databases.

Getting distributed database systems right is _HARD_.

And even the distributed database poster childs get it wrong

http://aphyr.com/tags/jepsen

http://aphyr.com/tags/jepsen

Brewer's CAP Theorem

• it is impossible for a distributed system to simultaneously provide these guarantees:

• Consistency

• Availability

• Partition tolerance

In a nutshell

Consistency - all nodes see the same data at the same timeAvailability - a guarantee that every request receives a response about whether it was successful or failedPartition tolerance - the system continues to operate despite arbitrary message loss or failure of part of the system

Brewer says: It's impossible to get all three

Managers like things available & partition tolerant

PG Mantra:Scale up, not out

Postgres, in the past, solved this problem by not dealing with it in the first place!

So that we don't have to bother with this, most people will usually tell you to just scale up

Throw more/bigger hardware at the problem and be done with it.

Real world says: "NO"

But that's not always possible.

You might need to have geo-redundant database servers, you might run in an environment where "scaling up" is no feasible option (hello ec2!)

So we need replication.What are our options?

So we need replication... Postgres has a bit of a Perl problem - TMTOWTDI

shared storage

...one of the oldest options

Usually achieved by using a SAN or DRBD

HA solution tacked on top of it, if one server goes down, other starts up

Trigger-based

Add a trigger to all replicated tablesChanges get written to a separate tableDaemon reads changes from source DB and writes to destination DB

Statement-based

or "The proxy approach"

Connect to middleware instead of real databaseAll queries executed on middleware will be sent to many databases

That's fine until one of the servers isn't reachable!

(Write Ahead) Log-based

And the most common ones

* Postgres writes all changes it does to the table & index files into a log, which would be used during crash recovery* Send log contents to a secondary server* Secondary server does "continuous crash recovery"

What should you use?

With all those options the question that comes up is...

and since "it depends" is probably not a sufficient answer for most of you

For now:log-based

asynchronousmaster→slave

I'd recommend to look at log-based replication first and only reconsider this when you're sure it won't fit you

Has it's own bag of things to look out for, but the stuff where most of development and operations resources are spent nowadays

http://www.fileformat.info/info/unicode/char/2192/index.htm

http://www.fileformat.info/info/unicode/char/2192/index.htm

Two flavors

• Log-Shipping

• Completed WAL-segments are copied to slave and applied there

• Streaming replication

• Transactions are streamed to slave servers

• Can also be configured for synchronous replication

Log-based replication in Postgres comes in two flavors

On WAL handling

• Server generates WAL with every modifying operation, 16MB segments

• Normally gets rotated after successful checkpoint

• Lots of conditions and config settings that can change the behaviour

• Slave needs base copy from master + all WAL files to reach consistent state

Master config

$ $EDITOR pg_hba.conf

host replication replication 192.0.2.0/24 trust

$ $EDITOR postgresql.conf

wal_level = hot_standbymax_wal_senders = 5wal_keep_segments = 32

http://wiki.postgresql.org/wiki/Streaming_Replicationhttp://www.postgresql.org/docs/current/static/warm-standby.html

This is a strict streaming replication example, no log archiving

If the slave server is offline too long, it needs to be freshly initialized from the master.

http://wiki.postgresql.org/wiki/Streaming_Replication

http://wiki.postgresql.org/wiki/Streaming_Replication

http://www.postgresql.org/docs/9.3/static/warm-standby.html

http://www.postgresql.org/docs/9.3/static/warm-standby.html

Slave config$ pg_basebackup -R -D /path/to/cluster --host=master --port=5432

$ $EDITOR postgresql.conf

hot_standby = on

$ $EDITOR recovery.conf

standby_mode = 'on'primary_conninfo = 'host=master port=5432 user=replication'trigger_file = '/path/to/trigger'

Caveats• Slaves are 100% identical to master

• No selective replication (DBs, Tables, etc.)

• No slave-only indexes

• WAL segment handling can be tricky

• Slave Query conflicts due to master TXs

• Excessive disk space usage on master

• Broken replication due to already-recycled segments on master

But when running with log based replication there are things to look out for

Coming in 9.4Q3 2014

All of the stuff works out of the box with 9.3

There are a few new things coming in postgres 9.4

Logical decoding

One of the most interesting additions is logical decoding

Master Server generates a list of tuple modificationsSimilar to trigger-based replication, but much more efficient and easier to maintainAlmost identical to "row based replication" format in MySQL

$ INSERT INTO z (whatever) VALUES ('row2');INSERT 0 1 $ SELECT * FROM pg_logical_slot_get_changes('depesz', null, null, 'include-xids', '0'); location | xid | data ------------+-----+------------------------------------------------------------ 0/5204A858 | 932 | BEGIN 0/5204A858 | 932 | table public.z: INSERT: id[integer]:1 whatever[text]:'row2' 0/5204A928 | 932 | COMMIT(3 rows)

http://www.depesz.com/2014/03/06/waiting-for-9-4-introduce-logical-decoding/

Here's an example of what logical decoding will produce

You can find more extensive examples at Hubert Depesz blog





Replication slots

Replication slots are an additional feedback mechanism between slave and master to communicate which WAL files are still needed

Also the backbone for logical replication

Time-delayed replication

Time-delayed rep allows an additional mechanism against operational accidents...

commit/checkpoint records are only applied after a configured time value has passed since the TX has been completed

What's coming in 9.5+?

These were the things that are already included in 9.4,

for the coming development cycles there're already a few things in the pipeline

Logical replication cont'd

What's currently missing is a reliable consumer for the data generated by 9.4 logical replication

People, mostly Andres Freund from 2nd Quadrant, are working on this topic and I expect that there's more to talk about next year with 9.5

Will be possible to build Galera-Like systems with the infrastructure

SQL MERGE"Upserts"

http://wiki.postgresql.org/wiki/SQL_MERGEhttp://www.postgresql.org/message-id/CAM3SWZTG4pnn5DfVm0J6e_f

[email protected]

...or INSERT ON DUPLICATE KEY ...

Was planned for 9.4, but turned out to be more complicated than anticipated

Developer meeting later this year where the course of action will be decided

http://wiki.postgresql.org/wiki/SQL_MERGE






Fragen?Ideen?

Abschliessendes?

That's all for now

Any questions, ideas?

Danke!Michael Renner

@terrorobehttps://pganalyze.com

Thanks!

You can hit me up on twitter or via Mail



Link CollectionTrigger-Based:

http://bucardo.org/wiki/Bucardohttp://slony.info/

Statement-based:

http://www.pgpool.net/mediawiki/index.php/Main_Page

Log-Shipping/Streaming replication:

https://github.com/2ndQuadrant/repmgrhttps://github.com/omniti-labs/omnipitr

Backup:

http://dalibo.github.io/pitrery/http://www.pgbarman.org/http://www.postgresql.org/docs/current/static/app-pgbasebackup.htmlhttp://www.postgresql.org/docs/current/static/app-pgreceivexlog.html

And there's also a link collection of tools and projects to look at when you're building your own replication setup



https://github.com/2ndQuadrant/repmgr

https://github.com/2ndQuadrant/repmgr

https://github.com/omniti-labs/omnipitr

https://github.com/omniti-labs/omnipitr

http://dalibo.github.io/pitrery/

http://dalibo.github.io/pitrery/

http://www.pgbarman.org/

http://www.pgbarman.org/

http://www.postgresql.org/docs/current/static/app-pgbasebackup.html

http://www.postgresql.org/docs/current/static/app-pgbasebackup.html

http://www.postgresql.org/docs/current/static/app-pgreceivexlog.html

http://www.postgresql.org/docs/current/static/app-pgreceivexlog.html

postgres vienna db meetup 2014

Technology

database replication

aggregate functions

jsonb http

new json functions

percentile postgres

new replication features

rows http

real database