who wants a service with zero downtime? · postgresql’s replication part of core (fully open...
TRANSCRIPT
![Page 1: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/1.jpg)
WHO WANTS A SERVICE WITH ZERO DOWNTIME?
![Page 2: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/2.jpg)
… EVERYBODY
![Page 3: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/3.jpg)
IS IT THAT GOOD?
![Page 4: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/4.jpg)
NOT JUST TECHNOLOGY. RISKS, PROCEDURES, PEOPLE
![Page 5: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/5.jpg)
FROM 0 TO ~100: BUSINESS CONTINUITY WITH POSTGRESQL
Giulio Calacoci Senior Developer @ 2ndQuadrant
DataOps 2019 Barcellona
![Page 6: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/6.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ABOUT MYSELF
▸ Open Source passionate since early 2k
▸ Member of the Italian and European PostgreSQL community
▸ Lean and DevOps practitioner
▸ Open Source Developer
▸ Member of the Barman team
▸ Continuous Delivery Architect @2ndQuadrant
▸ 24/7 support engineer @2ndQuadrant
![Page 7: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/7.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
BUSINESS CONTINUITY
▸ Disaster Recovery
▸ High Availability
▸ Types of disaster/failures
▸ Availability = Uptime / (Uptime + Downtime)
![Page 8: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/8.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
OBJECTIVES
▸ Recovery Point Objective (RPO)
▸ How much data can I afford to lose?
▸ Recovery Time Objective (RTO)
▸ How long will it take me to recover?
![Page 9: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/9.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SERVICE RELIABILITY
▸ Cost of downtime
▸ How many €/$/£/AUD/AED/…?
▸ Risk management
▸ SLI, SLO and SLA
![Page 10: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/10.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SOME NOTES FOR THIS PRESENTATION
▸ PostgreSQL on Linux
▸ Servers can be either physical or virtual
▸ Storage must be redundant
▸ RAID is required
▸ VOLUME: redundant disk mounted on a system
![Page 11: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/11.jpg)
LET’S START
![Page 12: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/12.jpg)
0. ONE POSTGRES SERVER
![Page 13: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/13.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Server name: HOPE
![Page 14: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/14.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ Why is RPO = ∞?
▸ Why is RTO = n/a?
▸ “Hope is not a strategy” (cit. Google)
▸ More common than you’d expect
![Page 15: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/15.jpg)
10. ONE POSTGRES SERVER + LOGICAL BACKUPS
![Page 16: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/16.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Add systematic backups with pg_dump
LOGICAL BACKUP LOGICAL
BACKUPLOGICAL BACKUP …
Day 04AM
Day -1 4AM
Day -2 4AM
![Page 17: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/17.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ How do you feel now?
▸ Still: RPO = ∞ and RTO = n/a. Why?
▸ A backup is valid only if you have tested it
▸ Unfortunately, this is very common
![Page 18: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/18.jpg)
20. ONE POSTGRES SERVER + LOGICAL BACKUPS + LOGICAL RESTORES
![Page 19: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/19.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Test your backups with pg_restore
LOGICAL BACKUP
Day 04AM
![Page 20: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/20.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
DEFINING SOME OBJECTIVES
▸ Measure time for pg_restore
▸ RPO = backup frequency
▸ RTO = maximum time of recovery
▸ Provision another server
▸ Configure another server (automated, right?)
▸ Time to restore the last backup (measure it)
![Page 21: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/21.jpg)
HAVE WE REALLY THOUGHT ABOUT EVERYTHING?
![Page 22: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/22.jpg)
TIME OF REACTION
![Page 23: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/23.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ Can this architecture work for you?
▸ We need reliable monitoring
▸ From now on, we assume we have it in place!
▸ We need to reduce both RPO and RTO
![Page 24: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/24.jpg)
HOW?POINT-IN-TIME-RECOVERY
![Page 25: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/25.jpg)
Using a time machine
![Page 26: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/26.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
POSTGRESQL’S PITR
▸ Part of core (fully open source)
▸ Rebuild a cluster at a point in time
▸ From crash recovery to sync streamrep (physical/logical)
▸ RPO = 0 (zero data loss)
▸ Hot base backup, continuous WAL archiving, Recovery
▸ API
![Page 27: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/27.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
BASIC CONCEPTS
▸ Continuous copy of WAL data (continuous archiving)
▸ Physical base backups
▸ Recovery:
▸ copy base backup to another location
▸ recovery mode (replay of WALs until target)
![Page 28: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/28.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
BARMAN
▸ Latest version: Barman 2.8
▸ Open Source (GNU GPL 3)
▸ Written in Python
▸ Developed and maintained by 2ndQuadrant
▸ Available at www.pgbarman.org
![Page 29: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/29.jpg)
40. ONE POSTGRES SERVER + ONE BARMAN SERVER
![Page 30: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/30.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Continuous backup
![Page 31: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/31.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
BASIC CONCEPTS
▸ Remote backup and recovery
▸ Multiple server management
▸ Backup catalogue and WAL archive
▸ Retention policies
![Page 32: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/32.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
COPY METHOD
▸ PostgreSQL streaming
▸ Practical/Windows/Docker
▸ Rsync/SSH
▸ Incremental backup and recovery (via hard links)
▸ Parallel backup and recovery
▸ Network compression and bandwidth limitation
![Page 33: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/33.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
WAL SHIPPING METHOD
▸ “archiving”, through “archive_command”:
▸ RPO ~ 16MB of WAL data, or
▸ “archive_timeout”
▸ “streaming”, through streaming replication:
▸ “pg_receivewal” or “pg_receivexlog”
▸ continuous stream, RPO ~ 0
▸ PostgreSQL 9.2+ required
![Page 34: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/34.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
EXAMPLE FROM POSTGRESQL.CONF
archive_mode = on
wal_level = logical
max_wal_senders = 10
max_replication_slots = 10
archive_command = 'rsync -a %p
barman@HOST:/var/lib/barman/ID/incoming'
![Page 35: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/35.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
EXAMPLE FROM BARMAN.CONF[stark] description = “Tony Stark database" ssh_command = ssh postgres@stark conninfo = user=barman-avengers dbname=postgres host=stark retention_policy = RECOVERY WINDOW OF 6 MONTHS copy_method = rsync reuse_backup = link parallel_jobs = 4 archiver = true streaming_archiver = true slot_name = barman_streaming
![Page 36: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/36.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ How do you feel now?
▸ Still: RPO = ∞ and RTO = n/a. Why?
▸ A backup is valid only if you have tested it
▸ Barman reduces backup risks, does not exclude them
▸ Systematic tests (especially custom scripts)
▸ Business risk is very high
![Page 37: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/37.jpg)
60. ONE POSTGRES SERVER + ONE BARMAN SERVER + ONE RECOVERY SERVER
![Page 38: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/38.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Test your backups with barman
recover
![Page 39: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/39.jpg)
WHAT A WASTE!
![Page 40: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/40.jpg)
TESTING OR BI?HAVE YOU EVER THOUGHT OF USING IT FOR
![Page 41: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/41.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
HOOK SCRIPTS
▸ Barman has hook scripts:
▸ pre and post backup
▸ pre and post archiving
▸ with retry option (until the script returns SUCCESS)
![Page 42: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/42.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
EXAMPLE OF RECOVERY SCRIPT
▸ Write a bash script that:
▸ connects to a remote server via SSH
▸ stops the PostgreSQL server
▸ issues a “barman recover” with target “immediate”
▸ starts the PostgreSQL
▸ Set it as post-backup script
![Page 43: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/43.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SOME FOOD FOR THOUGHT
▸ Outcomes:
▸ Systematically test your backup
▸ Measure your recovery time
▸ Identical server? This is a backup server ready to start
▸ You can use a different data centre
▸ Be creative, PostgreSQL gives you infinite freedom!
![Page 44: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/44.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ RPO ~ 0 (your backups work, every time)
▸ RTO = Time of reaction + Recovery time
▸ Example: RPO ~0 and RTO < 1 day
▸ Acceptable or not acceptable?
▸ Entry level architecture for business continuity
▸ Priority now: improve RTO
![Page 45: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/45.jpg)
HOW?REPLICATION
![Page 46: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/46.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
POSTGRESQL’S REPLICATION
▸ Part of core (fully open source)
▸ One master, multiple standby servers
▸ Evolution of PITR
▸ Standby server is in continuous recovery mode
▸ Hot standby (read-only)
▸ Both streaming (9.0+) and file based pulling of WAL
▸ Cascading from a standby
![Page 47: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/47.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SYNCHRONOUS REPLICATION
▸ Fine control (from global down to transaction level)
▸ 2-safe replication
▸ COMMIT of a write transactions waits until written on both the master and a standby (or more from 9.6)
▸ Read consistency of a cluster
▸ RPO = 0 (zero data loss)
![Page 48: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/48.jpg)
80. TWO POSTGRES SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER
![Page 49: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/49.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
barman_restore_wal
barman recover
Symmetric Cluster
master standby
STARK ROGERS
![Page 50: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/50.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
EXCERPT FROM ROGERS POSTGRESQL’S CONFIGURATIONpostgresql.conf:
hot_standby = on
recovery.conf:
standby_mode = ‘on' # Streaming primary_conninfo = ‘host=stark user=replica application_name=ha sslmode=require’ # Fallback via Barman restore_command = 'barman-wal-restore -U barman avengers stark %f %p’
![Page 51: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/51.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SWITCHOVER (PLANNED)
▸ Applications are paused (start of downtime)
▸ Shut down the master
▸ Allow the standby to catch up with the master
▸ Promote the standby
▸ Switch virtual IPs
▸ Resume applications (end of downtime)
▸ Reconfigure the former master as standby
![Page 52: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/52.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
FAILOVER (UNPLANNED)
▸ The master is down (start of downtime)
▸ Promote the standby
▸ Change the virtual IP
▸ DEGRADED SYSTEM
![Page 53: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/53.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
MANUAL SWITCHOVER AND FAILOVER
▸ Manual switchover != manual switchover procedure
▸ Manual switchover = manually triggered
▸ Automate the procedure!!!
▸ bash (good)
▸ Ansible (better)
▸ Enhance gradually
![Page 54: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/54.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
RECAP
▸ RPO ~ 0 (your backups work, every time)
▸ RTO = Time of reaction + Time of promotion
▸ Criticality: manual intervention
▸ Reliable monitoring
▸ Trained people (practice & docs!)
![Page 55: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/55.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
MANUAL FAILOVER VS AUTOMATED FAILOVER
▸ Risk management
▸ Split brain nightmare
▸ Automated is built on manual (test!)
▸ Your choice
▸ Very good solution for business continuity
▸ Uptime > 99.99% in a year
![Page 56: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/56.jpg)
90. TWO POSTGRES SYNC SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER
![Page 57: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/57.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
barman_restore_wal
barman recover
Synchronous
ZERO DATA LOSS
![Page 58: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/58.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
SYNCHRONOUS REPLICATION
▸ Primary: Barman
▸ Zero data loss backup
▸ Primary: Standby
▸ Zero data loss cluster (reduce RTO)
▸ Just one configuration line in PostgreSQL
▸ synchronous_standby_names = '1 (ha, barman_receive_wal)'
![Page 59: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/59.jpg)
~100. TWO POSTGRES SYNC SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER + REPMGR (AUTO-FAILOVER)
![Page 60: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/60.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ARCHITECTURE
Potential synchronous
Synchronous
repmgr repmgr
repmgr witness
![Page 61: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/61.jpg)
WHAT’S MORE?
![Page 62: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/62.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
PUSH THE BOUNDARIES
▸ Repeatable architectures in multiple data centres
▸ PgBouncer
▸ Virtual IPs
▸ S3 relay via Barman hook scripts
▸ Multiple standby servers and cascading replication
▸ Docker containers
▸ Logical replication backups
![Page 63: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/63.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
CONCLUSIONS
▸ Babysteps and KISS
▸ New? Explore and learn
▸ Practice is the only way to mastery (drills)
▸ Plan regular healthy downtimes
▸ Use switchovers to perform PostgreSQL updates
▸ Smart downtimes increase long-term uptime
![Page 64: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/64.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
ANY QUESTIONS?
▸ PostgreSQL: www.postgresql.org
▸ Barman: www.pgbarman.org #pgbarman
▸ PgBouncer: pgbouncer.github.io
▸ Repmgr: www.repmgr.org
▸ Our blog: blog.2ndquadrant.com
![Page 65: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in](https://reader033.vdocuments.net/reader033/viewer/2022060320/5f0ce7657e708231d437b4a7/html5/thumbnails/65.jpg)
2ndquadrant.com
@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity
LICENCE
Attribution 4.0 International (CC BY 4.0)
You are free to:
▸ Share — copy and redistribute the material in any medium or format
▸ Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.