etcd based postgresql ha cluster
TRANSCRIPT
etcd based PostgreSQL HA cluster
TL;DR: github.com/compose/template-etcd-based-postgres-ha
Introduction
Chris Winslett
@winsletts
compose.io
reading the top 5 comments on Imgur since 2012
How we started using PostgreSQL
MongoDB was a primary datastore
launched project to understand financial metrics
required data exploration, which is brutal in MongoDB
Our database product
our platform runs databases
these databases scale automatically as a customer
increases data size
Our database product
could we run PostgreSQL on our platform?
Database operational requirements
• replicated • highly-available • no human interaction for failover • minimize core-engine
modifications • customers use entire
deployment
Tools investigated
repmgr with pgpool II
required human interaction for failover
does not use PostgreSQL streaming
pgpool was flakey on failover
Tools investigated
PostgreSQL streaming replication
no automatic failover
Tools investigated
bi-directional replicationi.e. master-master
only runs on one database per cluster
requires a patch on core engine
is automated failover too ambitious with PostgreSQL?
Learned from tools investigation
PostgreSQL should not be the canonical store of its own state, investigated:
serf - not consensus based consul - runs with consensus
etcd - run with conensus
Consulwe built the prototype on Consul
using:
locking sessions
health checks
code at: https://github.com/MongoHQ/consul_ha
ConsulCode at: https://github.com/MongoHQ/
consul_ha
Tight coupling between:
Consul interaction and
HA decision loop
Consul Diagram 1
Final Consul Diagram
Consul Results
amazing
automatically growing and shrinking Consul clusters
health checks to prevent unhealthy secondaries from acquiring locks
Consul
until, we ran into massive swap allocation.
40 GB swap allocation.
fine for prototypes, not for production.
Results from Consul
HA PostgreSQL is possible
but, we need a tool which uses our resources more wisely.
Switch to etcd
because of what we’d learned in Consul, the switch to etcd took a
day to have a working sample
Modern etcd diagramStart
Connect to etcd?
Is data directory empty?
yes
Win race to set initialization
key?yes Initialize
database
Take over lead TTL
keyStart
PostgreSQL as a
leaderless Secondary
no
yes
Leader owns key?
pg_basebackup from leader
Do I own leader key?
Acquire leader lock?
yes
Update leader
TTL lock
yes
Promote to leader
Is leader key
owned?
no
Am I following
the correct leader?
yes
Am I the healthiest member?
no
Am I the leader?
no
Wait 30 seconds
yes
yes
no
yes
Start Postgres
wait 5 seconds
no
wait 5 seconds
no
follow proper leader
no
yes
Running Loop
Start Postgres
Start-up Process
etcd features used
concensus recursivettl prevValue prevExist
https://coreos.com/docs/distributed-configuration/etcd-api/
etcd: recursive
used to find all members known to a cluster
etcd: ttl
used with our keep alive from a PostgreSQL runner
etcd: prevValue
used in conjunction with TTL to ensure the leader remains the leader when updating the TTL
etcd: prevExist
used to create a deployment initialization race
Improved with etcd
removed tight coupling in classes:
HA decision process
etcd state interaction
PostgreSQL handler
Issues with etcd
overly aggressive about consensus
instructions for optimization at https://coreos.com/docs/cluster-management/debugging/etcd-
tuning/
Issues with etcd
overly aggressive about consensus
we quit running etcd along side PostgreSQL because we wanted expanding PostgreSQL clusters
Time for live demo?