building tungsten clusters with postgresql hot standby and streaming replication
DESCRIPTION
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication. Linas Virbalas and Alex Alexander Continuent, Inc. Introductions PG 9 Hot Standby and Streaming Replication What is Tungsten? Tungsten + PostgreSQL Hot Standby and Streaming Replication Demo! - PowerPoint PPT PresentationTRANSCRIPT
© Continuent 2010
Building Tungsten Clusters Building Tungsten Clusters with PostgreSQL Hot Standby with PostgreSQL Hot Standby and Streaming Replicationand Streaming Replication
Linas Virbalas and Alex Alexander
Continuent, Inc.
© Continuent 2010
AgendaAgenda
/ Introductions
/ PG 9 Hot Standby and Streaming Replication
/ What is Tungsten?
/ Tungsten + PostgreSQL Hot Standbyand Streaming Replication
/ Demo!
/ Questions and Comments
© Continuent 2010
IntroductionsIntroductions
© Continuent 2010
About ContinuentAbout Continuent
/ Our Business: Continuous Data Availability
/ Our Solution• Continuent Tungsten (Master/Slave Database Replication)
/ Our Value: • Ensure data are available when and where you need them • TCO less than 20% of comparable solutions
/ Our Technical Expertise• Database replication• Database cluster management• Application connectivity• Software-as-a-Service (SaaS)
© Continuent 2010
PostgresSQL 9: Hot PostgresSQL 9: Hot Standby and Standby and Log StreamingLog Streaming
© Continuent 2010
PostgreSQL 8.4 Warm Standby
WALWALFilesFiles
PostgreSQLPostgreSQL
MasterMaster
pg_xlogpg_xlogs s DirectorDirectoryy
ArchivedArchivedWALWALFilesFiles
Archive Archive DirectorDirectoryy
PostgreSQLPostgreSQL
StandbyStandby
WALWALFilesFiles
pg_xlogpg_xlogs s DirectorDirectoryy
pg_standbypg_standby
rsync to standbyrsync to standby
Continuous Continuous recoveryrecovery
© Continuent 2010
Limitations of Warm Standby
1. Utilization -- Cannot open the standby• To bring up the standby for queries you must end recovery• Standby hardware is idle• Difficult to track state of recovery since you cannot query log
position
2. Data Loss -- Warm standby transfers only full WAL files• Can bound loss using archive_timeout• Low values create large numbers of WAL files; complicate point-
in-time recovery• Workarounds using DRBD, etc. are complex
© Continuent 2010
Introducing Hot Standby
/ Allows users to connect to standby in read-only mode
• Allowed: SELECT, SET, LOAD, COMMIT/ROLLBACK• Disallowed: INSERT, UPDATE, DELETE, CREATE, 2PC,
SELECT … FOR SHARE/UPDATE, nextval(), LISTEN, LOCK, • No admin commands: ANALYZE, VACUUM, REINDEX, GRANT
/ Can come out of recovery while queries are running
/ Thanks to Simon Riggs for this description
© Continuent 2010
Introducing Log Streaming
PostgreSQLPostgreSQL
MasterMaster
PostgreSQLPostgreSQL
StandbyStandby
Continuous replication to Continuous replication to standbystandby
RecoveryRecoveryWALWAL
SenderSenderWALWAL
ReceiverReceiver
ArchivedArchivedWALWALFilesFiles
Archive Archive DirectorDirectoryy
ArchivingArchiving
© Continuent 2010
Configuration and Usage
/ Log streaming layers on top of existing warm standby log shipping
/ Multiple standby servers allowed
/ Failure of one standby does not affect others
/ Management is not simple - must coordinate provisioning & WAL shipping to set up/restart
© Continuent 2010
What is Tungsten?What is Tungsten?
© Continuent 2010
What Is Tungsten?
/ Tungsten implements master/slave clusters to:• Protect data• Maintain high availability• Improve resource utilization• Raise performance
/ Install and set up in a few minutes
/ Integrated backup/restore and data integrity checks
/ Efficient failover operations
/ Distributed, rule-driven management
/ No/minimal application changes
/ Highly pluggable
/ No specialized hardware requirements
© Continuent 2010
What’s Inside Tungsten?
/ Replication - Making copies• Tungsten Replicator -- Database-neutral, platform independent
master/slave replication
/ Connectivity -- Finding databases• Tungsten Connector -- Fast MySQL/PostgreSQL client to JDBC
proxying• Tungsten SQL Router --JDBC wrapper for high-performance and
transparent failover, load-balancing, and partitioning (no proxy required)
/ Management -- Administering the database• Tungsten Manager -- Distributed administration with autonomic,
rule-based configuration and no single point of failure• Tungsten Monitor -- Track resource status and
© Continuent 2010
Tungsten Clustering In ActionTungsten Clustering In Action
Master DBMaster DB Slave DBSlave DB
Master HostMaster Host Slave HostSlave Host
Application Server
SQL Router/Connector
Application Server
SQL Router/Connector
Management Client
Management Client
Replicator
Monitor
Manager
Replicator
Monitor
Manager
Manager Manager
© Continuent 2010
Distributed Rule-Based Management
Broadcast commands Broadcast commands and monitoring dataand monitoring data
BusinessBusinessRulesRules
Local ServicesLocal Services
Local ServicesLocal ServicesManager(Coordinator)
Manager
ManagerAdmin Client
Admin Client
Group Group CommunicationsCommunications
Admin Client
Local ServicesLocal Services
© Continuent 2010
Multiple Routes to Databases
Java App ServerTungsten SQL Router
PHP Application
Tungsten Connector
libpq.a
Tungsten Cluster
PostgreSQL JDBC Driver
Admin &Admin &MonitoringMonitoring
Admin &Admin &MonitoringMonitoring
Virtual IP Virtual IP AddressAddress
© Continuent 2010
TungstenTungsten++
PostgreSQL Hot StandbyPostgreSQL Hot Standbyandand
Streaming ReplicationStreaming Replication
© Continuent 2010
Moving Tungsten to PostgreSQL
/ Problem: We can’t read PostgreSQL logs (yet)
Tungsten solution is to manage:
/ Warm Standby + WAL Shipping(for PostgreSQL <9)
• Good basic availability/fast failover• Slaves open up for reads only after failover• No load balancing/scaling
/ Hot Standby + Streaming Replication(for PostgreSQL >=9)
• Slaves opened up for reads = Tungsten scaling facilities work• Add Streaming Replication = minimal delay in replicating data• A fully fledged clustering solution
© Continuent 2010
What is Tungsten’s Added Value To PostgreSQL?/ 15 minute cluster installation
/ Single commands to:• View cluster status• Provision a new standby• Confirm liveness of replication• Switch servers safely for maintenance• Failover a dead server to most current replica
/ Automatic discovery of new database replicas
/ Automatic failover when databases fail
/ Simple procedures for provisioning
/ Transparent application routing
/ Easy scaling
© Continuent 2010
Streaming Replication Setup (By Hand)
/ Configure master postgresql.conf and rebootarchive_mode = onmax_wal_senders = 10recovery_connections = onarchive_command =‘rsync -cz $1 ${STANDBY}:${PGHOME}/archive/$2 %p %f'
archive_timeout = 60
/ Set up standby recovery.confstandby_mode = 'on’primary_conninfo = 'host=${MASTER} port=5432 user=postgres’trigger_file = '/tmp/pgsql_stop_recovery'
/ Provision standbypsql# select pg_switch_xlog();psql# select pg_xlogfile_name(pg_start_backup('base_backup'));rsync –azv --delete --exclude=*pg_xlog*--exclude=postgresql.conf ${PGHOME}/ $STANDBY:$PGHOME/archive
psql# select pg_xlogfile_name(pg_stop_backup());
/ Start standby, recovery starts
© Continuent 2010
Manual Failover (By Hand)
/ Standby’s postgresql.conf should be prepared from the start to act as a master when needed:archive_mode = onmax_wal_senders = 10
/ Touch /tmp/pgsql_stop_recovery on a standby to snap out of recovery mode
/ Wait for recovery to finish• ERROR: recovery is in progress …
LOG: database system is ready to accept connections
/ Reroute applications to the new master
And to return to original cluster’s state:
/ Fix the failed master. Bring it online as a standby
/ Switch master with standby roles
© Continuent 2010
Gotchas (1/2)/ Wait for standby to finish initial recovery before
routing applications to it
/ “psql: FATAL: the database system is starting up” – bad check for that. E.g.: try under French locale
/ After a switch/failover, need to wait for the new master to come up from recovery too
/ rsync of master data folder might return a non-zero exit code: 24 - “Partial transfer due to vanished source files”
/ Exclude pg_xlog, pg_log and postgresql.conf from rsync!
/ Careful: queries might get canceled on a standby! (E.g. DROP TABLE on a master doesn’t wait for SELECT queries to finish on a standby)
© Continuent 2010
Captchas (2/2)
/ Need to know current progress?• Use pg_current_xlog_location() on a master
(“ERROR: recovery is in progress" if used on a standby),pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on a standby.
/ Using pg_standby? Ensure it is available in the path for the user you’re running
• If it’s for the root, ‘sudo which pg_standby’ is not enough!
/ Running Debian? Check for PostgreSQL status before starting it (it fails, if server is already running):
• sudo /etc/init.d/postgresql-8.4 startStarting PostgreSQL 8.4:pg_ctl: another server might be running; trying to start server anywaypg_ctl: could not start server
/ Etc.
© Continuent 2010
Ultimately
/ Hot Standby & Streaming Replication is awesome, but
/ Set-up/management is harder than it looks.
/ Monitoring is critical.
/ Transparent application rerouting is essential.
© Continuent 2010
How Tungsten covers all this?
DBMSDBMS
Replicator JMX Interface
Replication State Model
BackupBackupStorageStoragePluginPlugin
pg_dump/pg_dump/pg_restore pg_restore
Plug-InPlug-In
Monitor
DBMSDBMSCheckerCheckerPluginPlugin
Tungsten Manager
postgresql.confpostgresql.confrecovery.confrecovery.conf
pg_standbypg_standbyrsyncrsync
Pg-wal ScriptsPg-wal Scripts
Open Script Plugin
© Continuent 2010
DEMODEMOoror
A Summary in ActionA Summary in Action
© Continuent 2010
Questions?Questions?
© Continuent 2010
HQ and Americas560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel (866) 998-3642 Fax (408) 668-1009
e-mail: [email protected], [email protected]
EMEA and APACLars Sonckin kaari 1602600 Espoo, FinlandTel +358 50 517 9059Fax +358 9 863 0060
Contact InformationContact Information
Continuent Web Site:http://www.continuent.com