postgresql write-ahead log (heikki linnakangas)
DESCRIPTION
TRANSCRIPT
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 1
PostgreSQL Write-Ahead LogHeikki Linnakangas
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 2
What is Write-Ahead Log?
• Also known as the transaction log or redo log• Practically every database management system
has one• Also used by journaling file systems, transaction
managers etc.
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 3
Write-Ahead Log concepts
• A transaction log is a sequence of log records• One log record for every change• Each WAL record is assigned an LSN (Log
Sequence Number)
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 4
PostgreSQL Write-Ahead-Log
• Introduced in version 7.1 by Vadim B. Mikheev– Released in 2001
• REDO log only– No UNDO log– Instantaneous rollbacks– No limit on transaction size
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 5
PostgreSQL Example
INSERT INTO tbl (id, data) VALUES (123, 'foo')
@ 0/D023940: xid 734: Heap - insert: rel 1663/12040/16402; tid 0/2@ 0/D023988: xid 734: Btree - insert: rel 1663/12040/16404; tid 1/2@ 0/D0239D0: xid 734: Transaction - commit: 2012-03-30 19:08:10.262228+03
LOG: xlog flush request 0/D023A00
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 6
PostgreSQL WAL structure
• WAL is divided into WAL segments– Each segment is a file in pg_xlog directory
$ ls l pgsql/data/pg_xlog/
rw 1 heikki heikki 239 20091103 10:39 000000010000000000000034.00000020.backup
rw 1 heikki heikki 16777216 20091103 15:26 00000001000000000000003A
rw 1 heikki heikki 16777216 20091103 15:26 00000001000000000000003B
rw 1 heikki heikki 16777216 20091103 10:39 00000001000000000000003C
rw 1 heikki heikki 16777216 20091103 13:49 00000001000000000000003D
rw 1 heikki heikki 16777216 20091103 15:14 00000001000000000000003E
drwx 2 heikki heikki 160 20091103 15:26 archive_status
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 7
Tools
• Developer tools– WAL_DEBUG compile option– Xlogdump http://xlogviewer.projects.postgresql.org/
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 8
WAL-first rule
• The log record for an operation is always written to disk before the affected data pages– WAL first rule!
• In case of a crash, the WAL is replayed to reconstruct the unsaved changes
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 9
Checkpoints
• WAL grows indefinitely• Checkpoints allow us to truncate it
1. Flush all data pages to disk2. Write a checkpoint record3. Truncate away old WAL
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 10
WAL filenames
00000001000000000000003E
• Timeline ID is used to distinguish WAL generated before and after a PITR recovery
Log id Seg no
LSNTLI
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 11
Crash-safety
• Many database actions involve modifying more than one page.– Example: Splitting an index page.
• Writing two data pages A and B is not atomic.• One write must happen before the other.
– We need to give the operating system and disk controller freedom to reorder the writes; we don't even know which one will happen first.
• We could crash in between.
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 12
WAL Summary
Write-Ahead Logging is critical for:
• Durability– Once you commit, your data is safe
• Consistency– No corruption on crash
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 13
Advanced features
But there's more!
The Write-Ahead Log also allows:
• Online backup• Point-in-time Recovery• Replication
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 14
WAL Archiving
• Normally, old WAL is deleted at a checkpoint.• But it can also be archived
• Archive can be anything– Directory on another server– Tape drive
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 15
WAL Archiving
In your postgresql.conf file:
archive_mode=onwal_level=archive
archive_command = 'cp %p /mnt/walarchive/%f'
Restart server, and it will copy all WAL files to /mnt/walarchive as they're generated
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 16
WAL archive
~$ ls -l /mnt/walarchive/
yhteensä 114688
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000001
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000002
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000003
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000004
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000005
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000006
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000007
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 17
Online Backup
• Normally, you need to shut down the database while you take a backup– or use a filesystem snapshot (e.g ZFS or a SAN)
• WAL archiving makes that unnecessary
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 18
Online Backup
1. SELECT pg_start_backup()2. rsync / tar / whatever3. SELECT pg_stop_backup();4. Include all archived WAL files from archive
directory in the backup
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 19
Online backup: Step 1
/mnt$ ls l
yhteensä 8
drwxrxrx 2 heikki heikki 4096 30.3. 18:11 archivedir
drwx 14 heikki heikki 4096 30.3. 18:10 data
/mnt$ psql c "SELECT pg_start_backup('my first backup')"
pg_start_backup
0/C000020
(1 row)
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 20
Online backup: Step 2
/mnt$ tar czf ~/backup.tar.gz data/mnt$ ls -l ~/backup.tar.gz-rw-r--r-- 1 heikki heikki 39670143 30.3. 18:33 /home/heikki/backup.tar.gz
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 21
Online backup: Step 3
/mnt$ psql -c "SELECT pg_stop_backup()"NOTICE: pg_stop_backup complete, all required WAL segments have been archived pg_stop_backup ---------------- 0/C0000A8(1 row)
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 22
Online backup: Step 4
/mnt$ tar czf ~/backup-xlog.tar.gz archivedir/*
/mnt$ ls -l /home/heikki/backup*.tar.gz-rw-r--r-- 1 heikki heikki 39670143 30.3. 18:33 /home/heikki/backup.tar.gz-rw-r--r-- 1 heikki heikki 41937910 30.3. 18:38 /home/heikki/backup-xlog.tar.gz
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 23
Point-in-Time Recovery
postgres=# DROP TABLE customers;DROP TABLE
Oops!
• Once you've enabled WAL archiving and taken a backup, you can use the archived WAL to restore to any point in time
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 24
Point-in-Time Recovery
Recovery.conf:
# By default, recovery will rollforward to the end of the WAL log.# If you want to stop rollforward at a specific point, you# must set a recovery target....#recovery_target_name = '' # e.g. 'daily backup 2011-01-26'##recovery_target_time = '' # e.g. '2004-07-14 22:39:00 EST'##recovery_target_xid = ''##recovery_target_inclusive = true
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 25
Replication
• You can transmit WAL as it is generated to a standby server– Replication!
• This can be done using the WAL archive, or by streaming directly from the server over a TCP connection
• Standby server can be used for read-only queries (Hot Standby)
Copyright 2009 EnterpriseDB Corporation. All rights Reserved.<presentation title goes here, edit in the slide master > Slide: 26
Thank you
Write-Ahead Logging makes it possible for a database to maintain consistency. It also allows many advanced features: online backup, PITR and replication
Questions?