backup and recovery strategy - percona and recovery strategy. ... § mysql enterprise backup §...
TRANSCRIPT
Backup and Recovery Strategy
About Stacy
▪ 10+ years of experience on various flavors of relational databases.▪ Focus on performance tuning, code reviews, database deployment
and infrastructure management for MySQL▪ In her spare time, she enjoys reading books and doing some
volunteer work.
About Kushal
▪ 4+ years of experience on relational databases.▪ Focus on infrastructure management for MySQL and Oracle RAC,
code reviews, troubleshooting and capacity planning▪ In spare time, like to play indoor and outdoor games
MySQL at Yahoo
▪ MySQL powers many mission-critical products at Yahoo▪ Different configurations based on product requirements ▪ Products: Yahoo Sports, Flickr, Finance, Tumblr, Daily Fantasy, Ad
System, YDS▪ MySQL Flavor: Percona Server 5.5 to 5.6 including XtraDB Cluster▪ Operating systems : RHEL 6.x
What are the next 50 minutes about?
§ Backups• XtraBackup and its options
§ Recovery• File based• GTID enabled• MTS without GTID
§ Backup Monitoring
§ Multi Source Replication Setup
Backups
§ Why we need backup?
MySQL Database Backup Method
• Hot Backup§ mysqldump§ MySQL Enterprise Backup§ Xtrabackup§ File System
•Cold Backup§ File System
Why go for XtraBackup
XtraBackup is an open-source hot backup utility§ Optimized
● Uses posix_fadvise() not to cache data in memory● Perform more aggressive read-ahead optimizations on the source files
§ Reliable● Checks for page corruption on each page with InnoDB’s buf_page_is_corrupted() function.
§ No interruption for writes
Xtrabackup ImprovementsPercona MySQL 5.6.16-64+ use backup locks instead of “FLUSH TABLES WITH READ LOCK “It is a light-weight lock.● Block updates to non-transactional tables and
DDL statements for all tables● New backup lock will not flush tables
Storage engines are not forced to close tables and tables are not expelled from the table cache. ● Only waits for conflicting statements to complete (i.e. DDL and updates to non-
transactional tables). It never waits for SELECTs, or UPDATEs to InnoDB tables to complete
https://www.percona.com/doc/percona-server/5.6/management/backup_locks.html
XtraBackup
PRESENTED BY Brian Koch ⎪ March 2014
• Records the log sequence number(LSN) when it starts• Copies all InnoDB files which is internally in inconsistent state• Read transaction log files to create its own change log file• Backup locks
o Use LOCK TABLES FOR BACKUP to copy non-InnoDB fileso Use Lock BINLOG FOR BACKUP to block operation and gets binary log coordinates
• Apply xtrabackup logs to make data files consistent state
Yahoo MySQL Database Architecture
▪Dual masters single writer setup▪Each master carries one or more slave
VIP for Write
Prod master BCP master
Prod slave BCP slave
VIP for Read-only
Backup server
Current Backup Strategy
PRESENTED BY Brian Koch ⎪ March 2014
• Full backup on Sunday and incremental backup from Monday to Saturday• The backups are stored in external storage volume• The backup job runs on dedicated slave or prod slave • The binary logs are saved periodically
Prod masterDedicated
slaveXtrabackup job Backup filer
Copy binary logs
§ xtrabackup.addl_args: --slave-info§ xtrabackup.backup_dir: /mnt/backup/§ xtrabackup.cron_bkup_ret_hour: 0
§ xtrabackup.cron_bkup_ret_min: 0§ xtrabackup.cron_bkup_ret_wday: 1
§ xtrabackup.retain_extra_days: 14§ xtrabackup.cron_day: *§ xtrabackup.cron_hour: 7
§ xtrabackup.cron_min: 0§ xtrabackup.cron_wday: 0
§ xtrabackup.incremental: --incremental§ xtrabackup.cron_incremental_day: *§ xtrabackup.cron_incremental_hour: 7
§ xtrabackup.cron_incremental_min: 0§ xtrabackup.cron_incremental_wday: 1-6
§ xtrabackup.crontab: on§ xtrabackup.email: page-oncall-email§ xtrabackup.backup_vol_threshold: 80
§ xtrabackup.backup_user: root§ xtrabackup.backup_password: <path to pw file>
Xtrabackup Wrapper
Xtrabackup OptionsOptions Comments
slave-info Useitwhentakingbackupisfromtheslave.Recorditsmaster'sbinarylogfilenameanditspositioninthefilextrabackup_slave_info
parallel Itspecifiesthenumberofthreadsthextrabackup childprocessshouldusetobackupfilesconcurrently,speedupbackupprocess
compress Itinstructsxtrabackup tocompressbackupcopiesofInnoDB datafiles.
use-memory Speeduptheapplylogprocess
safe-slave-backup
stopslaveSQLthreadbeforerunningFLUSHTABLESWITHREADLOCKandwaittostartbackupuntilSlave_open_temp_tables inSHOWSTATUSiszero.
incremental Instructsxtrabackup totakeincrementalbackupinsteadoffullbackup
Xtrabackup with Network Streaming
PRESENTED BY Brian Koch ⎪ March 2014
Target mkdir /home/tmp/backup
screen -S backup
nc -l 9999 | tar -ixf - -C /home/tmp/backup
Source screen -S backup
sudo innobackupex --user=root –password=** --slave-info --stream=tar ./ | nc <target> 9999
Target The backups will be in target:/home/tmp/backup
sudo innobackupex --apply-log --use-memory=16G --parallel=8 /home/tmp/backup
sudo innobackupex --move-back /home/tmp/backupsudo chown -R mysql:mysql /home/y/…/mysqlstart mysql_server
Backup Option: --compress
PRESENTED BY Brian Koch ⎪ March 2014
§ Saved backup space 2/3§ Take more time to complete backup§ Take less time if multiple compress-threads are used
Database Server
Full backup size
Compressed size
Full bkuptime
Compressed bkup time
Compress-thread
db1 1.3 TB 360 GB 220 mins 255 mins 1
db2 1.3 TB 360 GB 220 mins 180 mins 8
Database Restore Comparison
PRESENTED BY Brian Koch ⎪ March 2014
DB Backup size
Transfer backup to target server
Decompress Apply log
Total time
Space saved
Time saved
Full backup 1.3TB 210 mins 0 15 mins
225 mins
Full compressed backup
360GB 75 mins 65 mins 15 mins
155 mins
~70% ~30%
The Issues Could Happen
PRESENTED BY Brian Koch ⎪ March 2014
• The database that has a lot of partitions could run into Error number 24 means 'Too many open files’
Added “ulimit -n 65535“ in the backup script
• Transaction logs could be smallxtrabackup: error: log block numbers mismatch:xtrabackup: error: expected log block no. 98032273, but got no. 102226569 from the log file.xtrabackup: error: it looks like InnoDB log has wrapped around before xtrabackup could process all records
due to either log copying being too slow, or log files being too small.xtrabackup: Error: xtrabackup_copy_logfile() failed.
Increase transaction log size or take backup from off-peak time.
CAVEATS
PRESENTED BY Brian Koch ⎪ March 2014
• The backup folder should be dedicated.If it contains other folders or other files, the incremental backup may not be right.
• The backup purging time and backup time shall be different.
• Information_schema is not backed up. So the table create time stamp could be lost.
CAVEATS --- continued
▪ eventmaster : status is ENABLED slave: status is SLAVESIDE_DISABLED
▪ Master and slave are NOT sameTaking backup from master to rebuild slave, event is enabled causes issue
Create event cross all database servers
▪ Login to each serverset sql_log_bin=0;create event event_name ….BEGINIF @@global.read_only=0 THEN
call del_test_data();end if;
END
Ensures master and slave identical
Backup and Recovery Strategy -- continue
Take Database Backup
PRESENTED BY Brian Koch ⎪ March 2014
0 12 * * 1 root find /backup/dbbk -maxdepth 1 -type d -ctime +14 | xargs rm -rf0 0 * * 0 root /home/y/bin/innobackupex_y0 0 * * 1-6 root /home/y/bin/innobackupex_y --incremental
Point-Time-Recovery
PRESENTED BY Brian Koch ⎪ March 2014
Assume there are compressed backups:full backup: /mnt/backup/saca/2016-04-07_05-05-01
Incremental backup: /mnt/backup/saca/2016-04-08_05-00-00
1. Transfer backups to the restore server2. Decompress the backups first§ sudo innobackupex /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G --decompress --parallel=16
§ sudo innobackupex /home/syuan/backup/2016-04-08_05-00-00 --use-memory=8G --decompress --parallel=8
3. Apply logs§ sudo innobackupex --apply-log --redo-only /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G --
parallel=16§ sudo innobackupex --apply-log --redo-only /home/syuan/backup/2016-04-07_05-05-01 --incremental-
dir=/home/syuan/backup/2016-04-08_05-00-00 --use-memory=8G --parallel=8 § sudo innobackupex --apply-log /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G --parallel=16
Point-Time-Recovery - continued
PRESENTED BY Brian Koch ⎪ March 2014
4. Move the backups into right directorysudo innobackupex --move-back /home/syuan/backup/2016-04-07_05-05-01
5. Start database● sudo chown -R mysql:mysql /home/y/var/mysql
● yinst start mysql_server
6. Update root password
7. mysqlbinlog-stop-datetime
-stop-position
Slave Rebuild
PRESENTED BY Brian Koch ⎪ March 2014
Get the replication coordinates● If the backup is taken from the master, check the file xtrabackup_binlog_info in the backup folder
● If the backup is from slave, please check the file xtrabackup_slave_info
Set up the replicationCHANGE MASTER TO
MASTER_HOST='master-host-name.yahoo.com',
MASTER_USER='replicant',
MASTER_PASSWORD='pass-word',
MASTER_LOG_FILE='mysqld-bin.004464',
MASTER_LOG_POS=49402644;
START SLAVE;
SHOW SLAVE STATUS\G
Slave Rebuild - continued
PRESENTED BY Brian Koch ⎪ March 2014
The binary logs that is needed have been purged.
1. Transfer the needed binary logs to the restore server2. Use mysqlbinlog utility to apply them to the database.
mysqlbinlog --start-position=49402644 mysqld-bin.004464 mysqld-bin.004465 .......| mysql -u root –p
Caution:If data contains BLOB, please use binary-modeUsing one db connection to process all binary logs to avoid temporary table issue.
Alternative Way to Apply Binary Logs
PRESENTED BY Brian Koch ⎪ March 2014
● Set up a fake master or binlog server● Copy all necessary binary logs to the fake master● Make sure the system recognize all these binary logs
Edit file names in mysqld-bin.indexRestart MySQL
● Set up replications between this master and restore server● Change master to
………
Advantages
PRESENTED BY Brian Koch ⎪ March 2014
Much safer and robust.
● Recover speed is much faster20% faster comparing with using mysqlbinlog utility
● Best way for partial replication database to be recoveredAdd ignore, replicate-do-db in my.cnf, start replication
Clone Replication Slave
PRESENTED BY Brian Koch ⎪ March 2014
● Requirements for my.cnfrelay_log_info_repository = TABLEmaster_info_repository = TABLErelay_log_recovery = 1
● Xtrabackup option slave-infoTake xtrabackup from the sever, restore it in target serverStart slave;The replication is automatically enabled.
Start Slave Issues
PRESENTED BY Brian Koch ⎪ March 2014
• Start slave failsError log: Slave failed to initialize relay log info structure from the repository
Explanation: Relay log info is messed up.
reset slave all;change master to…start slave;
Start Slave Issues - GTIDEnable Replication in Slave Issue
mysql> SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533';ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
How to fix▪ RESET MASTER;
▪ SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533’;
▪ CHANGE MASTER TO MASTER_HOST="mastername", master_user='rep_user', master_password='rep_password', MASTER_AUTO_POSITION = 1;
▪ START SLAVE;
Special backup case – MTS without GTID enabled
PRESENTED BY Brian Koch ⎪ March 2014
● Can not add --slave-info in MTS without GTID.● How to get the consistent backup
1. Normal way: STOP SLAVE; START SLAVE UNTIL SQL_AFTER_MTS_GAP;2. The alternative way: Backup the database as normal
Recovery:
1. The table slave_worker_info records each worker detail, the required relay log from backup server or corresponding mysql binary logs can be found from its master.2. START SLAVE UNTIL SQL_AFTER_MTS_GAPS; to fill all gaps,3. Then direct new slave to the existing master as usual.
PITR– MTS without GTID enabled
PRESENTED BY Brian Koch ⎪ March 2014
Follow previous instructions to restore the database backup.
mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+------------------------------------------------+---------------+-------------------+----------------+
| ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+------------------------------------------------+---------------+-------------------+----------------+
| 1 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 946879583 | mysqld-bin.000059 | 946879419 |
| 2 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 947931963 | mysqld-bin.000059 | 947931799 |
| 3 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 947515423 | mysqld-bin.000059 | 947515259 |
+----+------------------------------------------------+---------------+-------------------+----------------+
3 rows in set (0.00 sec)
Please copy mysqld-relay-bin.000187 from the backup slave to the restore server.
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
If the relay log file exists1. Edit mysqld-relay-bin.index with this new relay log file.
2. Edit file permission chown mysql:mysql mysqld-relay-bin.000187 Example:
[root@bf1-trfdb-001 data]# cat mysqld-relay-bin.index/home/y/var/mysql/data/mysqld-relay-bin.000001/home/y/var/mysql/data/mysqld-relay-bin.000187
3. Restart mysqld to make the system recognized the new relay log
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
Use mysql binary log which is from the master1. Rename mysqld-bin.000059 to mysqld-relay-bin.000002
Edit mysqld-relay-bin.index with this new relay log file.
2. Edit file permission chown mysql:mysql mysqld-relay-bin.000002
Example:
[root@bf1-trfdb-001 data]# cat mysqld-relay-bin.index/home/y/var/mysql/data/mysqld-relay-bin.000001
/home/y/var/mysql/data/mysqld-relay-bin.000002
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
3. update mysql.slave_worker_infoset relay_log_name= ‘…./mysqld-relay-bin.000002',relay_log_pos=master_log_pos;mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+------------------------------------------------+---------------+-------------------+----------------+
| ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |+----+------------------------------------------------+---------------+-------------------+----------------+| 1 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 946879419 | mysqld-bin.000059 | 946879419 |
| 2 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 947931799 | mysqld-bin.000059 | 947931799 || 3 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 947515259 | mysqld-bin.000059 | 947515259 |
+----+------------------------------------------------+---------------+-------------------+----------------+
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
4. Fill all holes.START SLAVE UNTIL SQL_AFTER_MTS_GAPS;mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info;
+----+----------------+---------------+-----------------+----------------+
| ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos |
+----+----------------+---------------+-----------------+----------------+
| 1 | | 0 | | 0 |
| 2 | | 0 | | 0 |
| 3 | | 0 | | 0 |
+----+----------------+---------------+-----------------+----------------+
3 rows in set (0.00 sec)
5. Change master to ….; start slave;
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
TABLE mysql.slave_relay_log_info is NOT updatedrelay_log_recovery=1 starts auto recovery
Relay_Log_File: mysqld-relay-bin.000001 Relay_Log_Pos: 4
Replication is screwed!!
https://bugs.mysql.com/bug.php?id=81385
MySQL Backup Monitoring
PRESENTED BY Brian Koch ⎪ March 2014
Email alert if backup failsBackup dashboard
Multi-source Replication
PRESENTED BY Brian Koch ⎪ March 2014
● Enable replication slave to receive transactions from multiple sources simultaneously● Merge table shards, data aggregation for reporting purpose● Back up one database server instead of of different multitenant clusters.● Automate database restore from prod to non-prod.
master1
master2
master3
slave
Binlog channel 1
Binlog channel 2
Binlog channel 3
Binlogs
Multi-source Replication
PRESENTED BY Brian Koch ⎪ March 2014
● Pre-condition in target slave:master-info-repository=TABLE relay-log-info-repository=TABLEslave is MySQL57
● Replication topology can be (GTID) based or binary log position-based● Change master to master_host=‘master1’,… FOR CHANNEL 'master-1';● Start slave FOR CHANNEL 'master-1';
Stop slave FOR CHANNEL 'master-1';Show slave status FOR CHANNEL 'master-1’\G;
SELECT * FROM performance_schema.replication_connection_status\G
Multi-source Replication -setup
PRESENTED BY Brian Koch ⎪ March 2014
• Source: MySQL 5.6 Target: MySQL 5.7• Source: take xtrabackup using --slave-info in the master, transfer the backup to the slave
generate a file contains create database/table sql statement without data• Target: innobackupex --apply-log --export /home/backup
Load database structure sql script in the slaveGenerate “ALTER TABLE TABLENAME DISCARD|IMPORT TABLESPACE;” sql scripts
Target: Run drop tablespace scriptCopy backups(*.ibd,*.exp, *.cfg) into right database foldersRun import tablespace scriptGet the coordinates from xtrabackup_slave_info to setup replication
Multi-source Replication -setup
PRESENTED BY Brian Koch ⎪ March 2014
Target slave:Reset master;SET GLOBAL gtid_purged=……change master to
master_host=’master1', master_port=3306, master_user='rep',master_password='pass' ,MASTER_AUTO_POSITION = 1for channel 'ch1_master1’;
start slave for channel 'ch1_master1';
Multi-source Replication - Issues
PRESENTED BY Brian Koch ⎪ March 2014
Pre-check:Any temporal time columns Any antelope tables
Rebuild all those tables
Issuemysql> alter table application IMPORT TABLESPACE; ERROR 1808 (HY000): Schema mismatch (Table flags don't match, server table has 0x6 and the meta-data file has 0x1)Drop table and recreate table with ROW_FORMAT=compactCopy backup and import this tablespace
Multi-source Replication - restrictions
PRESENTED BY Brian Koch ⎪ March 2014
1. All tables should have the newer format for timestamp columns2. innodb_file_per_table should be used3. All files should be in Barracuda format4. Database name has to be different crossing multiple masters
Summary
▪ Percona Xtrabackup Options▪ Issues and Cautions▪ Point-time recovery using different ways▪ Backup/Recovery for MTS without GTID▪ Database backup monitoring▪ Multi-Source Replication Setup
QA