getting innodb compression_ready_for_facebook_scale
TRANSCRIPT
InnoDB CompressionGetting it ready for Facebook scale
Nizam Ordulu [email protected] Engineer, database engineering @Facebook4/11/12
Why use compression
Why use compression
▪ Save disk space.
▪ Buy fewer servers.
▪ Buy better disks (SSD) without too much increase in cost.
▪ Reduce IOPS.
Database Size
IOPS
Sysbench Benchmarks
SysbenchDefault table schema for sysbench
CREATE TABLE `sbtest` (
`id` int(10) unsigned NOT NULL auto_increment,
`k` int(10) unsigned NOT NULL default '0',
`c` char(120) NOT NULL default '',
`pad` char(60) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `k` (`k`)
);
In-memory benchmarkConfiguration
▪ Buffer pool size =1G.
▪ 16 tables.
▪ 250K rows on each table.
▪ Uncompressed db size = 1.1G.
▪ Compressed db size = 600M.
▪ In-memory benchmark.
▪ 16 threads.
In-memory benchmarkLoad Time
mysql-un-compressed
mysql-compressed fb-mysql-un-compressed
fb-mysql-compressed
0
10
20
30
40
50
60
70
80Time(s)
Time(s)
In-memory benchmarkDatabase size after load
mysql-uncompressed mysql-compressed fb-mysql-uncompressedfb-mysql-compressed0
200
400
600
800
1000
1200Size (M)
Size (M)
In-memory benchmarkTransactions per second for reads (oltp.lua, read-only)
mysql-uncompressed mysql-compressed fb-mysql-un-compressed
fb-mysql-compressed0
1000
2000
3000
4000
5000
6000
7000
8000Transactions Per Second (Read-Only)
TPS
In-memory benchmarkInserts per second (insert.lua)
mysql-uncompressed mysql-compressed fb-mysql-un-compressed
fb-mysql-compressed (4X)
0
10000
20000
30000
40000
50000
60000Inserts Per Second
IPS
IO-bound benchmark for insertsInserts per second (insert.lua)
mysql-uncompressed mysql-compressed fb-mysql-un-compressed
fb-mysql-com-pressed(3.8X)
0
10000
20000
30000
40000
50000
60000Inserts Per Second
IPS
InnoDB Compression
InnoDB CompressionBasics
▪ 16K Pages are compressed to 1K, 2K, 4K, 8K blocks.
▪ Block size is specified during table creation.
▪ 8K is safest if data is not too compressible.
▪ blobs and varchars increase compressibility.
▪ In-memory workloads may require larger buffer pool.
InnoDB CompressionExample
CREATE TABLE `sbtest1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '’,
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
InnoDB CompressionPage Modification Log (mlog)
▪ InnoDB does not recompress a page on every update.
▪ Updates are appended to the modification log.
▪ mlog is located in the bottom of the compressed page.
▪ When mlog is full, page is recompressed.
InnoDB CompressionPage Modification Log Example
InnoDB CompressionPage Modification Log Example
InnoDB CompressionPage Modification Log Example
InnoDB CompressionPage Modification Log Example
InnoDB CompressionCompression failures are bad
▪ Compression failures:
▪ waste CPU cycles,
▪ cause mutex contention.
InnoDB CompressionUnzip LRU
▪ A compressed block is decompressed when it is read.
▪ Compressed and uncompressed copy are both in memory.
▪ Any update on the page is applied to both of the copies.
▪ When it is time to evict a page:
▪ Evict an uncompressed copy if the system is IO-bound.
▪ Evict a page from the normal LRU if the system is CPU-bound.
InnoDB CompressionCompressed pages written to redo log
▪ Compressed pages are written to redo log.
▪ Reasons for doing this:
▪ Reuse redo logs even if the zlib version changes.
▪ Prevent against indeterminism in compression.
▪ Increase in redo log writes.
▪ Increase in checkpoint frequency.
InnoDB CompressionOfficial advice on tuning compression
If the number of “successful” compression operations (COMPRESS_OPS_OK) is a high percentage of the total number of compression operations (COMPRESS_OPS), then the system is likely performing well. If the ratio is low, then InnoDB is reorganizing, recompressing, and splitting B-tree nodes more often than is desirable. In this case, avoid compressing some tables, or increase KEY_BLOCK_SIZE for some of the compressed tables. You might turn off compression for tables that cause the number of “compression failures” in your application to be more than 1% or 2% of the total. (Such a failure ratio might be acceptable during a temporary operation such as a data load).
Facebook Improvements
Facebook ImprovementsFinding bugs and testing new features
▪ Expanded mtr test suite with crash-recovery and stress tests.
▪ Simulate compression failures.
▪ Fixed the bugs revealed by the tests and production servers.
Facebook ImprovementsTable level compression statistics
▪ Added the following columns to table_statistics:
▪ COMPRESS_OPS,
▪ COMPRESS_OPS_OK,
▪ COMPRESS_USECS,
▪ UNCOMPRESS_OPS,
▪ UNCOMPRESS_USECS.
Facebook ImprovementsRemoval of compressed pages from redo log
▪ Removed compressed page images from redo log.
▪ Introduced a new log record for compression.
Facebook ImprovementsAdaptive padding
▪ Put less data on each page to prevent compression failures.
▪ pad = 16K – (maximum data size allowed on the uncompressed copy)
Facebook ImprovementsAdaptive padding
Facebook ImprovementsAdaptive padding
Facebook ImprovementsAdaptive padding▪ Algorithm to determine pad per table:
▪ Increase the pad until the compression failure rate reaches the specified level.
▪ Decrease padding if the failure rate is too low.
▪ Adapts to the compressibility of data over time.
Facebook ImprovementsAdaptive padding on insert benchmark
▪ Padding value for sbtable is 2432.
▪ Compression failure rate:
▪ mysql: 41%.
▪ fb-mysql: 5%.
mysql-compressed fb-mysql-compressed
0
5000
10000
15000
20000
25000
30000
35000Inserts Per Second
Facebook ImprovementsCompression ops in insert benchmark
mys
ql-c
ompr
esse
d
fb-m
ysql
-com
pres
sed
0
200000
400000
600000
800000
1000000
1200000
1400000
compress_ops_okcompress_ops_fail
Facebook ImprovementsTime spent for compression ops in insert benchmark
mysql-compressed fb-mysql-compressed0
200
400
600
800
1000
1200
compress_time(s)decompress_time(s)
Facebook ImprovementsOther improvements
▪ Amount of empty allocated pages: 10-15% to 2-5%.
▪ Cache memory allocations for:
▪ compression buffers,
▪ decompression buffers,
▪ buffer page descriptors.
▪ Hardware accelerated checksum for compressed pages.
▪ Remove adler32 calls from zlib functions.
Facebook ImprovementsFuture work
▪ Make page_zip_compress() more efficient.
▪ Test larger page sizes:32K, 64K.
▪ Prefix compression.
▪ Other compression algorithms: snappy, quicklz etc.
▪ 3X compression in production.
Questions