my sql innovation work -innosql
TRANSCRIPT
![Page 2: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/2.jpg)
About Me 7+ years work on different databases SQL Server MySQL Oracle
Now work for Netease Development and Research Center Lab MySQL kernel development
Author <<Inside MySQL: InnoDB Storage Engine>> <<Inside MySQL: SQL Programming >> (coming soon
2012.3)
![Page 3: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/3.jpg)
What is InnoSQL A new MySQL branch Open source High performance (flash cache) Ease of use Fully compatible with original MySQL Collect creative idea for MySQL and make it happen
MySQL Innovation Works http://www.innomysql.org
![Page 4: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/4.jpg)
InnoSQL Feature Flash Cache for InnoDB Provide high performance than just use SSD as durable storage
Share memory(SHM) for InnoDB Buffer Pool Quick warm-up InnoDB buffer pool Less than 1 sec !!!
InnoDB IO Statistic Get each SQL’s physical and logic read
Page Clean Thread Remove block in user query thread
![Page 5: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/5.jpg)
InnoSQL Flash Cache InnoSQL Flash Cache Using SSD as Cache
Other flash cache solution Facebook flash cache Oracle flash cache Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
![Page 6: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/6.jpg)
Facebook Flash Cache A general solution Open source https://github.com/facebook/flashcache
Integration with file systems built using the Linux Device Mapper
Not optimize for database Good in read intensive workload Worse in write intensive workload Need time to warm up
![Page 7: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/7.jpg)
Oracle Flash Cache Work for Oracle 11g Page write to flash cache is slow Not so aggressive
Need warm up
![Page 8: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/8.jpg)
Secondary Buffer Pool Support in InnoSQL 5.5.8 Good in read intensive workload Also not good for write intensive workload TPC-C
Can warm up database when start up Slow for each start
Cache is not a persistent storage
![Page 9: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/9.jpg)
Why need warm up ? Capacity: SSD >> Memory
Speed SSD << Memory
![Page 10: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/10.jpg)
Flash Cache in InnoSQL 5.5.13 Can cache both read & write operation Sequential write on SSD No random write
Merge write Cache is persistent
![Page 11: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/11.jpg)
Why not use SSD as durable storage SSD is good for random read 7000+ IOPS 100 ~ 150 IOPS for disk
SSD life cycle SSD write performance Write: page Wipe: extent ( 128~256 page)
Database is not fully optimized for SSD Read ahead algorithm 512 bytes alignment write for log file Random write
![Page 12: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/12.jpg)
Why use SSD as Cache Cache is everywhere Register L1 cache L2 cache L3 cache Memory
Disk Tape
SSD
volatile
non-volatile
![Page 13: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/13.jpg)
Question Using your SSD as volatile or non-volatile ?
![Page 14: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/14.jpg)
Analyze If use SSD as durable storage
Non-volatile But now the database not fully optimize it
If use Secondary Buffer Pool or Oracle Flash Cache Volatile Performance degrade
Need to write twice ( flash cache & durable storage )
If use Facebook flash cache Volatile or Non-volatile
Base on cache modes Writethrough Writearound writeback
Performance degrade Still need to write twice, but use some optimization
Not fully optimize for database
![Page 15: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/15.jpg)
Cache in MySQL InnoDB InnoDB Buffer Pool Cache page Asynchronous operation for page Read page in buffer pool first Modify page in buffer pool first Then make fuzzy or sharp checkpoint to disk Need log manager for recovery
More buffer pool, better performance Because speed gap between disk and memory However, we can not get enough memory to cache all the database
![Page 16: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/16.jpg)
Cache in MySQL InnoDB Insert Buffer
Insert buffer is a B+ Tree, MySQL version < 4.1.x, one table on insert buffer tree.
(page_no, fields_type_info, actual record) >=4.1, only on insert buffer tree.
(space_id, one-byte-marker, page_no,fields_type_info, actual record) index by (space_id, page_no)
Work for non-unique secondary index Write to insert buffer , if page is not in the buffer pool Insert buffer bitmap page to track the free space of page
2 bit per page Merge write operation
Merge write Delay page write raise write performance However, increase read operation
MySQL 5.5 Change Buffer insert、purge、delete mark
![Page 17: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/17.jpg)
InnoDB Insert Buffer mysql> show engine innodb status\G; *************************** 1. row *************************** Status: ===================================== 090922 11:52:51 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 15 seconds …… ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 2249, free list len 3346, seg size 5596, 374650 inserts, 51897 merged recs, 14300 merges Hash table size 4980499, node heap has 1246 buffer(s) 1640.60 hash searches/s, 3709.46 non-hash searches/s
Used Page Free Page Seg size=size+free list len+1
merged recs: merges = insert buffer efficiency
![Page 18: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/18.jpg)
Cache in MySQL InnoDB Cache can increase performance Delay write operation Gap between disk and cache
However, there is another cache in InnoDB Doublewrite
![Page 19: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/19.jpg)
What is Doublewrite ? Doublewrite Avoid partial write problem 512 byte write is always OK But 16K write is not
Doublewrite buffer 2M
Doublewrite file 2M Share tablespace: ibdata1
![Page 20: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/20.jpg)
Doublewrite Architecture Stores all data twice, first to the doublewrite buffer, and then
to the actual data files --skip-innodb_doublewrite
mysql> show global status like 'innodb_dbl%'\G; ************** 1. row ************************ Variable_name: Innodb_dblwr_pages_written Value: 152362 ************** 2. row ************************ Variable_name: Innodb_dblwr_writes Value: 1465 2 rows in set (0.00 sec)
![Page 21: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/21.jpg)
Doublewrite Feature Size: 2M All the page should first write here Sequential write Cache write
Hence, what about have a 100G or 300G doublewrite ? This makes flash cache happen
![Page 22: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/22.jpg)
Flash Cache in InnoSQL 5.5.13 Replace original doublewrite work Now user can have a large doublewrite Page write is sequential SSD write feature
Doublewrite can read now SSD random read feature
Cache both read and write operation Persistent cache Merge write 60 ~ 70% in workload like TPC-C
Support AIO read on flash cache Not supported in Secondary Buffer Pool
![Page 23: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/23.jpg)
Flash Cache Architecture
![Page 24: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/24.jpg)
Flash Cache Data Structure /** Flash cache block struct */
struct trx_flashcache_block_struct{
unsigned space:32; /*!< tablespace id */
unsigned offset:32; /*!< page number */
unsigned fil_offset:32; /*!< flash cache page number */
unsigned state:2; /*!< flash cache state*/
trx_flashcache_block_t* hash; /*!< hash chain */
};
Four State: BLOCK_NOT_USED BLOCK_READY_FOR_FLUSH BLOCK_READ_CACHE BLOCK_FLUSHED
![Page 25: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/25.jpg)
Flash Cache Data Structure struct trx_flashcache_struct{ mutex_t fc_mutex;/*!< mutex protecting flash cache */ hash_table_t* fc_hash; /*!< hash table of flash cache pages */ ulint fc_size; /*!< flash cache size */ ulint write_off; /*!< write to flash cache offset */ ulint flush_off; /*!< flush to disk this offset */ ulint write_round; /* write round */ ulint flush_round; /* flush round */ trx_flashcache_block_t* block; /* flash cache block */ byte* read_buf_unalign; /* unalign read buf */ byte* read_buf; /* read buf */ }
![Page 26: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/26.jpg)
From Developer Perspective View Flash Cache File
Flash Cache Block
Block Block Block Block Block Block Block Block
Flash Cache Hash Table (In Memory)
Lookup
Write write_offset flush_offset
Flash Cache Log File write_offset flush _offset write_round flush_round
![Page 27: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/27.jpg)
Flash Cache Flush Algorithms Flush page in flash cache to disk Take over the flush in master thread Flush in flash cache background thread Algorithms Less than innodb_flash_cache_write_cache_pct No flush Default 10
Less than innodb_flash_cache_do_full_io_pct Flush 10% innodb_io_capacity Default 90
Else Flush 100% innodb_io_capacity
If idle Flush 100% innodb_io_capacity
![Page 28: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/28.jpg)
Merge Write in Flash Cache
(7,7) (2,6) (0,6) (3,7) …… (3,7) (2,6) (4,8)
write_offset flush_offset
Page (2,6)、(3,7) can be merged This much like insert buffer Delay write operation
![Page 29: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/29.jpg)
Flash Cache Benchmark Sysbench OLTP Read intensive
TPC-C Write intensive
Blogbench Blog like application oriented Developed by Netease
![Page 30: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/30.jpg)
Sysbench OLTP
InnoDB Buffer Pool: 6G DB Size: 19G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1
![Page 31: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/31.jpg)
TPC-C
SSD:3607.183 Tpm Flash Cache:7230.05 Tpm Merge Write Ratio:65.47%
InnoDB Buffer Pool: 12G DB Size: 39G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Flash Cache: 100G
![Page 32: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/32.jpg)
Blogbench
InnoDB Buffer Pool: 4G DB Size: 21G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Merge write ratio: 60%
![Page 33: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/33.jpg)
Conclusion Flash Cache can work in both read and write workload Work better than using SSD as durable storage Optimize for SSD in database kernel No more writes in flash cache Merge write support
![Page 34: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/34.jpg)
SHM for InnoDB Buffer Pool Use share memory to allocate innodb buffer pool Why use share memory? Speed warm up
Warm up speed? Random read 10~20M/sec 30G buffer pool need 30~60 minutes
![Page 35: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/35.jpg)
Warm up Method Use SQL to warm up SELECT count(*) FROM table ( force index ( primary key ) ) Warm up speed convert to sequential read But can not make database to previous workload environment
Dump buffer pool to file MySQL 5.6+ support Warm up speed convert to sequential read Make database to previous workload environment Dump file is big Database crash ?
![Page 36: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/36.jpg)
Warm up Method Percona Server Export (space_id, page_no) in LRU list to file Load this file order by (space_id,page_no) to make read
sequential when MySQL is startup Make database to previous workload environment Still need long time to warm up if you have big buffer pool:128G、256G
![Page 37: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/37.jpg)
Warm up in InnoSQL Use share memory --innodb_use_shm_preload=1
Share memory configuration like Oracle /proc/sys/kernel/shmmax /proc/sys/kernel/shmall
Warm up less than 1 sec All page is in memory
![Page 38: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/38.jpg)
SHM for InnoDB Buffer Pool # list share memory info
innosql@db-62:~$ ipcs -a
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0008c231 4653056 innosql 600 549715968 0
------ Semaphore Arrays --------
key semid owner perms nsems
------ Message Queues --------
key msqid owner perms used-bytes messages
# remove share memory
innosql@db-62:~$ ipcrm -m 4653056
![Page 39: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/39.jpg)
InnoDB IO Statistics Get read IO statistics Like SQL Server:SET STATISTICS IO ON
InnoSQL realize it in Slow query Log Both file and table
Help SQL developer 10 reads may be not good in OLTP application
Help DBA Know the SQL real IO statistics Not only the time it consumes
Still in develop You can preview this feature
![Page 40: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/40.jpg)
InnoDB IO Statistics # Time: 111103 13:29:06 # User@Host: root[root] @ localhost [::1] # Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1
Rows_examined: 1 Logical_reads: 198 Physical_reads: 3 use tpcc; SET timestamp=1320298146; select * from warehouse where w_id=1; # Time: 111103 13:31:28 # User@Host: root[root] @ localhost [::1] # Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1
Rows_examined: 1 Logical_reads: 164 Physical_reads: 50 SET timestamp=1320298288; select * from history;
![Page 41: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/41.jpg)
Configuration long_query_time io_slow_query slow_query_type 0 long_query_time 1 io_slow_query 2 both
![Page 42: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/42.jpg)
Page Cleaner Thread Flush page in Master Thread Adaptive Flush IO Capacity
Problem Master Thread have a lot to cope Async flush can block user query thread
Page cleaner thread MySQL 5.6 support InnoSQL support it in MySQL 5.5 Can also help flush in FLUSH_LRU_LIST
![Page 43: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/43.jpg)
Flush Algorithms in InnoDB checkpoint_age:current_lsn – checkpint_lsn async_water_mark: ~78%*Log_Group_Size sync_water_mark: ~90%*Log_Group_Size For example: Log file size 1G, Log file number 2 Async_water_mark = ~1.5G Sync_water_mark = ~1.8G
![Page 44: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/44.jpg)
Flush Algorithms in InnoDB checkpoint_age < async_water_mark adaptive_flusing 5% innodb_io_capacity
async_water_mark < checkpoint_age < sync_water_mark Block one user query thread Async flush
checkpoint_age > sync_water Block all user query thread Sync flush
n_dirty_pages > innodb_max_dirty_page_pct Flush innodb_io_capacity
![Page 45: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/45.jpg)
Page Cleaner Thread Reduce master thread burden Async flush move to this background No block happened in user query thread
![Page 46: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/46.jpg)
However Flush not only happen in master thread FLUSH_LRU_LIST Check if there at least 64 page can be used In this situation, flush almost in user query thread Adaptive flush, innodb_io_capacity helps nothing Happen in user query thread
InnoSQL also move this flush to page cleaner thread MySQL 5.6 does not support Still need more optimize
![Page 47: My sql innovation work -innosql](https://reader037.vdocuments.net/reader037/viewer/2022110119/555c2557d8b42a09438b4c15/html5/thumbnails/47.jpg)
Q & A