inno db internals inno db file formats and source code structure
DESCRIPTION
TRANSCRIPT
![Page 1: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/1.jpg)
Transactional Storage for MySQL
FAST. RELIABLE. PROVEN.
InnoDB Internals: InnoDB File Formats and Source Code
Structure
Heikki Tuuri
CEO Innobase Oy
Vice President, Development
Oracle Corporation
MySQL Conference, April 2009
Calvin Sun
Principal Engineer
Oracle Corporation
![Page 2: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/2.jpg)
Today’s Topics
• Goals of InnoDB
• Key Functional Characteristics
• InnoDB Design Considerations
• InnoDB Architecture
• InnoDB On Disk Format
• Source Code Structure
• Q & A
![Page 3: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/3.jpg)
Goals of InnoDB
• OLTP oriented
• Performance, Reliability, Scalability
• Data Protection
• Portability
![Page 4: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/4.jpg)
InnoDB Key Functional
Characteristics
• Full transaction support
• Row-level locking
• MVCC
• Crash recovery
• Efficient IO
![Page 5: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/5.jpg)
Design Considerations
• Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques”
• Also emulated the Oracle architecture
• Added unique subsystems• Doublewrite
• Insert buffering
• Adaptive hash index
• Designed to evolve with changing hardware & requirements
![Page 6: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/6.jpg)
InnoDB Architecture
IO
Buffer
File Space Manager
Transaction
Handler API Embedded InnoDB API
Cursor / Row
Mini-
transactionLockB-tree
Page
Server Applications
![Page 7: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/7.jpg)
InnoDB On Disk Format
• InnoDB Database Files
• InnoDB Tablespaces
• InnoDB Pages / Extents
• InnoDB Rows
• InnoDB Indexes
• InnoDB Logs
• File Format Design Considerations
![Page 8: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/8.jpg)
InnoDB Database Files
ibdata files
System tablespace
internaldata
dictionary
MySQL Data Directory
InnoDB
tables
OR innodb_file_per_table
.ibd files
.frm files
undologs
insert buffer
![Page 9: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/9.jpg)
InnoDB Tablespaces
• A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]]
• A file/partition is a collection of segments.
• A segment consists of fixed-length pages.
• The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).
![Page 10: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/10.jpg)
System Tablespace
• Internal Data Dictionary
• Undo
• Insert Buffer
• Doublewrite Buffer
• MySQL Replication Info
![Page 11: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/11.jpg)
InnoDB Tablespaces
Extent
Segment
Extent
Extent Extent
an extent = 64 pages
Extent
Trx id
Row
Field 1
Roll pointer
Field pointers
Field 2 Field n
Row
Page
Row
Row
Row Row
Leaf node segment
Tablespace
Rollback segment
Non-leaf node segment
RowRow
![Page 12: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/12.jpg)
InnoDB Pages
Symbol Value Notes
FIL_PAGE_INODE 3 File segment inode
FIL_PAGE_INDEX 17855 B-tree node
FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page
FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page
FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page
FIL_PAGE_TYPE_SYS 6 System page
FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page
othersi-buf bitmap, I-buf free list, file space header, extent desp page, new allocated page
InnoDB Page TypesInnoDB Page Types
![Page 13: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/13.jpg)
InnoDB Pages
A page consists of: a page header, a page
trailer, and a page body (rows or other
contents).
Page header
Page trailer
row offset array
Row RowRow
Row
Row
RowRow
Row
Row RowRow
![Page 14: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/14.jpg)
Page Declares
typedef struct /* a space address */{ulint pageno; /* page number within the file */ulint boffset; /* byte offset within the page */
} fil_addr_t;
typedef struct {ulint checksum; /* checksum of the page (since 4.0.14) */ulint page_offset; /* page offset inside space */fil_addr_t previous; /* offset or fil_addr_t */fil_addr_t next; /* offset or fil_addr_t */dulint page_lsn; /* lsn of the end of the newest
modification log record to the page */PAGE_TYPE page type; /* file page type */dulint file_flush_lsn;/* the file has been flushed to disk
at least up to this lsn */int space_id; /* space id of the page */char data[]; /* will grow */ulint page_lsn; /* the last 4 bytes of page_lsn */ulint checksum; /* page checksum, or checksum magic, or 0 */} PAGE, *PAGE;
![Page 15: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/15.jpg)
InnoDB Compressed Pages
•InnoDB keeps a “modification log” in each page
•Updates & inserts of small records are written to the log w/o page reconstruction; deletes don’t even require uncompression
•Log also tells InnoDB if the page will compress to fit page size
•When log space runs out, InnoDB uncompresses the page, applies the changes and recompresses the page
Page header
modification log
Page trailer
page directory
compressed data
BLOB pointers
empty space
![Page 16: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/16.jpg)
InnoDB Rows
prefix(768B) ……
overflowpage
COMACT formatCOMACT format
Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values
overflowpage
… …
DYNAMIC formatDYNAMIC format
20 bytes
![Page 17: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/17.jpg)
InnoDB Indexes - Primary
●Data rows are stored in the B-tree leaf nodes of a clustered index
●B-tree is organized by primary key or non-null unique key of table, if defined; else, an internal column with 6-byte ROW_ID is added.
xxxxxxxxxxxx----
nnnnnnnnnnnn001001001001----
275275275275276 276 276 276 ––––500500500500
clustered(primary key)
index
501501501501----
630630630630631631631631
----768768768768
769769769769----
800800800800801801801801
----949949949949
950950950950----
xxxxxxxxxxxx
001 001 001 001 ––––500500500500
801 801 801 801 ––––nnnnnnnnnnnn
500 500 500 500 ––––800800800800
PK valuesPK valuesPK valuesPK values001 001 001 001 ---- nnnnnnnnnnnn
Key valuesKey valuesKey valuesKey values501501501501----630630630630+ data for + data for + data for + data for
corresponding rowscorresponding rowscorresponding rowscorresponding rows
……
Primary Index
![Page 18: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/18.jpg)
InnoDB Indexes - Secondary
● Secondary index B-tree leaf nodes contain, for each key value, the primary keys of the corresponding rows, used to access clustering index to obtain the data
clustered(primary key)
index
clustered(primary key)
index
Secondary index
PK valuesPK valuesPK valuesPK values001 001 001 001 ---- nnnnnnnnnnnn
B-tree leaf nodes, containing data
key valueskey valueskey valueskey valuesA ZA ZA ZA Z
B-tree leaf nodes, containing PKs
Secondary index
key valueskey valueskey valueskey valuesA ZA ZA ZA Z
B-tree leaf nodes, containing PKs
Secondary Index
![Page 19: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/19.jpg)
DATA
InnoDB Logging
Rollback segments
Log Buffer Buffer Pool
redo
logrollback
Log File
#1Log File
#2
log thread
write thread
log filesibdata files
![Page 20: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/20.jpg)
InnoDB Redo Log
Redo log structure:
Space id PageNo OpCode Data
end of log
min LSN
start of loglast checkpoint
![Page 21: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/21.jpg)
File Format Management• Builtin InnoDB format: “Antelope”
• New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC
• Fast index creation, other features do notrequire Barracuda file format
• Builtin InnoDB can access “Antelope”databases, but not “Barracuda”databases
• Check file format tag in system tablespace on startup
• Enable a file format with new dynamic parameter innodb_file_format
• Preserves ability to downgrade easily
.ibddata files
(file pertable)
![Page 22: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/22.jpg)
InnoDB File Format Design
Considerations• Durability
• Logging, doublewrite, checksum;
• Performance• Insert buffering, table compression
• Efficiency• Dynamic row format, table compression
• Compatibility• File format management
![Page 23: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/23.jpg)
Source Code Structure
• 31 subdirectories
• Relevant InnoDB source files on file formats• Tablespace: fsp0fsp {.c, .ic, .h}
• Page: page0page, page0zip {.c, .ic, .h}
• Log: log0log {.c, .ic, .h}
![Page 24: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/24.jpg)
Source Code Subdirectories
• buf
• data
• db
• dict
• dyn
• eval
• fil
• fsp
• fut
• ha
• handler
• ibuf
• include
• lock
• log
• math
• mem
• mtr
• os
• page
• pars
• que
• read
• rem
• row
• srv
• sync
• thr
• trx
• usr
• ut
![Page 25: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/25.jpg)
Summary:
Durability, Performance,
Compatibility & Efficiency
• InnoDB is the leading transactional storage engine for MySQL
• InnoDB’s architecture is well-suited to modern, on-line transactional applications; as well as embedded applications.
• InnoDB’s file format is designed for high durability, better performance, and easy to manage
![Page 26: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/26.jpg)
For More Information …
2009 MySQL User Conference
InnoDB Birds of a Feather
Wed 7:30pm
Ballroom C
• Heikki Tuuri: Concurrency Control: How it Really Works, Thurs, 2:50pm
Please visit www.innodb.com,
blogs.innodb.com and forums.innodb.com
![Page 27: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/27.jpg)
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
![Page 28: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/28.jpg)
companyan
Embedded
Hot Backup
Plugin
![Page 29: Inno Db Internals Inno Db File Formats And Source Code Structure](https://reader031.vdocuments.net/reader031/viewer/2022013111/54c6d6aa4a7959a4578b45a5/html5/thumbnails/29.jpg)
InnoDB Size Limits
• Max # of tables: 4 G
• Max size of a table: 32TB
• Columns per table: 1000
• Max row size: n*4 GB • 8 kB if stored on the same page
• n*4 GB with n BLOBs
• Max key length: 3500
• Maximum tablespace size: 64 TB
• Max # of concurrent trxs: 1023