Chapter 5. How InnoDB Stores Variable-Length Columns

Table of Contents

5.1. Overview of InnoDB Row Storage
5.2. Specifying the Row Format for a Table
5.3. Barracuda File Format: DYNAMIC and COMPRESSED Row Formats
5.4. Antelope File Format: COMPACT and REDUNDANT Row Formats

This section discusses how certain InnoDB features, such as table compression and off-page storage of long columns, are controlled by the ROW_FORMAT clause of the CREATE TABLE statement. It discusses considerations for choosing the right row format and compatibility of row formats between MySQL releases.

5.1. Overview of InnoDB Row Storage

The storage for rows and associated columns affects performance for queries and DML operations. As more rows fit into a single disk page, queries and index lookups can work faster, less cache memory is required in the InnoDB buffer pool, and less I/O is required to write out updated values for the numeric and short string columns.

All data in InnoDB is stored in database pages that make up a B-tree index (the clustered index organized according to the primary key columns). Table data and indexes both use this type of structure. The nodes of the index data structure contain the values of all the columns in that row (for the clustered index) or the index columns and the primary key columns (for secondary indexes).

Variable-length columns are an exception to this rule. Columns such as BLOB and VARCHAR that are too long to fit on a B-tree page are stored on separately allocated disk pages called overflow pages. We call such columns off-page column. The values of these columns are stored on singly-linked lists of overflow pages, and each such column has its own list of one or more overflow pages. In some cases, all or a prefix of the long column value is stored in the B-tree, to avoid wasting storage and eliminating the need to read a separate page.

The Barracuda file format provides a new option (KEY_BLOCK_SIZE) to control how much column data is stored in the clustered index, and how much is placed on overflow pages.

5.2. Specifying the Row Format for a Table

You specify the row format for a table with the ROW_FORMAT clause of the CREATE TABLE and ALTER TABLE statements.

5.3. Barracuda File Format: DYNAMIC and COMPRESSED Row Formats

When innodb_file_format is set to Barracuda and a table is created with ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, long column values are stored fully off-page, and the clustered index record contains only a 20-byte pointer to the overflow page.

Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long, InnoDB chooses the longest columns for off-page storage until the clustered index record fits on the B-tree page.

The DYNAMIC row format maintains the efficiency of storing the entire row in the index node if it fits (as do the COMPACT and REDUNDANT formats), but this new format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns. The DYNAMIC format is based on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store all of the value off-page. With DYNAMIC format, shorter columns are likely to remain in the B-tree node, minimizing the number of overflow pages needed for any given row.

The COMPRESSED row format uses similar internal details for off-page storage as the DYNAMIC row format, with additional storage and performance considerations from the table and index data being compressed and using smaller page sizes. For full details about the COMPRESSED row format, see Chapter 3, InnoDB Data Compression.

5.4. Antelope File Format: COMPACT and REDUNDANT Row Formats

Early versions of InnoDB used an unnamed file format (now called Antelope) for database files. With that format, tables were defined with ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT) and InnoDB stored up to the first 768 bytes of variable-length columns (such as BLOB and VARCHAR) in the index record within the B-tree node, with the remainder stored on the overflow pages.

To preserve compatibility with those prior versions, tables created with the newest InnoDB use the prefix format, unless one of ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED is specified (or implied) on the CREATE TABLE statement.

With the Antelope file format, if the value of a column is 768 bytes or less, no overflow page is needed, and some savings in I/O may result, since the value is in the B-tree node. This works well for relatively short BLOB values, but may cause B-tree nodes to fill with data rather than key values, reducing their efficiency. Tables with many BLOB columns could cause B-tree nodes to become too full of data, and contain too few rows, making the entire index less efficient than if the rows were shorter or if the column values were stored off-page.