Table of Contents
All data in InnoDB is stored in database pages comprising a B-tree index (the so-called clustered index or primary key index). The essential idea is that the nodes of the B-tree contain, for each primary key value (whether user-specified or generated or chosen by the system), the values of the remaining columns of the row as well as the key. In some other database systems, a clustered index is called an “index-organized table”. Secondary indexes in InnoDB are also B-trees, containing pairs of values of the index key and the value of the primary key, which acts as a pointer to the row in the clustered index.
There is an exception to this rule. Variable-length columns (such
as BLOB
and VARCHAR
) that
are too long to fit on a B-tree page are stored on separately
allocated disk (“overflow”) pages. We call these
“off-page columns”. The values of such columns are
stored on singly-linked lists of overflow pages, and each such
column has its own list of one or more overflow pages. In some
cases, all or a prefix of the long column values is stored in the
B-tree, to avoid wasting storage and eliminating the need to read
a separate page.
The new “Barracuda” file format provides a new option
(KEY_BLOCK_SIZE
) to control how much column
data is stored in the clustered index, and how much is placed on
overflow pages.
Previous versions of InnoDB used an unnamed file format (now
called “Antelope”) for database files. With that format, tables
were defined with ROW_FORMAT=COMPACT
(or ROW_FORMAT=REDUNDANT
)
and InnoDB stored up to the first 768 bytes of variable-length
columns (such as BLOB
and
VARCHAR
) in the index record within the B-tree
node, with the remainder stored on the overflow page(s).
To preserve compatibility with those prior versions, tables
created with the InnoDB Plugin use the prefix format, unless one
of ROW_FORMAT=DYNAMIC
or ROW_FORMAT=COMPRESSED
is specified
(or implied) on the CREATE TABLE
command.
With the “Antelope” file format, if the value of a column is not
longer than 768 bytes, no overflow page is needed, and some
savings in I/O may result, since the value is in the B-tree node.
This works well for relatively short BLOB
s, but
may cause B-tree nodes to fill with data rather than key values,
thereby reducing their efficiency. Tables with many
BLOB
columns could cause B-tree nodes to become
too full of data, and contain too few rows, making the entire
index less efficient than if the rows were shorter or if the
column values were stored off-page.
When innodb_file_format
is set to “Barracuda” and a table is
created with ROW_FORMAT=DYNAMIC
or ROW_FORMAT=COMPRESSED
, long
column values are stored fully off-page, and the clustered index
record contains only a 20-byte pointer to the overflow page.
Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long, InnoDB chooses the longest columns for off-page storage until the clustered index record fits on the B-tree page.
The DYNAMIC
row format maintains the efficiency of storing the
entire row in the index node if it fits (as do the COMPACT
and
REDUNDANT
formats), but this new format avoids the problem of
filling B-tree nodes with a large number of data bytes of long
columns. The DYNAMIC
format is predicated on the idea that if a
portion of a long data value is stored off-page, it is usually
most efficient to store all of the value off-page. With DYNAMIC
format, shorter columns are likely to remain in the B-tree node,
minimizing the number of overflow pages needed for any given row.
The row format used for a table is specified with the ROW_FORMAT
clause of the CREATE TABLE
and ALTER TABLE
commands. Note that
COMPRESSED
format implies DYNAMIC
format. See
Section 3.2, “Enabling Compression for a Table” for more details on the
relationship between this clause and other clauses of these
commands.