B-tree Storage Structure

B-tree is the keyed storage structure in which data is sorted by value in the key column for fast access on the exact value and range retrievals, and the index is dynamic. It is the most versatile storage structure. The B-tree structure allows for keyed access and supports range searches and pattern matching. The B-tree index is dynamic, growing as the table grows. This eliminates the overflow problems that static structures like ISAM and hash present as they grow. B-tree also allows for maximum concurrent use of the table.

B-tree design incorporates a sparse index that points to pages in a leaf level. The leaf level is a dense index that points to the rows on the data pages in the table. The benefit of this indexing approach is that it minimizes splitting cost: when splitting does occur, the actual data rows need not move. Only the leaf and index levels require reorganization, as described in Index Growth in a B-tree Table.

Structure of a B-tree Table

A B-tree can be viewed as four separate parts:

A free list header page, which is used to keep track of allocated pages that are not currently being used
One or more index pages, which contain leaf page numbers and the range of key values to expect on each leaf page
One or more leaf pages, which identify the data page and row where the data is stored
One or more data pages, where the user data is actually stored

The smallest B-tree has four pages, one of each type.

Note: If a secondary index is modified to B-tree, it cannot contain data pages. Instead, the leaf pages of the secondary index reference the main table's data pages. For more information, see Secondary Indexes.

The index level is similar to the ISAM index, except that the ISAM index points to data pages, whereas the B-tree index level points to leaf pages. The number of index pages is dependent on the width of the key and the number of leaf pages, because eventually the index pages point to a particular leaf page. Usually the index level is small, because it needs to point to only the leaf pages.

The leaf page level is considered a dense index because it tells the location of every row in the table. In dense indexes, rows on data pages do not move during a split; that causes their tids to change. Tids identify every row on every data page. For a complete discussion of tids, see Tids.

The index level is considered a sparse index, because it contains only a key value and a pointer to a page.

The following diagram illustrates the three B-tree levels: index page, leaf page, and data page. It illustrates the relationship between the three levels, but cannot be realistic. In actuality, if the key width name were only 30 characters, the row width were 500 bytes, and there were only 31 employees, this B-tree has only a free list header page, one index page, one leaf page, and 8 data pages (instead of 4 leaf pages and 3 index pages).

+----------------------------------+

| ROOT |

| |

| <= McShane |

INDEX PAGE +----------------------------------+

LEVEL / \

/ \

+-------------------------------------+ +---------------------------------+

| Key Leaf Page | | Key Leaf Page |

| | | |

| <= Giller 1 | | > McShane <= Shigio 3 |

| > Giller <= McShane 2 | | > Shigio 4 |

+-------------------------------------+ +---------------------------------+

LEAF PAGE LEVEL

Leaf Page 1 Leaf Page 2 Leaf Page 3 Leaf Page 4

Aitken 1,0 Gordon 3,0 McTigue 5,0 Smith 7,0

Blumberg 1,1 Green 3,3 Ming 5,1 Stannich 7,1

Brodie 1,3 Gregori 3,2 Ramos 5,2 Stein 7,2

Cameron 1,2 Huber 3,1 Robinson 5,3 Stover 7,3

Clark 2,0 Kay 4,0 Ross 6,0 Sullivan 8,0

Curan 2,1 Kreseski 4,1 Sabel 6,1 Verducci 8,1

Curry 2,2 Mandic 4,2 Saxena 6,2 Zimmerman 8,2

Giller 2,3 McShane 4,3 Shigio 6,3

DATA PAGE LEVEL

+-------------------------------------------+

Page 1 0 |Aitken | 1| 49| 50000.000 |

1 |Blumberg | 2| 33| 32000.000 |

2 |Cameron | 4| 37| 35000.000 |

3 |Brodie | 3| 42| 40000.000 |

|-------------------------------------------|

Page 2 0 |Clark | 5| 43| 40000.000 | Associated Data

1 |Curan | 6| 30| 30000.000 | Page for Leaf

2 |Curry | 7| 34| 32000.000 | Page 1

3 |Giller | 8| 47| 46000.000 |

|-------------------------------------------|

Page 3 0 |Gordon | 9| 28| 27000.000 |

1 |Huber | 12| 35| 32000.000 |

2 |Gregori | 11| 32| 31000.000 |

3 |Green | 10| 27| 26000.000 |

|-------------------------------------------|

Page 4 0 |Kay | 13| 41| 38000.000 | Associated Data

1 |Kreseski | 14| 25| 24000.000 | Page for Leaf

2 |Mandic | 15| 46| 43000.000 | Page 2

3 |McShane | 16| 22| 22000.000 |

|-------------------------------------------|

Page 5 0 +McTigue | 17| 44| 41000.000 +

To look for an employee named Kay, the search starts from the root node, where a name that precedes McShane in the alphabet directs you down the left side of the index.

The index page on the left shows that leaf Page 2 is the appropriate page on which to look, because Kay comes between Giller and McShane in the alphabet.

On leaf Page 2, Kay's record is identified as being on data Page 4, row 0. Going directly to data Page 4, row 0, Kay's record is located.

Associated Data Pages in a B-tree Table

Every leaf page has an associated data page. The associated data page is where new rows are added. A leaf page can actually point to several different pages, but new data is only added to the associated data page. When an associated data page fills up, a new associated data page is attached to the leaf page. If you delete rows that exist on the current associated data page, the deleted space is reused.

Having one associated data page per leaf page provides a good chance for rows with similar key ranges to exist on the same data page, thereby increasing the likelihood that data references occur on the same data page.

Index Growth in a B-tree Table

The major difference between ISAM and B-tree is that the B-tree index grows as the table grows. If you added these five new employees to the ISAM employee table, keyed on name: Zanadu, Zentura, Zilla, Zorro, Zumu, these names are put on the last page of the ISAM table. Because they do not all fit on the last page, they are put onto an overflow page attached to the last page.

If you added these five new employees to the B-tree table, you add the new names to the appropriate leaf page (Page 4, in this case) and their records go on the associated data page for leaf Page 4. Because the associated data page fills up, a new associated data page is assigned to Page 4. If the leaf page is full, and cannot hold all five names, the leaf page splits into two leaf pages, and a reference to the new leaf page in the index is added. If the index page can no longer hold a reference to another leaf page, the index is split as well.

Splitting in a B-tree Table

Splitting occurs fairly frequently while the table is small and growing. As the table gets larger, splitting occurs less frequently (unless a sequential key is used) and usually only in the leaf or lowest index level.

Repeated inserts into the right-most leaf of a B-tree table create empty leaf pages rather than half-full ones. This improves insert and retrieval performance, and increases disk space efficiency.

Locking and B-tree Tables

During normal B-tree traversal, leaf and data pages are logically locked until the end of the transaction. B-tree index pages are only temporarily locked during query execution. The index page lock is released after the page has been searched.

When searching the B-tree index, ladder locking is used: a lock is taken on the first index page, which points to another index page. The next index page is locked and, once it is locked, the first lock is dropped, and so on down the index to the leaf level.

The locking system always locks the leaf and data pages when accessing B-tree tables. Because of this, locking in a B-tree table requires twice as many locks as locking an ISAM or hash table. It is wise to set the maxlocks escalation factor higher than the default when using the B-tree storage structure. For details, see the set lockmode statement in the SQL Reference Guide.

Sorted Order in a B-tree Table

In the diagram in Structure of a B-tree Table, rows for Huber and Green are not in sorted order on the data page. This happens if Huber's record was appended before Green's. They both end up on the same data page, but slightly out of order. This happens in ISAM as well. However, if you tried the following retrieval, you retrieve the rows in sorted order if the employee table was a B-tree. This is because the leaf pages are used to point to the data rows, and the leaf pages maintain a sorted sequence:

select * from employee
where employee.name like 'G%';

The data on the data pages is not guaranteed to be sorted, but the access, which is always through the leaf pages, guarantees that the data is retrieved in sorted order. (This is not true for ISAM.)

Because the leaf entries are in sorted order, the maximum aggregate for a B-tree key does not require a table scan. Instead the index is read backwards.

Deleted Rows in a B-tree Table

If rows are deleted on the associated data page, the space is reused the next time a row is appended to that page. If rows are deleted from a data page that is no longer associated, the space is not reused. If all the rows on a non-associated data page are deleted, the page is immediately added to the free list and becomes available for reuse.

Note: The only way to free up unused data pages completely and return disk space to the operating system is to change the storage structure to B-tree. You can do this using the Modify Table Structure dialog or using the modify statement.

The reason that deleted space on a non-associated data page is not automatically reused is to speed the append operation. Appending to one particular page (the "associated data page") is faster than tracking and checking all the available free space on non-associated data pages; appending to the associated data page also provides better key clustering when data addition occurs in sorted key order. Because appends generally occur more frequently than deletes, preserving the performance of the append operation seems wiser than reusing deleted space from non-associated data pages.

When to Use B-tree

B-tree is the most versatile storage structure, as it supports both exact match and range retrievals and includes a dynamic index, so that frequent remodification is not necessary.

B-tree is a good storage structure to use in any of these cases:

The table is growing at a rapid rate.
You use pattern matching.
You retrieve ranges of key values.
You retrieve using only the leftmost part of a multi-column key.

B-tree is a poor storage structure to use if:

The table is relatively static.
The table is small, static, and access is heavily concurrent.

B-tree Troubleshooting

The following are problems encountered with the B-tree storage structure, and their solutions:

Problem	Solution
You tried to use pattern matching, but did not specify the leftmost character.	Specify the leftmost part of the key; F does not use the B-tree index, but F* does. If you cannot modify the search condition, the entire table must be scanned.
You tried to use just part of a multi-column key, but did not specify the leftmost column.	Specify the leftmost column of the multi-column key. If you cannot modify the search condition, create a secondary index with only the columns on which you are searching.
You are deleting frequently, as well as adding data.	To reclaim space, periodically select Shrink B-tree Index in the Modify Table Structure dialog, or use the modify to merge or modify statements.