Overflow Management

Overflow chains can slow down performance considerably. Overflow must be monitored and prevented as much as possible.

Preventing or reducing overflow requires you to do the following:

Carefully monitor overflow in both primary tables and secondary indexes
Avoid the use of repetitive keys, including both primary keys and secondary index keys
Modify table structure to redistribute poorly distributed overflow
Understand the overflow implications when choosing a particular storage structure

Measure the Amount of Overflow

You can monitor overflow using either VDBA or an SQL statement.

In VDBA, select a table or secondary index in the Database Object Manager window, and click the Pages tab.

In SQL, you can monitor overflow with the help table statement. For more information, see the SQL Reference Guide.

For tables, overflow data is displayed in red in the pie chart, as indicated in the legend. Heap tables are considered as one main page, with an overflow chain attached to the main page. For B-tree tables, overflow occurs only at the leaf level and only with duplicate keys.

The iitables catalog (a view into the iirelation catalog) includes one row for each table in the database. It contains pertinent information for evaluating overflow.

For example, the following query results in the information shown in the table:

select table_name, storage_structure,
number_pages, overflow_pages
from iitables

table_name	storage_structure	number_pages	overflow_pages
manager	hash	22	4
department	B-tree	5	0
parts	B-tree	5	0
orders	heap	3	0

The above figures are approximate; they are updated only when they change by a certain percentage (5%) to prevent performance degradation by continuously updating these catalogs. Also, if transactions that involve many new pages are backed out during a recovery, the page counts cannot be updated. Page counts are guaranteed to be exact only after modification.

In evaluating overflow, if the number of overflow pages is greater than 10-15% of the number of data pages, expect performance degradation. Overflow must be regularly monitored to ensure that performance does not degrade as rows are appended to tables.

Repetitive Key Overflow

Storage structures other than heap that have a high degree of duplication in the key values are likely to have overflow because duplicate keys are stored in overflow pages. Keys with a high degree of duplication are not recommended. This applies to secondary index keys as well as primary keys.

Repetitive key overflow occurs, for example, if the emp table is keyed on sex, resulting in two primary pages for the values "M" and "F." The remainder of the pages are overflow pages to these two primary pages.

Consider if the following query is run:

select * from student
where student.sex = 'F'
and student.name = 'Baker';

The key is used to find the first primary page. The search goes down the entire overflow chain for "F" looking for all names Baker. Every page is checked. Because this query looks restrictive, the locking system probably chooses to page level lock. The query locks 10 pages and eventually escalates to a table level lock. Wait for the table level lock if other users are updating. Finally, the search finishes scanning the overflow chain and returns the row.

Retrieval performance with a duplicate key is still better than for a heap table because only half the table is scanned.

However, update performance suffers. If a user wants to append a new female student, the locking system starts by exclusively locking pages in the "F" overflow chain. If another 10 pages need to be locked eventually, the locking system attempts to escalate to an exclusive table level lock. If only one user is updating the table, the lock is easily obtained. If multiple users are trying to update the table at the same time, deadlock is likely.

User1 and User2 both exclusively hold 10 pages in the table. User1 wants to escalate to an exclusive table level lock so the query can continue, but User1 cannot proceed until User2 drops the exclusive page level locks User2 holds. User2 also wants to obtain an exclusive table level lock, but cannot proceed until User1 releases the locks. This is deadlock, which can seriously degrade update performance. For more information, see the chapter "Understanding the Locking System."

Poorly Distributed Overflow

Overflow that is not uniformly distributed, that is, it is concentrated around one or two primary pages, is poorly distributed. A classic example of poorly distributed overflow occurs when new rows are added to a table with a key that is greater than all the keys that already exist in the table (for example, a time stamp). If this table has an ISAM structure, the table builds up overflow in the last primary page, and all operations involving this overflow chain can exhibit poor performance. This type of table is best stored as a B-tree or hash.

Overflow and ISAM and Hash Tables

In hash and ISAM tables that have had a large amount of data added and have not been remodified, overflow and the resulting performance degradation is easy to understand. A keyed retrieval that normally touches one page now has to look through not only the main data page, but also every overflow page associated with the main data page. For every retrieval, the amount of disk I/O increases as the number of overflow pages increases.

Overflow pages are particular to a main data page for ISAM and hash tables, not to the table itself. If a table has 100 main pages and 100 overflow pages, it is likely that the overflow pages are distributed over many main data pages (that is, each main data page has perhaps one overflow page). A keyed retrieval on such a table possibly causes only one additional I/O rather than 100 additional I/Os.

For more information on overflow in hash tables, see Alternate Fill Factors.

For ISAM tables, because the ISAM index is static, if you append a large number of rows, the table can begin to overflow. If there is no room on a page to append a row, an overflow page is attached to the data page. For example, if you wanted to insert empno #33, there is no more room on the data page, so an overflow page is allocated for the data page as shown in the following diagram:

Page 8 Overflow Page for Primary Page 8

|--------------------------- | |-------------------------------|

| 29 |Ramos | 31| 30000.000 | | 33 |Quinn | 33| 20000.000 |

| 30 |Brodie | 42| 40000.000 | ---> | |

| 31 |Smith | 20| 10000.000 | | |

| 32 |Horst | 26| 50000.000 | | |

|--------------------------- | |-------------------------------|

For hash and ISAM tables, one way of looking at overflow is by looking at the tids of rows and analyzing the way the tids grow in a sequential scan through the table. For more information, see Tids.

Example: Showing Overflow Distribution

The sample code shown here can be customized to show overflow distribution. Each time a primary page is encountered, the tid's value grows by 512. If a primary page has associated overflow pages, the tid's value jumps by more than 512. So if you run the embedded SQL/C program shown in Sample Code to Show Overflow, the output looks like that shown in Output from Sample Code.

Sample Code to Show Overflow

page_val = 0;
exec sql select key, tid
    into :key_val, :tid_val
    from tablename
exec sql begin;
  if (tid_val == page_val)
  {
    printf("Primary Page %d, tid = %d,",(page_val/512)+1, tid_val);
    printf(" Starting key value = %d0", key_val);
    page_val = page_val + 512;
    old_tid_val = tid_val;
    overflow_page = 0;
  }
  else
  {
    if (tid_val > old_tid_val + 1)
  {
      overflow_page++;
      printf("\n Overflow page %d,tid = %d0",over_page,tid_val);
    }
    old_tid_val = tid_val;
  }
exec sql end;

Output from Sample Code

Primary Page 1, tid = 0, Starting Key Value = 123
Overflow page 1,tid = 2048
Overflow page 2,tid = 2560
Overflow page 3,tid = 3072
Overflow page 4,tid = 3584
Primary Page 2, tid = 512, Starting Key Value = 456
Overflow page 1,tid = 4096
Overflow page 2,tid = 4608
Overflow page 3,tid = 5120
Overflow page 4,tid = 5632

B-tree Tables and Overflow

Eliminating overflow is one of the major benefits of the B-tree storage structure. Overflow in a B-tree occurs only at the leaf level and only if you have a significant number of duplicate keys.

For example, if 30 new employees all joined the company and all had the last name Aitken, the attempt is made to add their records to leaf page 1. In this case, because leaf page 1 can hold only 8 keys (remember that the leaf page can actually hold 2000/(key_size + 6)), an overflow leaf page is added to hold all the duplicate values. This is different than splitting the leaf page, because the same index pointer can still point to the same leaf page and be accurate. There are no additional key/leaf page entry added to the index.

In B-tree tables, you can look at overflow in the leaf level by running a query of the following type, substituting your B-tree table name for t, your B-tree keys for the keycol values, and the width of the key for key_width:

select keycol1, keycol2, overflow =
  (count(*)/keys_per_page)-1
  from tablename t
  group by keycol1, keycol2;

Notes:

This query is not needed for a B-tree index, in which the automatic inclusion of the tidp column in the key prevents overflow.
For B-tree tables with key compression selected, in the select statement you can substitute an estimate of the average key size for key_width.
For keys_per_page calculations, see the chapter "Calculating Disk Space."

The results of this query give an approximation of the amount of overflow at the leaf level, per key value. The query works by calculating the number of keys that fit on a page and dividing the total number of particular key incidents—grouped by key—by this value. For instance, if there are 100 occurrences of a particular key and 10 keys fit on each page, there are nine overflow pages at the leaf level.

Other tables can incur overflow pages for reasons other than duplicate keys; hence, overflow distribution can involve more than simply running a query.

Secondary Indexes and Overflow

Overflow must be monitored in secondary indexes, as well as in the primary tables. Even if the base table has a low overflow percentage, the secondary indexes can badly overflow. Except when the base table is a heap or B-tree table, the base table generally overflows before the secondary index.

Secondary indexes need to be monitored and modified at interim points—even between base table modifications—to ensure a low percentage of overflow pages. More information is provided in Modifying Secondary Indexes.