Maintaining Storage Structures

Maintaining good performance is one of the major responsibilities of the DBA. Performance-enhancing tasks related to storage structures include modifying the database tables, compressing storage structures, and managing overflow.

This chapter discusses how and when to use the modify procedures to change storage structures for tables and secondary indexes. As part of regular system maintenance, you should use modify procedures to eliminate overflow pages and recover disk space for deleted rows.

This chapter describes how to perform modify procedures using VDBA, rather than SQL statements.

For additional information on database performance, see the chapter "Improving Database and Query Performance."

Storage Structures and Performance

The data for each table is stored in a file on disk. Tables consist of pages with a size that you define when you create the table. For example, you can specify a page size of 2 KB, 4 KB, and so forth by powers of two up to 64 KB. Each page has a certain amount of overhead, which depends on the page size. Relevant values and how they are calculated for each possible page size are described in Space Requirements for Tables.

Each page stores a number of rows. The number of rows per page varies, according to the row width, the storage structure of the table, whether or not the table is compressed, and how much data has been added or deleted because the table was last modified. Rows cannot span pages, limiting the maximum row width to the per-page data size.

The page is an important concept in understanding query performance because it affects the amount of disk I/O a query does, as well as the amount of CPU resources required to read through a table.

Display the Number of Pages in a Table

To see how many pages are in a table, you can use either VDBA or an SQL statement.

In VDBA, select a table and select the Pages tab.

In SQL, use the help table statement. For more information, see the SQL Reference Guide.

A display for a B-tree table is shown in this example:

Name: emp

Owner: ingres

Created: 22-sep-2006 10:27:00

Location: ii_database

Type: user table

Version: II9.0

Page size: 2048

Cache priority: 0

Alter table version: 0

Alter table totwidth: 70

Row width: 70

Number of rows: 32

Storage structure: B-tree

Compression: none

Duplicate Rows: not allowed

Number of pages: 6

Overflow data pages: 0

Journaling: enabled

Base table for view: yes

Permissions: none

Integrities: none

Optimizer statistics: none

Column Information:

Key

Column Name Type Length Nulls Defaults Seq

name varchar 20 no no 1

title varchar 15 no yes

hourly_rate money no yes

manager varchar 20 yes null

Secondary indexes: none

Limitations of Heap Structure

Without help from the storage structure, when you want to retrieve a particular row from a table, you must search through every row to see if it qualifies. (Searching through every row is called scanning the table.) Stopping at the first row that qualifies is not enough, because multiple rows can qualify.

Consider the data shown in a sample heap table:

        empno  name        age   salary    comment
        +-------------------------------------------
Page 0  | 17 | Shigio    |  29| 28000.000|
        |  9 | Blumberg  |  33| 32000.000|
        | 26 | Stover    |  38| 35000.000|
        |  1 | Mandic    |  46| 43000.000|
        |-------------------------------------------
Page 1  | 18 | Giller    |  47| 46000.000|
        | 10 | Ming      |  23| 22000.000|
        | 27 | Curry     |  34| 32000.000|
        |  2 | Ross      |  50| 55000.000|
        |-------------------------------------------
Page 2  | 19 | McTigue   |  44| 41000.000|
        | 11 | Robinson  |  64| 80000.000|
        | 28 | Kay       |  41| 38000.000|
        |  3 | Stein     |  44| 40000.000|
        |-------------------------------------------
Page 3  | 20 | Cameron   |  37| 35000.000|
        | 12 | Saxena    |  24| 22000.000|
        | 29 | Ramos     |  31| 30000.000|
        |  4 | Stannich  |  36| 33000.000|
        |-------------------------------------------
Page 4  | 21 | Huber     |  35| 32000.000|
        | 13 | Clark     |  43| 40000.000|
        | 30 | Brodie    |  42| 40000.000|
        |  5 | Verducci  |  55| 55000.000|
        |-------------------------------------------
Page 5  | 22 | Zimmerman |  26| 25000.000|
        | 14 | Kreseski  |  25| 24000.000|
        | 31 | Smith     |  20| 10000.000|
        |  6 | Aitken    |  49| 50000.000|
        |-------------------------------------------
Page 6  | 23 | Gordon    |  28| 27000.000|
        | 15 | Green     |  27| 26000.000|
        |  7 | Curan     |  30| 30000.000|Fire
        | 24 | Sabel     |  21| 21000.000|
        |-------------------------------------------
Page 7  | 16 | Gregori   |  32| 31000.000|
        |  8 | McShane   |  22| 22000.000|
        | 25 | Sullivan  |  38| 35000.000|
        |
        +-------------------------------------------

With this heap structure, a retrieval such as the following looks at every page in the emp table:

select * from emp where emp.name = 'Sullivan';

Although the Shigio record is the first row in the table, the following retrieval also looks at every row in the table:

select * from emp where emp.name = 'Shigio';

Because the table is not sorted, the entire table must be scanned in case there is another employee named Shigio on another page in the table.

Retrieval from a large table can be costly in time and system resources. To understand the performance consequences of a scan of a large table, assume that the emp table is actually 300,000 pages, rather than 8. Further, assume the disks can manage approximately 30 disk I/Os per second. Assume one disk I/O per page. With a heap storage structure, the example select operation takes 300,000 / 30 = 10,000 seconds (or 2 hours, 46 minutes) in disk access time alone, not counting the CPU time taken to scan each page once it is brought in from disk, and assuming no other system activity.

For a large table, a different storage structure is needed. A production system cannot tolerate a three-hour wait to retrieve a row. The solution is to provide a storage structure that allows for keyed access, like hash, ISAM, or B-tree.