Maintaining good performance is one of the major responsibilities of the DBA. Performance-enhancing tasks related to storage structures include modifying the database tables, compressing storage structures, and managing overflow.
This chapter discusses how and when to use the modify procedures to change storage structures for tables and secondary indexes. As part of regular system maintenance, you should use modify procedures to eliminate overflow pages and recover disk space for deleted rows.
This chapter describes how to perform modify procedures using VDBA, rather than SQL statements.
For additional information on database performance, see the chapter "Improving Database and Query Performance."
The data for each table is stored in a file on disk. Tables consist of pages with a size that you define when you create the table. For example, you can specify a page size of 2 KB, 4 KB, and so forth by powers of two up to 64 KB. Each page has a certain amount of overhead, which depends on the page size. Relevant values and how they are calculated for each possible page size are described in Space Requirements for Tables.
Each page stores a number of rows. The number of rows per page varies, according to the row width, the storage structure of the table, whether or not the table is compressed, and how much data has been added or deleted because the table was last modified. Rows cannot span pages, limiting the maximum row width to the per-page data size.
The page is an important concept in understanding query performance because it affects the amount of disk I/O a query does, as well as the amount of CPU resources required to read through a table.
To see how many pages are in a table, you can use either VDBA or an SQL statement.
In VDBA, select a table and select the Pages tab.
In SQL, use the help table statement. For more information, see the SQL Reference Guide.
A display for a B-tree table is shown in this example:
Name: emp
Owner: ingres
Created: 22-sep-2006 10:27:00
Location: ii_database
Type: user table
Version: II9.0
Page size: 2048
Cache priority: 0
Alter table version: 0
Alter table totwidth: 70
Row width: 70
Number of rows: 32
Storage structure: B-tree
Compression: none
Duplicate Rows: not allowed
Number of pages: 6
Overflow data pages: 0
Journaling: enabled
Base table for view: yes
Permissions: none
Integrities: none
Optimizer statistics: none
Column Information:
Key
Column Name Type Length Nulls Defaults Seq
name varchar 20 no no 1
title varchar 15 no yes
hourly_rate money no yes
manager varchar 20 yes null
Secondary indexes: none
Without help from the storage structure, when you want to retrieve a particular row from a table, you must search through every row to see if it qualifies. (Searching through every row is called scanning the table.) Stopping at the first row that qualifies is not enough, because multiple rows can qualify.
Consider the data shown in a sample heap table:
empno name age salary comment
+-------------------------------------------
Page 0 | 17 | Shigio | 29| 28000.000|
| 9 | Blumberg | 33| 32000.000|
| 26 | Stover | 38| 35000.000|
| 1 | Mandic | 46| 43000.000|
|-------------------------------------------
Page 1 | 18 | Giller | 47| 46000.000|
| 10 | Ming | 23| 22000.000|
| 27 | Curry | 34| 32000.000|
| 2 | Ross | 50| 55000.000|
|-------------------------------------------
Page 2 | 19 | McTigue | 44| 41000.000|
| 11 | Robinson | 64| 80000.000|
| 28 | Kay | 41| 38000.000|
| 3 | Stein | 44| 40000.000|
|-------------------------------------------
Page 3 | 20 | Cameron | 37| 35000.000|
| 12 | Saxena | 24| 22000.000|
| 29 | Ramos | 31| 30000.000|
| 4 | Stannich | 36| 33000.000|
|-------------------------------------------
Page 4 | 21 | Huber | 35| 32000.000|
| 13 | Clark | 43| 40000.000|
| 30 | Brodie | 42| 40000.000|
| 5 | Verducci | 55| 55000.000|
|-------------------------------------------
Page 5 | 22 | Zimmerman | 26| 25000.000|
| 14 | Kreseski | 25| 24000.000|
| 31 | Smith | 20| 10000.000|
| 6 | Aitken | 49| 50000.000|
|-------------------------------------------
Page 6 | 23 | Gordon | 28| 27000.000|
| 15 | Green | 27| 26000.000|
| 7 | Curan | 30| 30000.000|Fire
| 24 | Sabel | 21| 21000.000|
|-------------------------------------------
Page 7 | 16 | Gregori | 32| 31000.000|
| 8 | McShane | 22| 22000.000|
| 25 | Sullivan | 38| 35000.000|
|
+-------------------------------------------
With this heap structure, a retrieval such as the following looks at every page in the emp table:
select * from emp where emp.name = 'Sullivan';
Although the Shigio record is the first row in the table, the following retrieval also looks at every row in the table:
select * from emp where emp.name = 'Shigio';
Because the table is not sorted, the entire table must be scanned in case there is another employee named Shigio on another page in the table.
Retrieval from a large table can be costly in time and system resources. To understand the performance consequences of a scan of a large table, assume that the emp table is actually 300,000 pages, rather than 8. Further, assume the disks can manage approximately 30 disk I/Os per second. Assume one disk I/O per page. With a heap storage structure, the example select operation takes 300,000 / 30 = 10,000 seconds (or 2 hours, 46 minutes) in disk access time alone, not counting the CPU time taken to scan each page once it is brought in from disk, and assuming no other system activity.
For a large table, a different storage structure is needed. A production system cannot tolerate a three-hour wait to retrieve a row. The solution is to provide a storage structure that allows for keyed access, like hash, ISAM, or B-tree.