Secondary Indexes

Secondary indexes provide a mechanism for specifying an additional key to the base table.

For instance, assume that an employee table containing name (employee's name) and empno (employee number) columns is hashed on empno, but occasionally data must be retrieved based on the employee's name rather than the employee number. You can create a secondary index on the name column of the table.

Ways to Work with Indexes

You can create and work with indexes using either VDBA or SQL statements.

In VDBA, using the Indexes branch for a particular table in the Database Object Manager window, you can:

Create index objects
View existing index objects, including the detailed properties of each individual object
Drop index objects

Indexes are dropped automatically when the base table is destroyed. Indexes are also dropped when the base table is modified, unless the Persistence option is specified for the index.

For the detailed steps for performing these procedures in VDBA, see the Procedures section of online help.

In SQL, you can accomplish these tasks using the create index, drop, and help index statements. For more information, see the SQL Reference Guide.

Implementation and Overhead of Secondary Indexes

Secondary indexes are actually tables that are automatically tied to the base table. Secondary indexes must be updated whenever the base table is changed, so they must be used sparingly. The user need not explicitly reference the secondary index for it to be used in a query. In fact, you cannot directly update a secondary index and probably never reference it. If the Ingres optimizer sees that an index is available to help solve the query, generally the index is used.

By default, secondary indexes are created as ISAM tables. You can change the storage structure of the index by modifying the secondary index once it is created, or by specifying another structure when you create the index.

In VDBA, you create indexes using the Create Indexes dialog and modify them using the Modify Index Structure dialog. For more information on modifying an existing index, see the chapter "Maintaining Storage Structures."

The following example shows the relationship of a secondary index to a base table:

Select * from xnameselect name,tid from employee
|name                  |tidp||name                |tid |
|-------------------------- ||-------------------------|
|Aitken                |3072||Gregori             |   0|
|Blumberg              | 512||Sabel               |   1|
|Brodie                |3584||Blumberg            | 512|
|Cameron               |1024||Kay                 | 513|
|Clark                 |4096||Shigio              | 514|
|Curan                 |1536||Cameron             |1024|
|Curry                 |4608||Mandic              |1025|
|Giller                |2048||Stannich            |1026|
|Gordon                |5120||Curan               |1536|
|Green                 |2560||McTigue             |1537|
|Gregori               |   0||Stover              |1538|
|Huber                 |3073||Giller              |2048|
|Kay                   | 513||Ramos               |2049|
|Kreseski              |3585||Verducci            |2050|
|Mandic                |1025||Green               |2560|
|McShane               |4097||Ross                |2561|
|McTigue               |1537||Aitken              |3072|
|Ming                  |4609||Huber               |3073|
|Ramos                 |2049||Saxena              |3074|
|Robinson              |5121||Brodie              |3584|
|Ross                  |2561||Kreseski            |3585|
|Sabel                 |   1||Smith               |3586|
|Saxena                |3074||Clark               |4096|
|Shigio                | 514||McShane             |4097|
|Smith                 |3586||Stein               |4098|
|Stannich              |1026||Curry               |4608|
|Stein                 |4098||Ming                |4609|
|Stover                |1538||Sullivan            |4610|
|Sullivan              |4610||Gordon              |5120|
|Verducci              |2050||Robinson            |5121|
|Zimmerman             |5122||Zimmerman           |5122|
|---------------------------||-------------------------|

There is a row in the secondary index xname for every row in the employee table. There is also a column called tidp. This is the tid of the row in the base table. Tids identify every row on every data page. For a complete discussion of tids, see Tids. The tidp entry for an employee is the tid of the employee's record in the base table.

There are no limits to the number of secondary indexes that can be created on a table. However, there is overhead involved in the maintenance and use of a secondary index that you must be aware of:

When you add a row to the base table, you add an entry into every secondary index on the table as well.
When a row in the base table moves, causing the tid to change, every secondary index must be updated to reflect this change. In a base table, rows move when the key is updated or if the table is compressed and a row is replaced that no longer fits in the same page.
Note: For a compressed table, when a varchar(width) column is updated and then recompressed, the row size can change.
When the base table is updated, so that there is a change of the value in a column, which is used as the key of a secondary index, the key of the secondary index has to be updated as well.
When processing a query execution plan for a query, the more indexes and plans possible for the query, the longer it takes to decide what query execution plan to use.

R-tree Secondary Index

An R-tree storage structure is a secondary index for multi-dimensional object management extension data types that can provide the requisite functions (nbr and hilbert).

The R-tree index is a secondary index only. The access method of the base table is B-tree, hash, heap, or ISAM. The R-tree index uses two functions to describe and sort its data. The R-tree index is built on the nbr (normalized bounding rectangle) function of the original object, not the object itself. The nbr function describes the location of each object. The hilbert function sorts the nbr values so that nbr records describing close locations are close to one another in the R-tree index table.

For more information on the nbr and hilbert functions and for more information on handling objects, see the Object Management Extension User Guide.

An R-tree index allows Ingres to answer range queries, such as: "find all records where its position overlaps this spot," quickly. Without an R-tree index, the whole database must be read. Consider two tables: Table A is a table of houses, and Table B contains park information and location. The query, "select all houses where the house intersects a park" is an example of a spatial join. Without an R-tree index, the spatial join reads Table B entirely for each row in Table A.

When creating an R-tree index (for example, using the Create Indexes dialog or the create index statement) you must include range values, which specify the minimum and maximum values of the index column.

The following example illustrates an R-tree index:

select shape, hex(hilbert), tidp from xfio_shape_ix;

+-----------------------------------------------+------------+-------+
|shape                                          |col2        |tidp   |
+-----------------------------------------------+------------+-------+
|((6644550,2412235),(6651911,2425562))          |182343433792|      0|
|((5711593,7469490),(5720615,7473074))          |2CBBAFC085E6|   1541|
|((5755540,7431379),(5765798,7468084))          |2CBC38CC815C|   1543|
|((5764642,7468084),(5776333,7489652))          |2CCEABAE4E25|   1542|
|((5760044,7471142),(5775065,7492024))          |2CCEAC433EF1|   1544|
|((4392392,7367220),(4392773,7368251))          |2F0514CC452B|      3|
|((4393222,7381338),(4393696,7382470))          |2F05ECE43CA5|   1536|
|((6105365,8716914),(6119516,8719411))          |7BC8B02F74CE|   1539|
|((6104208,8719411),(6123227,8733088))          |7BC8B47DB378|   1538|
|((6082882,8707086),(6104747,8708099))          |7BCA043955D6|   1540|
|((8995748,12135179),(8999981,12144160))        |8F8235359771|   1537|
|((9289826,13632441),(9325335,13663808))        |9356B03B9AA0|      1|
|((9268185,13666317),(9286628,13724240))        |93591514F7A8|      4|
|((9396304,16145868),(9397279,16148181))        |95C328081C95|      2|
|((11623892,4873084),(11624345,4874079))        |DF6722ADDB47|      7|
|((11624186,4871079),(11624855,4871713))        |DF6727B0C6D0|      6|
|((11622165,4875404),(11624949,4877801))        |DF672D336FDD|      8|
|((11621206,4874079),(11624345,4876640))        |DF672D738440|     10|
|((11621807,4874417),(11624499,4877759))        |DF672D7B50C1|      9|
|((11610646,4875871),(11612145,4878603))        |DF67321EEFB6|      5|
+-----------------------------------------------+------------+-------+
(20 rows)

The shape column contains the nbr coordinates. The col2 column contains the hilbert number for the nbr. The tidp column corresponds to the tid value of the object in the base table. Tids identify every row on every data page.

Secondary Indexes and Performance

Secondary indexes are generally used to index into the base table they see, although if the query can be executed in the secondary index alone, the base table need not be visited. Using secondary indexes to help complete queries that are otherwise executed on the base table can dramatically reduce the query execution time.

For example, assume a secondary index exists on the name column for the employee table, and the following query is executed:

select empno, age, name
from employee
where name like 'A%';

First, records beginning with an "A" in the secondary index are located, and using the tidp column, each tidp is used to do a tid lookup into the employee table, to get the rest of the information about the employee, namely empno and age. Tids identify every row on every data page.

Both the secondary index and the base table are used in this query. However, had the retrieval asked only for employee.name rather than empno and age, the base table is not used, and the number of disk I/Os executed is reduced by more than 50%.

Even in some situations requiring scans of the entire table, you can dramatically improve performance by loading the columns retrieved into the secondary index, so that probing the base table is not necessary. An example is shown in Example: Loading Retrieved Columns into a Secondary Index to Improve Performance.

Example: Load Retrieved Columns into a Secondary Index to Improve Performance

In this example, the table bigtable contains 100,000 rows and 20,000 pages.

First, follow these steps to modify the bigtable to use a B-tree structure keyed on three columns:

In VDBA, open the Modify Table Structure dialog for bigtable. For more information, see the chapter "Maintaining Storage Structures" and online help.
Enable Change Storage Structure and click Structure.
The Structure of Table dialog opens.
Select B-tree in the Structure drop-down list, enable col1, col2, and col3 in the Columns group box to specify them as keys, and then click OK.
The Structure of Table dialog closes.
Click OK
The Modify Table Structure dialog closes.

Next, a select statement is issued in which the key columns are specified in the where clause. This search requires a full table scan, even though the three columns in question are key columns in the bigtable structure:

select col1, col2, col3 from bigtable
where col1 = 'Colorado', col2 = 17, col3 = 'repo';

Creating a secondary index on the three columns alleviates this problem.

Follow these steps to create a secondary index, with name xbig:

In VDBA, open the Create Indexes dialog for bigtable. For more information, see online help.
Enter xbig in the Index Name edit control.
For each of the key columns, col1, col2, and col3, select the column in the Base Table Columns list box, and click the double-right arrow (>>) to add it to the Index Columns list box, and then click OK.

The index xbig is 500 pages. Issuing the exact same query as before (shown again below) now uses the secondary index, thereby reducing the scan from 20,000 pages to 500 pages:

select col1, col2, col3 from bigtable
where col1 = 'Colorado', col2 = 17, col3 = 'repo';

Aggregates on secondary indexes can be more efficient, because the index is so much smaller than the base table. For example, if there was a secondary index on col1, this aggregate is processed in much less time:

select avg(col1) from bigtable;

Forced Use of Secondary Indexes

You can force a secondary index to be used by referencing it in the query, but the optimizer must ensure that this is never necessary. For example, consider the following query:

select * from emp
where emp.name = 'Shigio';

To force it to use a secondary index, change it to the following:

select * from emp, xname
where xname.tidp = emp.tid
and xname.name = 'Shigio';

Two Secondary Indexes

There is no reason for having two secondary indexes on the same column, for example, one hash and one ISAM. Instead, use the index giving you the most versatile access path because the overhead of maintaining and using two indexes is more than the disk I/O saved for a few queries.

If you need two access paths, and you want one to be hash and the other to be ISAM or B-tree, you can use ISAM (or B-tree) for the base table access method and hash for the index. ISAM and B-tree cluster similar data on the same data page, while hash randomizes data, so that ranges of values are not clustered. With the base table as ISAM or B-tree, range retrievals find the physical rows clustered on the same data pages, reducing the amount of disk I/O needed to execute range queries. If the base table is hash, the ISAM index points to the qualifying rows, but these rows are spread randomly about the table instead of being clustered on the same data pages.