Modifying the Catalogs to Extend Indices

Look back at Figure 13-1. The right half shows the catalogs that we must modify in order to tell PostgreSQL how to use a user-defined type and/or user-defined operators with an index (that is, pg_am, pg_amop, pg_amproc, pg_operator and pg_opclass). Unfortunately, there is no simple command to do this. We will demonstrate how to modify these catalogs through a running example: a new operator class for the B-tree access method that stores and sorts complex numbers in ascending absolute value order.

The pg_am table contains one row for every user defined access method. Support for the heap access method is built into PostgreSQL, but every other access method is described here. The schema is shown in the following table:

Table 13-2. pg_am Schema

ColumnDescription
amnamename of the access method
amowneruser id of the owner
amstrategiesnumber of strategies for this access method (see below)
amsupportnumber of support routines for this access method (see below)
amorderstrategyzero if the index offers no sort order, otherwise the strategy number of the strategy operator that describes the sort order
amgettuple 
aminsert 
...procedure identifiers for interface routines to the access method. For example, regproc ids for opening, closing, and getting rows from the access method appear here.

The object ID of the row in pg_am is used as a foreign key in many other tables. You do not need to add a new row to this table; all that you need is the object ID of the row of the access method you want to extend:
SELECT oid FROM pg_am WHERE amname = 'btree';

 oid
-----
 403
(1 row)
   
We will use this query in a WHERE clause later.

The amstrategies column exists to standardize comparisons across data types. For example, B-trees impose a strict ordering on keys, lesser to greater. Since PostgreSQL allows the user to define operators, but it cannot tell by looking at the name of an operator (for example, ">" or "<") what kind of comparison it is. In fact, some access methods do not impose any ordering at all. For example, R-trees express a rectangle-containment relationship, whereas a hashed data structure expresses only bitwise similarity based on the value of a hash function. PostgreSQL needs some consistent way of taking a qualification in your query, looking at the operator and then deciding if a usable index exists. This implies that Red Hat Database needs to know, for example, that the "<=" and ">" operators partition a B-tree. Red Hat Database uses strategies to express these relationships between operators and the way they can be used to scan indices.

Defining a new set of strategies is beyond the scope of this discussion, but we will explain how B-tree strategies work because you will need to know that is done in order to add a new operator class. In the pg_am table, the amstrategies column is the number of strategies defined for this access method. For B-trees, this number is 5. These strategies are shown in the following table:

Table 13-3. B-tree Strategies

OperationIndex
less than1
less than or equal2
equal3
greater than or equal4
greater than5

The idea is that you will need to add procedures corresponding to the comparisons above to the pg_amop relation (see below). The access method code can use these strategy numbers, regardless of data type, to figure out how to partition the B-tree, compute selectivity, and so on. Do not worry about the details of adding procedures yet; just understand that there must be a set of these procedures for int2, int4, oid, and every other data type on which a B-tree can operate.

Sometimes, strategies are not enough information for the system to figure out how to use an index. Some access methods require other support routines in order to work. For example, the B-tree access method must be able to compare two keys and determine whether one is greater than, equal to, or less than the other. Similarly, the R-tree access method must be able to compute intersections, unions, and sizes of rectangles. These operations do not correspond to user qualifications in SQL queries; they are administrative routines used by the access methods, internally.

In order to manage diverse support routines consistently across all PostgreSQL access methods, pg_am includes a column called amsupport. This column records the number of support routines used by an access method. For B-trees, this number is one -- the routine to take two keys and return -1, 0, or +1, depending on whether the first key is less than, equal to, or greater than the second.

Note

Strictly speaking, this routine can return a negative number (< 0), 0, or a non-zero positive number (> 0).

The amstrategies entry in pg_am is just the number of strategies defined for the access method in question. The procedures for less than, less than or equal, and so on do not appear in pg_am. Similarly, amsupport is just the number of support routines required by the access method. The actual routines are listed elsewhere.

The amorderstrategy entry tells whether the access method supports ordered scan. Zero means it does not; if it does, amorderstrategy is the number of the strategy routine that corresponds to the ordering operator. For example, btree has amorderstrategy = 1 which is the number of its "less than" strategy.

The next table of interest is pg_opclass. This table exists only to associate an operator class name and perhaps a default type with an operator class oid. Some existing opclasses are int2_ops, int4_ops, and oid_ops. You need to add a row with your opclass name (for example, complex_abs_ops) to pg_opclass. The oid of this row will be a foreign key in other tables, notably pg_amop.
INSERT INTO pg_opclass (opcname, opcdeftype)
    SELECT 'complex_abs_ops', oid FROM pg_type 
    WHERE typname = 'complex';

SELECT oid, opcname, opcdeftype
    FROM pg_opclass
    WHERE opcname = 'complex_abs_ops';

  oid   |     opcname     | opcdeftype
--------+-----------------+------------
 277975 | complex_abs_ops |     277946
(1 row)
   
Note that the oid for your pg_opclass row will be different! Do not worry about this though. The number will be obtained from the system later just like we got the oid of the type here (that is, through a query).

The above example assumes that you want to make this new opclass the default index opclass for the complex datatype. If you do not, just insert zero into opcdeftype, rather than inserting the datatype's oid:
INSERT INTO pg_opclass (opcname, opcdeftype) 
   VALUES ('complex_abs_ops', 0);
   

So now we have an access method and an operator class. We still need a set of operators. The procedure for defining operators is discussed in the Red Hat Database SQL Guide and Reference. For the complex_abs_ops operator class on Btrees, the operators we require are:
        absolute value less-than
        absolute value less-than-or-equal
        absolute value equal
        absolute value greater-than-or-equal
        absolute value greater-than
   

Suppose the code that implements the functions defined is stored in the file $PGROOT/src/tutorial/complex.c

Part of the C code looks like this: (note that we will show only the equality operator for the rest of the examples. The other four operators are very similar. Refer to complex.c or complex.source for the details.)
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)

         bool
         complex_abs_eq(Complex *a, Complex *b)
         {
             double amag = Mag(a), bmag = Mag(b);
             return (amag==bmag);
         }
   

We make the function known to PostgreSQL:
CREATE FUNCTION complex_abs_eq(complex, complex)
              RETURNS bool
              AS 'PGROOT/src/tutorial/complex.so'
              LANGUAGE 'c';
   

There are some important things to note:

Now we are ready to define the operators:
CREATE OPERATOR = (
     leftarg = complex, rightarg = complex,
     procedure = complex_abs_eq,
     restrict = eqsel, join = eqjoinsel
         )
   
The important things here are the procedure names (which are the C functions defined above) and the restriction and join selectivity functions. You should just use the selectivity functions used in the example (see complex.source). Note that there are different functions for the less-than, equal, and greater-than cases. These must be supplied, or the optimizer will be unable to make effective use of the index.

The next step is to add entries for these operators to the pg_amop relation. To do this, we will need the oids of the operators we just defined. We will look up the names of all the operators that take two complexes, and pick the ones that we want:
    SELECT o.oid AS opoid, o.oprname
     INTO TABLE complex_ops_tmp
     FROM pg_operator o, pg_type t
     WHERE o.oprleft = t.oid and o.oprright = t.oid
      and t.typname = 'complex';

 opoid  | oprname
--------+---------
 277963 | +
 277970 | <
 277971 | <=
 277972 | =
 277973 | >=
 277974 | >
(6 rows)
   
(Again, some of your oid numbers will almost certainly be different.) The operators we are interested in are those with oids 277970 through 277974. The values you get will probably be different, and you should substitute them for the values below. We will do this with a select statement.

Now we are ready to update pg_amop with our new operator class. The most important thing in this entire discussion is that the operators are ordered, from less than through greater than, in pg_amop. We add the rows we need:
    INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
        SELECT am.oid, opcl.oid, c.opoid, 1
        FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
        WHERE amname = 'btree' AND
            opcname = 'complex_abs_ops' AND
            c.oprname = '<';
   
Now do this for the other operators substituting for the "1" in the third line above and the "<" in the last line. Note the order: "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater than or equal" is 4, and "greater than" is 5.

The next step is registration of the "support routine" previously described in our discussion of pg_am. The oid of this support routine is stored in the pg_amproc table, keyed by the access method oid and the operator class oid. First, we need to register the function in PostgreSQL. (We put the C code that implements this routine in the bottom of the file in which we implemented the operator routines.):
    CREATE FUNCTION complex_abs_cmp(complex, complex)
     RETURNS int4
     AS 'PGROOT/src/tutorial/complex.so'
     LANGUAGE 'c';

    SELECT oid, proname FROM pg_proc
     WHERE proname = 'complex_abs_cmp';

  oid   |     proname
--------+-----------------
 277997 | complex_abs_cmp
(1 row)
   
(Again, your oid number will probably be different.) We can add the new row as follows:
    INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
        SELECT a.oid, b.oid, c.oid, 1
            FROM pg_am a, pg_opclass b, pg_proc c
            WHERE a.amname = 'btree' AND
                b.opcname = 'complex_abs_ops' AND
                c.proname = 'complex_abs_cmp';
   

Now that we are done it should be possible to create and use btree indexes on complex columns.