Chapter 10. Compiled (C) Language Functions

Functions written in C can be compiled into dynamically loadable objects (also called shared libraries), and used to implement user-defined SQL functions. The first time a user-defined function in a particular loadable object file is called in a backend session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION command with the "LANGUAGE 'C'" clause is used to register a C function so that it can be invoked (or referenced) in other SQL commands. (Refer to the Red Hat Database SQL Guide and Reference for the syntax and other information.)

Calling Conventions for C-Language Functions

The calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a function is always
Datum funcname(PG_FUNCTION_ARGS)
In addition, the macro call
PG_FUNCTION_INFO_V1(funcname);
must appear in the same source file (conventionally it is written just before the function itself). This macro call is not needed for "internal"-language functions. However, it is required for dynamically-loaded functions.

Note

Two different calling conventions are currently accepted for C functions. The newer "version 1" calling convention is indicated by writing a PG_FUNCTION_INFO_V1() macro call for the function, as illustrated above. Lack of such a macro indicates an old-style ("version 0") function. The language name specified in CREATE FUNCTION is 'C' in either case. Old-style functions are now deprecated because of portability problems and lack of functionality, but they are still supported for compatibility reasons.

In a function, each actual argument is fetched using a PG_GETARG_xxx() macro that corresponds to the argument's datatype, and the result is returned using a PG_RETURN_xxx() macro for the return type.

Here are some examples:
#include "postgres.h"
#include <string.h>
#include "utils/geo_decls.h"
#include "fmgr.h"

/* By Value */

PG_FUNCTION_INFO_V1(add_one);
         
Datum
add_one(PG_FUNCTION_ARGS)
{
    int32   arg = PG_GETARG_INT32(0);

    PG_RETURN_INT32(arg + 1);
}

/* By Reference, Fixed Length */

PG_FUNCTION_INFO_V1(add_one_float8);

Datum
add_one_float8(PG_FUNCTION_ARGS)
{
    /* The macros for FLOAT8 hide 
     * its pass-by-reference nature 
     */
    float8   arg = PG_GETARG_FLOAT8(0);

    PG_RETURN_FLOAT8(arg + 1.0);
}

PG_FUNCTION_INFO_V1(makepoint);

Datum
makepoint(PG_FUNCTION_ARGS)
{
    /* Here, the pass-by-reference nature
     * of Point is not hidden            
     */
    Point  *pointx = PG_GETARG_POINT_P(0);
    Point  *pointy = PG_GETARG_POINT_P(1);
    Point  *new_point = (Point *) palloc(sizeof(Point));

    new_point->x = pointx->x;
    new_point->y = pointy->y;
       
    PG_RETURN_POINT_P(new_point);
}

/* By Reference, Variable Length */

PG_FUNCTION_INFO_V1(copytext);

Datum
copytext(PG_FUNCTION_ARGS)
{
    text *t = PG_GETARG_TEXT_P(0);
    /*
     * VARSIZE is the total size 
     *  of the struct in bytes.
     */
    text *new_t = (text *) palloc(VARSIZE(t));
    VARATT_SIZEP(new_t) = VARSIZE(t);
    /*
     * VARDATA is a pointer to the 
     * data region of the struct.
     */
    memcpy((void *) VARDATA(new_t), /* destination */
           (void *) VARDATA(t),     /* source */
           VARSIZE(t)-VARHDRSZ);    /* how many bytes */
    PG_RETURN_TEXT_P(new_t);
}

PG_FUNCTION_INFO_V1(concat_text);

Datum
concat_text(PG_FUNCTION_ARGS)
{
    text  *arg1 = PG_GETARG_TEXT_P(0);
    text  *arg2 = PG_GETARG_TEXT_P(1);
    int32 new_text_size = VARSIZE(arg1) + 
       VARSIZE(arg2) - VARHDRSZ;
    text *new_text = (text *) palloc(new_text_size);

    VARATT_SIZEP(new_text) = new_text_size;
    memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
    memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
       VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
    PG_RETURN_TEXT_P(new_text);
}

Supposing that the above code has been prepared in file funcs_new.c and compiled into a shared object, we could define the functions to PostgreSQL with commands like this:
CREATE FUNCTION add_one(int4) RETURNS int4
     AS 'PGROOT/tutorial/funcs_new.so' LANGUAGE 'c'
     WITH (isStrict);

-- note overloading of SQL function name add_one()
CREATE FUNCTION add_one(float8) RETURNS float8
     AS 'PGROOT/tutorial/funcs_new.so',
        'add_one_float8'
     LANGUAGE 'c' WITH (isStrict);

CREATE FUNCTION makepoint(point, point) RETURNS point
     AS 'PGROOT/tutorial/funcs_new.so' LANGUAGE 'c'
     WITH (isStrict);
                         
CREATE FUNCTION copytext(text) RETURNS text
     AS 'PGROOT/tutorial/funcs_new.so' LANGUAGE 'c'
     WITH (isStrict);

CREATE FUNCTION concat_text(text, text) RETURNS text
     AS 'PGROOT/tutorial/funcs_new.so' LANGUAGE 'c'
     WITH (isStrict);

Here PGROOT stands for the full path to the Red Hat Database source tree. Note that depending on your system, the filename for a shared object might not end in .so, but in .sl or something else; adapt accordingly.

Notice that we have specified the functions as "strict", meaning that the system should automatically assume a NULL result if any input value is NULL. By doing this, we avoid having to check for NULL inputs in the function code. Without this, we would have to check for NULLs explicitly.

At first glance, the coding conventions may appear to be just pointless obscurantism. However, they do offer a number of improvements, because the macros can hide unnecessary detail. An example is that in coding add_one_float8, we do not need to be aware that float8 is a pass-by-reference type. Another example is that the GETARG macros for variable-length types hide the need to deal with fetching "toasted" (compressed or out-of-line) values.

The macro PG_ARGISNULL(n) allows a function to test whether each input is NULL (of course, doing this is necessary only in functions not declared "strict"). As with the PG_GETARG_xxx() macros, the input arguments are counted beginning at zero. To return a NULL result, execute PG_RETURN_NULL(); this works in both strict and non-strict functions.

The function call conventions make it possible to return "set" results and implement trigger functions and procedural-language call handlers. The code is also portable because it does not break ANSI C restrictions on function call protocol.

Dynamic Loading

The first time a user-defined function in a particular loadable object file is called in a backend session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION for a user-defined C function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name.

The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command:

  1. If the name is an absolute path, the given file is loaded.

  2. If the name starts with the string $libdir, that part is replaced by the PostgreSQL package library directory name, which is determined at build time.

  3. If the name does not contain a directory part, the file is searched for in the path specified by the configuration variable dynamic_library_path.

  4. Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)

If this sequence does not work, the platform-specific shared library file name extension (often .so) is appended to the given name and this sequence is tried again. If that fails as well, the load will fail.

Note

The user ID the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the "postgres" user is a common mistake.

In any case, the file name that is given in the CREATE FUNCTION command is recorded literally in the system catalogs, so if the file needs to be loaded again the same procedure is applied.

Note

PostgreSQL will not compile a C function automatically. The object file must be compiled before it is referenced in a CREATE FUNCTION command. See the Section called Compiling and Linking Dynamically-Loaded Functions for additional information.

After it is used for the first time, a dynamically loaded object file is retained in memory. Future calls in the same session to the functions in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, use the LOAD command or begin a fresh session.

It is recommended that you locate shared libraries either relative to $libdir or through the dynamic library path. This simplifies version upgrades if the new installation is at a different location. The actual directory that $libdir stands for can be found out with the command pg_config --pkglibdir.

Note

Before PostgreSQL release 7.2, only exact absolute paths to object files could be specified in CREATE FUNCTION. This approach is now deprecated since it makes the function definition unnecessarily unportable. It is best to specify just the shared library name with no path nor extension, and let the search mechanism provide that information instead.