Functions written in C can be compiled into dynamically loadable objects (also called shared libraries), and used to implement user-defined SQL functions. The first time a user-defined function in a particular loadable object file is called in a backend session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION for a user-defined function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name.
Note: After it is used for the first time, a dynamically loaded user function is retained in memory, and future calls to the function in the same session will only incur the small overhead of a symbol table lookup.
The string that specifies the object file (the first string in the AS clause) should be the full path of the object code file for the function, bracketed by single quote marks. If a link symbol is given in the AS clause, the link symbol should also be bracketed by single quote marks, and should be exactly the same as the name of the function in the C source code. On Unix systems the command nm will print all of the link symbols in a dynamically loadable object.
Note: Postgres will not compile a function automatically; it must be compiled before it is used in a CREATE FUNCTION command. See below for additional information.
Two different calling conventions are currently used for C functions. The newer "version 1" calling convention is indicated by writing a PG_FUNCTION_INFO_V1() macro call for the function, as illustrated below. Lack of such a macro indicates an old-style ("version 0") function. The language name specified in CREATE FUNCTION is 'C' in either case. Old-style functions are now deprecated because of portability problems and lack of functionality, but they are still supported for compatibility reasons.
The following table gives the C type required for parameters in the C functions that will be loaded into Postgres. The "Defined In" column gives the actual header file (in the .../src/backend/ directory) that the equivalent C type is defined. Note that you should always include postgres.h first, and that in turn includes c.h.
Table 13-1. Equivalent C Types for Built-In Postgres Types
Built-In Type | C Type | Defined In |
---|---|---|
abstime | AbsoluteTime | utils/nabstime.h |
bool | bool | include/c.h |
box | (BOX *) | utils/geo-decls.h |
bytea | (bytea *) | include/postgres.h |
"char" | char | N/A |
cid | CID | include/postgres.h |
datetime | (DateTime *) | include/c.h or include/postgres.h |
int2 | int2 or int16 | include/postgres.h |
int2vector | (int2vector *) | include/postgres.h |
int4 | int4 or int32 | include/postgres.h |
float4 | (float4 *) | include/c.h or include/postgres.h |
float8 | (float8 *) | include/c.h or include/postgres.h |
lseg | (LSEG *) | include/geo-decls.h |
name | (Name) | include/postgres.h |
oid | oid | include/postgres.h |
oidvector | (oidvector *) | include/postgres.h |
path | (PATH *) | utils/geo-decls.h |
point | (POINT *) | utils/geo-decls.h |
regproc | regproc or REGPROC | include/postgres.h |
reltime | RelativeTime | utils/nabstime.h |
text | (text *) | include/postgres.h |
tid | ItemPointer | storage/itemptr.h |
timespan | (TimeSpan *) | include/c.h or include/postgres.h |
tinterval | TimeInterval | utils/nabstime.h |
xid | (XID *) | include/postgres.h |
Internally, Postgres regards a base type as a "blob of memory." The user-defined functions that you define over a type in turn define the way that Postgres can operate on it. That is, Postgres will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data. Base types can have one of three internal formats:
pass by value, fixed-length
pass by reference, fixed-length
pass by reference, variable-length
By-value types can only be 1, 2 or 4 bytes in length (also 8 bytes, if sizeof(Datum) is 8 on your machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long type is dangerous because it is 4 bytes on some machines and 8 bytes on others, whereas int type is 4 bytes on most Unix machines (though not on most personal computers). A reasonable implementation of the int4 type on Unix machines might be:
/* 4-byte integer, passed by value */ typedef int int4;
On the other hand, fixed-length types of any size may be passed by-reference. For example, here is a sample implementation of a Postgres type:
/* 16-byte structure, passed by reference */ typedef struct { double x, y; } Point;
Only pointers to such types can be used when passing them in and out of Postgres functions. To return a value of such a type, allocate the right amount of memory with palloc(), fill in the allocated memory, and return a pointer to it. (Alternatively, you can return an input value of the same type by returning its pointer. Never modify the contents of a pass-by-reference input value, however.)
Finally, all variable-length types must also be passed by reference. All variable-length types must begin with a length field of exactly 4 bytes, and all data to be stored within that type must be located in the memory immediately following that length field. The length field is the total length of the structure (i.e., it includes the size of the length field itself). We can define the text type as follows:
typedef struct { int4 length; char data[1]; } text;
Obviously, the data field shown here is not long enough to hold all possible strings; it's impossible to declare such a structure in C. When manipulating variable-length types, we must be careful to allocate the correct amount of memory and initialize the length field. For example, if we wanted to store 40 bytes in a text structure, we might use a code fragment like this:
#include "postgres.h" ... char buffer[40]; /* our source data */ ... text *destination = (text *) palloc(VARHDRSZ + 40); destination->length = VARHDRSZ + 40; memmove(destination->data, buffer, 40); ...
Now that we've gone over all of the possible structures for base types, we can show some examples of real functions.
We present the "old style" calling convention first --- although this approach is now deprecated, it's easier to get a handle on initially. In the version-0 method, the arguments and result of the C function are just declared in normal C style, but being careful to use the C representation of each SQL data type as shown above.
Here are some examples:
#include "postgres.h" #include <string.h> /* By Value */ int add_one(int arg) { return arg + 1; } /* By Reference, Fixed Length */ float8 * add_one_float8(float8 *arg) { float8 *result = (float8 *) palloc(sizeof(float8)); *result = *arg + 1.0; return result; } Point * makepoint(Point *pointx, Point *pointy) { Point *new_point = (Point *) palloc(sizeof(Point)); new_point->x = pointx->x; new_point->y = pointy->y; return new_point; } /* By Reference, Variable Length */ text * copytext(text *t) { /* * VARSIZE is the total size of the struct in bytes. */ text *new_t = (text *) palloc(VARSIZE(t)); VARATT_SIZEP(new_t) = VARSIZE(t); /* * VARDATA is a pointer to the data region of the struct. */ memcpy((void *) VARDATA(new_t), /* destination */ (void *) VARDATA(t), /* source */ VARSIZE(t)-VARHDRSZ); /* how many bytes */ return new_t; } text * concat_text(text *arg1, text *arg2) { int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ; text *new_text = (text *) palloc(new_text_size); VARATT_SIZEP(new_text) = new_text_size; memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ); memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ), VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ); return new_text; }
Supposing that the above code has been prepared in file funcs.c and compiled into a shared object, we could define the functions to Postgres with commands like this:
CREATE FUNCTION add_one(int4) RETURNS int4 AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c' WITH (isStrict); -- note overloading of SQL function name add_one() CREATE FUNCTION add_one(float8) RETURNS float8 AS 'PGROOT/tutorial/funcs.so', 'add_one_float8' LANGUAGE 'c' WITH (isStrict); CREATE FUNCTION makepoint(point, point) RETURNS point AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c' WITH (isStrict); CREATE FUNCTION copytext(text) RETURNS text AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c' WITH (isStrict); CREATE FUNCTION concat_text(text, text) RETURNS text AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c' WITH (isStrict);
Here PGROOT stands for the full path to the Postgres source tree. Note that depending on your system, the filename for a shared object might not end in .so, but in .sl or something else; adapt accordingly.
Notice that we have specified the functions as "strict", meaning that the system should automatically assume a NULL result if any input value is NULL. By doing this, we avoid having to check for NULL inputs in the function code. Without this, we'd have to check for NULLs explicitly, for example by checking for a null pointer for each pass-by-reference argument. (For pass-by-value arguments, we don't even have a way to check!)
Although this calling convention is simple to use, it is not very portable; on some architectures there are problems with passing smaller-than-int data types this way. Also, there is no simple way to return a NULL result, nor to cope with NULL arguments in any way other than making the function strict. The version-1 convention, presented next, overcomes these objections.
The version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always
Datum funcname(PG_FUNCTION_ARGS)In addition, the macro call
PG_FUNCTION_INFO_V1(funcname);must appear in the same source file (conventionally it's written just before the function itself). This macro call is not needed for "internal"-language functions, since Postgres currently assumes all internal functions are version-1. However, it is required for dynamically-loaded functions.
In a version-1 function, each actual argument is fetched using a PG_GETARG_xxx() macro that corresponds to the argument's datatype, and the result is returned using a PG_RETURN_xxx() macro for the return type.
Here we show the same functions as above, coded in version-1 style:
#include "postgres.h" #include <string.h> #include "fmgr.h" /* By Value */ PG_FUNCTION_INFO_V1(add_one); Datum add_one(PG_FUNCTION_ARGS) { int32 arg = PG_GETARG_INT32(0); PG_RETURN_INT32(arg + 1); } /* By Reference, Fixed Length */ PG_FUNCTION_INFO_V1(add_one_float8); Datum add_one_float8(PG_FUNCTION_ARGS) { /* The macros for FLOAT8 hide its pass-by-reference nature */ float8 arg = PG_GETARG_FLOAT8(0); PG_RETURN_FLOAT8(arg + 1.0); } PG_FUNCTION_INFO_V1(makepoint); Datum makepoint(PG_FUNCTION_ARGS) { /* Here, the pass-by-reference nature of Point is not hidden */ Point *pointx = PG_GETARG_POINT_P(0); Point *pointy = PG_GETARG_POINT_P(1); Point *new_point = (Point *) palloc(sizeof(Point)); new_point->x = pointx->x; new_point->y = pointy->y; PG_RETURN_POINT_P(new_point); } /* By Reference, Variable Length */ PG_FUNCTION_INFO_V1(copytext); Datum copytext(PG_FUNCTION_ARGS) { text *t = PG_GETARG_TEXT_P(0); /* * VARSIZE is the total size of the struct in bytes. */ text *new_t = (text *) palloc(VARSIZE(t)); VARATT_SIZEP(new_t) = VARSIZE(t); /* * VARDATA is a pointer to the data region of the struct. */ memcpy((void *) VARDATA(new_t), /* destination */ (void *) VARDATA(t), /* source */ VARSIZE(t)-VARHDRSZ); /* how many bytes */ PG_RETURN_TEXT_P(new_t); } PG_FUNCTION_INFO_V1(concat_text); Datum concat_text(PG_FUNCTION_ARGS) { text *arg1 = PG_GETARG_TEXT_P(0); text *arg2 = PG_GETARG_TEXT_P(1); int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ; text *new_text = (text *) palloc(new_text_size); VARATT_SIZEP(new_text) = new_text_size; memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ); memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ), VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ); PG_RETURN_TEXT_P(new_text); }
The CREATE FUNCTION commands are the same as for the version-0 equivalents.
At first glance, the version-1 coding conventions may appear to be just pointless obscurantism. However, they do offer a number of improvements, because the macros can hide unnecessary detail. An example is that in coding add_one_float8, we no longer need to be aware that float8 is a pass-by-reference type. Another example is that the GETARG macros for variable-length types hide the need to deal with fetching "toasted" (compressed or out-of-line) values. The old-style copytext and concat_text functions shown above are actually wrong in the presence of toasted values, because they don't call pg_detoast_datum() on their inputs. (The handler for old-style dynamically-loaded functions currently takes care of this detail, but it does so less efficiently than is possible for a version-1 function.)
One big improvement in version-1 functions is better handling of NULL inputs and results. The macro PG_ARGISNULL(n) allows a function to test whether each input is NULL (of course, doing this is only necessary in functions not declared "strict"). As with the PG_GETARG_xxx() macros, the input arguments are counted beginning at zero. To return a NULL result, execute PG_RETURN_NULL(); this works in both strict and non-strict functions.
The version-1 function call conventions make it possible to return "set" results and implement trigger functions and procedural-language call handlers. Version-1 code is also more portable than version-0, because it does not break ANSI C restrictions on function call protocol. For more details see src/backend/utils/fmgr/README in the source distribution.
Composite types do not have a fixed layout like C structures. Instances of a composite type may contain null fields. In addition, composite types that are part of an inheritance hierarchy may have different fields than other members of the same inheritance hierarchy. Therefore, Postgres provides a procedural interface for accessing fields of composite types from C. As Postgres processes a set of rows, each row will be passed into your function as an opaque structure of type TUPLE. Suppose we want to write a function to answer the query
SELECT name, c_overpaid(emp, 1500) AS overpaid FROM emp WHERE name = 'Bill' OR name = 'Sam';In the query above, we can define c_overpaid as:
#include "postgres.h" #include "executor/executor.h" /* for GetAttributeByName() */ bool c_overpaid(TupleTableSlot *t, /* the current row of EMP */ int32 limit) { bool isnull; int32 salary; salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull)); if (isnull) return (false); return salary > limit; } /* In version-1 coding, the above would look like this: */ PG_FUNCTION_INFO_V1(c_overpaid); Datum c_overpaid(PG_FUNCTION_ARGS) { TupleTableSlot *t = (TupleTableSlot *) PG_GETARG_POINTER(0); int32 limit = PG_GETARG_INT32(1); bool isnull; int32 salary; salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull)); if (isnull) PG_RETURN_BOOL(false); /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */ PG_RETURN_BOOL(salary > limit); }
GetAttributeByName is the Postgres system function that returns attributes out of the current row. It has three arguments: the argument of type TupleTableSlot* passed into the function, the name of the desired attribute, and a return parameter that tells whether the attribute is null. GetAttributeByName returns a Datum value that you can convert to the proper datatype by using the appropriate DatumGetXXX() macro.
The following query lets Postgres know about the c_overpaid function:
CREATE FUNCTION c_overpaid(emp, int4) RETURNS bool AS 'PGROOT/tutorial/obj/funcs.so' LANGUAGE 'c';
While there are ways to construct new rows or modify existing rows from within a C function, these are far too complex to discuss in this manual.
We now turn to the more difficult task of writing programming language functions. Be warned: this section of the manual will not make you a programmer. You must have a good understanding of C (including the use of pointers and the malloc memory manager) before trying to write C functions for use with Postgres. While it may be possible to load functions written in languages other than C into Postgres, this is often difficult (when it is possible at all) because other languages, such as FORTRAN and Pascal often do not follow the same calling convention as C. That is, other languages do not pass argument and return values between functions in the same way. For this reason, we will assume that your programming language functions are written in C.
The basic rules for building C functions are as follows:
The relevant header (include) files are installed under /usr/local/pgsql/include or equivalent. You can use pg_config --includedir to find out where it is on your system (or the system that your users will be running on). For very low-level work you might need to have a complete PostgreSQL source tree available.
When allocating memory, use the Postgres routines palloc and pfree instead of the corresponding C library routines malloc and free. The memory allocated by palloc will be freed automatically at the end of each transaction, preventing memory leaks.
Always zero the bytes of your structures using memset or bzero. Several routines (such as the hash access method, hash join and the sort algorithm) compute functions of the raw bits contained in your structure. Even if you initialize all fields of your structure, there may be several bytes of alignment padding (holes in the structure) that may contain garbage values.
Most of the internal Postgres types are declared in postgres.h, while the function manager interfaces (PG_FUNCTION_ARGS, etc.) are in fmgr.h, so you will need to include at least these two files. For portability reasons it's best to include postgres.h first, before any other system or user header files. Including postgres.h will also include c.h, elog.h and palloc.h for you.
Symbol names defined within object files must not conflict with each other or with symbols defined in the PostgreSQL server executable. You will have to rename your functions or variables if you get error messages to this effect.
Compiling and linking your object code so that it can be dynamically loaded into Postgres always requires special flags. See Section 13.4.6 for a detailed explanation of how to do it for your particular operating system.
Before you are able to use your PostgreSQL extension function written in C they need to be compiled and linked in a special way in order to allow it to be dynamically loaded as needed by the server. To be precise, a shared library needs to be created.
For more information you should read the documentation of your operating system, in particular the manual pages for the C compiler, cc, and the link editor, ld. In addition, the PostgreSQL source code contains several working examples in the contrib directory. If you rely on these examples you will make your modules dependent on the availability of the PostgreSQL source code, however.
Creating shared libraries is generally analoguous to linking executables: first the source files are compiled into object files, then the object files are linked together. The object files need to be created as position-independent code (PIC), which conceptually means that they can be placed at an arbitrary location in memory when they are loaded by the executable. (Object files intended for executables are not compiled that way.) The command to link a shared library contains special flags to distinguish it from linking an executable. --- At least this is the theory. On some systems the practice is much uglier.
In the following examples we assume that your source code is in a file foo.c and we will create an shared library foo.so. The intermediate object file will be called foo.o unless otherwise noted. A shared library can contain more than one object file, but we only use one here.
The compiler flag to create PIC is -fpic. The linker flag to create shared libraries is -shared.
gcc -fpic -c foo.c ld -shared -o foo.so foo.oThis is applicable as of version 4.0 of BSD/OS.
The compiler flag to create PIC is -fpic. To create shared libraries the compiler flag is -shared.
gcc -fpic -c foo.c gcc -shared -o foo.so foo.oThis is applicable as of version 3.0 of FreeBSD.
The compiler flag of the system compiler to create PIC is +z. When using GCC it's -fpic. The linker flag for shared libraries is -b. So
cc +z -c foo.cor
gcc -fpic -c foo.cand then
ld -b -o foo.sl foo.oHP-UX uses the extension .sl for shared libraries, unlike most other systems.
PIC is the default, no special compiler options are necessary. The linker option to produce shared libraries is -shared.
cc -c foo.c ld -shared -o foo.so foo.o
The compiler flag to create PIC is -fpic. On some platforms in some situations -fPIC must be used if -fpic does not work. Refer to the GCC manual for more information. The compiler flag to create a shared library is -shared. A complete example looks like this:
cc -fpic -c foo.c cc -shared -o foo.so foo.o
The compiler flag to create PIC is -fpic. For ELF systems, the compiler with the flag -shared is used to link shared libraries. On the older non-ELF systems, ld -Bshareable is used.
gcc -fpic -c foo.c gcc -shared -o foo.so foo.o
The compiler flag to create PIC is -fpic. ld -Bshareable is used to link shared libraries.
gcc -fpic -c foo.c ld -Bshareable -o foo.so foo.o
PIC is the default, so the compilation command is the usual one. ld with special options is used to do the linking:
cc -c foo.c ld -shared -expect_unresolved '*' -o foo.so foo.oThe same procedure is used with GCC instead of the system compiler; no special options are required.
The compiler flag to create PIC is -KPIC with the Sun compiler and -fpic with GCC. To link shared libraries, the compiler option is -G with either compiler or alternatively -shared with GCC.
cc -KPIC -c foo.c cc -G -o foo.so foo.oor
gcc -fpic -c foo.c gcc -G -o foo.so foo.o
The compiler flag to create PIC is -K PIC with the SCO compiler and -fpic with GCC. To link shared libraries, the compiler option is -G with the SCO compiler and -shared with GCC.
cc -K PIC -c foo.c cc -G -o foo.so foo.oor
gcc -fpic -c foo.c gcc -shared -o foo.so foo.o
Tip: If you want to package your extension modules for wide distribution you should consider using GNU Libtool for building shared libraries. It encapsulates the platform differences into a general and powerful interface. Serious packaging also requires considerations about library versioning, symbol resolution methods, and other issues.
The resulting shared library file can then be loaded into Postgres. When specifying the file name to the CREATE FUNCTION command, one must give it the name of the shared library file (ending in .so) rather than the simple object file.
Note: Actually, Postgres does not care what you name the file as long as it is a shared library file.