# Creating User-Defined Functions in PostgreSQL

## Introduction

In this post, I am going to share some knowledge on defining custom functions in PostgreSQL, with a special focus on the syntax of the CREATE FUNCTION command. Although the body of a user defined function can be written in many languages (SQL, PL/pgSQL, PL/Python, etc.), the knowledge of this post should generally apply to functions written in any language.

The following code has been tested with PostgreSQL 10.4 on Ubuntu 18.04.

The CREATE FUNCTION command can be used to define custom functions, and the following toy example defines a function that: 1. has a name f; 2. requires two arguments base and exponent, both of which are of type decimal; 3. returns $base^{exponent}$, also of type decimal, as the output:

CREATE FUNCTION f_exp (base decimal, exponent decimal) RETURNS decimal AS $BODY$
BEGIN
RETURN base ^ exponent;
END;
$BODY$ LANGUAGE plpgsql;

Note the body of the function is quoted by the dollar signs (i.e., ). The BODY tag between the dollar signs are optional unless nested dollar sign quotings exist in the function body. The clause LANGUAGE plpgsql following the enclosing dollar signs indicates that the body of the function is written using PL/pgSQL.

In the example above, the arguments are referenced by their names in the function body. Note that if a column has the same name as an argument, the column name will take precedence1. With the older approach, arguments are referenced using positional parameters $n (e.g., $1 refers to the first argument), which is safer but less readable.

The above function can be called with the SELECT command just like other built-in functions, using any of the three notations2:

SELECT f_exp(2, 3); -- positional notation
SELECT f_exp(exponent => 3, base => 2); -- named notation
SELECT f_exp(2, exponent => 3); -- mixed notation (named arguments cannot precede positional arguments)

## More on the input

A function can have an arbitrary number of arguments (e.g., 0, 1, 2, …) as input. If there are more than two arguments, each should be separated by a comma. The arguments need not to be named (in which case they should be referenced by positional parameters in the function body), but must has a type definition.

Each input parameter can have a default value, which can be defined with the DEFAULT keyword following the data type definition (e.g., base decimal DEFAULT 2).

## More on the output

In the example above, the function returns a scalar of type decimal. If the output is a vector, then the SETOF keyword should be placed before the output type. In the example above, the statement would be SETOF decimal. When a function is called for it side effect, it is not necessary to return any result. In this case, the function an be defined with RETURN VOID.

Other than base types like decimal, the input or output can also be of a composite type. A composite type can be created with the CREATE TYPE command, and then be used in the function definition just like base types. However, this approach is quite tedious, and it creates additional data types to be stored in the system3.

An alternative approach is to define the function with output parameters using the OUT keyword, which creates an anonymous composite type, as in this example:

CREATE FUNCTION f_exp (base decimal, exponent decimal, OUT formula text, OUT result decimal) RETURNS record AS $BODY$
BEGIN
formula := base::text || '^' || exponent::text;
result := base ^ exponent;
RETURN;
END;
$BODY$ LANGUAGE plpgsql;

SELECT * FROM f_exp(2, 3); -- Or equivalently: SELECT (f_exp(2, 3)).*;

Note that the output parameters should be assigned with values to be returned inside the function body, before the RETURN statement.

The above approach only returns one row of the result. If there are multiple rows to return, define the function with RETURNS SETOF record or, use the more SQL-compliant way (i.e., RETURNS TABLE ()) shown below:

CREATE FUNCTION f_cum_sum(sum_from integer, sum_to integer) RETURNS TABLE (n integer, cum_sum integer) AS $BODY$
BEGIN
cum_sum := 0;
FOR i in sum_from..sum_to LOOP
n := i;
cum_sum := cum_sum + i;
RETURN NEXT;
END LOOP;
END;
$BODY$ LANGUAGE plpgsql;

SELECT * FROM f_cum_sum(1, 2);

## Other considerations

1. Dropping a function. A function can be dropped using the DROP FUNCTION command. Besides the function name, data types of the input parameters also need to be specified. For example, to drop f_cum_sum() defined previously, use4:
DROP FUNCTION f_cum_sum(integer, integer);
1. Replacing a function. More often than not, developing a function requires many trials. During the process of development, it is more convenient to use the CREATE OR REPLACE FUNCTION command, which saves the trouble of dropping a function every time it needs to be modified. However, note that the replacing function should have the same input and output data types as the function to be replaced, otherwise the latter must be dropped manually.

2. Parameter mode. When defining a function, the mode of a parameter can be marked as IN (the default), OUT, INOUT, or VARIADIC. An IN parameter serves as an input parameter, while an OUT parameter serves as an output parameter. Previous examples have shown the usage of these two kinds of parameters (the IN keyword can be omitted). There are two additional kinds of modes: 1. an INOUT parameter serves as both an input parameter and an output parameter; 2. a VARIADIC parameter serves an input parameter that stores an arbitrary number of arguments.

1. In this case, the argument can be qualified by the function name; furthermore, if it again conflicts with a qualified column name, a table alias can be used within the SQL command.

2. Named notations are not supported by earlier versions of PostgreSQL (e.g., version 8.* and below).

3. It is useful to know that, whenever a new table is created, a composite type with the same name are also created and can be used to define the data type of input or output parameters.

4. Note: multiple functions can be dropped in one command; just separate each function with a comma.