Defines a new function.
CREATE [OR REPLACE] FUNCTION <name>
( [ [<argmode>] [<argname>] <argtype> [ { DEFAULT | = } <default_expr> ] [, ...] ] )
[ RETURNS <rettype>
| RETURNS TABLE ( <column_name> <column_type> [, ...] ) ]
{ LANGUAGE <lang_name>
| TRANSFORM { FOR TYPE <type_name> } [, ... ]
| WINDOW
| { IMMUTABLE | STABLE | VOLATILE }
| [ NOT ] LEAKPROOF
| { CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT }
| { [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER }
| EXECUTE ON { ANY | COORDINATOR | ALL SEGMENTS | INITPLAN }
| PARALLEL { UNSAFE | RESTRICTED | SAFE }
| COST <execution_cost>
| ROWS <result_rows>
| SUPPORT <support_function>
| SET <configuration_parameter> { TO <value> | = <value> | FROM CURRENT }
| AS '<definition>'
| AS '<obj_file>', '<link_symbol>'
[ WITH ({ DESCRIBE = <describe_function> } [, ...] ) ]
} ...
CREATE FUNCTION
defines a new function. CREATE OR REPLACE FUNCTION
either creates a new function, or replaces an existing definition. To define a function, the user must have the USAGE
privilege on the language.
If a schema name is included, then the function is created in the specified schema. Otherwise it is created in the current schema. The name of the new function must not match any existing function with the same input argument types in the same schema. However, functions of different argument types may share a name (overloading).
To update the current definition of an existing function, use CREATE OR REPLACE FUNCTION
. It is not possible to change the name or argument types of a function this way (this would actually create a new, distinct function). Also, CREATE OR REPLACE FUNCTION
will not let you change the return type of an existing function. To do that, you must drop and recreate the function. (When using OUT
parameters, that means you cannot change the types of any OUT
parameters except by dropping the function.)
When CREATE OR REPLACE FUNCTION
is used to replace an existing function, the ownership and permissions of the function do not change. All other function properties are assigned the values specified or implied in the command. You must own the function to replace it (this includes being a member of the owning role).
If you drop and then recreate a function, the new function is not the same entity as the old; you will have to drop existing objects (rules, views, triggers, and so on) that refer to the old function. Use CREATE OR REPLACE FUNCTION
to change a function definition without breaking objects that refer to the function. Also, ALTER FUNCTION can be used to change most of the auxiliary properties of an existing function.
The user that creates the function becomes the owner of the function.
To be able to create a function, you must have USAGE
privilege on the argument types and the return type.
For more information about creating functions, refer to the User Defined Functions section of the PostgreSQL documentation.
Limited Use of VOLATILE and STABLE Functions
To prevent data from becoming out-of-sync across the segments in Greenplum Database, any function classified as STABLE
or VOLATILE
cannot be run at the segment level if it contains SQL or modifies the database in any way. For example, functions such as random()
or timeofday()
are not allowed to run on distributed data in Greenplum Database because they could potentially cause inconsistent data between the segment instances.
To ensure data consistency, VOLATILE
and STABLE
functions can safely be used in statements that are evaluated on and run from the coordinator. For example, the following statements are always run on the coordinator (statements without a FROM
clause):
SELECT setval('myseq', 201);
SELECT foo();
In cases where a statement has a FROM
clause containing a distributed table and the function used in the FROM
clause simply returns a set of rows, execution may be allowed on the segments:
SELECT * FROM foo();
One exception to this rule are functions that return a table reference (rangeFuncs
) or functions that use the refCursor
data type. Note that you cannot return a refcursor
from any kind of function in Greenplum Database.
Function Volatility and EXECUTE ON Attributes
Volatility attributes (IMMUTABLE
, STABLE
, VOLATILE
) and EXECUTE ON
attributes specify two different aspects of function execution. In general, volatility indicates when the function is run, and EXECUTE ON
indicates where it is run.
For example, a function defined with the IMMUTABLE
attribute can be run at query planning time, while a function with the VOLATILE
attribute must be run for every row in the query. A function with the EXECUTE ON COORDINATOR
attribute is run only on the coordinator segment and a function with the EXECUTE ON ALL SEGMENTS
attribute is run on all primary segment instances (not the coordinator).
See Using Functions and Operators in the Greenplum Database Administrator Guide.
Functions And Replicated Tables
A user-defined function that runs only SELECT
commands on replicated tables can run on segments. Replicated tables, created with the DISTRIBUTED REPLICATED
clause, store all of their rows on every segment. It is safe for a function to read them on the segments, but updates to replicated tables must run on the coordinator instance.
IN
,
OUT
,
INOUT
, or
VARIADIC
. If omitted, the default is
IN
. Only
OUT
arguments can follow an argument declared as
VARIADIC
. Also,
OUT
and
INOUT
arguments cannot be used together with the
RETURNS TABLE
notation.
The data type(s) of the function's arguments (optionally schema-qualified), if any. The argument types may be base, composite, or domain types, or may reference the type of a table column.
Depending on the implementation language it may also be allowed to specify pseudotypes such as cstring
. Pseudotypes indicate that the actual argument type is either incompletely specified, or outside the set of ordinary SQL data types.
table_name.column_name%TYPE
. Using this feature can sometimes help make a function independent of changes to the definition of a table.
INOUT
) parameters can have a default value. Each input parameter in the argument list that follows a parameter with a default value must have a default value as well.
The return data type (optionally schema-qualified). The return type can be a base, composite, or domain type, or may reference the type of a table column. Depending on the implementation language it may also be allowed to specify pseudotypes such as cstring
. If the function is not supposed to return a value, specify void
as the return type.
When there are OUT
or INOUT
parameters, the RETURNS
clause may be omitted. If present, it must agree with the result type implied by the output parameters: RECORD
if there are multiple output parameters, or the same type as the single output parameter.
The SETOF
modifier indicates that the function will return a set of items, rather than a single item.
table_name.column_name%TYPE
.
RETURNS TABLE
syntax. This is effectively another way of declaring a named
OUT
parameter, except that
RETURNS TABLE
also implies
RETURNS SETOF
.
RETURNS TABLE
syntax.
SQL
,
C
,
internal
, or the name of a user-defined procedural language, e.g.
plpgsql
. Enclosing the name in single quotes is deprecated and requires matching case.
WINDOW
indicates that the function is a window function rather than a plain function. This is currently only useful for functions written in C. The
WINDOW
attribute cannot be changed when replacing an existing function definition.
These attributes inform the query optimizer about the behavior of the function. At most one choice may be specified. If none of these appear, VOLATILE
is the default assumption. Since Greenplum Database currently has limited use of VOLATILE
functions, if a function is truly IMMUTABLE
, you must declare it as so to be able to use it without restrictions.
IMMUTABLE
indicates that the function cannot modify the database and always returns the same result when given the same argument values. It does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
STABLE
indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), and so on. Also note that the current_timestamp()
family of functions qualify as stable, since their values do not change within a transaction.
VOLATILE
indicates that the function value can change even within a single table scan, so no optimizations can be made. Relatively few database functions are volatile in this sense; some examples are
random()
,
timeofday()
. But note that any function that has side-effects must be classified volatile, even if its result is quite predictable, to prevent calls from being optimized away; an example is
setval()
.
LEAKPROOF
indicates that the function has no side effects. It reveals no information about its arguments other than by its return value. For example, a function that throws an error message for some argument values but not others, or that includes the argument values in any error message, is not leakproof. This affects how the system executes queries against views created with the
security_barrier
option or tables with row level security enabled. The system will enforce conditions from security policies and security barrier views before any user-supplied conditions from the query itself that contain non-leakproof functions, in order to prevent the inadvertent exposure of data. Functions and operators marked as leakproof are assumed to be trustworthy, and may be executed before conditions from security policies and security barrier views. In addition, functions which do not take arguments or which are not passed any arguments from the security barrier view or table do not have to be marked as leakproof to be executed before security conditions. See
CREATE VIEW. This option can only be set by the superuser.
CALLED ON NULL INPUT
(the default) indicates that the function will be called normally when some of its arguments are null. It is then the function author's responsibility to check for null values if necessary and respond appropriately.
RETURNS NULL ON NULL INPUT
or
STRICT
indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not run when there are null arguments; instead a null result is assumed automatically.
SECURITY INVOKER
(the default) indicates that the function is to be run with the privileges of the user that calls it.
SECURITY DEFINER
specifies that the function is to be run with the privileges of the user that created it.
EXTERNAL
is allowed for SQL conformance, but it is optional since, unlike in SQL, this feature applies to all functions not just external ones.
The EXECUTE ON
attributes specify where (coordinator or segment instance) a function runs when it is invoked during the query execution process.
EXECUTE ON ANY
(the default) indicates that the function can be run on the coordinator, or any segment instance, and it returns the same result regardless of where it is run. Greenplum Database determines where the function runs.
EXECUTE ON COORDINATOR
indicates that the function must run only on the coordinator instance.
EXECUTE ON ALL SEGMENTS
indicates that the function must run on all primary segment instances, but not the coordinator, for each invocation. The overall result of the function is the UNION ALL
of the results from all segment instances.
EXECUTE ON INITPLAN
indicates that the function contains an SQL command that dispatches queries to the segment instances and requires special processing on the coordinator instance by Greenplum Database when possible.
Note
EXECUTE ON INITPLAN
is only supported in functions that are used in theFROM
clause of aCREATE TABLE AS
orINSERT
command such as theget_data()
function in these commands.
CREATE TABLE t AS SELECT * FROM get_data();
INSERT INTO t1 SELECT * FROM get_data();
Greenplum Database does not support the EXECUTE ON INITPLAN
attribute in a function that is used in the WITH
clause of a query, a CTE (common table expression). For example, specifying EXECUTE ON INITPLAN
in function get_data()
in this CTE is not supported.
WITH tbl_a AS (SELECT * FROM get_data() )
SELECT * from tbl_a
UNION
SELECT * FROM tbl_b;
EXECUTE ON
attributes, see
Notes.
PARALLEL UNSAFE
indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default.
PARALLEL RESTRICTED
indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader.
PARALLEL SAFE
indicates that the function is safe to run in parallel mode without restriction.
setval()
). They should be labeled as parallel restricted if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize in parallel mode (e.g.,
setseed()
cannot be executed other than by the group leader because a change made by another process would not be reflected in the leader). In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, functions should be labeled as
UNSAFE
, which is the default.
SET
clause applies a value to a session configuration parameter when the function is entered. The configuration parameter is restored to its prior value when the function exits.
SET FROM CURRENT
saves the value of the parameter that is current when
CREATE FUNCTION
is run as the value to be applied when the function is entered.
SET
clause is attached to a function, then the effects of a
SET LOCAL
command executed inside the function for the same variable are restricted to the function: the configuration parameter's prior value is still restored at function exit. However, an ordinary
SET
command (without
LOCAL
) overrides the
SET
clause, much as it would do for a previous
SET LOCAL
command: the effects of such a command will persist after function exit, unless the current transaction is rolled back.
A string constant defining the function; the meaning depends on the language. It may be an internal function name, the path to an object file, an SQL command, or text in a procedural language.
AS
clause is used for dynamically loadable C language functions when the function name in the C language source code is not the same as the name of the SQL function. The string obj_file is the name of the file containing the dynamically loadable object, and is interpreted as for the
LOAD command. The string link_symbol is the name of the function in the C language source code. If the link symbol is omitted, it is assumed to be the same as the name of the SQL function being defined. The C names of all functions must be different, so you must give overloaded SQL functions different C names (for example, use the argument types as part of the C names).
CREATE FUNCTION
calls refer to the same object file, the file is only loaded once per session. To unload and reload the file (perhaps during development), start a new session.
$libdir
(which is located at
$GPHOME/lib
) or through the dynamic library path (set by the
dynamic_library_path
server configuration parameter) will simplify version upgrades if the new installation is at a different location.
Greenplum Database allows function overloading; that is, the same name can be used for several different functions so long as they have distinct input argument types. Whether or not you use it, this capability entails security precautions when calling functions in databases where some users mistrust other users; refer to Functions in the PostgreSQL documentation for more information.
Two functions are considered the same if they have the same names and input argument types, ignoring any OUT
parameters. Thus for example these declarations conflict:
CREATE FUNCTION foo(int) ...
CREATE FUNCTION foo(int, out text) ...
Functions that have different argument type lists will not be considered to conflict at creation time, but if defaults are provided they might conflict in use. For example, consider:
CREATE FUNCTION foo(int) ...
CREATE FUNCTION foo(int, int default 42) ...
A call foo(10)
will fail due to the ambiguity about which function should be called.
Any compiled code (shared library files) for custom functions must be placed in the same location on every host in your Greenplum Database cluster (coordinator and all segments). This location must also be in the LD_LIBRARY_PATH
so that the server can locate the files. It is recommended that you locate shared libraries either relative to $libdir
(which is located at $GPHOME/lib
) or through the dynamic library path (set by the dynamic_library_path
server configuration parameter) on all coordinator segment instances in the Greenplum cluster.
The full SQL type syntax is allowed for input arguments and return value. However, parenthesized type modifiers (e.g., the precision field for type numeric
) are discarded by CREATE FUNCTION
. Thus for example CREATE FUNCTION foo (varchar(10)) ...
is exactly the same as CREATE FUNCTION foo (varchar) ...
.
When replacing an existing function with CREATE OR REPLACE FUNCTION
, there are restrictions on changing parameter names. You cannot change the name already assigned to any input parameter (although you can add names to parameters that had none before). If there is more than one output parameter, you cannot change the names of the output parameters, because that would change the column names of the anonymous composite type that describes the function's result. These restrictions are made to ensure that existing calls of the function do not stop working when it is replaced.
If a function is declared STRICT
with a VARIADIC
argument, the strictness check tests that the variadic array as a whole is non-null. The function will still be called if the array has null elements.
Using Functions with Queries on Distributed Data
In some cases, Greenplum Database does not support using functions in a query where the data in a table specified in the FROM
clause is distributed over Greenplum Database segments. As an example, this SQL query contains the function func()
:
SELECT func(a) FROM table1;
The function is not supported for use in the query if all of the following conditions are met:
table1
is distributed over Greenplum Database segments.func()
reads or modifies data from distributed tables.func()
returns more than one row or takes an argument (a
) that comes from table1
.If any of the conditions are not met, the function is supported. Specifically, the function is supported if any of the following conditions apply:
func()
does not access data from distributed tables, or accesses data that is only on the Greenplum Database coordinator.table1
is a coordinator only table.func()
returns only one row and only takes input arguments that are constant values. The function is supported if it can be changed to require no input arguments.Using EXECUTE ON attributes
Most functions that run queries to access tables can only run on the coordinator. However, functions that run only SELECT
queries on replicated tables can run on segments. If the function accesses a hash-distributed table or a randomly distributed table, the function should be defined with the EXECUTE ON COORDINATOR
attribute. Otherwise, the function might return incorrect results when the function is used in a complicated query. Without the attribute, planner optimization might determine it would be beneficial to push the function invocation to segment instances.
These are limitations for functions defined with the EXECUTE ON COORDINATOR
or EXECUTE ON ALL SEGMENTS
attribute:
FROM
clause of a query.SELECT
list of a query with a FROM
clause.The attribute EXECUTE ON INITPLAN
indicates that the function contains an SQL command that dispatches queries to the segment instances and requires special processing on the coordinator instance by Greenplum Database. When possible, Greenplum Database handles the function on the coordinator instance in the following manner.
This simple example uses the function get_data()
in a CTAS command to create a table using data from the table country
. The function contains a SELECT
command that retrieves data from the table country
and uses the EXECUTE ON INITPLAN
attribute.
CREATE TABLE country(
c_id integer, c_name text, region int)
DISTRIBUTED RANDOMLY;
INSERT INTO country VALUES (11,'INDIA', 1 ), (22,'CANADA', 2), (33,'USA', 3);
CREATE OR REPLACE FUNCTION get_data()
RETURNS TABLE (
c_id integer, c_name text
)
AS $$
SELECT
c.c_id, c.c_name
FROM
country c;
$$
LANGUAGE SQL EXECUTE ON INITPLAN;
CREATE TABLE t AS SELECT * FROM get_data() DISTRIBUTED RANDOMLY;
If you view the query plan of the CTAS command with EXPLAIN ANALYZE VERBOSE
, the plan shows that the function is run as part of an InitPlan node, and one of the listed slices is labeled as entry db
. The query plan of a simple CTAS command without the function does not have an InitPlan node or an entry db
slice.
If the function did not contain the EXECUTE ON INITPLAN
attribute, the CTAS command returns the error function cannot execute on a QE slice
.
When a function uses the EXECUTE ON INITPLAN
attribute, a command that uses the function such as CREATE TABLE t AS SELECT * FROM get_data()
gathers the results of the function onto the coordinator segment and then redistributes the results to segment instances when inserting the data. If the function returns a large amount of data, the coordinator might become a bottleneck when gathering and redistributing data. Performance might improve if you rewrite the function to run the CTAS command in the user defined function and use the table name as an input parameter. In this example, the function runs a CTAS command and does not require the EXECUTE ON INITPLAN
attribute. Running the SELECT
command creates the table t1
using the function that runs the CTAS command.
CREATE OR REPLACE FUNCTION my_ctas(_tbl text) RETURNS VOID AS
$$
BEGIN
EXECUTE format('CREATE TABLE %s AS SELECT c.c_id, c.c_name FROM country c DISTRIBUTED RANDOMLY', _tbl);
END
$$
LANGUAGE plpgsql;
SELECT my_ctas('t1');
Add two integers using a SQL function:
CREATE FUNCTION add(integer, integer) RETURNS integer
AS 'select $1 + $2;'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
Increment an integer, making use of an argument name, in PL/pgSQL:
CREATE OR REPLACE FUNCTION increment(i integer) RETURNS
integer AS $$
BEGIN
RETURN i + 1;
END;
$$ LANGUAGE plpgsql;
Return a record containing multiple output parameters:
CREATE FUNCTION dup(in int, out f1 int, out f2 text)
AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
LANGUAGE SQL;
SELECT * FROM dup(42);
You can do the same thing more verbosely with an explicitly named composite type:
CREATE TYPE dup_result AS (f1 int, f2 text);
CREATE FUNCTION dup(int) RETURNS dup_result
AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
LANGUAGE SQL;
SELECT * FROM dup(42);
Another way to return multiple columns is to use a TABLE
function:
CREATE FUNCTION dup(int) RETURNS TABLE(f1 int, f2 text)
AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
LANGUAGE SQL;
SELECT * FROM dup(42);
However, a TABLE
function is different from the preceding examples, because it actually returns a set of records, not just one record.
Increase the default segment host memory per query for a PL/pgSQL function:
CREATE OR REPLACE FUNCTION function_with_query() RETURNS
SETOF text AS $$
BEGIN
RETURN QUERY
EXPLAIN ANALYZE SELECT * FROM large_table;
END;
$$ LANGUAGE plpgsql
SET statement_mem='256MB';
Use polymorphic types to return an ENUM
array:
CREATE TYPE rainbow AS ENUM('red','orange','yellow','green','blue','indigo','violet');
CREATE FUNCTION return_enum_as_array( anyenum, anyelement, anyelement )
RETURNS TABLE (ae anyenum, aa anyarray) AS $$
SELECT $1, array[$2, $3]
$$ LANGUAGE SQL STABLE;
SELECT * FROM return_enum_as_array('red'::rainbow, 'green'::rainbow, 'blue'::rainbow);
This function is defined with the EXECUTE ON ALL SEGMENTS
to run on all primary segment instances. The SELECT
command runs the function that returns the time it was run on each segment instance.
CREATE FUNCTION run_on_segs (text) returns setof text as $$
begin
return next ($1 || ' - ' || now()::text );
end;
$$ language plpgsql VOLATILE EXECUTE ON ALL SEGMENTS;
SELECT run_on_segs('my test');
This function looks up a part name in the parts table. The parts table is replicated, so the function can run on the coordinator or on the primary segments.
CREATE OR REPLACE FUNCTION get_part_name(partno int) RETURNS text AS
$$
DECLARE
result text := ' ';
BEGIN
SELECT part_name INTO result FROM parts WHERE part_id = partno;
RETURN result;
END;
$$ LANGUAGE plpgsql;
If you run SELECT get_part_name(100);
at the coordinator the function runs on the coordinator. (The coordinator instance directs the query to a single primary segment.) If orders is a distributed table and you run the following query, the get_part_name()
function runs on the primary segments.
`SELECT order_id, get_part_name(orders.part_no) FROM orders;`
Because a SECURITY DEFINER
function is executed with the privileges of the user that created it, care is needed to ensure that the function cannot be misused. For security, search_path
should be set to exclude any schemas writable by untrusted users. This prevents malicious users from creating objects that mask objects used by the function. Particularly important in this regard is the temporary-table schema, which is searched first by default, and is normally writable by anyone. A secure arrangement can be had by forcing the temporary schema to be searched last. To do this, write pg_temp
as the last entry in search_path
. This function illustrates safe usage:
CREATE FUNCTION check_password(uname TEXT, pass TEXT)
RETURNS BOOLEAN AS $$
DECLARE passed BOOLEAN;
BEGIN
SELECT (pwd = $2) INTO passed
FROM pwds
WHERE username = $1;
RETURN passed;
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER
-- Set a secure search_path: trusted schema(s), then 'pg_temp'.
SET search_path = admin, pg_temp;
The SET
option was not available in earlier versions of Greenplum Database, and so older functions may contain rather complicated logic to save, set, and restore search_path
. The SET
option is far easier to use for this purpose.
Another point to keep in mind is that by default, execute privilege is granted to PUBLIC
for newly created functions (see GRANT for more information). Frequently you will wish to restrict use of a security definer function to only some users. To do that, you must revoke the default PUBLIC
privileges and then grant EXECUTE
privilege selectively. To avoid having a window where the new function is accessible to all, create it and set the privileges within a single transaction. For example:
BEGIN;
CREATE FUNCTION check_password(uname TEXT, pass TEXT) ... SECURITY DEFINER;
REVOKE ALL ON FUNCTION check_password(uname TEXT, pass TEXT) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION check_password(uname TEXT, pass TEXT) TO admins;
COMMIT;
CREATE FUNCTION
is defined in SQL:1999 and later. The Greenplum Database version is similar but not fully compatible. The attributes are not portable, neither are the different available languages.
For compatibility with some other database systems, argmode can be written either before or after argname. But only the first way is standard-compliant.
For parameter defaults, the SQL standard specifies only the syntax with the DEFAULT
key word. The syntax with =
is used in T-SQL and Firebird.
ALTER FUNCTION, DROP FUNCTION, GRANT, LOAD, REVOKE, createlang
Parent topic: SQL Commands