A user-defined function (UDF) transformer is a function that perform specific formatting or processing on data before it is written to Greenplum Database. All GPSS data sources (file, Kafka, RabbitMQ, s3) support UDF transformers.
This topic describes UDF transformers and how to use them with Greenplum Streaming Server:
A UDF transformer is a function. You create and register the function with Greenpum Database. Users of the UDF specify the name of the function in the GPSS load configuration file.
The framework supports specifying properties that direct the processing performed by the UDF. GPSS passes any transformer properties specified in the load configuration file to the function. The UDF transformer-related load configuration properties are described further in Using a UDF Transformer in GPSS.
A UDF transformer function signature follows:
<function_name>(<s> anyelement, <properties> json)
The anyelement
input argument and the function return columns must have the same table structure as the output table.
To use a UDF transformer in a GPSS job, you must specify an OUTPUT:TRANSFORMER
(version 2) or targets:gpdb:tables:table:transformer
(version 3 (Beta)) block in the load configuration file. The properties in this block identify the UDF transform function name and transform-specific properties that GPSS passes to the function.
Version 2 format syntax:
[TRANSFORMER:
PATH: <path_to_plugin_transform_library>
ON_INIT: <plugin_transform_init_name>
TRANSFORM: <plugin_transform_name>
PROPERTIES:
<plugin_transform_property_name>: <property_value>
[ ... ] ]
Version 3 (Beta) format syntax:
transformer:
transform: <udf_transform_udf_name>
properties:
<udf_transform_property_name>: <property_value>
...
columns:
- <udf_transform_column_name>
...
NoteGPSS currently supports specifying only one of the
mapping
or (UDF)transformer
blocks in the load configuration file, not both.
The following UDF adds a prefix to a name and adds an increment to an age. The prefix is specified in a function property named name-prefix
, the age increment in a property named age-increment
:
CREATE FUNCTION simple_mapping(s anyelement, properties json)
RETURNS table(id bigint, name text, age int)
AS $$ SELECT ((s.key)->>'id')::bigint, (properties->>'name-prefix')::text||((s.value)->>'name')::text, (properties->>'ag-increment')::int+(s.value)->>'age'::int;
$$ LANGUAGE sql;
To use this UDF in a GPSS job, specify the following in a version 2 load configuration file:
TRANSFORMER:
TRANSFORM: simple_mapping
PROPERTIES:
name-prefix: 'Dear '
age-increment: 10
With this configuration, the prefix Dear
is prepended to the name and 10
is added to the age before the data is written to Greenplum Database
Internally, GPSS writes to the target table using the following SQL command:
INSERT INTO tbl_target(id,name,age) FROM
SELECT f.id, f.name, f.age FROM
tbl_source s, simple_mapping(s,'{age-increment":"10","name-prefix":"Dear "}'::json) f;