A user-defined function (UDF) transformer is a function that perform specific formatting or processing on data before it is written to VMware Greenplum. All GPSS data sources (file, Kafka, RabbitMQ, s3) support UDF transformers.

This topic describes UDF transformers and how to use them with Greenplum Streaming Server:

Developing a UDF Transformer for GPSS

A UDF transformer is a function. You create and register the function with Greenpum Database. Users of the UDF specify the name of the function in the GPSS load configuration file.

The framework supports specifying properties that direct the processing performed by the UDF. GPSS passes any transformer properties specified in the load configuration file to the function. The UDF transformer-related load configuration properties are described further in Using a UDF Transformer in GPSS.

A UDF transformer function signature follows:

<function_name>(<s> anyelement, <properties> json)

The anyelement input argument and the function return columns must have the same table structure as the output table.

Using a UDF Transformer in GPSS

To use a UDF transformer in a GPSS job, you must specify an OUTPUT:TRANSFORMER (version 2) or targets:gpdb:tables:table:transformer (version 3) block in the load configuration file. The properties in this block identify the UDF transform function name and transform-specific properties that GPSS passes to the function.

Version 2 format syntax:

[TRANSFORMER:
   PATH: <path_to_plugin_transform_library>
   ON_INIT: <plugin_transform_init_name>
   TRANSFORM: <plugin_transform_name>
   PROPERTIES:
      <plugin_transform_property_name>: <property_value>
      [ ... ] ]

Version 3 format syntax:

transformer:
  transform: <udf_transform_udf_name>
  properties:
    <udf_transform_property_name>: <property_value>
    ...
  columns:
    - <udf_transform_column_name>
    ...
Note

GPSS currently supports specifying only one of the mapping or (UDF) transformer blocks in the load configuration file, not both.

Example

The following UDF adds a prefix to a name and adds an increment to an age. The prefix is specified in a function property named name-prefix, the age increment in a property named age-increment:

CREATE FUNCTION simple_mapping(s anyelement, properties json)
  RETURNS table(id bigint, name text, age int)
  AS $$ SELECT ((s.key)->>'id')::bigint, (properties->>'name-prefix')::text||((s.value)->>'name')::text, (properties->>'ag-increment')::int+(s.value)->>'age'::int;
$$ LANGUAGE sql;

To use this UDF in a GPSS job, specify the following in a version 2 load configuration file:

TRANSFORMER:
  TRANSFORM: simple_mapping
  PROPERTIES:
    name-prefix: 'Dear '
    age-increment: 10

With this configuration, the prefix Dear is prepended to the name and 10 is added to the age before the data is written to VMware Greenplum

Internally, GPSS writes to the target table using the following SQL command:

INSERT INTO tbl_target(id,name,age) FROM 
  SELECT f.id, f.name, f.age FROM 
    tbl_source s, simple_mapping(s,'{age-increment":"10","name-prefix":"Dear "}'::json) f;
check-circle-line exclamation-circle-line close-line
Scroll to top icon