title: gpsscli.yaml

gpsscli configuration file.

Synopsis

DATABASE: <db_name>
USER: <user_name>
PASSWORD: <password>
HOST: <master_host>
PORT: <greenplum_port>
VERSION: <version_number>

<DATASOURCE>
  <DATASOURCE_specific_properties>

[SCHEDULE:
   RETRY_INTERVAL: <retry_time>
   MAX_RETRIES: <num_retries>
   RUNNING_DURATION: <run_time>
   AUTO_STOP_RESTART_INTERVAL: <restart_time>
   MAX_RESTART_TIMES: <num_restarts>
   QUIT_AT_EOF_AFTER: <clock_time>]

Where you may specify any property value with a template variable that GPSS substitutes at runtime using the following syntax:

<PROPERTY:> {{<template_var>}}

Description

You specify the configuration parameters for a Greenplum Streaming Server (GPSS) job in a YAML-formatted configuration file that you provide to the gpsscli submit command. There are two types of configuration parameters in this file - Greenplum Database connection parameters, and parameters specific to the data source from which you will load data into Greenplum.

This reference page uses the name gpsscli.yaml to refer to this file; you may choose your own name for the file.

Note: GPSS currently supports loading data from Kafka and file data sources. Refer to Loading Kafka Data into Greenplum and Loading File Data into Greenplum for detailed information about using GPSS to load data into Greenplum Database.

The gpsscli utility processes the YAML configuration file in order, using indentation (spaces) to determine the document hierarchy and the relationships between the sections. The use of white space in the file is significant, and keywords are case-sensitive.

Keywords and Values

Greenplum Database Options

DATABASE: db_name
The name of the Greenplum database.
USER: user_name
The name of the Greenplum Database user/role. This user_name must have permissions as described in Configuring Greenplum Database Role Privileges.
PASSWORD: password
The password for the Greenplum Database user/role. By default, the GPSS client passes the password to the GPSS server in clear text. When the password has a SHADOW: prefix, it represents a shadowed password string, and GPSS uses the Shadow:Key specified in its gpss.json configuration file, or a default key, to decode the password.
HOST: master_host
The host name or IP address of the Greenplum Database master host.
PORT: greenplum_port
The port number of the Greenplum Database server on the master host.
VERSION: version_number
The version of the gpsscli configuration file. GPSS supports versions 1 and 2 of this format.

DATASOURCE: Options

DATASOURCE

The data source. GPSS currently supports KAFKA and FILE data sources; refer to gpkafka-v2.yaml and filesource.yaml for configuration file format and parameters.

DATASOURCE_specific_parameters
Parameters specific to the datasource.

Job SCHEDULE: Options

SCHEDULE:

Controls the frequency and interval of restarting jobs.

RETRY_INTERVAL: retry_time
The period of time that GPSS waits before retrying a failed job. You can specify the time interval in day ( d), hour ( h), minute ( m), second ( s), or millisecond ( ms) integer units; do not mix units. The default retry interval is 5m (5 minutes).
MAX_RETRIES: num_retries
The maximum number of times that GPSS attempts to retry a failed job. The default is 0, do not retry. If you specify a negative value, GPSS retries the job indefinitely.
RUNNING_DURATION: run_time
The amount of time after which GPSS automatically stops a job. GPSS does not automatically stop a job by default.
AUTO_STOP_RESTART_INTERVAL: restart_time
The amount of time after which GPSS restarts a job that it stopped due to reaching RUNNING_DURATION.
MAX_RESTART_TIMES: num_restarts
The maximum number of times that GPSS restarts a job that it stopped due to reaching RUNNING_DURATION. The default is 0, do not restart the job.
QUIT_AT_EOF_AFTER: clock_time
The clock time after which GPSS stops a job every day when it encounters an EOF. By default, GPSS does not automatically stop a job that reaches EOF. GPSS never stops a job when the current time is before clock_time, even when GPSS encounters an EOF.

Template Variables

GPSS supports using template variables to specify property values in the load configuration file.

You specify a template variable value in the load configuration file as follows:

<PROPERTY>: {{<template_var>}}

For example:

MAX_RETRIES: {{numretries}}

GPSS substitutes the template variable with a value that you specify via the -p | --property template\_var=value option to the gpsscli submit, gpsscli load, or gpkafka load command.

For example, if the command line specifies:

--property numretries=10

GPSS substitutes occurrences of {{numretries}} in the load configuration file with the value 10 before submitting the job, and uses that value during job execution.

Examples

Submit a job to load data into Greenplum Database as defined in the load configuration file named loadit.yaml:

$ gpsscli submit loadit.yaml

Example Greenplum Database configuration parameters in loadit.yaml:

DATABASE: ops
USER: gpadmin
PASSWORD: changeme
HOST: mdw-1
PORT: 15432
<DATASOURCE_block> ...

See Also

gpsscli load, gpsscli submit, gpkafka load, filesource.yaml, gpkafka-v2.yaml

check-circle-line exclamation-circle-line close-line
Scroll to top icon