gpsscli configuration file.
DATABASE: <db_name>
USER: <user_name>
PASSWORD: <password>
HOST: <coordinator_host>
PORT: <greenplum_port>
VERSION: <version_number>
<DATASOURCE>
<DATASOURCE_specific_properties>
[SCHEDULE:
RETRY_INTERVAL: <retry_time>
MAX_RETRIES: <num_retries>
RUNNING_DURATION: <run_time>
AUTO_STOP_RESTART_INTERVAL: <restart_time>
MAX_RESTART_TIMES: <num_restarts>
QUIT_AT_EOF_AFTER: <clock_time>]
[ALERT:
COMMAND: <command_to_run>
WORKDIR: <directory>
TIMEOUT: <alert_time>]
Where you may specify any property value with a template variable that GPSS substitutes at runtime using the following syntax:
<PROPERTY:> {{<template_var>}}
You specify the configuration parameters for a Greenplum Streaming Server (GPSS) job in a YAML-formatted configuration file that you provide to the gpsscli submit
command. There are two types of configuration parameters in this file - Greenplum Database connection parameters, and parameters specific to the data source from which you will load data into Greenplum.
This reference page uses the name gpsscli.yaml
to refer to this file; you may choose your own name for the file.
NoteGPSS currently supports loading data from Kafka and file data sources. Refer to Loading Kafka Data into Greenplum and Loading File Data into Greenplum for detailed information about using GPSS to load data into Greenplum Database.
The gpsscli
utility processes the YAML configuration file in order, using indentation (spaces) to determine the document hierarchy and the relationships between the sections. The use of white space in the file is significant, and keywords are case-sensitive.
Greenplum Database Options
SHADOW:
prefix, it represents a shadowed password string, and GPSS uses the
Shadow:Key
specified in its
gpss.json configuration file, or a default key, to decode the password.
gpsscli
configuration file. GPSS supports versions 1 and 2 of this format.
DATASOURCE: Options
The data source. GPSS currently supports KAFKA
and FILE
data sources; refer to gpkafka-v2.yaml and filesource-v2.yaml for configuration file format and parameters.
Job SCHEDULE: Options
Controls the frequency and interval of restarting jobs.
d
), hour (
h
), minute (
m
), second (
s
), or millisecond (
ms
) integer units; do not mix units. The default retry interval is
5m
(5 minutes).
RUNNING_DURATION
.
RUNNING_DURATION
. The default is 0, do not restart the job.
clock_time
, even when GPSS encounters an EOF.
Controls notification when a job is stopped for any reason (success, completion, error, user-initiated stop).
$GPSSJOB_NAME
,
$GPSSJOB_STATUS
, and
$GPSSJOB_DETAIL
.
d
), hour (
h
), minute (
m
), or second (
s
) integer units; do not mix units. The default alert timeout is
-1s
(no timeout).
GPSS supports using template variables to specify property values in the load configuration file.
You specify a template variable value in the load configuration file as follows:
<PROPERTY>: {{<template_var>}}
For example:
MAX_RETRIES: {{numretries}}
GPSS substitutes the template variable with a value that you specify via the -p | --property <template_var=value>
option to the gpsscli dryrun
, gpsscli submit
, gpsscli load
, or gpkafka load
command.
For example, if the command line specifies:
--property numretries=10
GPSS substitutes occurrences of {{numretries}}
in the load configuration file with the value 10
before submitting the job, and uses that value while the job is running.
Submit a job to load data into Greenplum Database as defined in the load configuration file named loadit.yaml
:
$ gpsscli submit loadit.yaml
Example Greenplum Database configuration parameters in loadit.yaml
:
DATABASE: ops
USER: gpadmin
PASSWORD: changeme
HOST: mdw-1
PORT: 15432
<DATASOURCE_block> ...
gpsscli load, gpsscli submit, gpkafka load, filesource-v2.yaml, gpkafka-v2.yaml