Load data with the Greenplum Streaming Server.
gpsscli load <jobconfig.yaml> [...]
[--name <job_name>]
[-f | --force] [--quit-at-eof] [--partition]
[{--force-reset-earliest | --force-reset-latest | --force-reset-timestamp <tstamp>}]
[-p | --property <template_var=value>]
[--config <gpsscliconfig.json>]
[--gpss-host <host>] [--gpss-port <port>]
[-U | -X-username <client_auth_user> -P | --password <client_auth_passwd>]
[--no-check-ca] [-l | --log-dir <directory>] [--verbose]
gpsscli load {-h | --help}
The gpsscli load
command initiates a load job to a specific Greenplum Streaming Server (GPSS) instance. When you run gpsscli load
, the command submits, starts, and displays the progress of a GPSS job.
You provide one or more YAML-formatted configuration files that define the job parameters when you run the command. When you specify a single load configuration file, you may choose a name to identify the job. If you do not provide a name, GPSS uses the base name of load configuration file as the job identifier. For example, if you invoke this command with the load configuration file /dir/jobconfig.yaml
and do not provide the --name
option, GPSS assigns the job the identifier jobconfig
.
By default, gpsscli load
loads all available data and then waits indefinitely for new messages to load. In the case of user interrupt or exit, the GPSS job remains in the Running state. You must explicitly stop the job with gpsscli stop
when running in this mode.
When you provide the --quit-at-eof
option to the command, the utility exits after it reads all published data, writes the data to Greenplum Database, and stops the job. The GPSS job is in the Success or Error state when the command returns.
If gpsscli load
detects an offset mismatch when loading from a Kafka data source, you can choose to resume a load operation from the earliest available data. Or, you may choose to load only new data, or data emitted since a specific time.
If the GPSS instance to which you want to send the request is not running on the default host (127.0.0.1
) or the default port number (5000
), you can specify the GPSS host and/or port via command line options.
One or more YAML-formatted configuration files that define the parameters of the job. If a filename provided is not an absolute path, Greenplum Database assumes the file system location is relative to the current working directory.
NoteGPSS uses the properties in a YAML configuration file to uniquely identify a load operation. Submit a configuration file only once. If you submit the same configuration file more than once, GPSS will create the job, but it will eventually error out.
Use job_name to identify the job. If you do not provide a name, the default job identifier is the base name of the load configuration file. Job names must be unique.
NoteGPSS does not support specifying a job_name when you provide more than one jobconfig.yaml load configuration file to the command.
Force GPSS to reload the configuration of a running job. GPSS stops the job, updates the job with the configuration specified in in jobconfig.yaml
, and then restarts the job. If you previously named the job, you must provide --name job\_name
when you force job configuration reload with this option.
NoteDo not attempt to update a configuration property that GPSS uses to uniquely identify a job. If you change any such configuration property, GPSS creates a new internal job and loads all available messages.
When you specify this option, gpsscli load
exits after it reads all of the source data. The default behaviour of gpsscli load
is to wait indefinitely for, and then consume, new data from the source.
gpsscli load
ignores job retry
SCHEDULE
configuration settings when it is invoked with the
--quit-at-eof
flag.
gpsscli load
returns an error if its recorded offset does not match that of the data source. Re-run gpsscli load
and specify the --force‑reset‑earliest
option to resume the load operation from the earliest available data offset known to the data source.
Note
gpsscli load
supports this option only when loading from a Kafka or RabbitMQ stream data source.
Note
--force-reset-earliest
specified on the command line takes precedence over aFALLBACK_OFFSET/fallback_offset
set in the jobconfig.yaml.
gpsscli load
returns an error if its recorded offset does not match that of the data source. Re-run gpsscli load
and specify the --force‑reset‑latest
option to load only new data emitted from the data source.
Note
gpsscli load
supports this option only when loading from a Kafka or RabbitMQ stream data source.
Note
--force-reset-latest
specified on the command line takes precedence over aFALLBACK_OFFSET/fallback_offset
set in the jobconfig.yaml.
Specify the --force‑reset‑timestamp
option to load messages published since the specified time. tstamp must specify epoch time in milliseconds, and is bounded by the earliest message time and the current time.
Note
gpsscli load
supports this option only when loading from a Kafka or RabbitMQ stream data source.
By default, GPSS outputs the Kafka job progress by batch, and displays the start and end times, the message number and size, the number of inserted and rejected rows, and the transfer speed per batch. When you specify the --partition
option, GPSS outputs the job progress by partition, and displays the partition identifier, the start and end times, the beginning and ending offsets, the message size, and the transfer speed per partition.
Note
gpsscli load
supports this option only when loading from a Kafka data source.
The GPSS configuration file. This file includes properties that identify the gpss
instance that services the command. When SSL encryption is enabled between the GPSS client and server, you also use this file to identify the file system location of the client SSL certificates. Refer to gpss.json for detailed information about the format of this file and the configuration properties supported.
Note
gpsscli
subcommands read the configuration specified in theListenAddress
block of thegpsscliconfig.json
file, and ignore thegpfdist
configuration specified in theGpfdist
block of the file.
Enable the use of color when displaying front-end log messages. When specified, GPSS colors the log level in messages that it writes to stdout
. Color is deactivated by default.
--color
option if you also specify
--csv-log
.
stdout
using spaces between fields for a more human-readable format.
127.0.0.1
. If specified, overrides a
ListenAddress:Host
value provided in
gpsscliconfig.json
5000
. If specified, overrides a
ListenAddress:Port
value provided in
gpsscliconfig.json
gpsscli
subcommand.
The directory to which GPSS writes client command log files. GPSS must have write permission to the directory. GPSS creates the log directory if it does not exist.
gpsscli
client log files to the
$HOME/gpAdminLogs
directory.
stdout
. When you specify the
--verbose
option, GPSS also outputs debug-level messages about the operation.
Submit a GPSS load job from Kafka named from_topic1
whose load parameters are defined by the configuration file named loadcfg.yaml
:
$ gpsscli load --name from_topic1 loadcfg.yaml
gpss, gpsscli.yaml, gpsscli submit, gpsscli start, gpsscli progress, gpsscli stop