If you are using the Greenplum Streaming Server (GPSS) in your current Greenplum Database installation, you must perform the GPSS upgrade procedure when:

  • You upgrade to a newer version of Greenplum Database, or
  • You install a new standalone GPSS package on your ETL host or in your Greenplum Database installation.

The GPSS upgrade procedures describe how to upgrade GPSS in your Greenplum Database installation or on your ETL host. This procedure uses GPSS.from to refer to your currently-installed GPSS and GPSS.new to refer to the GPSS installed when you upgrade to the new version of Greenplum Database or install a new GPSS package.

The GPSS upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database or GPSS:

Step1: GPSS Pre-Upgrade Actions

Perform this procedure in your GPSS.from installation before you upgrade to a new version of Greenplum Database or GPSS:

  1. Log in to the Greenplum Database coordinator host or the ETL host and set up your environment. For example:

    $ ssh gpadmin@<gpcoord>
    gpadmin@gpcoord$ . /usr/local/greenplum-db/greenplum_path.sh
    

    Or:

    $ ssh etluser@<etlhost>
    etluser@etlhost$ . /usr/local/gpss/gpss_path.sh
    
  2. Identify and note the current version (GPSS.from) of GPSS. For example:

    $ gpss --version
    
  3. Stop all gpss jobs that are in the Running state.

  4. Stop all running gpss instances.

  5. Upgrade to the new version of Greenplum Database or install a new version of GPSS, and then continue your GPSS upgrade with Step2: Upgrading GPSS.

Step2: Upgrading GPSS

After you upgrade to the new version of Greenplum Database or install the new version of GPSS in your Greenplum installation, perform the following procedure to upgrade the GPSS.new software:

  1. Log in to the Greenplum Database coordinator host or the ETL host and set up your environment. For example, on the coordinator:

    $ ssh gpadmin@<gpcoord>
    gpadmin@gpcoord$ . /usr/local/greenplum-db/greenplum_path.sh
    
  2. Identify and note the new version (GPSS.new) of GPSS. For example:

    gpadmin@gpcoord$ gpss --version
    
  3. If you are upgrading from GPSS version 1.3.0 or older:

    GPSS 1.3.0 introduced a regression that caused it to no longer recognize history tables (internal tables that GPSS creates for each job) that were created with GPSS 1.2.6. This regression could cause GPSS to load duplicate Kafka messages into Greenplum. This issue is resolved in GPSS 1.3.1.

    You are not required to perform any upgrade steps related to this issue; GPSS will automatically perform the required actions when you resubmit and restart a load job that you initiated with GPSS 1.3.0. GPSS's upgrade actions are dependent upon the GPSS version(s) from which you are upgrading, and are described below:

    • If you are upgrading directly from GPSS 1.2.6 or older, GPSS performs no special upgrade actions.
    • If you are upgrading from GPSS 1.3.0 and you previously submitted load jobs with both GPSS 1.2.6 or older and 1.3.0, GPSS copies the internal history table for each submitted job to a table with the correct name format, and uses those tables. GPSS also retains and renames the internal history table for each GPSS 1.3.0 job, adding the prefix deprecated_.
    • If you first and only used GPSS 1.3.0 and are upgrading from this version, GPSS renames the internal history table for each restarted job.
  4. If you are upgrading from GPSS version 1.3.1 or older:

    • GPSS 1.3.2 changes the gpss.json configuration file:
      • The new file format allows you to specify unique SSL Certificates for GPSS and gpfdist. If you are using SSL to encrypt communication between GPSS and Kafka, Greenplum, or the GPSS client, you must update the gpss.json server configuration file to configure the correct Certificate block.
      • The ListenAddress:SSL property is removed. Ensure that you remove this property from all GPSS server configuration files.
    • GPSS 1.3.2 renames gpkafka check to gpkafka history. If you have any scripts or programs that reference gpkafka check, you must replace these references with gpkafka history.
    • GPSS 1.3.2 removes the ENCRYPTION property from the gpkafka.yaml job configuration file. Ensure that you remove this property from all job configuration files, and that you provide Kafka SSL configuration properties via the PROPERTY block in the file.
    • GPSS 1.3.2 removes the LOCAL_HOSTNAME and LOCAL_PORT properties from the gpkafka.yaml job configuration file. You must remove these properties from all job configurations, and specify the gpfdist configuration for each job in one of the following ways:
      • If you are loading data with gpkafka load, provide the --config gpfdistconfig.json or --gpfdist-host hostaddr and --gpfdist-port portnum options when you run the command.
      • If you are loading data with the gpsscli job management commands, ensure that the gpss.json configuration file for the gpss server instance servicing the request specifies the desired Gpfdist:Host and Gpfdist:Port settings.
    • GPSS 1.3.2 removes the --no-reuse flag from the gpsscli load and gpsscli start commands. If you have any scripts or programs that reference this flag, you must remove the references.
  5. If you developed a client application with GPSS 1.3.5 or earlier and you want to use the new MaxErrorRows or Abort session capabilities added to the Close service that were introduced in GPSS 1.3.6, you must:

    1. Edit the gpss.proto service definition and add the new CloseRequest field(s):

      message CloseRequest {
        Session session = 1;
        int32 MaxErrorRows = 2;
        bool Abort = 3;
      }
      
    2. Re-generate the GPSS client classes.

    3. Add code to utilize the new fields.

    4. Re-compile and re-distribute your GPSS client application. Refer to Developing a Batch Data Client for supporting information.

  6. If you are upgrading from GPSS version 1.4.x or older:

    • GPSS 1.4.0 removes the gpsscli history and gpkafka history commands. If you have any scripts or programs that reference these commands, you must remove the references.
    • GPSS 1.4.1 changes the client and server log file format to CSV. If you created any scripts that parsed the previous log file format, you must update that script logic.
    • GPSS 1.4.1 adds a new, separate logfile to track Kafka job progress. If you created any scripts that relied on the existence of progress information in the client or server log files, you must update that script logic.
  7. If you are upgrading from GPSS version 1.6.x or older and you have registered the dataflow extension in any database, you must drop and re-create the extension:

    DROP EXTENSION dataflow;
    CREATE EXTENSION dataflow;
    
  8. If you are upgrading from GPSS version 1.7.x or older:

    • GPSS 1.8.0 changes the name of the Kafka version 3 (Beta) load configuration file window property to task. If you have any Kafka load configuration files that specify window:, you must change the references to task:.
  9. If you are upgrading from GPSS version 1.9.x or older:

    • GPSS 1.10.0 changes the naming format of its server log files as described in the Version 1.10.0 release notes and adds a job_id field to the content of the server log file. You must update any scripts that you have written that rely on the log file naming format or the log file content of previous releases.
  10. If you developed a client application with GPSS 1.9.x or earlier and you want to use the new session timeout capability added to the Connect service that was introduced in GPSS 1.10.0, you must:

    1. Edit the gpss.proto service definition and add the new SessionTimeout field to the ConnectRequest message:

      message ConnectRequest {
        string Host = 1;
        ...
        bool UseSSL = 6;
        int32 SessionTimeout = 7;
      }
      
    2. Re-generate the GPSS client classes.

    3. Add code to utilize the new field.

    4. Re-compile and re-distribute your GPSS client application. Refer to Developing a Batch Data Client for supporting information.

  11. If you are upgrading from GPSS version 1.10.0:

    • GPSS 1.10.1 changes the naming format of its per-run server log files as described in the Version 1.10.1 release notes. You must update any scripts that you have written that rely on the per-run server log file naming format introduced in version 1.10.0.
  12. If you installed a new version of Greenplum Database, or you installed the GPSS gppkg or .tar.gz packages in your Greenplum installation, you must drop and re-create the GPSS extension in any Greenplum database in which you are using GPSS to load data. A database superuser or the database owner must run these SQL commands:

    DROP EXTENSION gpss;
    CREATE EXTENSION gpss;
    

    (If the extension does not already exist, GPSS automatically creates it in a database the first time a Greenplum superuser or the database owner submits a load job to any table that resides in that database.)

  13. Restart your gpss instances.

  14. Resubmit and restart your GPSS jobs.

    For any Kafka job that you resubmit and restart, GPSS will consume Kafka messages from the offset associated with the latest timestamp recorded in the history table for the job.

check-circle-line exclamation-circle-line close-line
Scroll to top icon