You can restore a failed Client node from a full backup data.

Prerequisites

  • Verify that the backup was created from a healthy Client node. See Back-Up Client Node on vSphere.

  • Verify that you have captured the IP addresses of all the Client node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.

Procedure

  1. Stop all the applications that invoke connection requests from the Daml Ledger.
  2. Stop the Client node components.
    curl -X POST 127.0.0.1:8546/api/node/management?action=stop
    vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.7.0.0.55  "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.7.0.0.55  "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
  3. Start all the Client node components.
    curl -X POST 127.0.0.1:8546/api/node/management?action=start 
  4. Mount the NFS to the target directory /mnt/client-backups.

    For each backed up node, there is a <node-id> sub-directory under the /mnt/client-backups directory. Select the appropriate sub-directory to restore data from to the target Client node.

    The mount point configuration involves updating /etc/fstab and running the mount command.

  5. Validate that the Daml index database container has read and write access to the backup directory.
    1. Get the Daml index database image ID.

      sudo docker images

      Sample image ID, 873681a0fd91.

    2. Launch a container with the Daml index database image.
      sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw  -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>

      Substitute <daml-index-db-image> with the image ID from the previous step.

      A Bash shell prompt appears.

    3. Switch to a postgres user.

      su - postgres

    4. Switch to the mounted directory.

      cd /mnt/client-backups

    5. With the postgres user permissions, create test files, directories, nested files, and directories, and validate that you have read permission.

      If you receive a read or write permission error, your mount point is not configured correctly. Remount the NFS file server with the correct permissions. The remount might require some reconfiguration on the NFS server.

  6. Delete the existing Client node data.

    sudo bash -c 'rm -rf /mnt/data/db/*'

    You can initiate a full restore or point-in-time restore.

  7. Perform a full Client node restore.
    1. Launch a container with the Daml index database image.
      sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw  -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>

      Substitute <daml-index-db-image> with the image ID from the previous step.

      A Bash shell prompt appears.

    2. Switch to a postgres user.

      su - postgres

    3. List all the available backups.

      pgbackrest info

      The output provides information on the last valid backup that can be used for a full restore.

    4. Restore the last valid backup.
      pgbackrest --stanza=daml-indexdb --log-level-console=info restore

      After the restore process completes, exit the psql shell, the postgres user shell, and the container bash shell by running exit consecutively.

    5. Start all the Client node components.
      curl -X POST 127.0.0.1:8546/api/node/management?action=start
    6. If a backup was configured after the Client node deployment using the agent REST API, reissue the API.
      POST http://127.0.0.1:8546/api/backup
      {
          "retention_days": 33,
          "schedule_frequency": "DAILY"
      }
    7. Verify that the backup was configured properly.
      GET http://127.0.0.1:8546/api/backup/status

      The call returns the following API response from the Client node.

      {
          "backup_state": {
              "state": "ENABLED",
              "state_change_time": "2021-11-29T22:40:33.828981Z[UTC]"
          },
          "execution_error": "",
          "execution_message": "... some message ...",
          "execution_status_code": 0,
          "last_run_end_time": "2021-11-29T22:29:25.71632Z[UTC]",
          "last_run_start_time": "2021-11-29T22:29:09.876718Z[UTC]"
          "in_progress": false
      }
  8. Validate the restored data.

    When a backup process runs regularly and at intermediate times, you must capture data during the restore process to make sure that the restored data is consistent.

    1. Access the Docker container.

      sudo docker exec -it daml_index_db bash

    2. Switch to a postgres user.

      su - postgres

    3. Start the psql client.

      psql -U indexdb -daml_ledger_api

    4. Run the count(*) command for the following tables.
      select count(1) from configuration_entries;
      select count(1) from flyway_schema_history;
      select count(1) from package_entries;
      select count(1) from packages;
      select count(1) from parameters;
      select count(1) from participant_command_completions;
      select count(1) from participant_command_submissions;
      select count(1) from participant_events;
      select count(1) from participant_events_consuming_exercise;
      select count(1) from participant_events_create;
      select count(1) from participant_events_divulgence;
      select count(1) from participant_migration_history_v100;
      select count(1) from parties;
      select count(1) from party_entries;
    5. Compare these values with the values that were captured during the backup process.
  9. (Optional) Restart the Daml Ledger container.

    In some instances, the restore process might take a long time, and the postgres database might not be ready to accept connections until the restore completes. When this happens, the Daml Ledger container times out and becomes non-functional.

    1. Access Daml Ledger container logs.

      sudo docker logs daml_ledger_api

      After you access the Daml Ledger container logs and you see a database connection error, perform the subsequent operations listed.

    2. Stop the Client node components.
      curl -X POST 127.0.0.1:8546/api/node/management?action=stop
      vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.7.0.0.55  "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.7.0.0.55  "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
    3. Start all the Client node components.
      curl -X POST 127.0.0.1:8546/api/node/management?action=start
    4. Verify that the Daml Index database logs and Daml Ledger API logs are functional.

What to do next

You can also restore the Client node from the point-in-time backup data. See Restore Client Node from Point-In-Time Backup on vSphere.

After the restore, you can manage your backup configuration. See Manage Client Node Backup on vSphere.