You can restore a failed Client node from full backup data.
As a best practice, it is recommended to secure the NFS file server data to avoid a data breach.
Prerequisites
Verify that you have captured the IP addresses of all the Client node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.
Procedure
- Stop all the applications that invoke connection requests from the Daml Ledger.
- Stop the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
root@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop root@localhost [ ~ ]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.4.0.0.91 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.4.0.0.91 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent root@localhost [ ~ ]#
- Mount the NFS to the target directory /mnt/client-backups.
For each backed up node, there is a
<node-id>
sub-directory under the /mnt/client-backups directory. Select the appropriate sub-directory to restore data from the target Client node.The mount point configuration involves updating /etc/fstab and running the
mount
command. - Validate that the Daml index database container has read and write access to the backup directory.
- Get the Daml index database image ID.
docker images
Sample image ID, 873681a0fd91.
- Launch a container with the Daml index database image.
docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>
Substitute <daml-index-db-image> with the image ID from the previous step.
A Bash shell prompt appears.
- Switch to a postgres user.
su - postgres
- Switch to the mounted directory.
cd /mnt/client-backups
- With the postgres user permissions, create test files, directories, nested files, and directories, and validate that you have read permission.
If you receive a read or write permission error, your mount point is not configured correctly. Remount the NFS file server with the correct permissions. The remount might require some reconfiguration on the NFS server.
- Get the Daml index database image ID.
- Delete the contents in the existing Client node data folder.
rm -rf /mnt/data/db
You can initiate a full restore or point-in-time restore.
- Perform a full Client node restore.
- Launch a container with the Daml index database image.
docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>
Substitute <daml-index-db-image> with the image ID from the previous step.
A Bash shell prompt appears.
- Switch to a postgres user.
su - postgres
- List all the available backups.
pgbackrest info
The output provides information on the last valid backup that can be used for a full restore.
- Restore the last valid backup.
pgbackrest --stanza=daml-indexdb --log-level-console=info restore
After the restore process completes, exit the psql shell, the postgres user shell, and the container bash shell by running
exit
consecutively. - Start all the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- If a backup was configured after the Client node deployment using the agent REST API, reissue the API.
POST http://127.0.0.1:8546/api/backup { "retention_days": 33, "schedule_frequency": "DAILY" }
- Verify that the backup was configured properly.
GET http://127.0.0.1:8546/api/backup/status
The call returns the following API response from the Client node.
{ "backup_state": { "state": "ENABLED", "state_change_time": "2021-11-29T22:40:33.828981Z[UTC]" }, "execution_error": "", "execution_message": "... some message ...", "execution_status_code": 0, "last_run_end_time": "2021-11-29T22:29:25.71632Z[UTC]", "last_run_start_time": "2021-11-29T22:29:09.876718Z[UTC]" "in_progress": false }
- Launch a container with the Daml index database image.
- Validate the restore data.
When a backup process runs regularly, and at intermediate times, you must capture data during the restore process to make sure that the restored data is consistent.
- Access the Docker container.
docker exec -it daml_index_db bash
- Switch to a postgres user.
su - postgres
- Start the psql client.
psql -U indexdb -daml_ledger_api
- Run the
count(*)
command for the following tables.select count(1) from configuration_entries; select count(1) from flyway_schema_history; select count(1) from package_entries; select count(1) from packages; select count(1) from parameters; select count(1) from participant_command_completions; select count(1) from participant_command_submissions; select count(1) from participant_events; select count(1) from participant_events_consuming_exercise; select count(1) from participant_events_create; select count(1) from participant_events_divulgence; select count(1) from participant_migration_history_v100; select count(1) from parties; select count(1) from party_entries;
- Compare these values with the values that were captured during the backup process.
- Access the Docker container.
- (Optional) Restart the Daml Ledger container.
In some instances, the restore process might take a long time, and the postgres database might not be ready to accept connections until the restore completes. When this happens, the Daml Ledger container times out and becomes non-functional.
- Access Daml Ledger container logs.
docker logs daml_ledger_api
After accessing the Daml Ledger container logs and seeing a database connection error, perform the subsequent operations listed.
- Stop the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
root@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop root@localhost [ ~ ]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.4.0.0.91 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.4.0.0.91 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent root@localhost [ ~ ]#
- Start all the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- Verify that the Daml Index database logs and Daml Ledger API logs are functional.
- Access Daml Ledger container logs.
What to do next
You can also restore the Client node from the point-in-time backup data. See Restore Client Node from Point-In-Time Backup on vSphere.
After the restore you can manage your backup configuration. See Manage Client Node Backup on vSphere.