You can restore a failed Client node from point-in-time backup data.
As a best practice, it is recommended to secure the NFS file server data to avoid a data breach.
Prerequisites
Verify that you have captured the IP addresses of all the Client node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.
Verify that you have captured semantic data with timestamps for point-in-time restore.
Procedure
- Stop all the applications that invoke connection requests from the Daml Ledger.
- Stop the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.8.0.0.53 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.8.0.0.53 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
- Mount the NFS to the target directory /mnt/client-backups.
For each backed up node, there is a
<node-id>
sub-directory under the /mnt/client-backups directory. Select the appropriate sub-directory to restore data from the target Client node.The mount point configuration involves updating /etc/fstab and running the
mount
command. - Validate that the Daml index database container has read and write access to the backup directory.
- Get the Daml index database image ID.
sudo docker images
Sample image ID, 873681a0fd91.
- Launch a container with the Daml index database image.
sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>
Substitute <daml-index-db-image> with the image ID from the previous step.
A Bash shell prompt appears.
- Switch to a postgres user.
su - postgres
- Switch to the mounted directory.
cd /mnt/client-backups
- With the postgres user permissions, create test files, directories, nested files, and directories, and validate that you have read permission.
If you receive a read or write permission error, your mount point is not configured correctly. Remount the NFS file server with the correct permissions. The remount might require some reconfiguration on the NFS server.
- Get the Daml index database image ID.
- Delete the content in the existing Client node data folder.
sudo rm -rf /mnt/data/db
- Perform a point-in-time Client node restore.
- Identify the specific time to restore the data.
- Launch a container with the Daml index database image.
sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash <daml-index-db-image>
Substitute <daml-index-db-image> with the image ID from the previous step.
A Bash shell prompt appears.
- Switch to a postgres user.
su - postgres
- List all the available backups.
pgbackrest info
The output provides information on all valid backups that can be used for point-in-time restore.
For example, you can view the output and restore the data to 2021-10-08 13:14:15.
- Restore data to a specific time.
pgbackrest --stanza=daml-indexdb --type=time --target="2021-10-08 13:14:15+00" --log-level-console=info restore
- Verify that the recovery.conf file appears with the correct timestamp.
postgres@40e87f8849b8:~/data$ sudo cat /var/lib/postgresql/data/recovery.conf # Recovery settings generated by pgBackRest restore on 2021-10-08 19:32:11 restore_command = 'pgbackrest --stanza=daml-indexdb archive-get %f "%p"' recovery_target_time = '2021-11-06 13:14:15+00'
- Exit the container bash shell by running
exit
. - Start all the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- After the Daml index database container starts up, verify that the logs show recovery to the restore point.
sudo docker logs -f daml_index_db
- Validate the restore data for point-in-time restore.
When a backup process runs regularly and at intermediate times, you must capture data during the restore process to make sure that the restored data is consistent.
- Access the Docker container.
sudo docker exec -it daml_index_db bash
- Switch to a postgres user.
su - postgres
- Start the psql client.
psql -U indexdb -daml_ledger_api
- Run the
count(*)
command for the following tables.select count(1) from configuration_entries; select count(1) from flyway_schema_history; select count(1) from package_entries; select count(1) from packages; select count(1) from parameters; select count(1) from participant_command_completions; select count(1) from participant_events_consuming_exercise; select count(1) from participant_events_create; select count(1) from participant_events_divulgence; select count(1) from participant_migration_history_v100; select count(1) from party_entries;
- Compare these values with the values that were captured during the backup process.
- Access the Docker container.
- (Optional) Restart the Daml Ledger container.
In some instances, the restore process might take a long time, and the postgres database might not be ready to accept connections until the restore completes. When this happens, the Daml Ledger container times out and becomes non-functional.
- Access Daml Ledger container logs.
sudo docker logs daml_ledger_api
After accessing the Daml Ledger container logs and seeing a database connection error, perform the subsequent operations listed.
- Stop the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.8.0.0.53 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.8.0.0.53 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
- Start all the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- Verify that the Daml Index database logs and Daml Ledger API logs are functional.
- Access Daml Ledger container logs.
What to do next
After the restore, you can manage your backup configuration. See Manage Client Node Backup on vSphere.