It is a best practice to set up scheduled backups or replication for vRealize Log Insight nodes and clusters.

Prerequisites

  • Verify that no configuration problems exist on source and target sites before performing the backup or replication operations.
  • Verify that cluster resource allocation is not at capacity.

    In configurations with reasonable ingestion and query loads, the memory and swap usage can reach almost 100% capacity during backup and replication operations. Because the memory is near capacity in a live environment, part of the memory spike is due to the vRealize Log Insight cluster usage. Also, the scheduled backup and replication operations can contribute significantly to the memory spike.

    Sometimes, worker nodes are disconnected momentarily for 1–3 minutes before rejoining primary nodes, possibly because of high memory usage.

  • Reduce the memory throttling on vRealize Log Insight nodes by doing one or both of the following:
    • Allocate additional memory over the vRealize Log Insight recommended configurations.
    • Schedule the recurring backups during off-peak hours.

Procedure

  1. Enable regular backup or replication of vRealize Log Insight forwarders by using the same procedures that you use for the vRealize Log Insight server.
  2. Verify that the backup frequency and backup types are appropriately selected based on the available resources and customer-specific requirements.
  3. If the resources are not a problem and if it is supported by the tool, enable concurrent cluster node backups to speed up the backup process.
  4. Back up all the nodes at the same time.
    For information about how to back up the nodes, see the Backup and Restore, and Disaster Recovery section in the vRealize Suite documentation.

What to do next

Monitoring—As the backup is in progress, check any environment or performance problems in the vRealize Log Insight setup. Most backup, restore, and disaster recovery tools provide monitoring capabilities.

During the backup process, check all the relevant logs in the production system because the user interface might not display all problems.