You can use the reports that vSphere Replication compiles to optimize your environment for replication, identify problems in your environment, and reveal their most probable cause.

Server and site connectivity, number of RPO violations, and other metrics give you, as an administrator, the information you need to diagnose replication issues.

The following sections contain examples of interpreting the data displayed under Reports on the vSphere Replication tab under Monitor.

RPO Violations

The large number of RPO violations can be caused by various problems in the environment, on both the source and the target site. With more details on historical replication jobs, you can make educated decisions on how to manage the replication environment.

Table 1. Analysing RPO Violations

Probable Cause

Solution

  • The network bandwidth cannot accommodate all replications.

  • The replication traffic might have increased.

  • The initial full sync for a large virtual machine is taking longer than the configured RPO for the virtual machine.

  • Disable the replication on some virtual machines with high change rate to allow lower change rate virtual machines to meet their RPO objectives.

  • Increase the network bandwidth for the selected host.

  • Check if the replication traffic has increased. If the traffic has increased, investigate possible causes, for example the usage of an application might have changed without you being informed.

  • Check the historical data for average of transferred bytes for a notable and sustained increase. If an increase exists, contact application owners to identify recent events that could be related to this increase.

  • Adjust to a less aggressive RPO or look at other ways to increase bandwidth to accommodate the current RPO requirements.

  • A connectivity problem exists between the source and the target site.

  • An infrastructure change might have occurred on the target site.

  • Check the site connectivity data to verify the connection between the source and target site.

  • Check if the infrastructure on the target site has changed or is experiencing problems that prevent vSphere Replication from writing on the target datastores. For example, storage bandwidth management changes made to target hosts might result in storage delays during the replication process.

  • Check on the vSphere Replication Management Server appliance and the vSphere Replication Server appliance. Someone might have shut down the appliance or it might have lost connection.

Transferred Bytes

Corelating the total number of transferred bytes and the number of RPO violations can help you make decisions on how much bandwidth might be required to meet RPO objectives.

Table 2. Analysing the Rate of Transferred Bytes and RPO Violations

Graph Values

Probable Cause

Solution

  • High rate of transferred bytes and high number of RPO violations

  • Low rate of transferred bytes and high number of RPO violations

The network bandwidth might be insufficient to accommodate all replications.

  • Maximize the transferred bytes graph and use the drop-down menu to filter the data by virtual machine. Disable the replication on some virtual machines with high change rate to allow lower change rate virtual machines to meet their RPO objectives.

  • Increase the network bandwidth for the selected host.

  • High rate of transferred bytes and a few or no RPO violations

  • Low rate of transferred bytes and a few or no RPO violations

The environment operates as expected.

N/A

Replicated Virtual Machines by Host

The number of replicated virtual machines by host help you determine how replication workload is distributed in your environment. For example, if the number of replicated virtual machines on a host is high, the host might be overloaded with replication jobs. You might want to verify that the host has enough resources to maintain all replication jobs. If needed, you can check for hosts with low number of replicated virtual machines and optimize the allocation of resources in your environment.