Interpreting Replication Statistics for a Site

You can use the reports that vSphere Replication compiles to optimize your environment for replication, identify problems in your environment, and reveal their most probable cause.

As an administrator, you can get the necessary information to diagnose various replication issues by using the server and site connectivity, number of RPO violations, and other metrics.

The following sections contain examples of interpreting the data displayed under vSphere Replication reports on the Site Pair tab of vSphere Replication.

RPO Violations

The large number of RPO violations can occur due to by various problems in the environment, on both the protected and the recovery site. With more details on historical replication jobs, you can make educated decisions on how to manage the replication environment.

Table 1. Analyzing RPO Violations
Probable Cause	Solution
The network bandwidth cannot accommodate all replications. The replication traffic might have increased. The initial full sync for a large virtual machine is taking longer than the configured RPO for the virtual machine.	To allow the lower change rate virtual machines to meet their RPO objectives, deactivate the replication on some virtual machines with a high change rate. Increase the network bandwidth for the selected host. Check if the replication traffic has increased. If the traffic has increased, investigate possible causes, for example the usage of an application might have changed without you being informed. Check the historical data for average of transferred bytes for a notable and sustained increase. If an increase exists, contact application owners to identify recent events that might be related to this increase. Adjust to a less aggressive RPO or look at other ways to increase bandwidth to accommodate the current RPO requirements.
A connectivity problem exists between the protected and the recovery site. An infrastructure change might have occurred on the recovery site.	To verify the connection between the protected and recovery site, check the site connectivity data. Check if the infrastructure on the recovery site has changed or is experiencing problems that prevent vSphere Replication from writing on the recovery datastores. For example, storage bandwidth management changes made to recovery hosts might result in storage delays during the replication process. Check on the vSphere Replication Management Server appliance and the vSphere Replication Server appliance. Someone might have shut down the appliance or it might have lost connection.

Transferred Bytes

Corelating the total number of transferred bytes and the number of RPO violations can help you decide on how much bandwidth might be required to meet RPO objectives.

Table 2. Analyzing the Rate of Transferred Bytes and RPO Violations
Graph Values	Probable Cause	Solution
High rate of transferred bytes and high number of RPO violations Low rate of transferred bytes and high number of RPO violations	The network bandwidth might be insufficient to accommodate all replications.	Check the transferred bytes graph and use the drop-down menus to filter the data by virtual machine and time period. To let virtual machines with a lower change rate meet their RPO objectives, you can deactivate the replication on some virtual machines with a high change rate. Increase the network bandwidth for the selected host.
High rate of transferred bytes and a few or no RPO violations Low rate of transferred bytes and a few or no RPO violations	The environment operates as expected.	N/A