How vSphere Replication Works When Using Guest OS Trim/Unmap Commands

Storage and network bandwidth requirements might increase when using the trim/unmap Guest OS commands with vSphere Replication. You might also observe RPO violations.

Incremental Sync After Using Guest OS Trim/Unmap Commands

Calling the trim/unmap commands might increase the storage consumption on the target site.

After using the trim/unmap commands on the source site disk, the free space available on the disk is added to the data blocks that vSphere Replication transfers to the target site during the next RPO cycle. As a result, when the source site disk is less full, the size of the changed blocks that are transferred to the target site is larger.

For example, if the source site disk is 10 TB, and only 1 TB is allocated, calling the trim/unmap commands results in a transfer of at least 9 TB to the target site.

If the source site disk is 10 TB, 9 TB of which are allocated, and if you delete 2 TB of data, calling the trim/unmap commands results in a transfer of at least 3 TB of data to the target site.

Because of the incremental sync and depending on the RAID configuration defined by the VM storage policy at the target site, the storage consumption by the replicated VM can be more than two times as high as the consumption by the source VM.

Note: If you use the trim/unmap commands at the source site, it is a best practice to configure the replication with an activated network compression to reduce the network bandwidth. See: Replication Data Compression and Configure a Replication.

Note: If you use the trim/unmap commands, and the target datastore is vSAN, to reduce the actual physical storage space consumption at the target site, you must activate deduplication and compression of vSAN. If you do not use deduplication and compression, no storage space is reclaimed at the target site. Even after deduplication and compression, you might still see storage consumption spikes at the target location, but after the sync and the reconciliation, the storage space is freed. For more information about deduplication and compression, see Using Deduplication and Compression.

You cannot see the storage consumption by the replicated VM at the target site. You can only see the overall consumption of the entire vSAN datastore. So, you cannot track the reclaimed storage space at the VM disk level, but you can track it by looking at the overall free space left on the vSAN datastore.

Recovery Point Objective Violations After Using the Trim/Unmap Commands on the Source Virtual Machine

You can call the trim/unmap commands manually or they can be called by the guest OS at certain intervals of time. In both cases, the synchronization after the command might take a significant amount of time.

The usage of the trim/unmap commands to reclaim the unused space on the source virtual machine might generate a large number of changed disk blocks. The synchronization of these changes might take longer than the configured RPO, and vSphere Replication starts reporting RPO violations.

Since the replication is behind the RPO schedule, to synchronize the changed disk blocks, a new incremental sync begins as soon as the synchronization of the previous instance completes. This process of immediate subsequent incremental syncs continues until vSphere Replication creates a replica instance that satisfies the RPO schedule, and does not report an RPO violation. The replication status becomes OK.

Use the Unmap Handling Mode of the vSphere Replication Filter Driver

On ESXi 7.0 Update 3 or later, by default, the vSphere Replication filter driver fails the SCSI Unmap commands during a sync operation, if these commands override the content that is transferred to the target site. The guest OS will retry the command later without impacting the applications that run in the virtual machine. Some guest OS do not like this behavior of the filter driver and might get unresponsive while the sync operation is in progress.

On ESXi 7.0 Update 2 or earlier, there is a different Unmap handling mode of the hbr_filter where the Unmap commands are accommodated by preserving the content that is transferred. Some guest OS behave better in this mode even though the method has some disadvantages:

Additional read and write operations for preserving the overlapping regions which on a slow storage might result in unexpected delays. These delays can cause some guest OS to issue device resets during the sync operation.
Temporarily increased storage space consumption by the preserved disk content.

Prerequisites

On ESXi 7.0 Update 3 or later, you can return to the previous behavior by using the ESXi advanced setting

Procedure

To allow trim/unmap during sync operations, use the following command running on the ESXi host where the virtual machine is working:
```
$ esxcli system settings advanced set -o /HBR/DemandlogFailCollidingUnmap -i 0
```
To disallow trim/unmap during sync operations, use the following command running on the ESXi host where the virtual machine is working:
```
$ esxcli system settings advanced set -o /HBR/DemandlogFailCollidingUnmap -i 1
```