A storage path does not fail over when the TUR command repeatedly returns retry requests.

Problem

Typically, when a storage path experiences problems, an ESXi host sends the Test Unit Ready (TUR) command to confirm that the path is down before initiating a path failover. However, if the TUR command is unsuccessful and repeatedly returns a retry operation request (VMK_STORAGE_RETRY_OPERATION), the host continues to retry the command without triggering the failover. Usually, the following errors cause the host to retry the TUR command:
  • SCSI_HOST_BUS_BUSY 0x02
  • SCSI_HOST_SOFT_ERROR 0x0b
  • SCSI_HOST_RETRY 0x0c

Cause

To resolve this issue, you can use the enable|disable_action_OnRetryErrors parameter. When you enable this parameter, the ESXi host can mark the problematic path as dead. After marking the path as dead, the host can trigger the failover and use an alternative working path.

Solution

  1. Set the parameter by running an appropriate command:
    Action Command
    Enable the ability to mark a problematic path as dead # esxcli storage nmp satp generic deviceconfig set -c enable_action_OnRetryErrors -d naa.XXX
    Disable the ability to mark a problematic path as dead # esxcli storage nmp satp generic deviceconfig set -c disable_action_OnRetryErrors -d naa.XXX
  2. Check the status of the parameter by running the following command:
    # esxcli storage nmp device list
    The following example output indicates that the parameter has been enabled:
    naa.XXX
    Device Display Name: DGC Fibre Channel Disk (naa.XXX)
    Storage Array Type: VMW_SATP_CX Storage Array Type Device
    Config: {navireg ipfilter action_OnRetryErrors}

    The enable|disable_action_OnRetryErrors parameter is persistent across reboots.

Solution

You can also set this parameter when configuring an SATP claim rule:

# esxcli storage nmp satp rule add -t device -d naa.XXX -s VMW_SATP_EXAMPLE -P VMW_PSP_FIXED -o enable_action_OnRetryErrors