NSX Manager that is deployed on a KVM host returns an error when running CLI commands such as get service and get interface.

Problem

The CLI command get service returns an error. For example,

nsx-manager-1> get service
% An error occurred while processing the service command

Other CLI commands might also return an error. The get support-bundle command indicates that the /tmp directory has become read-only. For example,

nsx-manager-1> get support-bundle file failed-to-get-service.tgz
% An error occurred while retrieving the support bundle: [Errno 30] Read-only file system: '/tmp/tmpHzXF1u'

The /var/log/messages-<timestamp> log has the a message such as the following:

Nov 17 07:26:48 no kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [qemu-kvm:4386]

Cause

One or more file systems on the NSX Manager appliance were corrupted. Some possible causes are documented in https://access.redhat.com/solutions/22621.

To resolve the issue, you can repair the corrupt file systems or perform a restore from a backup.

Solution

  1. Option 1: Repair the corrupt file systems. The following steps are specifically for NSX Manager running on a KVM host.
    1. Run the virsh destroy command to stop the NSX Manager VM.
    2. Run the virt-rescue command in write mode on the qcow2 image. For example,
      virt-rescue --rw -a nsx-unified-appliance-2.0.0.0.0.6522097.phadniss-p0-DK-to-DGo-on-rhel-prod_nsx_manager_1.qcow2
    3. In the virt-rescue command prompt run the e2fsck command to fix the tmp file system. For example,
      <rescue> e2fsck /dev/nsx/tmp
    4. If necessary, run the e2fsck /dev/nsx/tmp again until there are no more errors.
    5. Restart NSX Manager with the virsh start.
  2. Option 2: Perform a restore from a backup.

    For instructions, see the NSX-T Administration Guide.