NSX Manager that is deployed on a KVM host returns an error when running CLI commands such as get service and get interface.
The CLI command get service returns an error. For example,
nsx-manager-1> get service % An error occurred while processing the service command
Other CLI commands might also return an error. The get support-bundle command indicates that the /tmp directory has become read-only. For example,
nsx-manager-1> get support-bundle file failed-to-get-service.tgz % An error occurred while retrieving the support bundle: [Errno 30] Read-only file system: '/tmp/tmpHzXF1u'
The /var/log/messages-<timestamp> log has the a message such as the following:
Nov 17 07:26:48 no kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [qemu-kvm:4386]
One or more file systems on the NSX Manager appliance were corrupted. Some possible causes are documented in https://access.redhat.com/solutions/22621.
To resolve the issue, you can repair the corrupt file systems or perform a restore from a backup.
- Option 1: Repair the corrupt file systems. The following steps are specifically for NSX Manager running on a KVM host.
- Run the virsh destroy command to stop the NSX Manager VM.
- Run the virt-rescue command in write mode on the qcow2 image. For example,
virt-rescue --rw -a nsx-unified-appliance-22.214.171.124.0.6522097.phadniss-p0-DK-to-DGo-on-rhel-prod_nsx_manager_1.qcow2
- In the virt-rescue command prompt run the e2fsck command to fix the tmp file system. For example,
<rescue> e2fsck /dev/nsx/tmp
- If necessary, run the e2fsck /dev/nsx/tmp again until there are no more errors.
- Restart NSX Manager with the virsh start.
- Option 2: Perform a restore from a backup.
For instructions, see the NSX-T Administration Guide.