VMware constantly monitors customer SDDC environments through automation and a team of Site Reliability Engineers (SRE). The following describes processes that VMware automates to ensure the health of SDDCs.
- Orphaned VM(s) Auto-Remediation
- If you use "No data redundancy/VMs w/ FTT=0" as a storage policy, you might experience data loss if there is a failure or if the VM becomes unresponsive. If a failure happens and a VM or VMs become orphaned, VMware performs a cleanup action. You will receive an email notification when this happens.
- vCenter Sessions (Connections) Maxed Out
- If many sessions are created and not cleared, vCenter Server might become inaccessible. Typically this is caused by automation creating a large number of sessions. This generates an automated alert and VMware will restart vCenter Server. You will receive an email notification when this happens.
- vCenter Server Reboot
- A number of different issues might require a reboot of vCenter Server. Some issues might require an immediate reboot for remediation, while others might allow for continued usage with a reboot required in the near future. In the latter case, you will receive an email notification alerting you that a restart will occur in the next 24 hours. After a reboot, ongoing tasks and application connections might need to restart.
- Management Plane (NSX Manager) Restart
- A number of different issues might require a restart of NSX Manager. Some issues might require an immediate reboot for remediation, while others might allow for continued usage with a reboot required in the near future. For the short time while NSX Manager is in the process of restarting, you will not be able to access the SDDC Networking and Security UI. You will not receive an email notification for NSX Manager restart events.
- NSX Edge Failover
- If our monitoring system detects that an NSX Edge (active) is close to becoming unhealthy, we will schedule NSX Edge failover at off-peak hours. This scheduled failover is done as a proactive measure to avoid possible disruption from a failover happening at peak hours. If there is a problem with NSX (active) Edge before the scheduled failover, it will automatically failover. You will receive an email notification if we schedule an NSX Edge failover.
- Single Host SDDC Failure
- The Single Host SDDC starter configuration has no SLA and is appropriate for proof-of-concept or test and development use cases. VMware does not perform any remediation in the event of a Single Host SDDC failure. You will receive an email notification if a Single Host SDDC failure occurs.