The auto-scaler service within VMware Cloud on AWS monitors the health of your infrastructure. This allows you to focus on the workload knowing that it will handle any failures should they occur. With auto-scaler, you can build a resilient and a high availability SDDC.

Although AWS Infrastructure is reliable, but failures are inevitable. Failures can be anything that can also occur in an on-premises data center, from an individual disk or host failures, to network failures to even more widespread failures that can affect groups of hosts. The AWS Architecture framework reliability pillar discusses design principles for reliability in the cloud. You cannot assume that the cloud infrastructure is infallible and must plan for a failure, and an automatic recovery. VMC on AWS provides a huge benefit by abstracting the underlying infrastructure and letting your applications run in the cloud without modifying the architecture. It monitors the infrastructure, detect failures and automatically remediate the infrastructure when a failure occurs.

Most of the auto-remediation process happens in the background and is carried out without affecting existing workloads. The auto-remediation detects when an error occurs as it always monitors the health of the system. The service can quickly provide hardware into an SDDC. By combining these two capabilities, the service reacts fast to a hardware failure by inserting a new host into your cluster when a fault is detected. In addition, using VMware vSAN, VMs are protected and vSphere HA automatically restarts any VMs which were running on a failed server.

Auto-remediation is part of the VMC Auto-scaler Service. One of the benefits of running your workloads in VMware Cloud on AWS is that VMware manages the platform, including all the infrastructure and management components. VMware also performs regular updates across the SDDC fleet to deliver new features, bug fixes, and software upgrades.

AWS monitors the health of other components such as top-of-rack switches, power supplies, and so on. Failure of these components triggers host failures and auto-remediation handles such failures. The auto-remediation monitor checks:
  • If the host is disconnected.
  • If the host is not responding.
  • If a vSAN disk failure has occurred.
  • Checks for HA FDM agent status.