VMware Cloud offers many of the same resilience features found in local cloud versions of vSphere and Cloud Foundation. This includes snapshots, clones, replication, as well as vSphere High Availability, vMotion, and the Distributed Resource Scheduler (DRS).
Ideas to consider:
- Ensure that workload applications start automatically when the virtual machine boots. This helps immensely for regular and automated patching, but also as part of incident response. For example, if a cloud host fails vSphere HA will restart the workloads on other cluster hosts. If the workloads automatically start the need for off-hours administration work is reduced, pushing it to normal working hours.
- Ensure that workloads spanning multiple virtual machines or containers are resilient to restarts on components, either because of patching or from a vSphere HA automated restart. Applications should employ techniques to retry connections periodically. Use of the NSX Advanced Load Balancer can help make internal application subcomponents more reliable, as well as detect application health and present customized outage pages to customers.
- Use DRS affinity and anti-affinity rules to separate clustered components from each other, reducing the impact of a host failure.