VMware Tanzu Application Service for VMs (TAS for VMs) deployments make up several layers of high availability to keep your apps running during system failure. These layers include AZs, app health management, process monitoring, and VM resurrection.
Ops Manager supports deploying apps instances across multiple AZs. This level of high availability requires that you define AZs in your IaaS. Ops Manager balances the apps you deploy across the AZs you defined. If an AZ goes down, you still have app instances running in another.
You can configure your deployment so that Diego Cells are created across these AZs in the Assign AZs and Networks pane of the TAS for VMs tile.
If you lose app instances for any reason, such as a bug in the app or an AZ going down, Ops Manager restarts new instances to maintain capacity. Under Diego architecture, the nsync, BBS, and Cell Rep components track the number of instances of each app that are running across all of the Diego cells. When these components detect a discrepancy between the actual state of the app instances in the cloud and the desired state as known by the Cloud Controller, they advise the Cloud Controller of the difference and the Cloud Controller initiates the deployment of new app instances.
For more information about the nsync, BBS, and Cell Rep components, see the nsync, BBS, and Cell Rep section of the TAS for VMs Components topic.
Ops Manager uses a BOSH agent, monit, to monitor the processes on the component VMs that work together to keep your apps running, such as nsync, BBS, and Cell Rep. If monit detects a failure, it restarts the process and notifies the BOSH agent on the VM. The BOSH agent notifies the BOSH Health Monitor, which triggers responders through plugins such as email notifications or paging.
BOSH detects if a VM is present by listening for heartbeat messages that are sent from the BOSH agent every 60 seconds. The BOSH Health Monitor listens for those heartbeats. When the Health Monitor finds that a VM is not responding, it passes an alert to the Resurrector component. If the Resurrector is enabled, it sends the IaaS a request to create a new VM instance to replace the one that failed.
To enable the Resurrector, see the following pages for your particular IaaS: AWS, Azure, GCP, OpenStack, or vSphere.