VMware Tanzu Application Service for VMs (TAS for VMs) deployments make up several layers of high availability to keep your apps running during system failure. These layers include AZs, app health management, process monitoring, and VM resurrection.
Operations Manager supports deploying apps instances across multiple AZs. This level of high availability requires that you define AZs in your IaaS. Operations Manager balances the apps you deploy across the AZs you defined. If an AZ goes down, you still have app instances running in another.
You can configure your deployment so that Diego Cells are created across these AZs in the Assign AZs and Networks pane of the TAS for VMs tile.
If you lose app instances for any reason, such as a bug in the app or an AZ going down, Operations Manager restarts new instances to maintain capacity. Under Diego architecture, the nsync, BBS, and Cell Rep components track the number of instances of each app that are running across all of the Diego cells. When these components detect a discrepancy between the actual state of the app instances in the cloud and the desired state as known by the Cloud Controller, they advise the Cloud Controller of the difference and the Cloud Controller initiates the deployment of new app instances.
For more information about the nsync, BBS, and Cell Rep components, see the nsync, BBS, and Cell Rep section of the TAS for VMs Components topic.
Operations Manager uses a BOSH agent, monit, to monitor the processes on the component VMs that work together to keep your apps running, such as nsync, BBS, and Cell Rep. If monit detects a failure, it restarts the process and notifies the BOSH agent on the VM. The BOSH agent notifies the BOSH Health Monitor, which starts the responders through plug-ins. For example, email notifications or paging.
BOSH detects if a VM is present by listening for heartbeat messages that are sent from the BOSH agent every 60 seconds. The BOSH Health Monitor listens for those heartbeats. When the Health Monitor finds that a VM is not responding, it passes an alert to the Resurrector component. If the Resurrector is enabled, it sends the IaaS a request to create a new VM instance to replace the one that failed.
To enable the Resurrector, see the following pages for your particular IaaS: AWS, Azure, GCP, OpenStack, or vSphere.