During a TAS for VMs upgrade, BOSH drains all Diego Cell VMs that host app instances. BOSH manages this process by upgrading a batch of cells one at a time.
When BOSH triggers an upgrade, each upgrading Diego Cell enters evacuation mode. In evacuation mode, BOSH stops Diego Cells and then schedules replacements for its app instances.
For more information, see Guidance for Diego Cells in Configuring TAS for VMs for Upgrades.
The cf push
command is mostly available for the duration of a TAS for VMs upgrade. However, cf push
can become unavailable when a single VM is in use or during BOSH Backup and Restore (BBR).
For more information, see cf push availability during TAS for VMs upgrades.
This section describes the order in which VMware Tanzu Operations Manager upgrades components and runs tasks during a full platform upgrade. It also explains how the scale of different TAS for VMs components affects uptime during upgrades, and which components are scalable.
When performing an upgrade, Tanzu Operations Manager first upgrades individual components, and then runs one-time tasks.
Components describes how Tanzu Operations Manager upgrades TAS for VMs components and explains how individual component upgrades affect broader TAS for VMs capabilities.
One-Time Tasks lists the tasks that Tanzu Operations Manager runs after it upgrades the TAS for VMs components.
Tanzu Operations Manager upgrades TAS for VMs components in a fixed order that honors component dependencies and minimizes downtime and other system limitations during the upgrade process.
The type and duration of downtime and other limitations that you can expect during a TAS for VMs upgrade reflect:
Component instance scaling. For more information, see How single-component scaling affects upgrades.
Component upgrade order. For more information, see Component upgrade order and behavior.
In Tanzu Operations Manager, the Resource Config pane of the TAS for VMs tile shows the components that the BOSH Director installs:
Scalable component fields let you select the instance count from a range of settings or enter a custom value.
Unscalable component fields allow a maximum of one instance.
When a component is scaled at a single instance, it can experience downtime and other limitations while the single VM restarts. This behavior might be acceptable for a test environment. To avoid downtime in a production environment, you must scale any scalable components, such as Router, and Diego Cells, to more than one instance.
For more information about how the scale of each component affects upgrade behavior, see Component upgrade order and behavior.
A full Tanzu Operations Manager upgrade can take close to two hours. You have limited ability to deploy an app during this time.
The following table lists components in the order that Tanzu Operations Manager upgrades each. It also lists which components are scalable and explains how component downtime affects TAS for VMs app and control availability. The table includes these columns:
Scalable: Indicates whether the component is scalable above a single instance.
For components marked with a checkmark in this column, VMware recommends that you change the preconfigured instance value of 1
to a value that best supports your production environment. For more information about scaling a deployment, see High Availability in TAS for VMs.
Extended Downtime: Indicates that if there is only one instance of the component, that component is unavailable for up to five minutes during an upgrade.
Downtime Affects…: Indicates the plane of the TAS for VMs platform that component downtime affects, if the component is scaled at single instance:
Other Limitations and Information: Provides:
Component | Scalable | Extended downtime | Downtime affects... | Other limitations and information | |
---|---|---|---|---|---|
Apps | Platform | ||||
NATS | ✓ | ✓ | |||
File storage | ✓ | ✓ | |||
MySQL Proxy | ✓ | ✓ | ✓ | The MySQL Proxy is responsible for managing failover of the MySQL Servers. If the Proxy becomes unavailable, then access to the MySQL Server might be broken. | |
MySQL Server | ✓ | ✓ | ✓ | The MySQL Server is responsible for persisting internal databases for the platform. If the MySQL Server becomes unavailable, then platform services that rely upon a database, such as Cloud Controller and UAA, also become unavailable. | |
Backup Restore node | |||||
Diego BBS | ✓ | ✓ | ✓ | ||
UAA | ✓ | ✓ | If you have an active authorization token before performing an upgrade, you can still log in using either a UI or the CLI. | ||
Cloud Controller | ✓ | ✓ | ✓ | ||
Gorouter | ✓ | ✓ | ✓ | The Gorouter is responsible for routing requests to their app containers. If the Gorouter is not available, then apps cannot receive requests. | |
MySQL Monitor | |||||
Clock Global | ✓ | ||||
Cloud Controller Worker | ✓ | ✓ | |||
Diego Brain | ✓ | ✓ | ✓ | ||
Diego Cell | ✓ | ✓ | ✓ | ✓ | If you only have one Diego Cell, upgrading causes downtime for all apps that run on it. These include apps pushed with cf push and apps automatically installed by TAS for VMs, such as Apps Manager and the App Usage Service. |
Loggregator Traffic controller | ✓ | Operators experience 2-5 minute gaps in logging. | |||
Doppler Server | ✓ | Operators experience 2-5 minute gaps in logging. | |||
TCP Router (if enabled) | ✓ | ||||
CredHub | ✓ | ✓ | ✓ | ✓ | App downtime for apps that use secure credentials. Platform downtime for cf CLI commands such as bind-service and unbind-service applied to services configured with CredHub. |
Istio Control | ✓ | ✓ | |||
Istio Router | ✓ | ✓ | ✓ | Downtime for this component only affects routes on mesh domains. | |
Route Syncer | ✓ | ✓ | ✓ | Downtime for this component only affects routes on mesh domains. |
After Tanzu Operations Manager upgrades components, it performs system checks and launches UI apps and other TAS for VMs components as Cloud Foundry apps. These tasks run in this order:
1 | Apps Manager Errand - Push Apps Manager |
2 | Smoke Test Errand - Run smoke tests |
3 | Usage Service Errand - Push Usage Service app |
4 | Notifications Errand - Push Notifications app |
5 | Notifications UI Errand - Push Notifications UI |
6 | App Autoscaler Errand - Push App Autoscaler |
7 | App Autoscaler Smoke Test Errand - Run smoke tests against App Autoscaler |
8 | Register Autoscaling Service Broker |
9 | Destroy Autoscaling Service Broker |
10 | Bootstrap Errand - Recover MySQL cluster |
11 | MySQL Rejoin Unsafe Errand |