VSphere Lifecycle Manager

VMware vSphere Lifecycle Manager enables centralized and simplified lifecycle management for ESXi hosts using images and baselines. Lifecycle management refers to the process of installing software, maintaining it through updates/upgrades, and decommissioning it. In the context of maintaining a vSphere environment, the clusters and hosts in particular, lifecycle management refers to tasks such as installing ESXi and firmware on new hosts and updating or upgrading the ESXi version and firmware when required.



  • vSphere Lifecycle Manager uses desired-state model for all lifecycle operations.

    • Monitors compliance drift.

    • Remediates back to the desired state.

  • It is built to manage hosts at the cluster level.

    • Hypervisor

    • Drivers

    • Firmware

  • Modular framework supports vendor firmware plugins.

vSphere Lifecycle Manager Overview:

vSphere Lifecycle Manager is a service that runs in vCenter Server and uses the embedded vCenter Server PostgreSQL database. No additional installation is required to start using that feature. Upon deploying the vCenter Server appliance, the vSphere Lifecycle Manager user interface becomes automatically enabled in the HTML-based vSphere Client.Baselines and baseline groups or Lifecycle Manager images can be leveraged for host patching and host upgrade operations. VM hardware and VMware Tools versioning for VMs are included.Lifecycle Manager can work in an environment that has access to the Internet, directly or through a proxy server. It can also work in a secured network without access to the Internet. In such cases, the Update Manager Download Service (UMDS) can be used to download updates to the vSphere Lifecycle Manager depot, or imported manually.

vSphere Lifecycle Manager Operations:

The basic Lifecycle Manager operations are related to maintaining an environment that is up-to-date and ensuring smooth and successful updates and upgrades of the ESXi hosts.

Operation

Description

Compliance Check

An operation of scanning ESXi hosts to determine their level of compliance with a baseline attached to the cluster or with the image that the cluster uses. The compliance check does not alter the object.

Remediation Pre-Check

An operation that is performed before remediation to ensure the good health of a cluster and that no issues occur during the remediation process.

Remediation

An operation of applying software updates to the ESXi hosts in a cluster. During remediation, software is installed on the hosts. Remediation makes a non-compliant host compliant with the baselines attached to the cluster or with the image for cluster.

Staging

An operation that is available only for clusters managed with baselines or baseline groups. When patches or extensions are staged to an ESXi host, VIBs are downloaded to the host without applying them immediately. Staging makes the patches and extensions available locally on the hosts.

vSphere Lifecycle Manager Depot:

Several components make up Lifecycle Manager and work together to deliver the functionality and coordinate the major lifecycle management operations that it provides. The depot is an important component in the architecture because it contains all software updates that are used to create baselines and images. Lifecycle Manager can only be used if the depot is populated with components, add-ons, base imаges, and legacy bulletins and patches.

Secure Hashing and Signature Verification in vSphere Lifecycle Manager:

vCenter Server performs an automatic hash check on all software that vSphere Lifecycle Manager downloads from online depots or from a UMDS-created depot. Similarly, it performs an automatic checksum verification on all software that is manually imported into the depot. The hash check verifies the sha-256 checksum of the downloaded software to ensure its integrity. During remediation, before vSphere Lifecycle Manager installs any software on a host, the ESXi host checks the signature of the installable units to verify that they are not corrupted or altered during the download.

When an ISO image is imported into the vSphere Lifecycle Manager depot, vCenter Server performs an MD5 hash check on the ISO image to validate its MD5 checksum. During remediation, before the ISO image is installed, the ESXi host verifies the signature inside the image. And, if an ESXi host is configured with UEFI Secure Boot, the ESXi host performs full signature verification of each package that is installed on the host every time the host boots.

vSphere Lifecycle Manager vSAN Integration:

vSAN 7 (U3 and greater) provides support for managing the lifecycle of the vSAN witness host appliance for vSAN stretched cluster and 2-node topologies. Once a stretched cluster or 2-node cluster that meets the criteria is managed by Lifecycle Manager, the witness host appliance is also managed. Hosts and witness appliances are updated in the recommended order to maintain availability.

General Tips for Upgrades/Patching vSphere

Beyond the act of using vSphere Lifecycle Manager, these are some best practices and tips for ensuring success during patching and upgrades. These include technical concepts, as well as people and process improvements.

  • Patching vCenter Server doesn't impact workloads, and vMotion can move workloads seamlessly (infrastructure permitting) so that ESXi can be patched.

  • Ensure the vCenter Server Appliance (VCSA) root and administrator@vsphere.local account passwords are stored correctly and are not locked out. By default, the VCSA root account locks itself after 90 days. Prior to patching, verify these accounts work correctly, recovering the passwords if needed (which may require a restart of vCenter Server), then changing them after patching/upgrading.

  • Ensure that time settings are correct on the appliance. Many issues on systems can be traced to incorrect time synchronization.

  • Ensure that vCenter Server’s file-based backup and restore is configured and generating scheduled output. This can be configured through the Virtual Appliance Management Interface (VAMI) on port 5480/TCP on the VCSA.

  • Take a snapshot of the VCSA prior to the update, and preferably from the ESXi host client after the VCSA has been shut down gracefully and cleanly. Snapshots have performance impacts, so ensure its deletion after the upgrade is verified.

  • If it has been many months since a system has been restarted, it is recommended to restart it as-is, and let it restore to good health. Otherwise, any pre-existing problems become more difficult to troubleshoot.

  • If vSphere HA has been configured with custom isolation addresses (for example, das.isolationaddress) ensure that it is not set the same as the vCenter Server or it could trigger HA failover.

  • Where possible, minimize the number of plugins installed in vCenter Server. Modern zero-trust security architecture practices discourage connecting systems in these ways, to make life harder for attackers. Fewer items installed means fewer compatibility checks are necessary.

  • Minimize additional installed VIBs on ESXi, and use "stock" VMware ESXi versions instead of OEM customized ones, whenever possible. This helps avoid issues with VIB version conflicts that can arise from vendor packages. vSphere Lifecycle Manager makes it easy to add OEM driver packages and additional software.

  • Use Dynamic Resource Scheduler (DRS) groups and affinity rules to keep vCenter Server on a particular ESXi host. Then, when issues arise, the VCSA can be found easily using the ESXi host client. Ensure that a management workstation can get to the host client interface on that ESXi host.

  • Don't forget about Platform Services Controllers (PSCs). They are considered part of vCenter Server and all PSCs that replicate together should be updated before patching vCenter Server. Ensure that NTP, DNS, and all the other considerations above are checked and valid for the PSCs, too.

  • vCenter Server should always be updated before ESXi, so the overall order for vSphere is: PSCs, then vCenter Servers, then ESXi hosts.

  • After updates, clear browser cache to ensure that the latest vSphere Client components download properly.

Introduction to VCF Integration

VCF provides a fully integrated cloud platform, built on software-defined services for compute, storage, networking, security, and cloud management. Products included within Edge Compute Stack packaging align with portions of VCF.

It also provides the following capabilities:

  • Automated deployment and configuration of the VCF components

  • Life cycle management

  • Supports VMs and modern apps

  • Enables the path to the hybrid cloud

VCF can be consumed both in private or public environments. Private clouds are the typical construct for power utilities on the operational technology side of their infrastructure. VCF makes operations fundamentally simpler by deploying a standardized and validated architecture with built-in life-cycle automation for the entire stack. It includes intrinsic security built into every level, from micro-segmentation at the networking layer to encryption at the storage layer.



Upgrades/Patching with VCF

VCF can lifecycle manage all the software it deploys, including vSphere Lifecycle Manager as well as Aria Suite Lifecycle Manager components. This feature makes patching a full SDDC relatively easy. Reducing the complexity of patching operations and the resources required to perform them empowers businesses to update frequently. In turn, the environment becomes more stable and more secure through the application of patches.

VCF allows upgrading from a current version to a target version, with the assurance that dependency mapping has been certified by our engineering teams and the upgrade path has been validated and is released in the form of a bundle. A bundle is a release mechanism used for VCF that includes the update package, descriptor file, and a checksum file. The descriptor file is where dependency mapping happens to ensure that the validated upgrade path is ordered properly.

VMware Update Bundles are one of the most common types of bundles that are downloaded in a VCF environment. These are the bundles that provide an upgrade from version x to version y, or they may be a security patch (more on that soon). These bundles are released on a regular basis and apply to VCF (SDDC Manager, VCF services, and drift remediation), vSphere (vCenter Server and ESXi), NSX, and Aria Suite.

VCF Release versions are categorized as Major, Minor, Maintenance, and Patch. A major release is something like 4.0, a minor release is like 4.1, a maintenance release is like 4.0.1, and a patch release is something like 4.0.0.1.

Third-party updates are the least common and are usually incorporated into the SDDC Manager upgrade itself.

Security updates are also a subset of the Update Bundles, however they are not differentiated into a separate category within SDDC Manager. The Engineering, Architecture, and Product Management teams are keeping a pulse on any security updates affecting the core set of software delivered in VCF. When a critical patch is released for a component of VCF, it provides assurance that cross functionally checks were performed to confirm full stack compatibility.



VCF security patches are then released with the following priority rankings:

CVSS Score

Timelin

Greater than 9

Within 1 week of when patch is released

Greater than 7

Within 6 weeks of when patch is released

Less than 7

Within 3 months of when patch is released, ingested into next major/minor VCF release

CVSS scores ‘Greater than 9’ receive a commitment of patch delivery as quickly as possible.

Upgrading Modern Applications

Kubernetes updates are released on a regular cadence, at a relatively rapid pace, when compared to vSphere environments by themselves. The versioning follows a Major, Minor, and Patch format, with Minor releases occurring every three months. Minor update versions cannot be skipped when upgrading, and downgrades are not possible.

There are a pair of common methods for upgrading Kubernetes clusters:

  1. In-place method using existing installations and machines

    • Difficult to mitigate problems, should they arise

  2. Forklift or rolling upgrade method to create new clusters to migrate workloads to

    • Promotes multi-cluster deployments and validation can occur prior to placing production workloads on new clusters/version of software.

The order of upgrade remains to perform on control plane nodes/clusters first. If performing this within vSphere, using the rolling upgrade method, a new VM is created from the subscribed content library. Once this has been configured and added to the Kubernetes cluster, one of the control plane nodes in the cluster with the older version is tainted with the status of SchedulingDisabled and then removed from the Kubernetes cluster. After it’s removed from Kubernetes, the VM is deleted from vSphere to complete the lifecycle. This process repeats itself until all control plane nodes have been upgraded.

After the control plane has been upgraded, the worker nodes are next. As before, a new VM is created in vSphere and added to the Kubernetes cluster. Before a worker node is removed from a Kubernetes cluster in a rolling upgrade, it gets tainted so no new workloads can be deployed to it. After the taint has been added, the worker node is then flushed. Any containers running is stopped and exited. Kubernetes are responsible for redeploying the containers to other nodes in the cluster to achieve the desired state for the application deployment. Once the node has been flushed, it’s removed from the Kubernetes cluster and then the VM is deleted from vSphere. This process again repeats itself until all worker nodes have been upgraded.

Outside of vSphere, Kubernetes has built-in features called cordon and drain, which can be used in a similar process as described above to upgrade worker nodes/clusters. The ‘cordon’ feature is analogous to vSphere’s ‘Maintenance mode’ which stops any new workloads from being scheduled. A node can then be ‘drained’ or removed, instructing containers to terminate. Care should be taken that replicas are in place prior to workload removal to prevent any undesired outages.

Overall, for Tanzu implementations in vSphere, vCenter needs updating at least once every 9 - 12 months to remain on the supported Kubernetes version. This only requires an update to vCenter though (not the ESXi hosts separately). vSphere with Tanzu is packaged with vCenter and the bits required to update the hosts are also packaged within. The cluster updates do not require a reboot of the ESXi hosts afterward.