check-circle-line exclamation-circle-line close-line

VMware Cloud Foundation 3.8 | 18 JULY 2019 | Build 14172583

VMware Cloud Foundation is a unified SDDC platform that brings together VMware vSphere, vSAN, NSX and optionally, vRealize Suite components, into a natively integrated stack to deliver enterprise-ready cloud infrastructure for the private and public cloud. The Cloud Foundation 3.8 release continues to expand on the SDDC automation, VMware SDDC stack, and the partner ecosystem.

NOTE: VMware Cloud Foundation 3.8 must be installed as a new deployment or upgraded from VMware Cloud Foundation 3.7.2. For more information, see Installation and Upgrade Information below.

What's in the Release Notes

The release notes cover the following topics:

What's New

The VMware Cloud Foundation 3.8 release includes the following:

  • Support for Automated Upgrades of vRealize Log Insight, vRealize Operations Manager, and vRealize Automation through vRealize Lifecycle Manager: Enables automated upgrade support for vRealize Log Insight, vRealize Operations Manager, and vRealize Automation components managed by VMware Cloud Foundation.
  • Automated vRealize Operations Manager Cluster Expansion in SDDC Manager: Provides the ability to scale out the vRealize Operations Manager analytics cluster to handle any growing need of more resources.
  • Support for Automated Upgrade for NSX-T: Provides support to SDDC Manager for the automated patching and upgrading of the NSX-T components deployed by VMware Cloud Foundation.
  • Public APIs for VMware Cloud Foundation: Provides support for the public APIs to create and manage workload domains. The users can now invoke the VMware Cloud Foundation public APIs to create, delete, and get properties on the workload domains, clusters, network pools, hosts, license keys, and tasks.
  • SSO Management Domain Convergence: Provides the ability to link between the SSOs (PSCs) of two or more VMware Cloud Foundation instances so that the management and the VI workload domains are visible in each of the instances.
  • BOM Updates for the 3.8 Release: Provided a BOM update for various VMware Cloud Foundation components, including ESXi, vCenter, vSAN, NSX, NSX-T, and vRealize Suite.

Cloud Foundation Bill of Materials (BOM)

The Cloud Foundation software product is comprised of the following software Bill-of-Materials (BOM). The components in the BOM are interoperable and compatible.

Software Component Version Date Build Number
Cloud Builder VM 2.1.0.0 18 JUL 2019

14172583

SDDC Manager 3.8.0 18 JUL 2019

14172583

VMware vCenter Server Appliance vCenter Server 6.7 Update 2c 16 JUL 2019

14070457

VMware vSphere (ESXi) ESXi670-201906002 20 JUN 2019

13981272

VMware vSAN

6.7 Express Patch 10

20 JUN 2019

13805960

VMware NSX Data Center for vSphere 6.4.5 18 APRIL 2019

13282012

VMware NSX-T Data Center 2.4.1 21 MAY 2019

13716575

VMware vRealize Suite Lifecycle Manager 2.1 Patch 1 06 JUN 2019

13685821

VMware vRealize Log Insight 4.8 11 APR 2019 13036238
vRealize Log Insight Content Pack for NSX for vSphere 3.8 n/a n/a
vRealize Log Insight Content Pack for Linux 1.0 n/a n/a
vRealize Log Insight Content Pack for vRealize Automation 7.3+ 2.1 n/a n/a
vRealize Log Insight Content Pack for vRealize Orchestrator 7.0.1+ 2.0 n/a n/a
vRealize Log insight Content Pack for NSX-T 3.2 n/a n/a
VSAN Content Pack for Log Insight 2.0 n/a n/a
vRealize Operations Manager 7.5 11 APR 2019 13165949
vRealize Automation 7.6 11 APR 2019 13027280
Horizon 7 7.7.0 13 DEC 2018 11038474

Note: 

  • vRealize Log Insight Content Pack for vRealize Automation is installed during the deployment of vRealize Automation.
  • vRealize Log Insight Content Pack for vRealize Orchestrator is installed during the deployment of vRealize Automation.
  • vRealize Log Insight Content Pack for NSX-T is installed alongside the deployment of the first NSX-T workload domain.
  • VMware Solution Exchange and the vRealize Log Insight in-product marketplace store only the latest versions of the content packs for vRealize Log Insight. The software components table contains the latest versions of the packs that were available and automation at the time VMware Cloud Foundation released. When you deploy the VMware Cloud Foundation components, it is possible that the version of a content pack within the in-product marketplace for vRealize Log Insight is newer than the one used for this release.

VMware Software Edition License Information

The SDDC Manager software is licensed under the Cloud Foundation license. As part of this product, the SDDC Manager software deploys specific VMware software products.

The following VMware software components deployed by SDDC Manager are licensed under the Cloud Foundation license:

  • VMware ESXi
  • VMware vSAN
  • VMware NSX Data Center for vSphere

The following VMware software components deployed by SDDC Manager are licensed separately:

  • VMware vCenter Server
    NOTE Only one vCenter Server license is required for all vCenter Servers deployed in a Cloud Foundation system.
  • VMware vRealize Automation
  • VMware vRealize Operations
  • VMware vRealize Log Insight and content packs
    NOTE Cloud Foundation permits limited use of vRealize Log Insight for the management domain without purchasing full vRealize Log Insight licenses.

For details about the specific VMware software editions that are licensed under the licenses you have purchased, see the Cloud Foundation Bill of Materials (BOM) section above.

For more general information, see VMware Cloud Foundation.

Supported Hardware

For details on vSAN Ready Nodes in Cloud Foundation, see VMware Compatibility Guide (VCG) for vSAN and the Hardware Requirements section in the VMware Cloud Foundation Planning and Preparation Guide.

Documentation

To access the Cloud Foundation 3.8 documentation, go to the VMware Cloud Foundation product documentation.

To access the documentation for VMware software products that SDDC Manager can deploy, see the product documentation and use the drop-down menus on the page to choose the appropriate version:

Browser Compatibility and Screen Resolutions

The Cloud Foundation web-based interface supports the following web browsers:

  • Google Chrome: Version 75.x or 74.x
  • Internet Explorer: Version 11
  • Mozilla Firefox: Version 67.x or 66.x

For the Web-based user interfaces, the supported standard resolution is 1024 by 768 pixels. For best results, use a screen resolution within these tested resolutions:

  • 1024 by 768 pixels (standard)
  • 1366 by 768 pixels
  • 1280 by 1024 pixels
  • 1680 by 1050 pixels

Resolutions below 1024 by 768, such as 640 by 960 or 480 by 800, are not supported.

Installation and Upgrade Information

You can install Cloud Foundation 3.8 as a new release or upgrade from VMware Cloud Foundation 3.7.2.

In addition to the release notes, see the VMware Cloud Foundation Upgrade Guide for information about the upgrade process.

Installing as a New Release

The new installation process has three phases:

Phase One: Prepare the Environment

The VMware Cloud Foundation Planning and Preparation Guide provides detailed information about the software, tools, and external services that are required to implement a Software-Defined Data Center (SDDC) with VMware Cloud Foundation, using a standard architecture model.

Phase Two: Image all servers with ESXi

Image all servers with ESXi 6.7 U2 and update to ESXi 6.7 EP 10 (Build 13644319). See Knowledge Base article 58715 Virtual Machines running on VMware vSAN 6.6 and later report guest data consistency concerns following a disk extend operation for details. See also the VMware Cloud Foundation Architecture and Deployment Guide for information on installing ESXi.

Phase Three: Install Cloud Foundation 3.8

Refer to the VMware Cloud Foundation Architecture and Deployment Guide for information on deploying Cloud Foundation.

Upgrade to Cloud Foundation 3.8

You can upgrade to Cloud Foundation 3.8 only from 3.7.2. If you are at a version earlier than 3.7.2, refer to the 3.7.2 Release Notes for information on how to upgrade from the prior releases.

For information on upgrading to 3.8, refer to the VMware Cloud Foundation Upgrade Guide.


 

 

Resolved Issues

  • SSL Certificate Replacement for vCenter breaks vRealize Operations data collection

    After replacing the certificate for the vCenter Server component, both the vCenter and vSAN components in vRealize Operations Manager report a "Collection failed" error message. Testing the connection and attempting to accept the new certificate returns additional error messages: Unable to establish a valid connection to the target system. Adapter instance has been configured to trust multiple certificates, when only one is allowed. Please remove any old, unneeded certificates and try again.

    Workaround: If you encounter this issue, use the following procedure to resolve the situation.

    1. Delete the current vCenter and vSAN adapters.
    2. Re-create them using the same configuration and credentials originally set by vRealize Operations.
    3. Test the connection, accept the new certificates, and save the configuration.
  • The bring-up and the Virtual Infrastructure workload domain workflows fail at VM deployments if any hosts are in maintenance mode

    No operation checks for host maintenance mode state. As a result, NSX controller deployments fail. This is expected because vSAN default policy requires a minimum of three ESXi nodes to be available for deployment.

    Workaround: If you encounter this error, do the following:

    1. Through either VMware vCenter or the esxcli utility, take the affected hosts out of maintenance mode:
      esxcli system maintenanceMode set -e 0
    2. Restart the failed workflow.
  • Unable to cancel JSON Spec Validations.

    This is observed during the Bringup process when there is an error in the JSON file. The user is unable to cancel the JSON validation, but it might take up to five minutes if there's no connectivity to the ESXi hosts

    Workaround: There is no workaround to enable the desired cancellation. However, if this occurs, after the validation fails, review the JSON for syntax or other errors. Correct these errors and try again.

  • The PSC and VC update fails if the ping to the NTP servers is disabled.

    The VC upgrade cannot check the NTP time with the server.

    Workaround: Have the NTP server without any security and ensure that it works correctly on the standard NTP port.

  • You cannot log in to vROPS UI after the API password rotation.

    Description required

    Workaround:

    1. Whenever the user updates or rotates a password for vRealize Operations Manager, the user has to manually log in to vRealize Log Insight and navigate to Administration > Vrealize Operations > Update Password..
    2. If the user forgets step 1, vRealize Log Insight tries to connect with the wrong credentials and after certain failed attempts account gets locked. To unlock the login to vRealize Operations Manager UI, update vRealize Log Insight.
  • The workflow fails during the vCenter deployment If you provide an IP address that is in use already.

    The duplicate IP address detection happens in later stages.

    Workaround: Ensure that the IP addresses that are given are unused ones.In case of duplicate IP addresses, the  workflow input needs to be modified to get the task running on a retry.

  • During VMware Cloud Foundation bring-up, the transport zone replication mode is set to Hybrid but no multicast address is set.

    A requirement in NSX deployment is that the transport zone replication mode is set to Hybrid with proper multicast address set. But in this issue, no multicast address is set.

    Workaround:

    1. From the vSphere client, launch Networking and Security.
    2. Click Installation and Upgrade and navigate to Logical Network Settings.
    3. Click Edit for the Segment IDs.
    4. Turn on Multicast addressing.
    5. For Multicast addresses type in the range 239.1.0.0-239.1.255.255.
    6. Ensure that the IGMP snooping is enabled on the ToR physical switch and an IGMP the querier is available.
       
  • The Operation Manager Upgrade failed at timeout

    This issue is seen in the below scenario:

    1. Restart the Operation Manager.
    2. The Resource Aggregator inside Operation Manager makes the API calls to NSX Manager to build the cache at the start of the Operation       Manager.
    3.The NSX manager does not respond to the API calls because of its state or services are down.
    4. The Operations Manager restart is hung.

    Workaround: Restart NSX Manager.

  • NSX-T workload domain deployment fails if the host names have upper case characters

    Deploying a NSXT WLD using hosts with hostname having uppercase characters, fails with the following error message -

    Message: Unable to get input for VDS to N-VDS migration

    Remediation Message: Reference Token: 602SME Cause: Type: java.lang.IllegalArgumentException Message: Unable to find transport node with name <host name>

     

    Workaround:

    1. To continue from the failed task -
           - Log in to NSX-T Manager and edit the transport node name to lower case.
           - Update the DNS record and the host name of the hosts.
           - Retry the failed task.
    2. To deploy a new domain, use the lower case characters for the host name

  • Stretch cluster fails when the VSAN policy is reapplied

    This issue is observed on setups where the VSAN policy fails to apply in case of NSX EDGE VMs.

    Workaround: Disable reconfigure method of such VMs and restart workflow. Once the workflow is completed, enable the reconfigure method.

  • Unable to configure the fabric compute managers during the NSX-T workload creation

    POST : https://10.0.0.50/api/v1/fabric/compute-managers fails with the socket timeout when tried with NSX-T SDK client.

    Retry the workflow.

  • The "Failed to deploy NSX-T Controller" error message displays when NSX-T workload domain is created

    In of the tasks of NSX-T deployment the task name says "Deploy additional NSX-T Managers". But the progress message says "Failed to deploy NSX-T Controller". There are no controllers in the current release of NSX-T.

    Workaround: None

  • The stretch cluster specification validation fails

    This issue occurs when the hosts provided for the the stretch cluster expansion are from different network pools.

    Workaround: During the expansion of stretch cluster, provide hosts that are a part of the same network pool.

  • The certificate rotate operation fails during the installation phase

    The replacement of the certificate fails intermittently. The Apply certificate API of NSX_T returns the 500 http error code intermittently during the certificate replacement.

    Workaround: Retry the certificate replacement operation by using the same or the different CA.

  • The NSX-T Install Bundle generated as part of vcf-lcm-packages should consume the vRealize Log Insight Content Pack from the artifactory

    In the vRealize Log Insight UI, under Content Packs for NSX-T, you may see that there is an update available.

    Workaround: None. You can go ahead and update the content pack.

  • The host entries are still present on the  NSX Manager transport nodes list even after the successful removal of the host from the cluster.

    On the NSX Manager, under System->Fabric->Transport nodes list, the deleted ESXi hosts show up.

    Workaround: Clean up the entries manually through the UI.

  • The add host process fails while adding a host in the NSX-T workload domain if you have created a port group a name of your choice

    When one creates a port group in DVS that does not have any uplink for the traffic communication, then there is an issue in determining the corresponding segment during the add host workflow. Hence the workflow fails.

    Workaround: Delete the user-created port groups from DVS and rerun the failed workflow.

  • The SOS log collection failing for vRealize Suite Lifecycle Manager with the" Fail to create vRealize Suite Lifecycle Manager support bundle" error

    The SoS log collection fails for vRealize Suite Lifecycle Manager after the password is rotated or updated for the vRealize products (vRealize Automation/vRealize Operations).

    Workaround: When you rotate the vRealize Operations/vRealize Automation password (SSH or API), you have to log in to the vRealize Lifecycle Manager UI, delete the existing vRealize Operations environment, and re-create it.

  • The NSX-T deployment fails at the task of deploying the second NSX-Manager

    The NSX-T deployment fails with the "The specified static ip address '<ip_address>' is already in use by another machine" error.

    Workaround: Manually delete the second NSX manager and retry the task.

  • The VMware Cloud Foundation Cloud Builder fails during bring-up when no vCenter License key is supplied in the Excel file

    If you do not supply a license key in the excel file that can be used by the Cloud Builder during the bring-up process, the deployment will fail while renaming the applied license key in the "Deploy and configure vCenter Server" task.

    Workaround: None

  • VMware Cloud Foundation workflows fail due to missing images

    After restore of the SDDC Manager VM, there are no bundles/images in the VM. Due to that, VMware Cloud Foundation workflows that consume the images will fail.

    Workaround: Download the NSX-T bundle again from SDDC Manager.

  • After the certificate rotation, the Horizon workload domain creation or expansion fails at the create local user operation in Platform Services Controller

    After the certificate rotation, the Horizon workload domain creation or expansion fails at the create local user operation in the Platform Services Controller task.

    Workaround: Log in to SDDC Manager and restart the solution manager service.

  • A warning is displayed on the validation page in the UI in case a host has an IP address ending with 0.

    This issue also occurs in some cases where it is a valid IP address and should be validated accordingly. The Configuration File Validation page displays a warning under "Host and IP DNS Records" category if an ESXi host has an IP address ending with 0. This would be a valid warning if a /24 subnet is used, but if a /22 subnet is used, this is not an issue and no warning should exist. 

    Workaround: None

  • The NFS NSX-T domain creation fails at "Unable to create transport node collection"

    When you try to create a VI workload domain using NSX-T and NFS, the task fails with the message "Unable to create transport node collection".

Known Issues

The known issues are grouped as follows.

Bringup Known Issues
  • Clicking the help icon in the bring-up wizard opens the help for an older release.

    The help icon in the bring-up wizard links to the product help for an older version.

    Workaround: To open the help topic on deploying Cloud Foundation, perform the following steps:

    1. In a browser window, navigate to docs.vmware.com.

    2. Click Browse All Products and then click VMware Cloud Foundation.

    3. Click VMware Cloud Foundation Architecture and Deployment Guide.

    4. In the left navigation pane, click Deploying Cloud Foundation.

  • Cloud Foundation Builder fails to initiate with the "[Admin/Root] password does not meet standards" message

    When configuring the Cloud Foundation Builder admin and root passwords, the format restrictions are not validated. As a result, a user may create a password that does not comply with the restrictions. As a result, Cloud Foundation Builder will fail upon initiation.

    Workaround: When configuring the Cloud Foundation Builder, ensure that the password meets the following restrictions:

    • Minimum eight characters long.
    • Must include both uppercase and lowercase letters
    • Must include digits and special characters
    • Must not include common dictionary words
  • The bring-up process fails at task disable TLS 1.0 on the vRealize Log Insight nodes

    The bring-up fails at the task disable TLS 1.0 on the  vRealize Log Insight nodes with the following error Connect to 10.0.0.17:9543 [/10.0.0.17] failed: Connection refused (Connection refused). This issue has been observed in the slow environments after restarting a vRealize Log Insight node. The node does not start correctly and its API is not reachable.

    Workaround: Use the following procedure to work around this issue.

    1. Restart the failed bring-up execution in the Cloud Foundation Builder VM and open the bring-up logs.
      This will retry the failed the bring-up task which might still fail on the initial attempt. The log shows an unsuccessful connection to the vRealize Log Insight node.
    2. While bring-up is still running, use SSH to log in to the vRealize Log Insight node that is shown as failed in the bring-up log.
    3. Run the following command to determine the connection issue.
      loginsight-node-2:~ # service loginsight status
      It should confirm that the daemon is not running.
    4. Execute the following command:
      loginsight-node-2:~ # mv /storage/core/loginsight/cidata/cassandra/data/system ~/cassandra_keyspace_files
    5. Reboot the vRealize Log Insight node.
    6. Confirm that it is running.
      loginsight-node-2:~ # uptime
      18:25pm up 0:02, 1 user, load average: 3.16, 1.07, 0.39
      loginsight-node-2:~ # service loginsight status
      Log Insight is running.

    In a few minutes, the bring-up process should successfully establish a connection to the vRealize Log Insight node and proceed.

  • The Cloud Foundation Builder VM remains locked after more than 15 minutes.

    The VMware Imaging Appliance (VIA) locks out the user after three unsuccessful login attempts. Normally, the lockout is reset after fifteen minutes but the underlying Cloud Foundation Builder VM does not automatically reset.

    Workaround: Using SSH, log in as admin to the Cloud Foundation Builder VM, then switch to root user. Unlock the account by resetting the password of the admin user with the following command.
    pam_tally2 --user=<user> --reset

  • Validation fails on SDDC Manager license.

    During the bringup process, the validation for the SDDC Manager license fails.

    Workaround: Enter a blank license and proceed. You can enter the correct license value later in the process.

  • During bring-up, the component sheet incorrectly lists the wrong NSX version.

    During the bring-up process, detailed information of the components to be deployed are displayed. This display incorrectly shows NSX version 6.4.3 but this is incorrect. The actual version being deployed is NSX 6.4.4.

    Workaround: None

  • Cloud Foundation Builder: Restart of imaging service fails.

    During the host imaging operation, the imaging service (imaging.service) fails when restarted.

    Workaround: If you encounter this issue, perform the following procedure:

    1. Stop the imaging service in the SDDC Manager VM.
      systemctl stop imaging.service
    2. Stop all processes related to imaging.
      ps - ef | grep imag
      kill <process_number>
    3. Sleep the system for five seconds.
      sleep 5
    4. Start the imaging service.
      systemctl start imaging.service
      It should restart correctly.
  • NSX-V Workload domain creation may fail at NSX-V Controller deployment

    The workload domain creation workflow may fail at NSX-V Manager deploying the controllers to the new domain. The user must reboot each of the ESXi hosts in the cluster of the new domain, and restart the workflow. The user will have an unusable domain until then.

    Workaround: The user must reboot each of the ESXi hosts in the cluster of the new domain and restart the workflow.

Upgrade Known Issues
  • The vCenter upgrade operation fails on the management domain and workload domain

    vCenter fails to be upgraded because lcm-bundle-repo nfs mount on the host is inaccessible.

    Workaround: Remove and remount the SDDC Manager NFS datastore on the affected ESXi hosts. Use the showmount command to check if all hosts are displayed in the SDDC manager mount list.

  • The Lifecycle Manager page displays that the update is available even after the upgrade is done

    After successful Lifecycle Manager upgrade of vRealize Automation, vRealize Operation Manager, vRealize Log Insight, NSX, vCenter and any other VMware Cloud Foundation component other than SDDC Manager, the Lifecycle Manager UI continues to show that this upgrade (the one that is just finished) is available for a few minutes even after you refresh the browser session.

    Workaround: After the successful Lifecycle Manager upgrade of the Vmware Cloud Foundation component, the client has to wait between 2-5 minutes until the upgrade button disappears. This button should not be clicked until then.

  • Even after applying the VMware Cloud Foundation update, the bundle status reflects to future ( pending list)

    This issue occurs if the user downloads and uploads the bundles by using the marker file.

    Workaround: Ignore the bundle that shows as future. Run /opt/vmware/sddc-support/sos --get-vcf-summary|grep "SDDC Version" to verify that SDDC Manager is updated to 3.8.0.0.

  • The  download of the Lifecycle Manager bundles from depot.vmware.com using bundle transfer utils fails with the 'No space left on device" error

    This issue occurs because the bundles get downloaded to the tmp directory and they move to the provided directory path later. But if the tmp directory doesn't have sufficient space, the download fails with the  'No space left on device' error.

    Workaround: Increase the size of the tmp directory.

  • The NSX-T upgrade fails with the UPGRADE_TIMEDOUT status while upgrading the NSX-T host clusters.

    While upgrading the host clusters which are part of NSX-T Fabric, the NSX-T upgrade can time out if the hosts in the cluster are overloaded or when the cluster size is high.

    Workaround: Add nsxt.upgrade.hostcluster.timeout property in the Lifecycle Manager properties and set it to an appropriate level in millisecs.

    For example:
    nsxt.upgrade.hostcluster.timeout=72000000 (sets it to 20 hours).

  • The NSX-T upgrade fails with the COMPLETED_WITH_FAILURE status while retrying a failed upgrade

    When multiple setup issues cause the upgrade attempts to fail, even if the root cause is resolved, the upgrade could still fail at the NSX_T_UPGRADE_STAGE_SET_UPGRADE_PAYLOAD stage. This is because, NSX-T in the background changes the UC location.

    Restart Lifecycle Manager on the SDDC Manager VM as root (systemctl restart lcm).

  • The NSX-T upgrade fails at NSX_T_PERFORM_BACKUP during the VMware Cloud Foundation 3.7.2 to 3.8 migration upgrade through SDDC Manager Lifecycle Manager

    Before the SDDC Manager upgrade, the NSX-T backup is configured with SDDC Manager 3.7.2 and the SDDC Manager fingerprint is registered with NSX-T. After the SDDC Manager upgrade, the new SDDC Manager 3.8 fingerprint is different than that of the previously registered SDDC Manager 3.7.2.

    Workaround:

    1. Log in to the NSX-T manager UI.
    2. Navigate to System -> Backup & Restore.
    3. Click Edit.
    4. Enter password for the backup user.
    5. Remove the SSH fingerprint entry.
    6. Click Save. The fingerprint is applied automatically.
    7. Login to SDDC Manager.
    8. Retry the NSX-T upgrade through SDDC Manager Lifecycle Manager.

  • The NSX-T Host cluster upgrade may fail with the COMPLETED_WITH_FAILURE status and any retry will fail at the same stage

    The upgrade fails with the COMPLETED_WITH_FAILURE status and you may see 'Unable to migrate VM, generic vm fault' on vCenter. NSX-T could put a host in a maintenance mode which needs to be exited before an upgrade attempt can be made.

    Workaround: Use the https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.4/upgrade/GUID-F77D031D-871B-467D-AC72-42ABF0122443.html KB to exit the host from the transport node maintenance mode and retry the upgrade.

  • During the upgrade, all the VMware Cloud Foundation NSX-T workflows including the vRealize Log Insight enablement for NSX-T workload domains are blocked

    This issue is seen when the VMware Cloud Foundation systems are upgraded to the 3.8 version while NSX-T is still not upgraded to the 2.4.1 version. The NSX-T install bundle is also not downloaded.

    Workaround: Once you upgrade VMware Cloud Foundation to the 3.8 version, you must also upgrade NSX-T to the 2.4.1 version. Download the NSX-T Install bundle for the  2.4.1 version and retry the operation.

  • The vRealize Automation upgrade reports the "Precheck Execution Failure : Make sure the latest version of VMware Tools is installed" message

    This is a vRealize Lifecycle Manager pre-upgrade check that happens when the upgrade through vRealize Lifecycle Manager is triggered.

    Workaround: Upgrade VMware Tools on the vRealize Automation Iaas nodes.

  • The Lifecycle Manager initiated vRealize Automation upgrade fails

    Once the vRealize Automation upgrade from VMware Cloud Foundation fails with the "vRA IaaS Upgrade Failed" error , the user does not have an option to complete the failed upgrade from VMware Cloud Foundation Lifecycle Manager.

    Workaround:

    1) On vRealize Lifecycle Manager, find the failing request.

    2) Retry the failing request. 
        The vRealize Lifecycle Manager upgrade workflow continues from the point of failure and completes successfully.
        The vRealize Automation environment details shows that the vRealize Automation version is upgraded to 7.6.0.

    3) Retry the failed vRealize Automation upgrade in SDDC Manager.This will validate the vRealize Automation current version and health of the vRealize Automation environment and will mark the upgrade flow that is completed successfully in Lifecycle Manager.
     

  • Error upgrading vRealize Automation

    Under certain circumstances, upgrading vRealize Automation may fail with a message similar to:

    An automated upgrade has failed. Manual intervention is required.
    vRealize Suite Lifecycle Manager Pre-upgrade checks for vRealize Automation have failed:
    vRealize Automation Validations : iaasms1.rainpole.local : RebootPending : Check if reboot is pending : Reboot the machine.
    vRealize Automation Validations : iaasms2.rainpole.local : RebootPending : Check if reboot is pending : Reboot the machine.
    Please retry the upgrade once the upgrade is available again. 

    1. Log-in into the first VM listed in the error message using RDP or the VMware Remote Console.
    2. Reboot the VM.
    3. Wait 5 minutes after the login screen of the VM appears.
    4. Repeat steps 1-3 for the next VM listed in the error message.
    5. Once you have restarted all the VMs listed in the error message, retry the vRealize Automation upgrade.

  • When there is no associated workload domain to vRealize Automation, the VRA VM NODES CONSISTENCY CHECK upgrade precheck fails

    This upgrade precheck compares the content in the logical inventory on the SDDC Manager and the content in the vRealize Lifecycle Manager environment. When there is no associated workload domain, the vRealize Lifecycle Manager environment does not contain information about the iaasagent1.rainpole.local iaasagent2.rainpole.local nodes. Therefore the check fails.

    Workaround: None. You can safely ignore a failed VRA VM NODES CONSISTENCY CHECK during the upgrade precheck. The upgrade will succeed even with this error.

  • When the user tries to deploy vRealize Automation after upgrading VMware Cloud Foundation from 3.7.2 to 3.8 without upgrading vRealize Lifecycle Manager, the deployment fails

    There might be cases where an older version of vRealize Lifecycle Management is already present in the system. This can happen if it was deployed in a previous release and hasn't been upgraded as part of upgrading to the new VMware Cloud Foundation release. In such cases, the older version of vRealize Lifecycle Manager may not support deploying the newer version of vRealize Automation/vRealize Operations Manager in the new release resulting in the failed deployment.

    Workaround:

      - Rollback the failed deployment.
      - Upgrade vRealize Lifecycle Manager.
      - Deploy vRealize Automation.

  • The vRealize Operations upgrade fails at the vRealize upgrade prepare backup step

    When the vRealize Operations cluster is intensively used, the process of taking it offline in order to prepare snapshots as a backup could take a long time and therefore, may hit a timeout in SDDC Manager.

    Workaround: Prepare the backup manually following the steps:

    1. Take vRealize Operations cluster offline
      1. Log in to the master node vRealize Operations Manager administrator interface of your cluster.
      2. On the main page, click System Status. Click Take Offline under Cluster Status.
      3. Wait until all nodes in the analytics cluster are offline.
    2. Take a snapshot of each node so that you can roll the update back if a failure occurs.
      1. Log in to the Management vCenter Server.
      2. Take a snapshot of each node in the cluster. Right-click the virtual machine and select Snapshot > Take Snapshot. Use vROPS_LCM_UPGRADE_MANUAL_BACKUP as a prefix in the snapshot name for each virtual machine.
      3. Once this is done, retry the vRealize Operations upgrade through the SDDC Manager UI.
  • During the expansion of the vRealize Operations cluster, the API calls targeted to the new node hang if the vRealize Operations cluster is upgraded from version 7.0 to version 7.5, as part of the VMware Cloud Foundation upgrade to version 3.8

    The expansion of the vRealize Operations will fail, unless during the expansion process the Apache service on the new node VA is restarted.

    Workaround:
    1) Trigger the expansion and monitor the deployment of the new node VA
    2) Once the new node VA is deployed, log into the vRealize Operations UI and wait for the new node to appear for configuration under the cluster management tab. Then log into the new node (with the password provided during the expansion input or use the default master node password if a new password is not provided) and execute the command:

      tail -f /var/log/casa_logs/casa.log

    3) During the expansion, the new node VA reboots itself. The SSH session will disconnect from the VA upon that event.
    4) Log in again after the successful reboot of the VA (the reboot process can be monitored in the vSphere UI), wait for about 5 minutes and execute:

         service apache2 restart

  • The operations manager component fails to come up after RPM upgrade.

    After manually upgrading the operations manager RPM to the latest version, the operationsmanager fails to come up. The system returns INFO-level message: Waiting for changelog lock... This is likely caused by overlapping restarts of the service preventing any from succeeding. This can happen to any service (e.g. ) which is exercising liquibase, such as commonsvcs.

    Workaround: Clean the databasechangeloglock table from the database.

    1. Log in to the SDDC Manager VM as admin user "vcf".
    2. Enter su to switch to root user.
    3. Run the following commands:
      1. In the postgres command prompt, open the password manager to delete the database change log lock:
        # psql -h /home/postgresql/ -U postgres -d password_manager -c "delete from databasechangeloglock"
      2. Restart the operationsmanager component:
        # systemctl restart operationsmanager
      3. Verify the operationsmanager is running:
        # curl http://localhost/operationsmanager/about
        It should return something like:
        {"id":"2cac9b7c-545f-4e6d-a66-6f81eef27601","name":"OPERATIONS_MANAGER",
        "version":"3.1.0-SNAPSHOT-9580592","status":"ACTIVE","serviceUrl":
        "http://127.0.0.1/operationsmanager","description":"Operations Manager"}
  • You cannot download all the bundles at once when you use the lcm-bundle-transfer util tool

    For downloading the 3.8 bundles, follow the below steps:

    1. Use the bundle transfer-util tool from 3.7.2 and download the VMware Cloud Foundation bundle with Lifecycle Management.

    2. Transfer this bundle to SDDC Manager.

    3.  Apply the bundle downloaded in Step 1.

    4.  Copy bundle-transfer-util from SDDC manager again and download the remaining bundles.

    5. Transfer the remaining bundles and upload to SDDC Manager

    Workaround: None

  • The name of the SDDC Manager VM changes from "sddc manager" to "sddc-manager"

    During the VMware Cloud Foundation 3.7.2 version to 3.8 bundle 2 version upgrade, the SDDC Manager VM name changes from the existing name to  "sddc-manager". This does not affect the functionality of VMware Cloud Foundation.

    Workaround: None. If you want to keep the existing VM name, contact the VMware support.

  • The config drift upgrade bundle fails constantly

    The config drift upgrade bundle fails if there are any non responding hosts in the free pool.

    Workaround: Decommission the non responding hosts and retry the available upgrade.

vRealize Integration Known Issues
  • vRealize Operations in vRealize Log Insight configuration fails when vRealize Operations appliances are in a different subdomain

    During vRealize Suite deployment in Cloud Foundation, the user provides FQDN values for vRealize load balancers. If these FQDNs are in a different domain than the one used during initial bringup, the deployment may fail.

    Workaround: To resolve this failure, you must add the vRealize Operations domain to the configuration in the vRealize Log Insight VMs.

    1. Log in to the first vRealize Log Insight VM.
    2. Open the /etc/resolv.conf file in a text editor, and locate the following lines:
      nameserver 10.0.0.250
      nameserver 10.0.0.250
      domain vrack.vsphere.local
      search vrack.vsphere.local vsphere.local 
    3. Add the domain used for vRealize Operations to the last line above.
    4. Repeat on each vRealize Log Insight VM.
  • SoS network cleanup utility fails on the hosts in two VDS switch configuration.

    This utility is not supported in a two VDS switch configuration in the current release.

  • Certificate replacement for the vRealize Automation component fails with 401 error

    Certificate replacement for the vRealize Automation component fails due to a 401 unauthorized error with the message "Importing certificate failed for VRA Cafe nodes." This issue is caused by a password lockout in the vRealize Automation product interface. For example, independently of Cloud Foundation, a user tried to log in to vRealize Automation with the wrong credentials too many times, causing the lockout.

    Workaround: The lockout period lasts for thirty minutes, after which the certification replacement process can succeed.

  • vRealize Automation integration fails with NSX-T workload domain.

    NSX-T does not yet support vRealize Automation integration.

    Workaround: None.

  • Upgrade to vRealize Suite Lifecycle Manager 2.0 removes Environments Cards and Request History

    The upgrade process from vRealize Suite Lifecycle Manager 1.2 to 2.0 replaces the deployed appliance and restores a baseline configuration. This action subsequently removes any previous environment cards from the user interface and the request history. This may create auditing concerns; however, the originating requests for vRealize Suite product actions (such as deployment, workload domain connections, and so on) are maintained in the SDDC Manager logs.

    Workaround: Verify that vRealize Log Insight is manually configured for log retention and archiving. This will help to ensure that the SDDC Manager logs with the historical and originating vRealize Suite product action requests are preserved. For example, see Configure Log Retention and Archiving for vRealize Log Insight in Region A in the VMware Validated Design 4.3 documentation.

  • The vRealize port group is configured with the incorrect teaming policy.

    The vRealize network is configured with Route based on originating virtual port load balancing policy. This could cause uneven traffic distribution between the physical network interfaces.

    Workaround: Manually upgrade nick teaming policy in the vCenter UI:

    1. Log into vCenter.
    2. Navigate to Networking.
    3. Locate the Distributed switch in management vCenter.
    4. Right click on vRack-DPortGroup-vRealize portgroup -> Edit Settings.
    5. Select Teaming and failover.
    6. Change Load balancing from Route based on originating virtual port to Route based on physical NIC load.
    7. Click OK and verify that the operation have completed successfully.
  • The password update for vRealize Automation and vRealize Operations Manager may run infinitely or may fail after sometime when password provided by user contains special character "%".

    The Password management uses vRealize Lifecycle Manager API to update the password of vRealize Automation and vReaize Operations Manager. When there is special character "%" in either of SSH or API or Administrator credential types of the vRealize Automation and vRealize Operations Manager users, then the vRealize Lifecycle Manager API hangs and doesn't respond to password management. There is a timeout of 5 mins and password management marks the operation as failed.

    Workaround:Retry the password update operation without special character "%" and also ensure the passwords of all other vRealize Automation accounts don't contain "%" special character.

  • The password rotation or update operation fails to update new passwords of the vRealize Automation account in the vRealize Automation Adapter in vRealize Operations Manager

    This symptom occurs when the user configures the vRealize Automation adapter with the externally managed accounts. The vRealize Operations Manager Suite API expects the sysadmin and superuser credentials to be passed always in its update adapter API. But since the external account details are not known to VMware Cloud Foundation, the password management cannot pass the external account details in the update call. It is always recommended that the vRealize Automation adapter be configured with VMware Cloud Foundation managed accounts which are configured by the default post deployment.

    Workaround:When the user credentials of the vRealize Automation Adapter in vRealize Operations Manager is configured with external accounts, then post vRealize Automation account password update or rotate, the user has to manually login to vRealize Operations Manager and update the new credentials of vRealize Automation adapter.

  • vRealize Operations expand will fail after upgrade if vRealize Operations Install bundle is not downloaded

    The vRealize Operations expand task fails after the vRealize Operations upgrade to version 7.5 if the vRealize Operations install bundle (for version 7.5) is not downloaded. The vRealize Operations Analytics Cluster expansion through vRealize Suite Lifecycle Manager task will fail.

    Workaround:
    1. Perform rollback on the failed expand operation.
    2. Download vRealize Operations install bundle for VMware Cloud Foundation 3.8.
    3. Start the vRealize Operations expand task afresh.

Networking Known Issues
  • Platform audit for network connectivity validation fails

    The vSwitch MTU is set to the same MTU as the VXLAN VTEP MTU. However, if the vSAN and vMotion MTU are set to 9000, then vmkping fails.

    Workaround: Modify the nsxSpecs settings in the bring-up JSON by setting the VXLANMtu as a jumbo MTU because vSwitch is set with the VXLAN MTU value. This will prevent the error seen in the platform audit.

  • NSX Manager is not visible in the vSphere Web Client.

    In addition to NSX Manager not being visible in the vSphere Web Client, the following error message displays in the NSX Home screen: "No NSX Managers available. Verify current user has role assigned on NSX Manager." This issue occurs when vCenter Server and the permission is not correctly configured for the account that is logged in.

    Workaround: To resolve this issue, follow the procedure detailed in Knowledge Base article 2080740 "No NSX Managers available" error in the vSphere Web Client.

SDDC Manager Known Issues
  • Cancelling "Network Connectivity" validations is not cleaning up the temporary vSwitch (adt_vSwitch_01) that is created by Platform Audit

    After cancelling "Network Connectivity" validations when user re-runs validation, "ESXi Host Readiness" validation fails with error "Physical NIC vmnic1 is connected to adt_vSwitch_01 on esxi host esxi-1 (10.0.0.100): should be disconnected".

    This is because temporary vSwitch (adt_vSwitch_01) which was created during previous "Network Connectivity" validations was not cleaned up after cancellation.

    Note: In the above error message, the ESXi host name and the IP address will vary depending up on customer's environment.

    Workaround: Run the platform audit validation again after the "ESXi Host Readiness" validation failure.

  • The SDDC Manager users are not able to log in through SSH and DCUI after the migration

    After migration, if the root and vcf users have passwords with the following special characters, they will not be able to log in through SSH and DCUI.

    & * { } [ ] ( ) / \ ' " ` ~ , ; : . < >

    Workaround: 

    1.  After migration, perform the following steps:
      1. Reboot the SDDC Manager and enter into single user mode.
      2. Change the root user password as same as the old one.
      3. Try to log in through root user in the SDDC manager direct console. You can log in to root.
    2. Another workaround for this issue is that change the password for the root and vcf users with the accepted special characters before the migration.
  • The "CPU, Memory and Storage"  dashboard widget shows the incorrect unit for memory

    The "CPU, Memory and Storage"  dashboard widget currently shows GB instead of TB for memory.

    Workaround: None

  • For any workload domain with NSX-T Managers, the corresponding cluster page in the SDDC manager UI shows the error sign for VLAN ID

    As part of cluster addition to a work load domain, the user is expected to provide a VLAN ID which will be used for the overlay network by NSX-T.
    After the successful deployment of the cluster, the summary page of the cluster has a field to show the VLAN ID which was provided by the user.
    Due to this issue, the user will not be able to see this VLAN ID on the summary page.

    Workaround
    1. Get any host name under that cluster by navigating to the hosts tab in the same page.
    2. Invoke http://SDDC_MANAGER_IP/inventory/nsxt and get clusterIpAddress from the response and use it as NSX_MANAGER_IP in step3.
    3. Invoke the NSX-T GET API(https://NSX_MANAGER_IP/api/v1/transport-nodes) with basic authentication credentials.
    4. Look for the transport node which contains display_name same as host name got from step 1.
    5. Under that transport-node, get the value of (host_switches -> host_switch_profile_ids where key is UplinkHostSwitchProfile). Use this as host_switch_profile_id in next step.
    6. Invoke the NSX-T GET API(https://NSX_MANAGER_IP/api/v1/host-switch-profiles/{host_switch_profile_id}) with basic authenticationcredentials.
    7. From the response, use the transport_vlan value which is VLAN ID of the cluster.

  • The migration of SDDC Manager from the 3.7.2 version to 3.8 version may fail if the certification server is not reachable

    If the certificate server details have been entered into SDDC Manager and it is not able to connect to the certificate server, the SDDC manager upgrade may fail.

    Workaround: Configure the certification server and make sure that SDDC Manager can successfully connect to the certification server.

Workload Domain Known Issues
  • The vSAN HCL database does not update as part of workload domain creation

    When you create a workload domain, the vSAN HCL database should update as part of the process. As a result, database moves into a CRITICAL state, as observed from vCenter.

    Workaround: Manually update the vSAN HCL database as described in Knowledge Base article 2145116.

  • Adding host fails when host is in a different VLAN

    This operation should succeed as adding a host to workload domain cluster should succeed even though the new host is on a different VLAN than other hosts in the same cluster.

    Workaround:

    1. Before attempting to add a host, add a new portgroup to the VDS for the cluster.
    2. Tag the new portgroup with the VLAN ID of the host to be added.
    3. Run the Add Host workflow in the SDDC Manager Dashboard.
      This will fail at the "Migrate host vmknics to dvs" operation.
    4. Locate the failed host in vCenter, and migrate the vmk0 of the host to the new portgroup you created in step 1.
      For more information, see Migrate VMkernel Adapters to a vSphere Distributed Switch in the vSphere product documentation.
    5. Retry the Add Host operation.
      It should succeed.

    NOTE: If you remove the host in the future, remember to manually remove the portgroup, too, if it is not used by any other hosts.

  • In some cases, VI workload domain NSX Manager does not appear in vCenter.

    Observed in NFS-based workload domains. Although VI workload domain creation was successful, the NSX Manager VM is not registered in vCenter and as a result, not appearing in vCenter.

    Workaround: To resolve this issue, use the following procedure:

    1. Log in to NSX Manager (http://<nsxmanager IP>).
    2. Navigate to Manage > NSX Management Service.
    3. Un-register the lookup service and vCenter, then re-register.
    4. Close the browser and log in to vCenter.
  • Unable to delete VI workload domain enabled for vRealize Operations Manager from SDDC Manager.

    Attempts to delete the vCenter adapter also fail, and return an SSL error.

    Workaround: Use the following procedure to resolve this issue.

    1. Create a vCenter adapter instance in vRealize Operations Manager, as described in Configure a vCenter Adapter Instance in vRealize Operations Manager.
      This step is required because the existing adapter was deleted by the failed workload domain deletion.
    2. Follow the procedure described in Knowledge Base article 56946.
    3. Restart the failed VI workload domain deletion workflow from the SDDC Manager interface.
  • Unable to create the transport node collection through the NSX-T Manager

    The process of deletion of the NSX-T workload fails with the UNABLE_TO_RE_CREATE_TRANSPORT_NODE_COLLECTION error.

    Workaround:

    1. Restart the proton service. Log in to all the NSX-T Manager instances as root and execute the /etc/init.d/proton restart command.

    2. After the proton service is up, log in to NSX-T Manager UI, ensure that the transport nodes state is success for the cluster. If not, select Configure NSX from NSX-T Manager for the compute collection by providing the respective transport node profile.
    3. Detach the transport node profile for the compute collection from the NSX-T Manager.
    4. Retry the workflow.

  • Adding cluster to a NSX-T workload domain fails with the error message "Invalid parameter: {0}".

    If you try to add cluster to a NSX-T workload domain and it fails with the error message "Invalid parameter: {0}", the subtask that creates logical switches has failed. This is likely due to an issue where artifacts from previously removed workload domains and clusters are conflicting with the new switch creation process.

    Workaround: Delete the hosts from the NSX-T Manager if exists and then delete the stale logical switches. the logical switches are created in the following pattern:

     ls-<UUID>-management, ls-<UUID>-vsan, ls-<UUID>-vmotion

    For all these three logical switches, {UUID} remains the same. For these logical switches, the logical ports are shown as "0" (ZERO).  The user has to delete these logical switches.

  • Unable to delete cluster from any NSX workload domain.

    The likely cause of this error is the presence of a dead or deactivated host within the cluster. To resolve this, you edit the workflow entry to remove reference to the offending host.

    Workaround: To resolve this issue, perform the following procedure.

    1. In SDDC Manager (Inventory > Workload Domains > [workload domain] > [cluster]), try to delete the cluster in order to identify the problematic host.
    2. Using SSH, log in to the SDDC Manager VM to obtain the workflow ID.
      Run curl -s http://localhost:7200/domainmanager/workflows to return a JSON listing all workflows.
    3. Using the workflow ID, get the workflow input.
      curl -s http://localhost:7200/domainmanager/internal/vault/<workflow-id> \
      -XGET > remove_cluster_input.json
    4. Edit the remove_cluster_input.json file by removing the entry referencing the problematic host.
      1. Find the reference to the host under the heading:
        "RemoveClusterEngine____2__RemoveClusterFromNsx____0__NsxtRemoveCluster____0__removeNsxtModel"
      2. Delete the entire entry for the problematic host.
      3. Save the edited file as remove_cluster_input_deadhost_removed.json.
    5. Update the workflow with the new file.
      curl -s http://localhost:7200/domainmanager/internal/vault/<workflow-id> -XGET > \
      -XPUT -H "Content-type: text/plain" -d @ remove_cluster_input_deadhost_removed.json
    6. Return to SDDC Manager and retry the workflow.
      It will fail at the logical switches deletion task.
    7. In NSX Manager, clear the host as follows:
      1. Go to NSX > Nodes > Transport Nodes and delete the same host as a transport node.
      2. Go to NSX > Nodes > Hosts and delete the same host.
    8. Return to SDDC Manager and restart the workflow.
      It should succeed.
  • Removal of a dead host from the NSX-T workload domain fails and subsequently, the removal of the workload domain fails

    In some cases where the creation of NSX-T workload domain fails, the subsequent attempt to delete the dead hosts fails as well. The host is disconnected from vCenter. Therefore, manual intervention is required to connect back. This issue is also seen when the domain is created successfully but then one of the host goes dead.

    Workaround:

    If a host removal operation fails, since the host is dead (i.e host is disconnected from VC), then perform the following steps:

    1. Log in to NSX-T Manager.
    2. Identify the dead host under fabric->nodes->transport nodes.
    3. Edit the transport node and select the N-VDS tab.
    4. Under Physical NICs, assign vmnic0 to the uplink.
    5. From SDDC Manager, retry the failed domain deletion task.

  • If there is a dead host in the cluster, the subsequent task of adding a cluster or a host fails.

    If one of the hosts of the workload domain goes dead and the user tries to remove the host, the task fails. And then, that particular host is set to the deactivating state without providing an option to forcefully remove it.

    Workaround: Bring the dead host back to normal state, after which the add-cluster and add-host tasks succeed.

  • NSX-V Workload domain creation may fail at the NSX-V Controller deployment

    The workload domain creation workflow may fail at NSX-V Manager deploying controllers to the new domain. This issue occurs because the NSX Controllers do not get an IP address because the VM NIC is disconnected.

    Workaround: The user must reboot each of the ESXi hosts in the cluster of the new domain and restart the workflow.

  • NSX-T workload domain creation fails at the 'Join vSphere hosts to NSX-T Fabric' task

    When NSX-T domain creation workflow fails at "Join vSphere hosts to NSX-T Fabric" task, it is generally due to the failure of NSX-T installation on one of the hosts. When seen in NSX-T manager web portal,the  installation failure is clearly indicated in the host view.

    This happens intermittently. When this happens, detection of it fails until some further point. Eventually task fails after a long wait.

    Workaround: The user has to log in to the NSX-T Manager, check the host that has the NSX-T installation failure, and delete it from the fabric.

  • The stretch cluster workflow does not report failure when the VMs are not compliant or out of date

    The stretch cluster workflow does not check for the VM compliant status after the policy is reapplied as it takes time for VMs to become compliant.

    Workaround: Try the reapply policy manually using the Web client.

  • Deletion of additional domain does not delete the NSXT vibs

    This issue is intermittent. You cannot re-purpose the same host for the workload domain creation as it has the older vibs.

    Workaround: Clean up the NSX-T vibs manually. Keep retrying the removal task for the NSX-Tvibs till it succeeds.

  • The SoS clean-up fails in cleaning up the NSX-T consumed hosts

    The issue occurs in removing the NSX-T related vib from the ESXi hosts. Hence, the SOS cleanup fails at the network clean up stage.

    Workaround: Refer the step 4 in the Remove a Host From NSX-T or Uninstall NSX-T Completely section in the VMware NSX-T Installation Guide.

  • The VI workload domain restart task fails

    The VI domain creation failed with the "Image with product type vCenter with version 6.7.0-13010631 and image type INSTALL is not found" error. When you upload the install bundle and restart the task., the restart process also fails with the same error.

    Workaround: Start a new VI domain creation task.

  • The workload domain creation and/or cluster addition may fail at the NSX host preparation phase

    If the ESXi hosts that are used to create a workload domain or cluster have not been fully cleaned or are reporting LiveInstallationError for any reason, the workload domain creation and/or cluster addition may fail at the NSX host preparation because EAM is not able to install the NSX VIBs on the ESXi hosts.

    Workaround: Reboot the hosts, ensure that EAM does not show LiveInstallationError and restart the workflow from the UI.

  • Second NSXT Workload domain will fail at "Creating NSXT Overlay Segment" or "Creating NSXT VLAN Segment"

    User or customer may hit this issue while deploying or creating Second NSXT workload.

    Workaround:

    1.Reboot all the NSX-T Manager appliances.
    2.Wait for all the NSXT Manager appliances to come up.
    3.Restart the failed task.

    If you see the failure again, then check the domain manager log. Search "Segment already exists" and delete that segment from NSX-T. Restart the failed task.

  • If the user alters or modifies the VMware Cloud Foundation created elements (port groups,segments), then there may be impact in the further workflow creation in the system.

    The user should not modify or alter any elements (port-groups, logical segments) which is created by VMware Cloud Foundation. 

    Workaround: The user has to identify the modified or changed elements and restore them back exactly as what was created by VMware Cloud Foundation. This involves manual intervention.

  • The vCenter that has gone through certificate rotation will not be able to access from any of the VDI infrastructure instance

    VMware Cloud Foundation does not support the rotation of certificates on the VDI-associated workload domains.

    Workaround: You can find the workaround for this issue at https://kb.vmware.com/s/article/70956.

  • When the user tries to deploy partner services on a VMware Cloud Foundation deployed NSX-T workload domain, the “Configure NSX at cluster level to deploy Service VM” error comes up

    On a VMware Cloud Foundation deployed NSX-T workload domain, the user will not be able to deploy partner services like McAfee, Trend, and so on.

    Workaround: Attach the Transport node profile back to the cluster and try deploying the partner service. After the service is deployed, detach the Transport node profile from the cluster.

  • The delete compute collection task during the workload domain deletion fails

    The deletion of a workload domain fails when the user tried to delete the NSX-T based workload domain that has VMs running on it

    Workaround:

    a. Delete all the configuration deployed on top of the workload (edge vms/ segments/ routers etc)
    b. Update the transport node profile with the correct VM kernel adapters and their port group names:
    vmk0 - <mgmt port group name>
    vmk1 - <mgmt port group name>
    vmk2 - <mgmt port group name>
    c. Log in to the NSX-T UI and configure NSX with the above profile and current compute collection.
    d. Retry the failed workflow.

  • The cluster deletion fails in NSX-T workload domain

    Due to the inconsistent behavior at NSX-T, the VMK interfaces are left at NVDS and proceeded with the rest of the actions in the workflow. Hence the segment deletion fails.

    Workaround:
    From NSXT Manager, manually remove the uninstall mapping from Host TN and remove NSX from NSX-T manager manually. Restart the Delete cluster task from SDDC Manager to proceed further.
    If the ESXi hosts becomes unreachable after removing NSX from NSX-T Manager, execute the following commands from the ESXi host's console to recover.

    esxcfg-vswitch -a vswitch0
    esxcfg-vswitch -A mgmt vswitch0
    esxcfg-vswitch -v <VLAN-ID>  -p mgmt vswitch0
    esxcfg -L vmnic0 vswitch0
    esxcli network ip interface remove –interface-name=vmk0
    esxcfg-vmknic -a -I <mgmt-ip>  -n <netmask>  mgmt

    Now the hosts will be reachable from vCenter. Migrate the VMKs from NSX-T logical segments to vSphere DVS port groups from vCenter and retry the cluster deletion task from the SDDC Manager.

  • The NSX-T workload domain creation fails

    The NSX-T workload domain creation task fails with the "Configure Backup Schedule Task" error.

    Workaround: Wait for about five minutes and then restart the failed task.

  • While trying to add a cluster to an NSX-T based domain , the NSX-T Add Cluster operation fails at the Create Transport Node Collection Action task

    While trying to add cluster to an NSX-T domain, for every new host that will be part of this new cluster, the NSX-T vibs are installed on each host. Intermittently ,the HTTPS service in one of the nodes of NSX-T Manager goes down due to which NSX install fails in one of the host.

    Workaround:

    1. Reboot the NSX-T Manager whose HTTPS service went down.
    2. Wait till the NSX-T cluster becomes stable. 
    3. From NSX-T UI, resolve the NSX-T Install failed host.
    4. Retry the failed workflow from SDDC Manager UI.

  • The unstretch workflow fails with VSAN error during the host maintenance task

    Currently, during the unstretch operation, the policies are updated. We need to explicitly run the reapply policy and the VSAN compliance check.

    It is needed to avoid VSAN error during host maintenance task.
     

    Workaround:
    1. Re-enable stretch
    2. Reapply the updated policy and run the VSAN compliance check and health check. After running this test, the VMs become compliant.
    3. Perform the disable stretch and remove the fault domains.
    4. Restart the failed task during the unstretch operation. The task successfully puts the host in maintenance.

  • The remove cluster operation fails for the partially created cluster

    The cluster creation fails at validating the network connectivity of the hosts. This issue occurs when the NSX-T add cluster operation is triggered which inserts the inventory details of the new cluster into database, but fails to create the cluster in the other areas of the VMware Cloud Foundation system (for example, vCenter, NSX-T, and so on). It fails at validating the host network connectivity. Trigger the remove cluster operation for the partially created cluster and even this fails at the "Gather input for NVDS to VDS migration from vCenter and NSX-T manager" task.

    Workaround: 

    1. Run the following command:

       curl -k -X GET http://localhost/inventory/clusters

    2. Find the <cluster-id>  from the above response for the cluster need to be deleted in backend.

    3. Delete cluster from inventory:

        curl -k -X DELETE http://localhost/inventory/extensions/vi/clusters/<cluster-id>

    4. Decommision the hosts part of the cluster.

    5. SOS cleanup for the hosts part of the cluster.

    6. Commission the hosts part of the cluster.

  • Currently, the migration from NSX-V to NSX-T is not supported

    Currently in the VMware Cloud Foundation environments, the NSX-V to NSX-T migration is not supported.

    Workaround: Deploy a new NSX-T workload domain.

Security Operations Known Issues
  • Updating password policy failure results in UPDATE message when should be FAILED

    If the password policy fails, the system shows an UPDATE status and the transaction history shows the message "Operation failed in 'appliance update', for  credential update." In actuality, the operation has FAILED because the new password does meet requirements. A more appropriate message would read "Password update has failed due to unmet policy requirements" and recommend reviewing the policy.

    Workaround: Review the password policy for the component in question and modify the password configuration as necessary, and try again to update.

  • Unable to perform password management operations from the SDDC Manager Dashboard

    If the Cloud Foundation Operations Manager component is restarted while a password management operation is in progress, the password management operation ends up in an INCONSISTENT state. You will not be able to perform any password management operations until you cancel the INCONSISTENT operation.

    1.  SSH into the SDDC Manager VM as the vcf user.
    2. Type su to switch to the root account.
    3. Run curl http://localhost/security/password/vault/transactions | json_pp
      This returns information about the INCONSISTENT operation, For example:
      [
      {
         "transactionStatus" : "INCONSISTENT",
         "workflowId" : "f6548e76-f3e9-4033-801d-36ccae893672",
         "transactions" : [
            {
               "oldPassword" : "AX1are276!",
               "username" : "administrator@vsphere.local",
               "entityName" : "psc-1.vrack.vsphere.local",
               "id" : 123,
               "newPassword" : "x%5N6H1A^p%wJ4N",
               "entityType" : "PSC",
               "credentialType" : "SSO",
               "workflowId" : "0daaac30-c88d-4407-8bf3-c791541ebbae",
               "transactionStatus" : "INCONSISTENT",
               "timestamp" : "2019-04-09T09:00:29.628+0000",
               "transactions" : [
                 
               ]
            }
         ],
         "type" : "ROTATE",
         "id" : 1
      }
      ]
    4. Using the ID of the INCONSISTENT transaction ("id" : 1 in the example above), run the following:
      curl -X DELETE http://localhost/security/password/vault/transactions/<ID> | json_pp
      This returns something like the following:
      {
          "transactionId":1,
          "workflowId":"f6548e76-f3e9-4033-801d-36ccae893672",
          "status":"USER_CANCELLED"
      }
Known Issues Affecting Service Providers
  • Domain manager workflows fail when using SDDC Manager to manage an API-created cluster or domain.

    If you have a cluster or domain that was created through the API, and you try to manage it through the SDDC Manager dashboard, the workflow will fail. This affects the following domain manager workflows: Add/Remove Host, Add/Remove VI Workload Domain, and Add/Remove Cluster.

    Workaround: None. Clean up the failed workflow and try again using the API.