check-circle-line exclamation-circle-line close-line

 

VMware Cloud Foundation 3.9 | 24 OCT 2019 | Build 14866160

VMware Cloud Foundation is a unified SDDC platform that brings together VMware ESXi, VMware vSAN, VMware NSX, and optionally, vRealize Suite components, VMware NSX-T, VMware Enterprise PKS, and VMware Horizon 7 into a natively integrated stack to deliver enterprise-ready cloud infrastructure for the private and public cloud. The Cloud Foundation 3.9 release continues to expand on the SDDC automation, VMware SDDC stack, and the partner ecosystem.

NOTE: VMware Cloud Foundation 3.9 must be installed as a new deployment or upgraded from VMware Cloud Foundation 3.8.1. For more information, see Installation and Upgrade Information below.

What's in the Release Notes

The release notes cover the following topics:

What's New

The VMware Cloud Foundation 3.9 release includes the following:

  • Cluster-Level Upgrade Support: Provides an option to select individual clusters within a workload domain for ESXi upgrades.
  • Multi-Instance Management: Allows you to monitor multiple Cloud Foundation instances from a single console.
  • Fibre Channel Storage as Principal Storage: Virtual Infrastructure (VI) workload domains now support Fibre Channel as a principal storage option in addition to VMware vSAN and NFS.
  • Support for Additional Composable Hardware: Server composability has been extended to include the ability to compose and decompose Dell MX servers, enabling Dell MX customers to compose servers as per workload needs.
  • Improved NSX Data Protection: SDDC Manager can configure NSX Managers to back up on a SFTP server in a separate fault zone. It is recommended that you register a SFTP server with SDDC Manager after upgrade or bring-up.
  • Cloud Foundation APIs: API support has been extended. For more information, see VMware Cloud Foundation API Reference Guide.
  • L3 Aware IP Addressing (API only): NSX-T based VI workload domains now support the ability to use hosts from different L2 domains to create or expand clusters.
  • Developer Center (Beta feature): Enables you to access Cloud Foundation APIs and code samples from SDDC Manager Dashboard.
  • BOM Updates for the 3.9 ReleaseUpdated Bill of Materials with new product versions.

Cloud Foundation Bill of Materials (BOM)

The Cloud Foundation software product is comprised of the following software Bill-of-Materials (BOM). The components in the BOM are interoperable and compatible.

Software Component Version Date Build Number
Cloud Builder VM 2.2.0.0 24 OCT 2019

14866160

SDDC Manager 3.9 24 OCT 2019

14866160

VMware vCenter Server Appliance vCenter Server 6.7 Update 3 20 AUG 2019

14367737

VMware ESXi ESXi 6.7 Update 3 20 AUG 2019

14320388

VMware vSAN

6.7 Update 3

20 AUG 2019

14263135

VMware NSX Data Center for vSphere 6.4.5 18 APR 2019

13282012

VMware NSX-T Data Center 2.5 19 SEP 2019

14663974

VMware Enterprise PKS 1.5 20 AUG 2019

14878150

VMware vRealize Suite Lifecycle Manager 2.1 Patch 2 02 JUL 2019

14062628

VMware vRealize Log Insight 4.8 11 APR 2019 13036238
vRealize Log Insight Content Pack for NSX for vSphere 3.9 n/a n/a
vRealize Log Insight Content Pack for Linux 1.0 n/a n/a
vRealize Log Insight Content Pack for vRealize Automation 7.3+ 2.2 n/a n/a
vRealize Log Insight Content Pack for vRealize Orchestrator 7.0.1+ 2.1 n/a n/a
vRealize Log insight Content Pack for NSX-T 3.8 n/a n/a
vSAN Content Pack for Log Insight 2.1 n/a n/a
vRealize Operations Manager 7.5 11 APR 2019 13165949
vRealize Automation 7.6 11 APR 2019 13027280
VMware Horizon 7 7.9.0 25 JUN 2019

13956742

Note: 

  • vRealize Log Insight Content Packs are deployed during the workload domain creation.
  • VMware Solution Exchange and the vRealize Log Insight in-product marketplace store only the latest versions of the content packs for vRealize Log Insight. The Bill of Materials table contains the latest versions of the packs that were available at the time VMware Cloud Foundation is released. When you deploy the Cloud Foundation components, it is possible that the version of a content pack within the in-product marketplace for vRealize Log Insight is newer than the one used for this release.

VMware Software Edition License Information

The SDDC Manager software is licensed under the Cloud Foundation license. As part of this product, the SDDC Manager software deploys specific VMware software products.

The following VMware software components deployed by SDDC Manager are licensed under the Cloud Foundation license:

  • VMware ESXi
  • VMware vSAN
  • VMware NSX Data Center for vSphere

The following VMware software components deployed by SDDC Manager are licensed separately:

  • VMware vCenter Server
    NOTE Only one vCenter Server license is required for all vCenter Servers deployed in a Cloud Foundation system.
  • VMware NSX-T
  • VMware Enterprise PKS
  • VMware Horizon 7
  • VMware vRealize Automation
  • VMware vRealize Operations
  • VMware vRealize Log Insight and content packs
    NOTE Cloud Foundation permits limited use of vRealize Log Insight for the management domain without the purchase of a vRealize Log Insight license.

For details about the specific VMware software editions that are licensed under the licenses you have purchased, see the Cloud Foundation Bill of Materials (BOM) section above.

For general information about the product, see VMware Cloud Foundation.

Supported Hardware

For details on vSAN Ready Nodes in Cloud Foundation, see VMware Compatibility Guide (VCG) for vSAN and the Hardware Requirements section in the VMware Cloud Foundation Planning and Preparation Guide.

Documentation

To access the Cloud Foundation 3.9 documentation, go to the VMware Cloud Foundation product documentation.

To access the documentation for VMware software products that SDDC Manager can deploy, see the product documentation and use the drop-down menus on the page to choose the appropriate version:

Browser Compatibility and Screen Resolutions

The Cloud Foundation web-based interface supports the latest two versions of the following web browsers except the Internet Explorer:

  • Google Chrome
  • Mozilla Firefox
  • Microsoft Edge
  • Internet Explorer: Version 11

For the Web-based user interfaces, the supported standard resolution is 1024 by 768 pixels. For best results, use a screen resolution within these tested resolutions:

  • 1024 by 768 pixels (standard)
  • 1366 by 768 pixels
  • 1280 by 1024 pixels
  • 1680 by 1050 pixels

Resolutions below 1024 by 768, such as 640 by 960 or 480 by 800, are not supported.

Installation and Upgrade Information

You can install Cloud Foundation 3.9 as a new release or upgrade from VMware Cloud Foundation 3.8.1.

In addition to the release notes, see the VMware Cloud Foundation Upgrade Guide for information about the upgrade process.

Installing as a New Release

The new installation process has three phases:

Phase One: Prepare the Environment

The VMware Cloud Foundation Planning and Preparation Guide provides detailed information about the software, tools, and external services that are required to implement a Software-Defined Data Center (SDDC) with VMware Cloud Foundation, using a standard architecture model.

Phase Two: Image all servers with ESXi

Image all servers with the ESXi version mentioned in the Cloud Foundation Bill of Materials (BOM) section. See the VMware Cloud Foundation Architecture and Deployment Guide for information on installing ESXi.

Phase Three: Install Cloud Foundation 3.9

Refer to the VMware Cloud Foundation Architecture and Deployment Guide for information on deploying Cloud Foundation.

Upgrade to Cloud Foundation 3.9

You can upgrade to Cloud Foundation 3.9 only from 3.8.1. If you are at a version earlier than 3.8.1, refer to the 3.8.1 Release Notes for information on how to upgrade from the prior releases.

For information on upgrading to 3.9, refer to the VMware Cloud Foundation Upgrade Guide.

Resolved Issues

  • The following issues have been resolved:
    • Updating a password results in UPDATE message when it should be FAILED
    • NSX-T workload domain creation fails with the "Configure Backup Schedule Task" error
    • The Lifecycle Manager page displays that an update is available even after the upgrade is done
    • The unstretch workflow fails with vSAN error during the host maintenance task
    • Downloading Lifecycle Manager bundles from depot.vmware.com using the bundle transfer utility fails with the "No space left on device" error
    • NSX-T upgrade fails with the UPGRADE_TIMEDOUT status while upgrading the NSX-T host clusters
    • NSX-T upgrade fails with the COMPLETED_WITH_FAILURE status while retrying a failed upgrade
    • NSX-T Host cluster upgrade may fail with the COMPLETED_WITH_FAILURE status and any retry will fail at the same stage
    • Even after applying the VMware Cloud Foundation update, the bundle status shows the pending list
    • Even after a host has been removed from the cluster, the vCenter Server still displays it in the inventory
    • PKS deployment fails with the "Unable to create pks user ubuntu" error
    • The get operation for the certification API throws a 500 Error when Microsoft CA is not configured on the SDDC Manager

Known Issues

The known issues are grouped as follows.

Bring-Up Known Issues
  • Clicking the help icon in the Cloud Builder VM opens the help for an older release

    The help icon in the Cloud Builder links to the product help for VMware Cloud Foundation 3.8.

    Workaround:

    In the browser window displaying the 3.8 help, select VMware Cloud Foundation 3.9 from the version selection drop-down menu.

  • Cloud Foundation Builder VM deployment fails with the "[Admin/Root] password does not meet standards" message

    When configuring the Cloud Foundation Builder admin and root passwords, the format restrictions are not validated. As a result, you can create a password that does not meet the requirements and the Cloud Foundation Builder VM deployment will fail. 

    Workaround: When configuring the Cloud Foundation Builder, ensure that the password meets the following restrictions:

    • Minimum eight characters long
    • Must include at least one uppercase letter
    • Must include at least one lowercase letter
    • Must include at least one digit 
    • Must include at least one special character
  • The bring-up process fails at task disable TLS 1.0 on the vRealize Log Insight nodes

    The bring-up fails at the task disable TLS 1.0 on the  vRealize Log Insight nodes with the following error Connect to 10.0.0.17:9543 [/10.0.0.17] failed: Connection refused (Connection refused). This issue has been observed in the slow environments after restarting a vRealize Log Insight node. The node does not start correctly and its API is not reachable.

    Workaround: Use the following procedure to work around this issue.

    1. Restart the failed bring-up execution in the Cloud Foundation Builder VM and open the bring-up logs.
      This will retry the failed the bring-up task which might still fail on the initial attempt. The log shows an unsuccessful connection to the vRealize Log Insight node.
    2. While bring-up is still running, use SSH to log in to the vRealize Log Insight node that is shown as failed in the bring-up log.
    3. Run the following command to determine the connection issue.
      loginsight-node-2:~ # service loginsight status
      It should confirm that the daemon is not running.
    4. Execute the following command:
      loginsight-node-2:~ # mv /storage/core/loginsight/cidata/cassandra/data/system ~/cassandra_keyspace_files
    5. Reboot the vRealize Log Insight node.
    6. Confirm that it is running.
      loginsight-node-2:~ # uptime
      18:25pm up 0:02, 1 user, load average: 3.16, 1.07, 0.39
      loginsight-node-2:~ # service loginsight status
      Log Insight is running.

    In a few minutes, the bring-up process should successfully establish a connection to the vRealize Log Insight node and proceed.

  • The Cloud Foundation Builder VM remains locked after more than 15 minutes.

    The VMware Imaging Appliance (VIA) locks out the user after three unsuccessful login attempts. Normally, the lockout is reset after fifteen minutes but the underlying Cloud Foundation Builder VM does not automatically reset.

    Workaround: Using SSH, log in as admin to the Cloud Foundation Builder VM, then switch to the root user. Unlock the account by resetting the password of the admin user with the following command.
    pam_tally2 --user=<user> --reset

  • Validation fails on SDDC Manager license.

    During the bringup process, the validation for the SDDC Manager license entered in the deployment parameter sheet fails.

    Workaround: Leave the SDDC Manager license field blank and try again. You can enter the correct license value later in the process.

  • After updating the via.properties file on the Cloud Foundation Builder VM, restarting the imaging service fails

    The imaging service (imaging.service) fails when restarted.

    Workaround: If you encounter this issue, perform the following procedure:

    1. Stop the imaging service in the SDDC Manager VM.
      systemctl stop imaging.service
    2. Stop all processes related to imaging.
      ps - ef | grep imag
      kill <process_number>
    3. Sleep the system for five seconds.
      sleep 5
    4. Start the imaging service.
      systemctl start imaging.service
      It should restart correctly.
Upgrade Known Issues
  • vCenter upgrade operation fails on the management domain and workload domain

    vCenter fails to be upgraded because lcm-bundle-repo NFS Mount on the host is inaccessible.

    Workaround: Remove and remount the SDDC Manager NFS datastore on the affected ESXi hosts. Use the showmount command to check if all hosts are displayed in the SDDC manager mount list.

  • The vRealize Automation upgrade reports the "Precheck Execution Failure : Make sure the latest version of VMware Tools is installed" message

    The vRealize Automation IaaS VMs must have the same version of VMware Tools as the ESXi hosts on which the VMs reside.

    Workaround: Upgrade VMware Tools on the vRealize Automation IaaS VMs.

  • vRealize Automation upgrade fails

    If the vRealize Automation upgrade from SDDC Manager Dashboard fails with the "vRA IaaS Upgrade Failed" error , you cannot complete the failed upgrade using vRealize Lifecycle Manager.

    Workaround:

    1) Use vRealize Lifecycle Manager to find the failing request.

    2) Retry the failing request. 
        
    3) Retry the failed vRealize Automation upgrade in SDDC Manager.This validates the vRealize Automation version and the health of the vRealize Automation environment and indicates that the upgrade flow is completed successfully.
     

  • Error upgrading vRealize Automation

    Under certain circumstances, upgrading vRealize Automation may fail with a message similar to:

    An automated upgrade has failed. Manual intervention is required.
    vRealize Suite Lifecycle Manager Pre-upgrade checks for vRealize Automation have failed:
    vRealize Automation Validations : iaasms1.rainpole.local : RebootPending : Check if reboot is pending : Reboot the machine.
    vRealize Automation Validations : iaasms2.rainpole.local : RebootPending : Check if reboot is pending : Reboot the machine.
    Please retry the upgrade once the upgrade is available again. 

    1. Log-in into the first VM listed in the error message using RDP or the VMware Remote Console.
    2. Reboot the VM.
    3. Wait 5 minutes after the login screen of the VM appears.
    4. Repeat steps 1-3 for the next VM listed in the error message.
    5. Once you have restarted all the VMs listed in the error message, retry the vRealize Automation upgrade.

  • The vRealize Log Insight pre-check may fail for the consistency checks for the vRealize Log Insight - vRealize Lifecycle Manager Environment Master and Environment Nodes

    This is a known issue with the discrepancy between the host names in SDDC Manager and vRealize Lifecycle Manager inventory.

    Workaround:

    1. Log in into vRealize Lifecycle Manager.
    2. Click View Details for vRLI_environment on the Getting Started page.
    3. Click View Details.
    4. Expand nodes one by one and check the hostname field.
    5. If the field contains only the host name (for example, loginsight-node-1) and not FQDN (for example, loginsight-node-1.vrack.vsphere.local), ignore this error in the pre-check validation.

  • When there is no associated workload domain to vRealize Automation, the VRA VM NODES CONSISTENCY CHECK upgrade precheck fails

    This upgrade precheck compares the content in the logical inventory on the SDDC Manager and the content in the vRealize Lifecycle Manager environment. When there is no associated workload domain, the vRealize Lifecycle Manager environment does not contain information about the iaasagent1.rainpole.local and iaasagent2.rainpole.local nodes. Therefore the check fails.

    Workaround: None. You can safely ignore a failed VRA VM NODES CONSISTENCY CHECK during the upgrade precheck. The upgrade will succeed even with this error.

  • Cluster level upgrade is not available if the workload domain has a faulty cluster

    This issue occurs if any host or cluster in the workload domain is in an error state. 

    Workaround: Remove the faulty host or cluster from the workload domain. The cluster level upgrade option is then available for the workload domain.

  • The vRealize Operations upgrade fails at the vRealize upgrade prepare backup step

    When the vRealize Operations cluster is intensively used, the process of taking it offline in order to prepare snapshots as a backup could take a long time and therefore, may hit a timeout in SDDC Manager.

    Workaround: Prepare the backup manually following the steps:

    1. Take vRealize Operations cluster offline
      1. Log in to the master node vRealize Operations Manager administrator interface of your cluster.
      2. On the main page, click System Status. Click Take Offline under Cluster Status.
      3. Wait until all nodes in the analytics cluster are offline.
    2. Take a snapshot of each node so that you can roll the update back if a failure occurs.
      1. Log in to the Management vCenter Server.
      2. Take a snapshot of each node in the cluster. Right-click the virtual machine and select Snapshot > Take Snapshot. Use vROPS_LCM_UPGRADE_MANUAL_BACKUP as a prefix in the snapshot name for each virtual machine.
      3. Once this is done, retry the vRealize Operations upgrade through the SDDC Manager UI.
  • During the upgrade of VMware Cloud Foundation bundle1, the upgrade UI screen does not update or auto refresh, even though the upgrade is successful

    The upgrade screen does not refresh and gets stuck at the SDDC-MANGER-UI service upgrade till you refresh the upgrade UI screen.

    Workaround: When upgrade is in progress and gets stuck at the SDDC-MANGER-UI service upgrade for long time, check the status of the upgrade with following API and if status is Success, refresh the upgrade UI screen. After the UI refresh, you must see all the services upgraded successfully.

    vcf@sddc-manager [ ~ ]$ curl localhost/lcm/upgrades/completed | json_pp

    The expected output contains "upgradeStatus" : "COMPLETED_WITH_SUCCESS".

  • If an NSX Edge node is removed after the NSX-T upgrade is initiated through Lifecycle Manager, the upgrade may hang

    Do not modify the NSX-T nodes (host, edge) after starting an upgrade. If you do, then the upgrade could hang.

    Workaround: Log in to NSX Manager as an admin and click System>Upgrade. Refresh the page. This resets the NSX-T upgrade coordinator with the latest inventory.

  • During an upgrade, the SDDC Manager UI service upgrade fails due to upgrade timeout

    The SDDC Manager UI service upgrade fails during the initial and all the retry attempts. 

    Workaround:  Restart the following services related to the SDDC Manager UI, wait for sometime, and check the status of the services. 

    Run the command to restart the UI Services:

    root@sddc-manager [ ~ ]# systemctl restart sddc-manager-ui-app

    root@sddc-manager [ ~ ]# systemctl restart sddc-manager-ui-db

    Run the command to check the status of UI service:
    root@sddc-manager [ ~ ]# systemctl status sddc-manager-ui-app

    root@sddc-manager [ ~ ]# systemctl status sddc-manager-ui-db

    After the services are active, refresh the SDDC Manager URL and retry the failed upgrade from the Upgrade/Patches tab in the management domain.

vRealize Integration Known Issues
  • vRealize Operations in deployment fails when vRealize Operations appliances are in a different subdomain

    When you deploy vRealize Operations, you provide FQDN values for the vRealize load balancer and nodes. If these FQDNs are in a different domain than the one used during initial bringup, the deployment may fail.

    Workaround: To resolve this failure, add the vRealize Operations domain to the configuration in the vRealize Log Insight VMs.

    1. Log in to the first vRealize Log Insight VM.
    2. Open the /etc/resolv.conf file in a text editor, and locate the following lines:
      nameserver 10.0.0.250
      nameserver 10.0.0.250
      domain vrack.vsphere.local
      search vrack.vsphere.local vsphere.local 
    3. Add the domain used for vRealize Operations to the last line above.
    4. Repeat on each vRealize Log Insight VM.
  • The vRealize port group is configured with the incorrect teaming policy.

    The vRealize network is configured with Route based on originating virtual port load balancing policy. This could cause uneven traffic distribution between the physical network interfaces.

    Workaround: Manually upgrade NIC teaming policy in the vCenter UI:

    1. Log into vCenter Server.
    2. Navigate to Networking.
    3. Locate the Distributed switch in the management vCenter.
    4. Right click on vRack-DPortGroup-vRealize portgroup -> Edit Settings.
    5. Select Teaming and failover.
    6. Change Load balancing from Route based on originating virtual port to Route based on physical NIC load.
    7. Click OK and verify that the operation completes successfully.
  • The password update for vRealize Automation and vRealize Operations Manager may run infinitely or may fail when the password contains special character "%"

    Password management uses the vRealize Lifecycle Manager API to update the password of vRealize Automation and vRealize Operations Manager. When there is special character "%" in either of SSH or API or Administrator credential types of the vRealize Automation and vRealize Operations Manager users, then the vRealize Lifecycle Manager API hangs and doesn't respond to password management. There is a timeout of 5 mins and password management marks the operation as failed.

    Workaround:Retry the password update operation without the special character "%". Ensure that the passwords for all other vRealize Automation and vRealize Operations Manager accounts don't contain the "%" special character.

  • Adding a node to the vRealize Operations analytics cluster fails

    Expanding a vRealize Operations analytics cluster can fail if the vRealize Operations install bundle is not downloaded. 

    Workaround:
    1. Perform rollback on the failed expand operation.
    2. Download the vRealize Operations install bundle.
    3. Retry the task.

Networking Known Issues
  • Platform audit for network connectivity validation fails

    The vSwitch MTU is set to the same MTU as the VXLAN VTEP MTU. However, if the vSAN and vMotion MTU are set to 9000, then vmkping fails.

    Workaround: Modify the nsxSpecs settings in the bring-up JSON by setting the VXLANMtu as a jumbo MTU because vSwitch is set with the VXLAN MTU value. This will prevent the error seen in the platform audit.

  • NSX Manager is not visible in the vSphere Web Client.

    In addition to NSX Manager not being visible in the vSphere Web Client, the following error message displays in the NSX Home screen: "No NSX Managers available. Verify current user has role assigned on NSX Manager." This issue occurs when vCenter Server is not correctly configured for the account that is logged in.

    Workaround: To resolve this issue, follow the procedure detailed in Knowledge Base article 2080740 "No NSX Managers available" error in the vSphere Web Client.

  • The east-west traffic between the workloads behind different T1 is impacted when the communication happens over their private IP addresses(NON-NAT IP addresses)

    The SNAT rule starts taking effect in NSX-T 2.4.2 between the T0 and T1 causing the traffic to be SNATed twice; once, while the traffic egresses to the destination and the other time, when the traffic returns back from the destination. This leads to the workload dropping the traffic. This issue is seen in any NSX-T traffic matching the pattern.

    Workaround: Follow the steps in https://kb.vmware.com/s/article/71363.

SDDC Manager Known Issues
  • Unable to delete VI workload domain enabled for vRealize Operations Manager from SDDC Manager.

    Attempts to delete the vCenter adapter also fail, and return an SSL error.

    Workaround: Use the following procedure to resolve this issue.

    1. Create a vCenter adapter instance in vRealize Operations Manager, as described in Configure a vCenter Adapter Instance in vRealize Operations Manager.
      This step is required because the existing adapter was deleted by the failed workload domain deletion.
    2. Follow the procedure described in Knowledge Base article 56946.
    3. Restart the failed VI workload domain deletion workflow from the SDDC Manager interface.
  • Some APIs display "404 Not Found" error in the Developer Center UI

    In the SDDC Manager Developer Center, some APIs return the "404 Not Found" error.

    Workaround: Retrieve API details through https://code.vmware.com/apis/723. Use curl commands to run the APIs.

  • The APIs for managing a host are missing the input specifications box in the Developer Center UI

    The specifications file that is generated using Swagger is not completely transformed by the developer center library.  

    Workaround: Use the documentation, https://code.vmware.com/apis/723 to determine the input parameters.

Workload Domain Known Issues
  • Adding host fails when host is on a different VLAN

    A host add operation can sometimes fail if the host is on a different VLAN

    Workaround:

    1. Before adding the host, add a new portgroup to the VDS for that cluster.
    2. Tag the new portgroup with the VLAN ID of the host to be added.
    3. Add the Host. This workflow fails at the "Migrate host vmknics to dvs" operation.
    4. Locate the failed host in vCenter, and migrate the vmk0 of the host to the new portgroup you created in step 1.
      For more information, see Migrate VMkernel Adapters to a vSphere Distributed Switch in the vSphere product documentation.
    5. Retry the Add Host operation.
       

    NOTE: If you later remove this host in the future, you must manually remove the portgroup as well if it is not being used by any other host.

  • NSX Manager for VI workload domain is not displayed in vCenter

    Although NFS-based VI workload domains are created successfully, the NSX Manager VM is not registered in vCenter Server and is not displayed in vCenter.

    Workaround: To resolve this issue, use the following procedure:

    1. Log in to NSX Manager (http://<nsxmanager IP>).
    2. Navigate to Manage > NSX Management Service.
    3. Un-register the lookup service and vCenter, then re-register.
    4. Close the browser and log in to vCenter.
  • The Add Cluster operations fails with an "Insufficient Hosts" error for VMware vMotion and VMware vSAN

    Adding a cluster to a workload domain fails if the number of hosts in the workload domain exceeds the number of IP addresses assigned to the network pool for the hosts.

    Workaround:

    Add more IP addresses to the network pool and then retry the add cluster operation. Or decommission the hosts that went over the IP address allocation in the network pool, commission them back with a new network pool, and retry the add cluster operation.

  • Adding a cluster to an NSX-T workload domain fails with the error message "Invalid parameter: {0}"

    If you try to add cluster to an NSX-T workload domain and it fails with the error message "Invalid parameter: {0}", the subtask that creates logical switches has failed. This is likely due to an issue where artifacts from previously removed workload domains and clusters are conflicting with the new switch creation process.

    Workaround: Delete the hosts from NSX-T Manager and then delete the stale logical switches. The logical switches are created in the following pattern:

     ls- -management, ls- -vsan, ls- -vmotion

    The UUID remains the same for all three logical switches and the logical ports are shown as "0" (zero). You must delete these logical switches.

  • Unable to delete cluster from NSX-V workload domain

    If a cluster contains an unresponsive host, it cannot be deleted..

    Workaround: To resolve this issue, perform the following steps:

    1. On the SDDC Manager, navigate to Inventory > Workload Domains > [workload domain] > [cluster] and try deleting the cluster in order to identify the unresponsive host.
    2. Using SSH, log in to the SDDC Manager VM to obtain the workflow ID.
      Run curl -s http://localhost:7200/domainmanager/workflows to return a JSON listing all workflows.
    3. Using the workflow ID, retrieve the workflow input.
      curl -s http://localhost:7200/domainmanager/internal/vault/<workflow-id> \
      -XGET > remove_cluster_input.json
    4. Edit the remove_cluster_input.json file by removing the entry referencing the unresponsive host.
      1. Find the reference to the host under the heading:
        "RemoveClusterEngine____2__RemoveClusterFromNsx____0__NsxtRemoveCluster____0__removeNsxtModel"
      2. Delete the entire entry for the problematic host.
      3. Save the edited file as remove_cluster_input_deadhost_removed.json.
    5. Update the workflow with the new file.
      curl -s http://localhost:7200/domainmanager/internal/vault/<workflow-id> -XGET > \
      -XPUT -H "Content-type: text/plain" -d @ remove_cluster_input_deadhost_removed.json
    6. Return to SDDC Manager and repeat step 1.
      It will fail at the logical switches deletion task.
    7. In NSX Manager, clear the host as follows:
      1. Navigate to NSX > Nodes > Transport Nodes and delete the same host as a transport node.
      2. Navigate to NSX > Nodes > Hosts and delete the same host.
    8. Return to SDDC Manager and repeat step 1. The cluster is deleted.
       
  • Removal of a dead host from an NSX-T workload domain fails. Subsequently, the removal of the workload domain fails

    This issue is seen when the workload domain is created successfully and one of the hosts is dead. In some cases, where the creation of NSX-T workload domain fails, the subsequent attempt to delete the dead hosts fails as well. The host is disconnected from vCenter. Therefore, you need to connect back manually.

    Workaround:

    If you are unable to decommission a host, perform the following steps:

    1. Log in to NSX-T Manager.
    2. Identify the dead host under Fabric->Nodes->Transport Nodes.
    3. Edit the transport node and select the N-VDS tab.
    4. Under Physical NICs, assign vmnic0 to the uplink.
    5. To delete the workload domain, retry the failed domain deletion task from SDDC Manager.

  • You are not able to add a cluster or a host to a NSX-T workload domain that has a dead host

    If one of the hosts of the workload domain goes dead and if you try to remove the host, the task fails. And then, that particular host is set to the deactive state without an option to forcefully remove it. In this condition, if you try to add a new cluster or add a host to the workload domain, the task runs for a long time and then fails eventually.

    Workaround: Bring the dead host back to normal state, after which you would be able add a cluster and a host.

  • NSX-T workload domain creation fails at the 'Join vSphere hosts to NSX-T Fabric' task

    If an NSX-T workload domain creation workflow fails at "Join vSphere hosts to NSX-T Fabric" task, it is because NSX-T could not be installed on one of the hosts. The installation failure can be seen on the NSX-T Manager UI.

    This happens intermittently. When this happens, detection of it fails until some further point. Eventually task fails after a long wait.

    Workaround: Log in to NSX-T Manager, check the host that has the NSX-T installation failure, and delete it from the fabric. Then create the workload domain.

  • The deletion of the domain does not clean up the NSX-T VIBS

    This issue occurs when the SOS cleanup fails to remove the NSX-T VIBS. 

    Workaround:

    To remove the NSX-T VIBS manually:

           1. Detect the NSX-T VIBS present on ESXi host with the following command:

                esxcli software vib list | grep nsx

           2. Remove each NSX VIB by using the following command:

               esxcli software vib remove -n

    Retry step 2 if you see a dependency error. 

  • The SoS clean-up does not clean hosts used in an NSX-T workload domain

    While removing the NSX-T VIB from the ESXi hosts, the SoS cleanup fails at the network clean up stage.

    Workaround: Follow step 4 in the Remove a Host From NSX-T or Uninstall NSX-T Completely section in the VMware NSX-T Installation Guide.

  • VI workload domain creation fails and restart does not work

    The VI domain creation failed with the "Image with product type vCenter with version 6.7.0-13010631 and image type INSTALL is not found" error. When you upload the install bundle and restart the task, the restart process also fails with the same error.

    Workaround: Create a new workload domain.

  • If you alter or modify the VMware Cloud Foundation created elements (such as port groups or segments), workload domains may not be created correctly

    It is highly recommended that port groups, logical switches or any elements created by VMware Cloud Foundation should not modified or deleted.

    Workaround: Contact VMware Global Support.
     

  • A vCenter Server on which certificates have been rotated is not accessible from a Horizon workload domain

    Cloud Foundation does not support the certificate rotation on the Horizon workload domains.

    Workaround: Refer to https://kb.vmware.com/s/article/70956.

  • Deploying partner services on an NSX-T workload domain displays an error

    Deploying partner services on an NSX-T workload domain such as McAfee or Trend displays the “Configure NSX at cluster level to deploy Service VM” error.

    Workaround: Attach the Transport node profile to the cluster and try deploying the partner service. After the service is deployed, detach the transport node profile from the cluster.

  • Add cluster task fails

    Adding a cluster fails at validating the network connectivity of the hosts. This issue occurs when you add a cluster to an NSX-T workload domain which inserts the inventory details of the new cluster into database, but fails to create the cluster in the other areas of Cloud Foundation system (for example, vCenter, NSX-T, and so on). It fails at validating the host network connectivity. If you try to delete the partially created cluster, the task fails at the "Gather input for NVDS to VDS migration from vCenter and NSX-T manager" task.

    Workaround: 

    1. Run the following command:

       curl -k -X GET http://localhost/inventory/clusters

    2. Find the <cluster-id>  from the above response for the cluster need to be deleted in backend.

    3. Delete cluster from inventory:

        curl -k -X DELETE http://localhost/inventory/extensions/vi/clusters/<cluster-id>

    4. Decommision the hosts that are part of the cluster.

    5. Perform a SoS cleanup for the hosts that are part of the cluster.

    6. Commission the hosts that are part of the cluster.

  • If the witness ESXi version does not match with the host ESXi version in the cluster, vSAN cluster partition may occur

    vSAN stretch cluster workflow does not check the ESXi version of the witness host. If the witness ESXi version does not match the host version in the cluster, then vSAN cluster partition may happen.

    Workaround:

    1. Upgrade the witness host manually with the matching ESXi version using the vCenter VUM functionality.
    2. Replace or deploy the witness appliance matching with the ESXi version.

  • vSAN partition and critical alerts are generated when the witness MTU is not set to 9000

    If the MTU of the witness switch in the witness appliance is not set to 9000, the vSAN stretch cluster partition may occur.

    Workaround: Set the MTU of the witness switch in the witness appliance to 9000 MTU.

  • The “Got bad CSRF token; invalid CSRF token” error message appears

    The options of the csurf middleware are not configured properly. When you try to log in again after being logged out for almost two hours, the CSRF token error appears. This issue occurs because the old CSRF cookies have a conflict with the newly created session cookies.

    Workaround: To reset cookies for the CSRF tokens, log out from the SDDC Dashboard and log in back again. 

  • The removal of an NSX-T workload domain host fails during the transport node deletion phase

     While deleting a transport node, NSX-T may display Timeout Exception and the host is not deleted.

    Workaround:

    1. Login to NSX-T UI.
    2. Select the Transport Nodes in failed state
    3. Click Remove NSX.
    4. Select Force Delete.
    5. Click Ok.
    6. Restart the failed workflow on the SDDC Manager Dashboard.
  • When you select NSXT_CONTROLLER in the supported entityTypes drop down, an empty list is returned

    NSX-T controllers are not supported by the NSX-T version included in the current release of Cloud Foundation. If this is selected on the UI, it returns no results.

    Workaround: None. This is a cosmetic issue. It will not impact any workflow or functionality.

  • Adding a cluster to an NSX-T workload domain fails

    After you trigger the Add Cluster workflow for an NSX-T domain, if the workflow fails before creating the cluster in vCenter Server, then the Remove Cluster workflow for that particular cluster will also fail. This issue occurs only when the Add Cluster task fails before it adds a cluster to vCenter Server.

    Workaround: 

    1) Delete the cluster from the inventory:

        curl -k -X DELETE http://localhost/inventory/extensions/vi/clusters/<cluster-id>

    2) Trigger a new workflow to add the hosts to form a new cluster.

     

  • When you create two NSX-T workload domains, the transport nodes from the second workload domain stay in the not-configured state

    This issue occurs when the multiple NSX-T workload domains are created with the same cluster name. The creation of workload domain may fail during cluster configuration.

    Workaround: Assign unique names to each cluster in the domain.

  • The certificate rotate operation on the second NSX-T domain fails

    Certificate rotation works on the first NSX-T workload domain in your environment, but fails on all subsequent NSX-T workload domains.

    Workaround: None

  • Add cluster operation fails

    Adding a cluster to a workload domain with 50 or more VMware ESXi nodes may fail.

    Workaround: Contact VMware Support for help.

Security Operations Known Issues
  • Unable to perform password management operations from the SDDC Manager Dashboard

    If the Cloud Foundation Operations Manager component is restarted while a password management operation is in progress, the password management operation ends up in an inconsistent state. You will not be able to perform any password management operations until you cancel the inconsistent operation.

    1. SSH into the SDDC Manager VM as the vcf user.
    2. Type su to switch to the root account.
    3. Run
      curl "http://localhost/security/password/vault/transactions" 
      -H "privileged-username: <privileged username>" -H "privileged-password: <privileged password>"|json_pp
      This returns information about the inconsistent operation. For example:
      [
      {
         "transactionStatus" : "INCONSISTENT",
         "workflowId" : "f6548e76-f3e9-4033-801d-36ccae893672",
         "transactions" : [
            {
               "oldPassword" : "AX1are276!",
               "username" : "administrator@vsphere.local",
               "entityName" : "psc-1.vrack.vsphere.local",
               "id" : 123,
               "newPassword" : "x%5N6H1A^p%wJ4N",
               "entityType" : "PSC",
               "credentialType" : "SSO",
               "workflowId" : "0daaac30-c88d-4407-8bf3-c791541ebbae",
               "transactionStatus" : "INCONSISTENT",
               "timestamp" : "2019-04-09T09:00:29.628+0000",
               "transactions" : [
                 
               ]
            }
         ],
         "type" : "ROTATE",
         "id" : 1
      }
      ]
    4. Using the ID of the inconsistent transaction ("id" : 1 in the example above), run the following:
      curl -X DELETE http://localhost/security/password/vault/transactions/<ID>|json_pp
      This returns something like the following:
      {
          "transactionId":1,
          "workflowId":"f6548e76-f3e9-4033-801d-36ccae893672",
          "status":"USER_CANCELLED"
      }
  • Addition of members from PKS UAA to Harbor library fails when the certificate verification is enabled

     This issue occurs when Harbor does not honor the certificate chain under System Settings > Registry Root Certificate.

    Workaround:

    1. SSH into the SDDC Manager VM as the vcf user.

    2. Run the following command. Make sure to update the password of the admin user and the Harbor URL:
    curl -k -H'Content-type: application/json' -u admin:"< >" -XPUT https://harbor.vrack.vsphere.local/api/configurations -d '{"uaa_verify_cert":"false"}'

    Harbor is in the UAA authentication mode and it uses members from PKS UAA.

    To create a user in UAA:
    1. Connect through SSH to Ops Manager appliance
    2. Run the following:

    uaac target https://pks.vrack.vsphere.local:8443 --skip-ssl-validation

    uaac token client get admin

    uaac user add <<user-name> > --emails <<email> >

Known Issues Affecting Service Providers
  • Domain manager workflows fail when using SDDC Manager to manage an API-created cluster or domain.

    You cannot manage clusters or workload domains created with an API through the SDDC Manager Dashboard. This includes adding and removing hosts, workload domains, and clusters.

    Workaround: None. Clean up the failed workflow and try again using an API.

Multi-Instance Management Known Issues
  • Federation creation information not displayed if you leave the Multi-Instance Management Dashboard

    Federation creation progress is displayed on the Multi-Instance Management Dashboard. If you navigate to another screen and then return to the Multi-Instance Management Dashboard, progress messages are not displayed. Instead, an empty map with no Cloud Foundation instances are displayed until the federation is created.

    Workaround: Stay on the Multi-Instance Dashboard till the task is complete. If you have navigated away, wait for around 20 minutes and then return to the dashboard by which time the operation should have completed.

  • When a new controller is added to a federation, the status turns red for all the members

    This issue may occur when a controller leaves the federation and a new controller joins the federation. The new controller processes all the old records on the message bus before it joined. Some records may be processed incorrectly and the controller member status turns red.

    Workaround: The controller member must leave the federation and join again.

  • The federation creation progress is not displayed

    While federation creation is in progress, the SDDC manager UI displays the progress on the multi-site page. If you navigate into any other screen and come back to the multi-site screen, the progress messages are not displayed. An empty map with no VMware Cloud Foundation instances is displayed until the federation creation process completes.

    Workaround: None