VMware Integrated OpenStack 7.2.1 Release Notes

VMware Integrated OpenStack 7.2.1 \| 07 JUL 2022 \| Build OVA 20003386, Patch 20003387 Check for additions and updates to these release notes.

VMware Integrated OpenStack 7.2.1 | 07 JUL 2022 | Build OVA 20003386, Patch 20003387

Check for additions and updates to these release notes.

What's in the Release Notes

The release notes cover the following topics:

About VMware Integrated OpenStack

VMware Integrated OpenStack greatly simplifies deploying an OpenStack cloud infrastructure by streamlining the integration process. VMware Integrated OpenStack delivers out-of-the-box OpenStack functionality and an easy configuration workflow through a deployment manager that runs as a virtual appliance in vCenter Server.

What's New

New Features and Enhancements:

Enhanced Nova performance.

In previous releases, some issues in the nova-compute service have been identified to have a big impact on its overall performance. Improvements on those issues allow a nova-compute service to support the same number of compute hosts as limited by a vCenter compute cluster, and the time required for service start/restart and instance operations (such as create, delete, etc), especially when there already exist a large number of nova instances and/or networks in its associated compute cluster, has been significantly improved.
Extended attached volume.

This feature is enabled by default. User can extend attached volume size from UI or via OpenStack command 'cinder extend ${volume_id} ${new_volume_size}'.
Enhanced Health Check Function in viocli command.
- Add certification expiration date check for VIO, LDAP, vCenter and NSX.
- Add glance image location format check.
- Add vio service desired status check.
- Add nova compute pod status check.
- Add vCenter host resource pressure check.
Trunk Subport Dual Stack support.

Every trunk has a parent port and can have any number of subports. Each subport can support dual-stack (v4 and v6) IP addresses.

Upgrade to Version 7.2.1

Upgrade from previous 7.x version of VIO, use viocli patch command. For more information, see the product installation guide.
VIO 7.2.1 does not support direct upgrade from 6.x, please upgrade to 7.2 or 7.2.0.1 firstly.

Compatibility

Refer to VMware Product Interoperability Matrices for details about the compatibility of VMware Integrated OpenStack with other VMware products.

Deprecation Notices

The following networking features have been deprecated and will be removed in VIO next release:
- The NSX Data Center for vSphere driver for Neutron.
Neutron FWaaSv2 will be deprecated in a future version.

Resolved Issues

Tier1 gateways could not rollback completely during large-scale MP2P migration.

Some tier1 gateways could not roll back completely, and the deletion status remained in progress during large-scale MP2P migration. Unsuccessful rollback might have caused due to an error during migration.
Live migration fails under specific scenario due to error, "Out of memory".

In case that the same image is used to boot instances on 2 different compute clusters with different flavors, for example, instance1 is from imageA with flavor1 (root disk 5 G) on compute01 and instance2 is from imageA with flavor2 (root disk 1G) on compute02, and the root disk usage in instance1 is larger than 1G. If user attempts to live migrate instance1 from compute01 to compute02, he will fail to do that.

If user has vmfs datastore on destination, he will see the following message in hostd.log

reason = "Failed waiting for data. Error bad0006. Limit exceeded."

If user has vsan datastore on destination, he will see the following message in hostd.log

reason = "Failed waiting for data. Error bad0014. Out of memory."
Deactivate TLSv1 and TLSv1.1 for nova vspc service.

Only TLS v1.2 is enabled for nova vspc service.
Unable to start the VIO deployment.

Customer stopped/deactivated some VIO service and forget to bring that service up. In such case, "mode: stop" was added on some service in osdeployment. When customer tried to start the services with command "viocli start services", the startup stuck at the pod "nova-cell-setup-xxxxx" pod and deployment status stuck at STARTING.
Fail to increase the size of a volume which is attached to a server.

Customers try to extend an attached volume via OpenStack command 'cinder extend ${volume_id} ${new_volume_size}', but extend command fails with error message, "ERROR: Policy doesn't allow volume:extend_attached_volume to be performed. (HTTP 403)". Dynamic extension is not supported in the previous release.
Make "keystone-shib-key" and "keystone-idp-saml2-metadata" consistent cross VIO restore.

In VIO previous version we don't backup the certificate/key data like keystone-shib-key" and "keystone-idp-saml2-metadata" and just provide instructions in docs for manually backing up the certificates/keys from VIO, VIO 7.2.1 Provide consistent certificate/key including "keystone-shib-key" and "keystone-idp-saml2-metadata" after a restoration.
Failed to create volume from a volume snapshot.

In specific case, Users will fail to create volume from a volume snapshot. Suppose users have a volume named “volume-0” whose volume type is not “__DEFAULT__”. Users created a volume snapshot named “volume-snapshot” from “volume-0”. When Users try to create volume from “volume-snapshot”, it will fail.
Data loss due to restarting of all mariadb pods.

On specific scenarios, database cluster could become partitioned. This could lead to data loss if the database node with stale data becomes the leader node after restart. VIO 7.2.1 adds a fix to prevent database partition from happening.
The Kubernetes ingress controller used by the VMware Integrated OpenStack Manager Web UI uses an outdated Nginx 1.15.10 version.

Ngnix is now updated to 1.20.1 for both refresh installations and patched setups. For 7.2 deployments upgraded from previous 7.x releases, please follow kb.vmware.com/s/article/88012.
Fail to enable Ceilometer when there are 10k Neutron tenant networks.

When there are large amounts of resources, such as networks created in vSphere, VIO will generate many custom resources for those objects. If the number of CRs are too large, VIO Manager Web UI will fail on the backend API because the response data are too large for HTTP requests.

Known Issues

NSX N/S cutover migration fails with the following error: NSX-V edge edge-XX is used more that one times in input mapping file

This issue occurs due to a problem with router bindings management in the Neutron NSX-V plugin. In some cases, the same edge can be mapped to multiple neutron routers. When this occurs, only one of the neutron routers mapped to a given appliance will be functional; the other routers will not be able to forward any traffic on the NSX-V backend. This occurs only with distributed routers.
Workaround:

This problem can be avoided in VIO 7.2.1 with the following instructions:
1. Identify the neutron routers leveraging the same edge appliance via the NSX admin appliance, there will be more than one row for a given edge appliance.
2. Check the edge appliance on the NSX-V backend: the neutron router id associated with that appliance will be encoded in the edge name.
3. Delete and recreate the other neutron NSX-V routers that are using the same appliance. These routers will be non-functional in any case as no edge appliance is currently serving them.
4. Retry the migration.
Alternatively:
1. Identify the neutron routers leveraging the same edge appliance via the NSX admin appliance, there will be more than one row for a given edge appliance.
2. Go to NSX V2T migration UI, download the edge mapping file uploaded by the VIO migration tool. This will be a JSON file.
3. For the specific edge appliance, remove any mapping for neutron routers different from the ones specified in the edge appliance.
4. Upload the modified JSON file and retry the migration. The routers removed from the list will still be migrated. The fact that they are removed from the mapping file means that temporary connectivity will not be established for them during the migration, but since these routers are already non-functional this will not cause any additional downtime.
There is a large number of unused NSX services.

Every time there is an event which triggers regeneration of firewall rules, like updating a rule's firewall policy, the routine that manages NSX Services will generate a random identifier for the rule's service, thus leading to the creation of an additional service.

Workaround: Removing stale service entries.

The easiest approach might is to look for services owned by neutron and then delete all those that are not used by firewall rules.Use search API to retrieve those:

/policy/api/v1/search?resource_type:Service%20AND%20tags.scope:os-router-firewall

The above query will return services created by Neutron. Please be aware of paginated API responses.Iterate over the results and delete all services. Deletion will fail for those that are in use by a firewall rule, and will succeed for the unused ones.
Temporary loss of E/W and N/S connectivity for VMs attached to exclusive routers during migration.

During “edge migration” phase, for networks uplinked to “exclusive” routers, N/S connectivity is lost. E/W connectivity across neutron exclusive routers also stops working. Connectivuty is restored at the beginning of the “host migration” phase. As a result, VMs might experience a longer than expected downtime. This is happening because NSX-T edges are not sending a GARP to advertise the NSX-T edge MAC.

Workaround: None
After a subport is detached from its parent trunk, the corresponding network interface is still able to send/receive traffic.

When a subport is detached from its parent trunk port, even if the configuration is correctly updated in NSX, the interface attached to the subport is still able to send and receive traffic.

Workaround: Delete and re-create the subport.
Updating Metadata of an Image Reflects Error on UI.

Initially, multiple glance datastore were configured. Each glance datastore pointed to different vCenters and images had been created in VIO. Then, if certain glance datastore configuration is deleted, updating metadata of existing images from horizon reflects error on UI.

Workaround:

We do not suggest to remove the glance datastore configuration if images in VIO environment already exist. In case of this issue, please add the glance datastore configuration back.
Subports with dual stack, once removed from the trunk, cannot be reused as regular ports.

Workaround: None.
Openstack network deletion fails, neutron reports the following error: "An unexpected error occurred in the NSX Plugin: Backend segment deletion for neutron network 98d5c37d-74ea-4fd9-8d47-09b5a864f940 failed. The object was however removed from the Neutron database: Cannot delete segment as it still has VMs or VIFs attached".

In some circumstance, there could be some conflicts in NSX when deleting a segment as it might report that a segment port is still attached whereas it has been removed. Instead of retrying immediately, NSX will wait for the next realization cycle, by default after 5 minutes. In the meanwhile, neutron would return the error received by NSX, report deletion failure from NSX, and remove the record for the network from the Neutron DB.

Workaround: Normally no workaround is needed, since the segment will be removed from NSX within the next 5 minutes.Users might check for this segment searching it on NSX by neutron network id and verify it's been actually deleted after 5 minutes.
Limitations on extending attached volume
- If the attached volume type is "eagerZeroedThick", extending action will not zero out the extended disk.
- Extending attached volume does not support FCD backend.
- Multi-attached volume is not supported.
Workaround: None
Public API rate limiting is not available.

In VMware Integrated OpenStack 7.2, it is not possible to enforce rate limiting on public APIs.

Workaround: None. This feature will be offered in a later version.
OpenStack port security cannot be enforced on direct ports in the NSX-V Neutron plugin.

Enabling port security for ports with vnic-type direct can be ineffective. Security features are not available for direct ports.

Workaround: None.
Cannot log in to VIO if vCenter and NSX password contain $$.

If the VIO account configured for the underlying vCenter and NSX use the password that contains "$$", VIO cannot complete the authentication for vCenter and NSX due to "$$" used in the password. The OpenStack pods can run into CrashLoopBackOff.

Workaround: Use other passwords that do not contain "$$".
Users could not download the Glance image from the OpenStack CLI client.

When downloading an image from Openstack CLI, there is an error: "[Errno 32] Corrupt image download." This is because VIO stores the image as a VM template in the vSphere datastore by default. The md5sum value is not saved between VMDK and VM template.
Workaround: The Glance image could be downloaded with the below configurations:
- The option vmware_create_template is false in the Glance configuration.
- The user creates the Glance image using Openstack CLI with the property "vmware_create_template=false".
Duplicate entry error in Keystone Federation.

After deleting the OIDC in Keystone Federation, if the same user tries to log in with OIDC, authentication fails with a 409 message.

Workaround: Delete the user either through Horizon or OpenStack CLI.

For example:

1. In Horizon, log in with an admin account.

2. Set the domain context with the federated domain.

3. In the user page, delete the user with User Name column is None.

In OpenStack CLI

openstack user list --domain <federated domain name>

openstack user delete <user id> --domain <federated domain name>
The certificate needs to be CA signed and re-applied after restoration.

The certs secret which contains the VIO private key and certificate does not backup currently. After not-in-place restoration, the cert imported previously is not present in the new deployment.

Workaround:

1. Save the certs secrets from the original deployment.

osctl get secret certs -oyaml > certs.yaml

2. After restoration, replace the "private_key" and "vio_certificate" values in certs secret with the data from step 1.

3. Stop/Start services.
Cannot create instances on a specific Nova-Compute node and the Nova-Compute log is stuck.

When creating an instance, it is in a BUILD state and never succeeds. If you check the nova-compute log, there are only a few logs and without more information.

Workaround: Restart the nova-compute pod manually.
The FWaaS v2 rule is enforced on all downlink ports of a Neutron router, regardless of FWaaS bindings.

This behavior is specific to NSX-V distributed routers. For these routers, the NSX-V implementation is between the PLR and the DLR. FWaaS rules run on the PLR, but downlink interfaces are on the PLR. Therefore, the firewall rules apply to all traffic going in and out of the downlink.

Workaround: For distributed routers, explicitly include source and target subnets to match the downlink subnet CIDR. Either make sure the firewall group applies to each port on the router or use a centralized router instead of a distributed router.
Deactivating DRS on the edge cluster can trigger deletion of the resource pool used by VIO.

Deactivating DRS on the edge cluster can trigger the VIO resource pool, which will no longer manage edge appliances anymore.

Workaround: Do not perform the following steps on the edge cluster:

1. Right-click on the edge cluster.

2. Deactivate DRS on the edge cluster.
After migrating from NSX-V to NSX-T, VMs are not able to access the Nova metadata service. New VMs created after the migration can access the Nova metadata service.

The VMs migrated from NSX-V have a static route for redirecting metadata traffic via the DHCP server. This configuration does not work on NSX-T, as NSX-T injects an on-link route for metadata access.

Workaround: Reboot the VM, or if you have access to the VM, remove the static route via the DHCP server and renew the DHCP lease so that the new lease will be provided by the NSX-T DHCP server with the appropriate route for accessing the Nova metadata service.
When using the "viocli update " command to update CR, an error may occur if you enter a large integer as a value. For example, profile_fb_size_kb: 2097152.

Large integers will get converted to scientific notation in some cases by VIO helm charts.

Workaround: Add quote around the large integer. For example, profile_fb_size_kb: "2097152".
Snapshots on a controller node prevent some VIO operations.

The persistent volume on a controller node cannot be moved if a snapshot of the controller node exists. Therefore, taking a snapshot of a controller is not supported by VIO.

Workaround: Delete all snapshots on controller nodes.
Volumes created from images are always bootable by default.

If you include the --non-bootable parameter when creating a volume from an image, the parameter does not take effect.

Workaround: After the volume has been created, update it to be non-bootable.
With VIO NSX-V integration, Openstack instances are unable to send and receive traffic when multiple compute clusters are defined.

The Neutron default security group, which allows DHCP traffic, is created when the neutron-server is first started. If compute clusters are added to the VIO deployment, they are not automatically added to the default security group.

Workaround: Use NSX admin utilities to rebuild the default firewall section as follows:nsxadmin -r firewall-sections -o nsx-update
V2T migrator job fails in API while migrating Neutron networks to NSX-T. An error like the following is reported:2021-05-18 17:16:00,612 ERROR Failed to create a network:: Invalid input for operation: Segmentation ID cannot be set with transparent VLAN.

The NSX-V Neutron plugin allows for setting VLAN transparent network settings for provider VLAN network. This setting does not make much sense as the provider network will only use a specific VLAN. The NSX-T Neutron plugin does not allow such configuration.

Workaround: Unset VLAN transparency for the network and retry.
In some cases, API replay will fail with internal error while configuring router external gateways. The Neutron migrator job logs will display an error like ERROR Failed to add router gateway with port : Request Failed: internal server error while processing your request.

This happens because the temporary Neutron server running inside the migrator job's pod checks for Tier-1 realization on NSX-T. In some rare cases, this realization might be extremely slow and timeout. When this issue manifests, the temporary Neutron server logs (/var/log/migration/neutron-server-tmp.log) will report an error like the following: 2021-05-07 10:36:31.909 472 ERROR neutron.api.v2.resource vmware_nsxlib.v3.exceptions.RealizationTimeoutError: LogicalRouter ID /infra/tier-1s/ was not realized after 50 attempts with 1 seconds sleep

Workaround: There is no workaround for this issue. In most cases retrying will be enough. If the issue occurs persistently, it will be necessary to troubleshoot NSX-T to find the root cause for slow realization.
In some cases, an Openstack Load Balancer will go into ERROR state after adding a member to one of its pools.

This happens because the selected member is configured on the downlink of a router that is already attached to an NSX load balancer. The Openstack driver is not able in this case to re-use the existing LBS.

Workaround: Re-create the Openstack LoadBalancer using a VIP on a downlink network. Then associate a floating IP to the VIP port.
After the migration, there can be Neutron logical ports for non-existing load balancers.

After V2T migration, Neutron ports for load balancers in an error state are migrated, even if the corresponding load balancers are not migrated. The V2T migration process skips Octavia load balancers in an ERROR state. This is because load balancers in such a state might not be correctly implemented on NSX-T; in addition, the ERROR state is immutable, so you must delete these load balancers. The corresponding logical ports are however migrated.

Workaround: These ports can be safely deleted once the migration is completed. The issue can be completely avoided by deleting load balancers in the ERROR state before starting the migration.
Failure while removing a Neutron router gateway.

The error reported is "Cannot delete a router_gateway as it still has lb service attachment." However, there is no Octavia load balancer attached to any of the router's subnet; likewise, no Octavia load balancer is attached to the external network and has members among router downlinks.

Workaround: Prior to VIO7.2, the only workaround is to detach the load balancer from the Tier-1 router on the backend. With VIO7.2 and admin, a utility is provided to remove stale load balancers from NSX-T.List orphans: nsxadmin -r lb-services -o list-orphanedDelete orphans: nsxadmin -r lb-services -o clean-orphaned
The console and log of recovered Nova instances cannot be shown on Horizon.

After the disaster recovery procedure, log in to the Horizon at the target site and navigate to Project->Compute->Instances, take a look at the log and console for every recovered instance, it displays empty.

Workaround: Create a new Nova instance at the target site and check each recovered Nova instance again, it must work.
V2T migration fails during the N/S cutover at the "edge" stage. The message returned by the UI is "Transport zone not found "On the NSX-T manager instance where the migration-coordinator service is running /var/log/migration-coordinator/v2t/cm.log shows the failure occurs while creating the "migration transit LS" for a distributed edge in the Neutron backup pool (edge name starts with "backup-").

The V2T migrator tries to create a transit switch for the N/S cutover for each NSX-V distributed router. Therefore, it scans all distributed edges and for each one performs this operation. In order to succeed, the edge must have at least a downlink interface, so the migrator can find the relevant transport zone. However, Neutron might keep some distributed edges in the "backup pool". These are unconfigured, ready-to-use edge appliances, which however cause the V2T migrator to fail.

Workaround: From NSX-V, delete distributed edges in the Neutron backup pool. This will have no effect on Neutron/NSX-T operations
After setting a firewall group administratively DOWN (state=DOWN), the firewall group operational status is always DOWN, even after the firewall group admin state is brought back UP.

The neutron-fwaas service will ignore changing operational status on transitions that do not involve adding a port or removing a port from the firewall group.

Workaround: Add or remove a port, or you can add and remove a port that is already bound to the firewall group.