TAS for VMs v6.0 Breaking Changes

This page lists breaking changes when upgrading VMware Tanzu Application Service for VMs to v6.0.

Default stacks no longer include cflinuxfs3

New installations of TAS for VMs v6.0 no longer default to include cflinuxfs3 in the list of stacks. Operators can continue to Configure Cloud Controller to install cflinuxfs3, if desired.

Upgrading an existing foundation to TAS for VMs v6.0 does not remove cflinuxfs3. Apps running on cflinuxfs3 continue to run as normal, and developers can continue to push applications using the cflinuxfs3 stack. However, if operators remove cflinuxfs3 from an existing foundation, then cflinuxfs3 is no longer installed when deploying TAS for VMs, unless the default stack list is configured to include cflinuxfs3 (see above).

Support for the cflinuxfs3 stack is deprecated and will be removed in a future release of TAS for VMs. If you have not already, migrate all applications off of cflinuxfs3 and remove the stack from your foundations.

Absolute CPU Entitlement metrics no longer available

The absolute_entitlement and absolute_usage metrics are no longer emitted for each container. They are replaced by the cpu_entitlement metric. If you have any dashboards that reference the absolute_entitlement and absolute_usage metrics, update the dashboards to use the new metric.

Due to the removal of these metrics, the experimental CPU Entitlement Plug-in no longer functions. If you use this plug-in to view CPU entitlement usage, you can instead view the cpu_entitlement metric, for example using the Log Cache cf CLI plug-in.

Breaking changes if you are starting from TAS 5.0

If you are upgrading from TAS 5.0 to TAS 6.0, review the following breaking changes to ensure a smooth upgrade.

App Autoscaler API requires JRE 17

Versions Introduced

Version(s)

2.11.37

2.13.19

3.0.9

4.0.1

Why Should I Be Concerned About This Change?

The App Autoscaler API may error during deployment due to it now requiring JRE 17.

How Can I Tell If I’m Impacted?

You are impacted if you have customized the java offline buildpack to use a JRE other than OpenJDK and the default JRE version in the buildpack or defined by an environment variable group is not JRE 17.

You are likely impacted if the following line appears in the logs for the autoscale-api application:

ERR java.lang.UnsupportedClassVersionError:
org/springframework/boot/loader/JarLauncher has been compiled by a more
recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0

What Should I Do About It?

Ensure that JRE 17 is available in your java offline buildpack.

If the java runtime you are using is the Oracle JRE, then upgrade to a version of TAS that ships with cf-autoscaling version 249.2.6. As of that version of cf-autoscaling, the autoscale-api application configures JBP_CONFIG_ORACLE_JRE to self-select Oracle JRE 17.

Alternatively, customers can temporarily override their buildpack defaults in order to run the deploy-autoscaler errand:

Set an environment variable group that changes the default version of Java to 17 across all applications in a foundation, e.g.
```
$ cf set-staging-environment-variable-group '{"JBP_DEFAULT_OPEN_JDK_JRE":"{jre: {version: 17.+ }}"}'
```
Generate the correct parameters for your java runtime by viewing the options available in the java buildpack. Also be careful to merge your new parameters with any parameters already set in the environment variable group.
Trigger the deploy-autoscaler errand and see it succeed.
Remove the parameters that you added to the environment variable group and re-set the remaining parameters to undo your changes.

Background

The App Autoscaler API was bumped to JRE 17. Affected versions set the required JRE version with an OpenJDK JRE specific environment variable.

Spring Boot 3 & Java Buildpack AutoReconfiguration

Versions Introduced

Version(s)

2.11

2.13

3.0

4.0

5.0

Why Should I Be Concerned About This Change?

The Spring AutoReconfiguration library supplied by the buildpack is deprecated, as well as the Spring Cloud Connectors project. It is recommended to use the java-cfenv library instead for accessing bound services in Spring Boot apps. To make it easier to migrate to this library, the Java Buildpack will now install the Java CfEnv library for Spring Boot 3 apps only, and the Spring AutoReconfiguration library will no longer be installed for these apps.

How Can I Tell If I’m Impacted?

Apps may be affected if they have already been migrated to Spring Boot 3 and are relying on Spring Autoreconfiguration, i.e.

You have not set the variable JBP_CONFIG_SPRING_AUTO_RECONFIGURATION '{enabled: false}'
The app is bound to a service of one of these types:
- Cassandra
- Relational Database
- RabbitMQ
- Mongo DB
- Redis
- SMTP
The startup logs for the app will show a log entry such as 'dataSource' bean of type with 'javax.sql.DataSource' reconfigured with 'mysql' bean

Apps using Spring Boot 2.x will continue to receive the Spring AutoReconfiguration library and should not be affected by this change.

What Should I Do About It?

In most cases, the Java CfEnv library should replace the functionality of the Spring Autoreconfiguration library for Spring Boot 3 apps. The Java CfEnv library examines bound services of the above types (except SMTP) and sets well-known Spring Boot properties so that Spring Boot’s Autoconfiguration can kick-in.

Background

The Spring AutoReconfiguration library uses the Spring Cloud Connectors project which has been deprecated since 2019. Java CfEnv is the recommended library for accessing bound services.

UAA’s SAML IdP functionality removed

Why Should I Be Concerned About This Change?

UAA’s ability to act as a SAML identity provider has been removed in preparation for replacing its dependency on Spring SAML Extension with Spring Security SAML 2 support. You can no longer use UAA as your SAML IdP.

Note that UAA’s ability to integrate with an upstream SAML identity provider as a SAML service provider is unaffected by this change.

How Can I Tell If I’m Impacted?

If you use UAA as a SAML IdP, you are impacted. If you are unsure, get the list of registered SAML service providers from UAA’s “/saml/service-providers” endpoint. (See https://docs.cloudfoundry.org/api/uaa/version/76.31.0/#list.) If you get a non-empty list in response, then you are using UAA as a SAML IdP.

Note that “/saml/service-providers” endpoint has been also removed from the latest UAA version as part of the SAML IdP functionality removal.

What Should I Do About It?

UAA now supports acting as an identity provider over OIDC. If the system that acts as a SAML service provider can also integrate with OIDC identity providers, you should switch it to use that protocol instead.

Background

Spring SAML Extension has reached the end of support. UAA is replacing it with Spring Security SAML 2 support to keep the SAML feature compatible with latest Spring versions. Since Spring Security does not provide Identity Provider support, UAA is dropping its SAML IdP functionality.

Local UAA password policy configuration changed

Why Should I Be Concerned About This Change?

If you have customized the password policy settings for local UAA users, these settings will be restored to the default policy unless you take action.

How Can I Tell If I’m Impacted?

If you currently have configured the local UAA user password policy by setting the fields under the “Internal user store” option in the Authentication and Enterprise SSO pane, then your existing settings will be affected.

What Should I Do About It?

You can instead configure the password policies for local UAA users in a new section on the UAA pane of the TAS tile. This configuration section is available in TAS 4.0 and 5.0 and will be preserved in the upgrade to 6.0.

Background

In previous TAS versions, the local UAA password policies could be configured only if you selected the Internal user store option in the Authentication and Enterprise SSO pane. The local UAA password policy configuration option is now moved to the UAA pane, where you can customize the settings regardless of the Authentication and Enterprise SSO option you chose.

Gorouter uses port 7070 by default for internal route services instead of a random available port

Why Should I Be Concerned About This Change?

This change is only relevant to foundations with third-party services listening on port 7070.

How Can I Tell If I’m Impacted?

SSH into a router VM and use lsof -Pi to determine if port 7070 is being used. Be sure to check any routers deployed by IST as well.

What Should I Do About It?

If your foundation is impacted, use Tanzu Operations Manager to configure use of a different port.

For TAS, go to the Networking tab and scroll down to the Route Services section. If route services are enabled, you will see a text box labeled “The port used for internal route services requests.” Set the value to a known available port, or set the value to 0 to allow the operating system to choose an available port at deploy time.

For IST, go to the Networking tab and scroll down to the text box labeled “The port used for internal route services requests.” Follow the same instructions as you did for TAS.

Breaking changes if you are starting from TAS 4.0

MySQL 5.7 external system database support removed

Why Should I Be Concerned About This Change?

Your TAS deployment will fail or become insecure if you are using MySQL 5.7 as TAS’s external system database, as MySQL 5.7 will soon reach End-Of-Life.

How Can I Tell If I’m Impacted?

You are impacted if you have configured a MySQL 5.7 database instance as the external system database for your TAS deployment. You can check your current system database settings by selecting “Databases” pane in the TAS for VMs tile. If under “System databases location” section, the “External database server” option is selected and the configured database server is based on MySQL 5.7, you are impacted.

What Should I Do About It?

Upgrade your external system database to a supported MySQL version (such as MySQL 8.0).

Background

This change was triggered by the fact that as MySQL 5.7 reaches its official End-Of-Life date (31 Oct 2023), many TAS components as well the database client libraries that they depend on will follow suit to remove support and testing for MySQL 5.7.

NATS intentionally fails deployment in post-start in v1

Why Should I Be Concerned About This Change?

Operators can ensure successful deployment of NATS servers, which propagate routes from services and apps to Gorouter, by confirming that their nats-release instances have successfully migrated to NATS v2.

How Can I Tell If I’m Impacted?

If your TAS environment is already on v2.11.26 or greater, make sure that your NATS instances have successfully migrated by checking that NATS 2.0 is running (see KB article in link).

What Should I Do About It?

If your deployment fails, check nats instance logs. Migration details, including any possible error messages, can be found under /var/vcap/sys/log/nats-tls/nats-tls-wrapper.stdout.log

Background

In TAS v2.11.26, nats-release upgraded its underlying software package from NATS 1.0 (package name gnatsd) to NATS 2.0 (package name nats-server). The nats-release contains NATS 1.0, which will start as a fallback in case the migration to NATS 2.0 fails. In preparation for the future removal of NATS 1.0, nats-release will fail in post-start if it detects that NATS 1.0 is running instead of NATS 2.0.

App Autoscaler API no longer accepts trailing slashes

Why Should I Be Concerned About This Change?

Existing clients of the App Autoscaler API may break if they make requests to resources

including trailing slashes.

How Can I Tell If I’m Impacted?

If you have written code that makes requests to the App Autoscaler API with trailing slashes or have documentation that describes making requests with trailing slashes you are impacted.

Requests that specify trailing slashes will now receive a 404 response.

This does not affect users who are using the autoscaler cf CLI plugin.

What Should I Do About It?

Modify your client or documentation to no longer make requests with trailing slashes.

Background

This is a Spring Framework default change to improve security posture:

https://github.com/spring-projects/spring-framework/issues/28552

Logging timestamp format changed for some TASW jobs

Why Should I Be Concerned About This Change?

Existing automation that parses logs from TASW components may break if they are still expecting timestamps formatted as epoch time. Specifically, logs from groot and garden_windows have been converted to RFC 3339 timestamps.

How Can I Tell If I’m Impacted?

You are impacted if you have log parsing for TASW’s groot and garden_windows jobs that rely on timestamps being in epoch format.

What Should I Do About It?

If it was necessary to adjust log parsing for TAS/TASW 4.0 to account for the change in timestamp format, update log parsing in the same manner, but for the TASW groot and garden_windows jobs.

Background

Non-standardized and non-human readable timestamps in logs make debugging TAS more difficult. Starting in TAS/TASW 4.0, timestamps have been logged using RFC 3339 format in all cases but groot and garden_windows.

Logging timestamp format property removed

Why Should I Be Concerned About This Change?

Existing automation that configures a log timestamp format will break if product config is not updated.

How Can I Tell If I’m Impacted?

You are impacted if you set the logging_timestamp_format product config property.

What Should I Do About It?

This property has had no effect since TAS 4.0. Update your configuration to no longer specify the property.

Background

Non-standardized and non-human readable timestamps in logs make debugging TAS more difficult.

Syslog Aggregate Drains property changes

Why Should I Be Concerned About This Change?

Existing automation that configures aggregate syslog drains will break if product config is not updated.

How Can I Tell If I’m Impacted?

You are impacted if you have configured aggregate syslog drains with the syslog_agent_aggregate_drains property.

What Should I Do About It?

References to the old property should be updated to refer to the new mtls_syslog_agent_aggregate_drains property.

This property is used to define aggregate syslog drains with keys and certificates for Mutual TLS, as well as to define aggregate drains that are not using Mutual TLS.

Comma-separated strings are no longer accepted, instead pass an array of aggregate drains:

.properties.mtls_syslog_agent_aggregate_drains:
value:
 - url: syslog-tls://HOSTNAME:PORT
 - url: syslog-tls://ANOTHER-HOSTNAME:PORT

Additionally, consider if your syslog aggregate drains should be updated to use Mutual TLS.

Background

From TAS 5.0 operators can configure aggregate drains that support Mutual TLS for improved security. This necessitated changing the structure of the syslog aggregate drains property to allow these new fields to be specified.

Metrics Agent deprecated and defaults to off

Why Should I Be Concerned About This Change?

The behavior of any custom integration using Prometheus to scrape the Metrics Agent will change. The Metrics Agent is protected by mutual TLS, and would require using a certificate issued by TAS to access. We do not know of any products or integrations which use this agent.

How Can I Tell If I’m Impacted?

You would have built a custom integration which scrapes all VMs via Prometheus on port 14726.

What Should I Do About It?

If you are using the Metrics Agent, it is possible in TAS 5.0 to re-enable it by unchecking the checkbox in the System Logging configuration page. We would encourage you instead to switch to using the Prometheus Exporter in the OpenTelemetry Collector. If you

are unable to migrate to the Prometheus Exporter please let VMware know as the Metrics Agent is slated for removal in TAS 6.0.

Background

The Metrics Agent was designed as a way for Healthwatch and other components to retrieve metrics without using the Loggregator Firehose. No components ended up adopting it, and with the introduction of the OpenTelemetry Collector we have much more flexible options for egress metrics so are slating this for removal instead of driving adoption towards it.

Breaking changes if you are starting from TAS for VMs v3.0

If you are upgrading from TAS v3.0 to TAS v6.0, review the following breaking changes to ensure a smooth upgrade.

Components logs always use Human-Readable RFC 3339 Timestamp Format

Why Should I Be Concerned About This Change?

From TAS 4.0.0 components logs are always output with RFC 3339 timestamps.

How Can I Tell If I’m Impacted?

Check the “System Logging” pane in the TAS configuration. If “Timestamp format for component logs” is set to “Maintain previous format” then you are impacted.

The corresponding product configuration is where .properties.logging_timestamp_format is set to deprecated.

What Should I Do About It?

If you have systems that rely on the old timestamp format for component logs when parsing log lines then these will need to be updated to handle the RFC 3339 timestamp format.

Background

Non-standardized and non-human readable timestamps in logs make debugging TAS more difficult.

Option to Enable (Beta) Log Rate Limit Removed

Why Should I Be Concerned About This Change?

The option to set a global app log rate limit in TAS/IST/TASW was removed.

If you were using this feature to limit your overall app log throughput, then you may see an increase in log load after upgrading.

How Can I Tell If I’m Impacted?

If you have enabled the App log rate limit option in the TAS/IST/TASW “App Containers” tab then you are impacted.

What Should I Do About It?

You should look to replace use of this feature with org, space, and app, byte-based log rate limits. For details, see App Log Rate Limits.

If you are impacted and concerned about log load in your system during, and after, upgrading, consider either:

Scaling up your logging system before upgrading to TAS v4.0 to compensate for the increased log load.
Upgrading to TAS v3.0 first to set app-level log rate limits while retaining the global app log rate limit.

Background

Since the introduction of org, space, and app log rate limits in TAS v3.0, this feature was moved from beta to deprecated.

Recent logs endpoint removed from traffic controller

Why Should I Be Concerned About This Change?

This will cause versions of the cf CLI before v6.52.0 to no longer be able to retrieve logs when called with cf logs --recent.

How Can I Tell If I’m Impacted?

If you have users or automation that are using cf CLI versions that are older than v6.52.0, they are impacted.

What Should I Do About It?

Upgrade clients to a newer version of the cf CLI.

Background

The cf CLI has been using Log Cache as the source for cf logs --recent since v6.52.0. This change is removing a deprecated path for log retrieval. cf CLI v6 is not compatible with TAS 2.13 or higher, this is a special notice of removal.

Add validation for CAs trusted by Gorouter

Why Should I Be Concerned About This Change?

If you have invalid CA certificates listed in your Ops Manager and TAS configurations, you will need to update these in order to deploy TAS 4.0.

How Can I Tell If I’m Impacted?

Any entries that are not valid for CA certificates cause an error in Ops Manager. You must remove or replace invalid entries.

Check any CA certs specified in your Director or TAS tiles against openssl x509 -text -in <cert-file>. If any errors are returned, you should fix or regenerate the certificate.

What Should I Do About It?

Replace the invalid CA certificates with up-to-date and valid certificates. Alternatively, if there is no replacement, simply remove the invalid CA certificate.

Background

Previously, Gorouter and Opsmanager would ignore the invalid certificates. This change has been made to help make operators aware of any copy/paste problems when applying CA certificates.

Max request header size now defaults to 48 kb

Why Should I Be Concerned About This Change?

If end-users are sending requests with large request headers to your environment, they may now experience 431 status code errors.

How Can I Tell If I’m Impacted?

You will see 431 status code in your gorouter access logs, with the X-Cf-RouterError: max-request-size-exceeded response header set.

What Should I Do About It?

If you want to accommodate large request headers, you can increase the default size in the Networking tab of Ops Manager under the field Maximum request header size. We do not recommend this accommodation as a long-term option.

Background

NOTE: This limit is specifically for the Method, Request URI, and protocol line of an HTTP request, as well as any HTTP Headers in the request.

Max request header size now defaults to 48 kb. Upgrading to 4.0 sets the Max request header size in kb to 48, unless the existing configuration was already lower. Lowering this value establishes a better security posture; large request headers could consume router resources or leak memory.

Ruby Buildpack 1.9.0 defaults to Ruby 3.1

Why Should I Be Concerned About This Change?

If your apps haven’t been upgraded to be compatible with Ruby 3.1, you may experience errors.

How Can I Tell If I’m Impacted?

You will see app errors, logged in

cf logs [APP NAME] --recent

What Should I Do About It?

Update your apps to be compatible with Ruby 3.1. If you need a temporary workaround while upgrading apps, you can upload both the Ruby Buildpack 1.9.0 as well as a previous buildpack to your environment until apps have been upgraded and the previous buildpack can be removed.

Background

Viewing Custom iptables Rules Requires /sbin/iptables Binaries

Why Should I Be Concerned About This Change?

If you create custom iptables rules on your Diego cells, they may not be visible when running iptables -L because we have upgraded the default version of iptables. However these custom rules do still apply.

How Can I Tell If I’m Impacted?

You can see any custom rules you have by running iptables-legacy, to evoke iptables 1.6.x.

What Should I Do About It?

Update your custom iptables logic to use the /sbin/iptables* binaries, which run iptables 1.8.x (backed by nftables).

If you need to interact with your custom rules before upgrading, you can use the command iptables-legacy as a short-term workaround.

Background

On the Jammy Jellyfish stemcell, the iptables command uses the nftables framework (1.8.x) instead of the iptables firewall (1.6.x) We made garden default to using the system iptables binary. However, we make both versions available to make custom iptables rules accessible in the short-term.

Breaking changes if you are starting from TAS for VMs v2.13

If you are upgrading from TAS v4.0 to TAS v6.0, review the following breaking changes to ensure a smooth upgrade.

App logs that exceed rate limit are dropped immediately

Versions Introduced

Version(s)

2.11.26

2.12.18

3.0.0

2.11.20

2.12.13

3.0.0

2.11.20

2.12.12

3.0.0

Why Should I Be Concerned About This Change?

The behavior of line-based application log rate limiting has changed. Previously application logs would be buffered to some extent and then released at the configured rate. Now application logs that exceed the rate limit are dropped immediately.

How Can I Tell If I’m Impacted?

You are impacted if you have configured the deprecated line-based application log rate limiting and have applications that emit logs in excess of the configured limit.

What Should I Do About It?

This is expected behavior and helps ensure that logs that are output are timely. VMware recommends use of quota-based log rate limits for fine-grained control over application log rates.

Background

This behavior was modified as part of changes to Diego to support granular log rate limits on orgs, spaces and individual apps in TAS 3.0.

Syslog Drains reject SHA-1 certificates

Versions Introduced

Version(s)

2.11.19

2.12.12

2.13.5

3.0.0

4.0.0

2.11.13

2.12.7

2.13.2

3.0.0

4.0.0

2.11.13

2.12.7

2.13.2

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

System Logging will be impacted if older SHA-1 hash function certificates are used.

How Can I Tell If I’m Impacted?

Review your configured application syslog drains. If you have a certificate for a syslog drain configured with a SHA-1 hash, you are impacted.

What Should I Do About It?

Regenerate impacted certificates so that they don’t use the SHA-1 hash function.

Background

Logging components provided by loggregator-agent-release have been upgraded to Go 1.18. From Go 1.18 the treatment of certificates is stricter and certificates signed with the SHA-1 hash function will be rejected. Go stopped accepting certificates signed with the SHA-1 hash function because of security concerns.

Application log rate limit default

Versions Introduced

Version(s)

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

Applications pushed to TAS 3.0+ will inherit a default operator-configurable log rate limit if one is not specified.

How Can I Tell If I’m Impacted?

You are impacted if you are upgrading to TAS 3.0+ and are pushing applications to the platform. Applications that already exist on the platform prior to upgrade will default to not being log rate limited.

What Should I Do About It?

Determine if the default log rate for new applications (16 K/s) is appropriate for your needs. If you have applications with verbose logs exceeding the default limit and do not wish their logs to be dropped for exceeding the limit then consider:

pushing your applications with a specified app log rate limit
modifying the platform default log rate limit

The default log rate limit for new applications is configurable in the “App Developer Controls” tab in the TAS product configuration. The corresponding property is .properties.cloud_controller_default_log_rate_limit_app.

Background

The ability to set the log rate limit in a granular way is intended to allow operators to protect both the logging system within TAS and external integrations that receive logs. Setting a platform default limit for new applications makes it more likely that applications will have a log rate limit set allowing quotas to be imposed.

Autoscaler service bindings duplicate index

Versions Introduced

Version(s)

2.11.20

2.12.13

2.13.5

Why Should I Be Concerned About This Change?

Upgrading to a version of TAS that introduces a new database index for App Autoscaler on the service bindings table may error if you have previously manually created the same index.

How Can I Tell If I’m Impacted?

You are impacted if you have previously manually added a database index following the instructions in the knowledge base article Autoscale application fails with MySQL Deadlock errors and are upgrading to one of the following TAS versions:

2.13.5
2.13.6

What Should I Do About It?

Upgrade to a newer patch release of TAS that does not have this limitation.

Background

An error was made in the implementation of the database migration that meant that an attempt was made to add the database index despite it already existing in the database.

BOSH System Metrics Forwarder Removed

Versions Introduced

Version(s)

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

The BOSH System Metrics Forwarder is removed from TAS for VMs

How Can I Tell If I’m Impacted?

You are impacted if you have been using the deprecated BOSH System Metrics Server and Forwarder. You are using the BOSH System Metrics Server if you have selected the “Enable BOSH System Metrics Server (deprecated)” option in the Ops Manager Director product configuration.

What Should I Do About It?

To continue receiving system metrics, you must select the “Enable System Metrics” checkbox in the Director Config pane of the BOSH Director tile.

To avoid deployment, platform automation, or data collection failures, you must update any queries that reference bosh-system-metrics-forwarder as the source_id for metrics to reference system_metrics_agent instead.

Additionally, metric names now include underscores instead of periods. For example, the metric named system.cpu.sys in previous versions of TAS for VMs is named system_cpu_sys in TAS for VMs v3.0.

Background

BOSH System Metrics Server and Forwarder are deprecated in favor of System Metrics server and scraper.

HAProxy removed

Why Should I Be Concerned About This Change?

If your environment was running an HAProxy instance group, you must reconfigure your networking configurations or else will get a deployment error when upgrading to TAS 3.0.

How Can I Tell If I’m Impacted?

To check if you are running HAProxy, open the Networking tab in Ops Manager. Under the field “Routing TLS Termination”, you’ll see “HAProxy”.

What Should I Do About It?

Change your networking and resource configurations to point directly to Gorouter. Full directions are in the “HAProxy Removed” doc above.

Background

HAProxy was previously supported as an instance group in front of gorouter in order to support features not offered by gorouter. Now that gorouter supports TLS termination, HAProxy is no longer needed.

Log Cache nozzle ingress removed

Versions Introduced

Version(s)

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

From TAS 3.0.0 Log Cache uses syslog ingress, and the option to use nozzle ingress has been removed.

How Can I Tell If I’m Impacted?

You are impacted if you have not checked the “Enable Log Cache syslog ingestion” option, or set the corresponding .properties.enable_log_cache_syslog_ingestion product property to false. From TAS 3.0.0 this is no longer a supported configuration.

In addition Diego Cells with high logging volume might experience higher CPU usage than they did prior to this change.

What Should I Do About It?

Confirm that your instances, including those within isolation segments, are permitted to establish connections to Log Cache nodes on port 6067. You may need to update firewall rules to allow logs to flow directly from your instances to the Log Cache syslog server.

Consider scaling your Diego Cells if you have applications with high logging volume due to the increased load from the syslog agent on the Diego Cell.

Background

Log Cache ingestion via the Reverse Log Proxy (RLP) has been removed.

ActiveLocks removed from Key Performance Indicators

Why Should I Be Concerned About This Change?

If you were monitoring ActiveLocks on your environment, upgrading to TAS 3.0.0 will increase your metric by 1.

How Can I Tell If I’m Impacted?

You are impacted if your environment Key Performance Indicators includes Locket ActiveLocks.

What Should I Do About It?

Remove this metric from any KPI monitoring you maintain.

Background

ActiveLocks is a metric emitted by Locket, a distributed locking service used by multiple services in CloudFoundry. With the introduction of a new, opt-in service that uses Locket, the expected number of active locks increased by 1. Our community noted that a feature-dependent was a smell of a bad performance indicator; while the information is useful in debugging, it is not useful as a performance indicator and we no longer recommend monitoring it as an environment health metric.

cflinuxfs4 as default stack may cause failures to push apps

Why Should I Be Concerned About This Change?

cflinuxfs3 (backed by Ubuntu Bionic) has been deprecated in favor of cflinuxfs4 (backed by Ubuntu Jammy). If apps are configured to use the latest stack, upgrading TAS will upgrade app stacks and may cause a push failure.

How Can I Tell If I’m Impacted?

You can see what stack your apps are configured to use by running:

cf audit-stack

If apps are followed by cflinuxfs3, they are running an outdated stack.

What Should I Do About It?

You should re-stage your app to use cflinuxfs4.

cf push <APP_NAME> -s cflinuxfs4

Resolve any push errors prior to upgrading TAS. This may entail updating the buildpack that your app uses. If fixing app staging is not possible before the TAS upgrade you can configure apps to run on cflinuxfs3 as a temporary workaround.

Background

“Stacks” are the pre-built root file system that Cloud Foundry uses to create app containers. Keeping these up-to-date with operating system updates is key to updating security vulnerabilities and other issues.

Diego and Routing Components More Strict With TLS Protocols

Versions Introduced

Version(s)

2.11.27

2.12.28

2.13.13

Why Should I Be Concerned About This Change?

Diego and Routing components have been updated to be more strict with TLS protocols. External services and databases, end users, and services making requests to gorouter should be using TLS 1.2 and have certs signed with a newer hash than SHA-1, or else they will experience TLS errors.

How Can I Tell If I’m Impacted?

For services talking to Diego components, if your environment is impacted they will not be able to successfully make a connection to Diego. For external users services talking to gorouter, you should see TLS errors in your access logs.

If you are using an external database, Diego will throw errors trying to start processes like Locket and BBS.

You can check the Signature Algorithm of your external database and external services with the following command:

echo '' | openssl s_client -connect <HOSTNAME>:<PORT> -servername
<HOSTNAME> 2>/dev/null | openssl x509 -noout -text | grep
'Signature Algorithm'

You can check that TLS 1.2 is supported with the following command:

echo openssl s_client -connect google.com:443 -servername google.com
-tls1_2 || echo "TLS 1.2 unsupported"

If you see no output, TLS 1.2 is supported. If you see the TLS 1.2 unsupported message, it is unsupported and will need to be updated.

What Should I Do About It?

Make sure that services talking to your Diego components and services talking to gorouter are not using TLS 1.0 or 1.1, or using SHA-1 certificates.

Background

Diego and Routing components, including gorouter, use an up-to-date version of Golang. As of Golang 1.18, TLS requirements have been made more strict in two major ways. First, TLS 1.0 and 1.1 disabled in favor of TLS 1.2. Second, crypto/x509 will now reject certificates signed with the SHA-1 hash function. See Golang release notes for details.

Breaking changes if you are starting from TAS for VMs v2.12

Logging stricter IP address parsing

Versions Introduced

Version(s)

2.11.14

2.12.7

2.13.0

3.0.0

4.0.0

2.11.10

2.12.4

2.13.0

3.0.0

4.0.0

2.11.10

2.12.5

2.13.0

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

Logging components provided by loggregator-agent-release and syslog-release have been upgraded to Go 1.17.

From Go 1.17 the treatment of IP addresses is stricter and any IP addresses where an octet starts with a leading zero is now invalid.

How Can I Tell If I’m Impacted?

Review your configured application syslog drains and the address of any configured syslog destination in the TAS “System Logging” tab. If your configuration contains an IP address with an octet that has a leading zero then you are impacted.

What Should I Do About It?

Modify your configuration to express the ip address without a leading zero.

Background

The Go Developers chose to disallow IP addresses with octets with leading zeros due to them presenting a security concern.

System metrics scrape frequency increased

Versions Introduced

Version(s)

2.11.8

2.12.1

2.13.0

3.0.0

4.0.0

Why Should I Be Concerned About This Change?

The frequency with which system metrics are scraped increased from a fixed frequency of every 1 minute to a configurable default of every 15 seconds. This may cause increased load on your logging and metrics infrastructure.

How Can I Tell If I’m Impacted?

If you have enabled system metrics in the Ops Manager director config and you are upgrading to a version of TAS after the versions that introduced this change then you are impacted.

You are not impacted if you are using the deprecated BOSH system metrics server instead of the recommended system metrics scraper.

What Should I Do About It?

The default 15 second scrape frequency is recommended. You may not need to take action if your logging and metrics infrastructure is already scaled sufficiently to handle the increased number of metrics. Aside from scaling another option is to reduce the scrape frequency.

The scrape interval can be changed by modifying the .properties.system_metrics_scraper_scrape_interval TAS property. This property is configurable within Ops Manager under the System Logging tab as “System metrics scrape interval”.

Background

Prior to the versions that this change was introduced the system metrics scrape interval was hard-coded to every 1 minute.

Jobs Error interface changes in Rails 6

Why Should I Be Concerned About This Change?

Changes internal to the Cloud Controller worker may cause cf push downtime during upgrade.

How Can I Tell If I’m Impacted?

You have a significant (>100) number of cloud_controller_worker VMs.

What Should I Do About It?

While there is an existing mitigation in place, it may not be sufficient for large foundations which might still experience downtime when running cf push. Please see https://github.com/cloudfoundry/cloud_controller_ng/issues/2748 for more details

Background

Rails 6 introduced a new interface for the Error class, which means Rails 5 servers cannot de-serialize the new Error class. CAPI release in TAS 2.13 contains upgrade of Ruby on Rails dependencies to Ruby on Rails 6.1 (from Ruby on Rails 5).

While upgrading Cloud Controller VMs to this release, new jobs created by API servers containing the Rails 6 upgrade may serialize the new Errors interface in the database as part of jobs for the Worker VMs to pick up. Worker VMs which have not upgraded, and are still running Cloud Controller with Rails 5, will fail to de-serialize the Error class’s new interface.

cf CLI v6 not supported in 2.13

Why Should I Be Concerned About This Change?

TAS for VMs v2.13 does not support cf CLI v6; this means that cf CLI v6 and TAS for VMs v2.13 and later are not tested together, and if you or your users encounter issues using the v6 CLI, you will need to upgrade to v7 to get effective support.

The change from cf CLI v6 to v7 (and v7 to v8) is itself a major version bump, with its own breaking changes.

Two examples:

the flags supported by cf push have changed, See the articles on Upgrading for more details.
the way quota is counted for staging apps is more conservative, and this can cause push failures. See the linked KB for details.

How Can I Tell If I’m Impacted?

If any automation or clients targeting the platform (especially app developer pipelines, or custom service brokers) that have not already explicitly updated to cf CLI version v7 or greater.

What Should I Do About It?

Your users must upgrade to cf CLI v7 or cf CLI v8. To upgrade to a supported cf CLI version, see Upgrading to cf CLI v7 or Upgrading to cf CLI v8.

Background

The newer versions of the CF cli target the new v3 version of the Cloud Controller API. The cf CLI v6 relies on the deprecated v2 Cloud Controller API, which may be entirely removed in a future TAS version line.

Log Cache Uses Its Own Instance Group

Why Should I Be Concerned About This Change?

As of TAS for VMs v2.13, the Log Cache component runs on its own Log Cache instance group, and is no longer deployed on Doppler instances. Operators should ensure they scale Log Cache appropriately.

How Can I Tell If I’m Impacted?

You are impacted if you are upgrading to TAS 2.13.0 or higher.

What Should I Do About It?

Scale up your Log Cache instance count. VMware recommends scaling up to match the number of VMs and amount of memory as your Doppler instances before the upgrade. Starting larger and then adjusting for actual use is safer than a deployment failure.

You can also consider reducing the memory allocation for Doppler instances now that Log Cache is no longer deployed there.

Background

Separating Log Cache out to its own instance group allows it to be scaled independently of Dopplers and Traffic Controllers. For example, to provide more memory for storing logs and metrics.

Log Cache uses syslog ingress by default

Why Should I Be Concerned About This Change?

The logging Metric topology has been changed. From TAS 2.13.0 Log Cache uses syslog ingress by default.

How Can I Tell If I’m Impacted?

You may be impacted if you are not seeing logs and metrics from Diego Cells deployed by TAS or the Isolation Segment or Tanzu Application Service for VMs [Windows] products.

In addition Diego Cells with high logging volume might experience higher CPU usage than they did prior to this change.

What Should I Do About It?

Consider scaling your Diego Cells if you have applications with high logging volume due to the increased load from the syslog agent on the Diego Cell.

During the upgrade Diego Cells may receive the new Log Cache BOSH DNS name log-cache.service.cf.internal and attempt to send logs and metrics over syslog. VMware recommends upgrading to at least the following patch versions prior to the upgrade and additionally re-deploying the Isolation Segment and TAS for VMs [Windows] products so that the new Log Cache BOSH DNS name is resolvable.

TAS for VMs v2.11.16
TAS for VMs v2.12.9

Background

As of TAS for VMs v2.13, the Log Cache component runs on its own Log Cache instance group, and is no longer deployed on Doppler instances. In addition, syslog ingestion was set as the default for Log Cache to use industry standard protocols.

Service instance metrics might not be retrievable using the Log Cache cf CLI plugin

Why Should I Be Concerned About This Change?

Service instance metrics might not be retrievable using the Log Cache cf CLI plugin.

How Can I Tell If I’m Impacted?

If you use the Log Cache cf CLI plugin to retrieve service instance metrics and your service tiles use log cache syslog ingestion you will be impacted.

What Should I Do About It?

If you need to retrieve metrics from service tiles that do not support this feature:

upgrade to a version of the service tile that allows for syslog ingestion OR
for TAS 2.13 only, deactivate the Enable Log Cache syslog ingestion checkbox in the System Logging pane of the TAS for VMs tile. The associated product property is: .properties.enable_log_cache_syslog_ingestion. Note that this is a temporary solution, as this setting is no longer available in TAS 3.0.0.

Background

Traffic Controller unavailable during upgrade

Why Should I Be Concerned About This Change?

V1 firehose nozzles such as the Splunk Nozzle for VMware Tanzu may fail to connect to the firehose during an upgrade.

How Can I Tell If I’m Impacted?

You will be impacted during the upgrade if both of the following are true:

You have a V1 firehose nozzle deployed
You upgrading to TAS v2.13.0 - v2.13.12 or TAS 3.0.0 - 3.0.2

What Should I Do About It?

VMware recommends that you upgrade to a more recent version of TAS that does not have this issue.

Background

The Traffic Controller component which V1 nozzles connect to had previously blocked on startup until Log Cache was available. In TAS 2.13 Log Cache was separated out to a separate instance group meaning that Traffic Controller blocks and is unavailable until the new Log Cache instances become available.

Golang 1.17 Rejects IPv4 Addresses With Leading Zeros

Why Should I Be Concerned About This Change?

If you have IPv4 addresses that contain decimal components leading zeros, (ie 192.168.020.100) you will receive a deploy error. You will need to reformat your IP (192.168.20.100) and deploy again.

How Can I Tell If I’m Impacted?

This affects properties that feed into all releases that use Golang v1.17. If impacted, you will receive Bosh templating errors during deploy.

What Should I Do About It?

Operators can remove the leading zeros and try deploying again.

Background

From Golang release notes: The ParseIP and ParseCIDR functions in Golang’s net library now reject IPv4 addresses which contain decimal components with leading zeros. These components were always interpreted as decimal, but some operating systems treat them as octal. This mismatch could hypothetically lead to security issues if a Go application was used to validate IP addresses which were then used in their original form with non-Go applications which interpreted components as octal.

Gorouter Certificates Require a SAN Extension

Why Should I Be Concerned About This Change?

Routing-release keeps up-to-date with Golang, and Golang 1.17 requires certs to include a Subject Alternative Name (SAN). (This is an enforcement of a deprecation that was introduced in Golang 1.15). If any certificates for services that terminate TLS connections in Gorouter lack a SAN, clients cannot connect to servers and deployment fails. External systems that the Gorouter connects must also have certificates with a valid SAN, or else requests experience a failed TLS handshake.

How Can I Tell If I’m Impacted?

For all certs in Ops Manager, copy the cert text to a file and decode it with the following command to see if it contains a SAN:

openssl x509 -noout -text -in [FILE]

Follow this same process for certs on external services that gorouter connects to. See the Knowledge Base article above for detailed instructions.

What Should I Do About It?

If any certs do not contain a SAN, you must rotate certs with a newly-generated cert that contains a SAN. See the Ops Manager documentation above.

Background

Golang’s crypto/x509 library uses certs to verify the server or client hostname. In the past, operators could use the Common Name field to input hostname; as of Golang 1.15, the Common Name field has been deprecated for hostname verification and the Subject Alternative Name must be provided to verify hostname.

Gorouter sends all responses with transfer-encoded chunks

Versions Introduced

Version(s)

2.11.3

2.13.0

Why Should I Be Concerned About This Change?

If your clients or proxies that access apps cannot handle a chunked response, or expect a Content-Length header, they will break.

How Can I Tell If I’m Impacted?

One common symptom is that applications now return duplicate Transfer-Encoding headers, and gorouter logs error “too many transfer encodings”.

What Should I Do About It?

Fix your clients to be able to handle a chunked response.

Background

Previous versions of TAS may, for short responses, silently remove the Transfer encoding header and replace it with a Content Length header. This convenience was dependent on Golang 1.15 and lead to a false sense of mitigation.

Golang 1.16 now prioritizes flushing partial requests to the client and no longer changes the response to not be chunked.

Breaking changes if you are starting from TAS for VMs v2.11

Log Cache aggregate drain metadata missing

Why Should I Be Concerned About This Change?

TAS 2.11.16 moved aggregate drain configuration to the syslog binding cache to improve deploy speed. Dependent on your configuration this could cause the Smoke Test errand to fail.

How Can I Tell If I’m Impacted?

You are impacted if “Enable Log Cache syslog ingestion” is checked and “Default loggregator drain metadata” is unchecked and you attempt to upgrade to a version of TAS from 2.11.16 - 2.11.21.

The corresponding product properties are

.properties.enable_log_cache_syslog_ingestion
.properties.default_loggregator_drain_metadata

What Should I Do About It?

Upgrade to TAS 2.11.22 or greater.

Background

Operators may configure aggregate drains to send all application logs to a syslog destination. The same mechanism may be used within TAS to populate Log Cache. If metadata is not enabled for the Log Cache aggregate drain then Log Cache will not have the metadata expected to function correctly.

Minimize user downtime when upgrading from 2.11 to 2.12

Why Should I Be Concerned About This Change?

To minimize downtime for developers pushing apps, upgrade from TAS for VMs v2.11.9 or later. Upgrading from earlier patch versions can result in an Unknown Error when pushing apps.

How Can I Tell If I’m Impacted?

Your current version of TAS is 2.11.0 through 2.11.8.

What Should I Do About It?

You should upgrade to TAS 2.11.9 or higher before upgrading to 2.12 or higher.

Background

Cloud Controller in affected versions has chance that app lifecycle_type could be nil when determining app lifecycle.

Envoy advertises HTTP2 support over ALPN

Why Should I Be Concerned About This Change?

If applications do not support HTTP/2 when using Container to Container (C2C) communication through the Envoy proxy (ports 61001 or 61443), requests between these apps will fail.

How Can I Tell If I’m Impacted?

If you are using container to container networking, and have network policies allowing apps to talk to other apps on ports 61001 or 61443, you may be affected.

Use the cf network-policies command to list policies, and look for those ports.

What Should I Do About It?

Investigate the destination applications for these policies, to determine if they support HTTP/2 requests. If not, the client applications will need to be updated to negotiate down to HTTP/1.1 when making their requests.

Background

ALPN is the Application-Layer Protocol Negotiation, which allows HTTP connections to negotiate what protocols are supported.
Envoy is a proxy sitting alongside each application container to facilitate TLS termination.

Zipkin Trace-ID Now 16 Bytes

Versions Introduced

Version(s)

2.11.10

2.12.6

Why Should I Be Concerned About This Change?

There is a chance that your apps are not capable of processing the longer 8-byte Trace-ID header. If so, you may receive 400-level errors after gorouter forwards requests to your app.

How Can I Tell If I’m Impacted?

Foundations that do not enable Zipkin are not affected. To see if zipkin is enabled, look at the Networking tab of the TAS tile in Ops Manager, and see if Enable Zipkin is checked.

If Zipkin is enabled, applications incompatible with this change will throw errors in their application or access logs, related to Zipkin header length.

What Should I Do About It?

The header size was increased from 8-bytes to 16-byes in accordance to the W3 standard; you must update your app to be able to handle 16-byte request headers.

Background

Zipkin is a library that allows users to trace a request through mutliple componenets, with the help of a Trace-ID that is the same for the lifecycle of one request. Having a longer Trace-ID is in compliance with W3 standards and also decreases the chance of generating duplicate IDs.

Links

After these changes are handled, continue reading the next sections.