Troubleshooting Cloud Service Broker for AWS

This topic gives you troubleshooting information for Tanzu Cloud Service Broker for AWS.

Troubleshoot Errors

Start here if you have a specific error or error messages.

Common Services Errors

The following errors can occur in multiple services:

Broker trying to recreate the instance when updating it
Broker trying to recreate the instance when changing plan

Error	Broker trying to recreate the instance when updating it
Operation	update
Symptom	The instance status is `update failed` and the message is similar to `update failed: Error: Instance cannot be destroyed on main.tf **** has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed`.
Cause	The update request for a field is failing because one of the following is true: The field cannot be updated The new value for a property, or combination of properties, would cause an instance recreation The failing update request might be an indication of an out-of-band update performed on the instance.
Examples	An out-of-band upgrade of the Redis version to a newer major version causes the broker to try to downgrade to the previous version, which causes instance recreation.
Solution	If the property can be updated, pass the parameter in the update request to match the IaaS configuration. If the property can be updated, but specified in the instance plan, then possible solutions include: Rolling back the change in the IaaS Changing the value in the instance plan

Error	Broker trying to recreate the instance when changing plan
Operation	update plan
Symptom	The instance status is `update failed` and the message is similar to `update failed: Error: Instance cannot be destroyed on main.tf **** has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed`.
Cause	The update request for the plan is failing because the new plan contains incompatible property values.
Examples	Execute the plan update operation for Redis by setting a version of Redis earlier than the previously created instance. The downgrade of the version is not allowed because it involves the recreation of the instance.
Solution	Update the instance to a plan with the compatible values.

Amazon ElastiCache for Redis Errors

The following errors can occur in Amazon ElastiCache for Redis:

Invalid Parameter Group
Snapshotting state while adding or removing nodes
Unable to create instance without specifying minor version (redis 7 only)

Error	Invalid parameter group
Operation	create or update
Symptom	Errors containing: `InvalidParameterCombination` `InvalidParameterValue`
Cause	The value of `parameter_group_name` points to a parameter group that is not compatible with the version of Redis specified in `redis_version`.
Solution	Set the `parameter_group_name` to `""` so that the default is used. Set the `parameter_group_name` to a parameter group whose family matches the Redis version.

Error	Snapshotting state while adding or removing nodes
Operation	update
Symptom	Errors containing: `unexpected state 'snapshotting', wanted target 'available'.`
Cause	An AWS snapshot was started during the operation.
Solution	Retry the operation.

Error	Unable to create instance without specifying minor version (redis 7 only).
Operation	create or update
Symptom	Errors containing: `InvalidParameterCombination: Cannot find version 7.x for redis` `engine_version: Redis versions must match major.minor when using version 6 or higher` .
Cause	There is an [underlying error in AWS API](https://github.com/hashicorp/terraform-provider-aws/issues/27918) preventing this scenario.
Solution	Unfortunately there is no workaround. Specify a minor version when using Redis version 7 and set auto_minor_version_upgrade to false.

Amazon General RDS Errors

The following errors can occur in any Amazon RDS instance:

Reaching AWS subnets quota in a subnet group for RDS
Major engine version should be specified when auto_minor_version_upgrade is enabled
Engine version not round when using a major version
Incompatible network state
Unreachable publicly accessible database
You can’t modify storage type
You can’t currently modify the storage of this database
IOPS and GP3 InvalidParameterCombination when storage size is less than 400 GB (create)
IOPS and GP3 InvalidParameterCombination when storage size is less than 400 GB (update and upgrade)

Error	Reaching AWS subnets quota in a subnet group for RDS
Operation	create
Symptom	Errors containing: `DBSubnetQuotaExceededFault`
Cause	There is a resource quota for AWS called Subnets per database subnet group that establishes the maximum number of subnets per database subnet group to each supported region to 20. When operators/developers do not supply an existing subnet group in the plan or provision time, the CSB creates a subnet group. The CSB adds all the present subnets in the specified VPC to the new subnet group. For example, let’s say the operator: Specifies a VPC with 25 subnets through the tile. Does not specify a database subnet group in the plan. Does not specify a database subnet group at provisioning time. Then the CSB creates a database subnet group and adds all subnets, 25 in this example, to the database subnet group. Hence, this operation breaches the AWS resource quota.
Solution	Create a custom database subnet group through the AWS console and add the desired subnets for RDS instances to use. Then use the database subnet group name as a plan or provision parameter.

Error	Major engine version should be specified when auto_minor_version_upgrade is enabled
Operation	create or update
Symptom	Errors containing: `Resource postcondition failed ............ .......................................... A Major engine version should be specified when auto_minor_version_upgrade is enabled. Expected engine version: x.x - got: x.x.x`
Cause	A business rule prevents you from creating or updating an RDS instance with a configuration that enables auto_minor_version_upgrade and does not select a major engine version. AWS automatically upgrades the minor versions, but you must pick a major version.
Solution	Create or update your RDS instance with either auto minor version upgrade deactivated or auto minor version upgrade enabled but select a major engine version. To find out the major version, you can run the following command: `aws rds describe-db-engine-versions --engine aurora-mysql --engine-version 5.7.mysql_aurora.2.02.3 --include-all --region us-west-2 \| jq -r '.DBEngineVersions[] \| { engine_version: .EngineVersion, major_version: .MajorEngineVersion }'` Substitute the engine, aurora-mysql, and the engine version, 5.7.mysql_aurora.2.02.3, with the values that you want.

Error	Engine version not found when using a major version
Operation	create or update
Symptom	Errors containing: `InvalidParameterCombination: Cannot find version (minor engine version x.x.x) for (specific engine)` Example: `InvalidParameterCombination: Cannot find version 8.0.mysql_aurora.3.04.0 for aurora-mysql`
Cause	The AWS API cannot eventually find a minor version within its catalog. Various causes can induce this error, such as: Limited version pool just before the release of a new minor version.. Eventual inconsistency between the read API and write API.
Solution	Create or update your RDS instance with auto minor version upgrade deactivated and select a specific minor engine version.

Error	incompatible-network state
Operation	create
Symptom	Errors containing: `incompatible-network`
Cause	An incompatible-network state indicates one or more of the following is true of the Amazon RDS DB instance: There are no available IP addresses in the subnet that the Amazon RDS DB instance was launched into. The subnet used in the Amazon RDS DB subnet group no longer exists in the Amazon Virtual Private Cloud (Amazon VPC).
Solution	AWS does not make any guarantees as to what subnet from the subnet group an RDS instance is launched in. Although you can assume it is going to balance new instance creation among all the subnets in the group, in reality, this doesn't happen. This means one subnet in the group can run out of IPs, while the others are widely unused. To work around this issue, create a custom DB subnet group through the AWS console and choose the subnets that still have available IP addresses from the navigation pane. Then use the DB subnet group name as a plan or provision parameter.

Error	Unreachable publicly accessible DB
Operation	create or update
Symptom	All following conditions must be occurring: Service instance is configured with the property `publicly_accessible: true`. The database is not reachable from outside your Tanzu Application Service foundation. Apps within your Tanzu Application Service foundation can connect without issues by using a service binding.
Cause	Several factors may contribute to the appearance of this error: The service instance was associated to an unexpected VPC. Pitfall: when `aws_vpc_id` is left blank, the service instance is created in whatever VPC is specified in the Tile's config. Or in AWS' default VPC when not specified. The service instance was associated to some unexpected subnets. Pitfall: when `rds_subnet_group` is left blank, the service instance is associated to whatever subnet group is specified in the Tile's Service Offering config if present. A new subnet group containing all subnets present in the VPC is created and assigned to the service instance when not specified. The service instance was associated to an unexpected security group. Pitfall: when `rds_vpc_security_group_ids` is left blank the service instance is associated to whatever subnet group is specified in the Tile's Service Offering config if present. A new security group allows all ingress traffic but no egress traffic is created and assigned to the service instance when not specified. The subnet group associated to the service instance contains some private subnets. Pitfall: according to AWS official docs, for a database instance to be publicly accessible, all of the subnets in its database subnet group must be public. The security groups associated to the service instance are missing some rules to allow routing your external traffic, or some rules conflict with one another.
Recommendations	For operators: Explicitly specify `aws_vpc_id`, `rds_subnet_group`, and `rds_vpc_security_group_ids` in the plans or specify a default value in the Service Offering configuration when this option is present. There is no way to specify at plan level that a property is mandatory and can't be left empty when creating an instance, so if your use case doesn't allow you to set these fields in the plan, keep in mind the pitfalls listed in the preceding Cause section. Set `publicly_accessible: false` in the plans if your VPC, subnets, and security groups are not designed with public dtabases in mind or if you want to disallow them.
Solution	Check whether explicitly specifying `aws_vpc_id`, `rds_subnet_group`, and `rds_vpc_security_group_ids` solves the issue. If any of these fields are enforced by the plan, ask maintainers of the plan if they support public databases. Check whether you have correctly configured your Security group rules. Check whether you have correctly configured your database subnet group.

Error	You can't modify storage type
Operation	Update
Symptom	Errors containing: `InvalidParameterCombination` and the text You can't modify storage type
Cause	Some modifications to storage-type are not allowed by AWS. Including but not limited to: Changing from `io1` to `standard` (magnetic) or vice versa.
Solution	For non-production test instances where data is irrelevant the most straightforward solution is to delete the service instance with the wrong storage_type and create a new one.	For production instances accidentally created with `storage_type: standard`, the only solution is to back up the existing instance and restore it in a new instance with the right `storage_type`.	For production instances created with a different `storage_type` that you want to migrate to `storage_type: standard`. Perform a backup + restore as in the previous scenario. Read AWS Magnetic storage type limitations and consider leaving your original storage type intact.

Error	You can't currently modify the storage of this DB instance because the previous storage change is being optimized
Operation	Update
Symptom	Errors containing: `InvalidParameterCombination:` and the text You can't currently modify the storage of this DB instance because the previous storage change is being optimized.
Cause	Scaling storage usually doesn't cause any outage or performance degradation of the database instance. After you modify the storage size for a database instance, the status of the database instance is `storage-optimization`. Storage optimization can take several hours. You can't make further storage modifications for either six (6) hours or until storage optimization has completed on the instance, whichever is longer.
Solution	Update any other properties not related to disk immediately and postpone the modification of disk-related properties. If you need to upscale your storage capacity, frequently enabling storage autoscaling might be a better option. See Amazon RDS for MSSQL configuration Parameters - max_allocated_storage.

Error	InvalidParameterCombination: You can't specify IOPS or storage throughput for engine postgres and a storage size less than 400
Operation	Create
Symptom	PostgreSQL/MySQL instances fail to create with this error if `storage_type` is set to `gp3` and `storage_gb` is less than a 400 GB event when `iops` is not specified.
Cause	The broker has a default value for IOPS of 3000 that is used if no value is specified and IOPS configuration is possible for the `storage_type` requested. However, this value cannot be set if the `storage_gb` value is below a certain threshold. For more information, see the [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)
Solution	Specifying a value of `0` for `iops` prevents the broker from setting `iops` in the instance. Baseline storage performance is still maintained by AWS as [documented]((https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)). You have two alternatives Specify `"iops":0` in the plans. This value is configured for all instances of the plan and can't be overridden on an instance-per-instance basis. Specify `"iops":0` as provision parameter: This can be configured in each service instance individually. `cf create-service csb-aws-postgresql PLAN_NAME SERVICE_INSTANCE_NAME -c '{"iops":0}'`

Error	InvalidParameterCombination: You can't specify IOPS or storage throughput for engine postgres and a storage size less than 400
Operation	Update and Upgrade
Symptom	PostgreSQL/MySQL instances fail to create with this error if `storage_type` is set to `gp3` and `storage_gb` is less than a 400 GB event when `iops` is not specified.
Cause	While creating instances with `"storage_type": "gp3"` and `"storage_gb" < 400GB` can be achieved by setting `iops: 0` as instructed in [gp3-iops--issue](#gp3-iops-create-issue), this doesn't work for updates. The AWS API sets a default iops value and setting it to 0 is interpreted as trying to change that default value.
Solution	Specifying a value of `3000` for `iops` preserves the AWS set default. Instances with the specified conditions must be created with `"iops": 0` and then updated or moved to a plan that specifies `"iops": 3000` for further updates/upgrades operations to work. Specify `"iops":3000` in the plans: This value is configured for all instances of the plan and can't be overridden on an instance-per-instance basis. Specify `"iops":3000` as update parameter. This can be configured in each service instance individually. `cf update-service csb-aws-postgresql -c '{"iops":3000}'`

Amazon PostgreSQL Errors

The following errors can occur in any Amazon PostgreSQL instance:

User does not have permission for table

Error	User does not have permission for tables created by other user in the PUBLIC schema
Operation	Bindings modifying/reading tables created by other bindings in the PUBLIC schema
Symptom	Errors mentioning lack of permissions/ownership: `must be owner of table` `permission denied for table`
Cause	The Cloud Foundry binding model implies that multiple bindings can query or edit the same tables. This is particularly useful for rotating credentials where unbind and bind operations are needed. Additionally, the broker does not support creating bindings with different levels of access to the objects created. This means that all bindings need the same access to all objects and can query and edit them regardless of what binding created them in the first place. This conflicts with the PostgreSQL permission model, where the user that created an object is the owner and is the only one who can edit tables and query them, unless permission is explicitly granted to other roles. This applies for bindings and service keys. Specifically, the following issues can happen: Binding A not having access to a table that binding B created, when binding B created the table after binding A was created. Binding A cannot read tables created by binding B until a new binding C is created. Binding A cannot change tables created by binding B until binding B is deleted (unbound from its app).
Solution	All the database users created when binding and creating service keys with Tanzu Cloud Service Broker for AWS are assigned the role `binding_user_group`. This implies they all have access to tables created by the `binding_user_group` role. Creating any objects with the `binding_user_group` role instead of the binding user resolves any of the issues mentioned here. You can achieve this by running `SET ROLE binding_user_group` before any other instruction in the SQL script that creates your object or framework performing database migrations. If you have issues with tables already created, you must either: Unbind the application that has created the objects and bind again (or delete the service key that has created the objects). This is because when unbinding, Tanzu Cloud Service Broker for AWS automatically transfers ownership of existing objects to the `binding_user_group` role. Manually transfer ownership to the `binding_user_group` with the following statement "ALTER TABLE tab_name OWNER TO binding_user_group;". You must run this statement after logging in to the database with the credentials from the binding/service key used to create the objects. If you only need other bindings to perform data operations, you can create new bindings for interacting with these objects. This is because Tanzu Cloud Service Broker for AWS assigns permissions to all existing tables whenever a new binding is created. However, this new binding does not have permissions to perform DDL operations until points 1. or 2. are implemented.

Amazon MySQL Errors

The following errors can occur in any Amazon MySQL instance:

IOPS and GP3 InvalidParameterCombination when storage size less than 400GB

Error	InvalidParameterCombination: You can't specify IOPS or storage throughput for engine mysql and a storage size less than 400
Operation	Create
Symptom	Instance fails to create with error mentioned above if `storage_type` is set to `gp3` and `storage_gb` is less than 400GB event when `iops` is not specified.
Cause	The broker has a default value for iops of 3000 that is used if no value is specified and iops configuration is possible for the `storage_type` requested. However this value cannot be set if the `storage_gb` value is below a certain threshold For more information see [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)
Solution	Specifying a value of `0` for `iops` will prevent the broker from setting `iops` in the instance. Baseline storage performance will still be maintained by AWS as [documented]((https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)). You have two alternatives Specify `"iops":0` in the plans: This value will be configured for all instances of the plan and can't be overridden on an instance-per-instance bases Specify `"iops":0` as provision parameter: This can be configured in each service instance individually. `cf create-service csb-aws-mysql PLAN_NAME SERVICE_INSTANCE_NAME -c '{"iops":0}'`

Amazon MSSQL Errors

The following errors can occur in any Amazon MSSQL instance:

Error disabling backups and Multi-AZ in the same updating operation

Error	InvalidParameterValue: Backup retention cannot be set to zero for DB Instance xxxxx since it has Multi-AZ enabled on it.
Operation	Update
Symptom	The instance fails to update with this error if `backup_retention_period` is set to `0` and `multi_az` is set to `false` in the same updating operation.
Cause	The broker handles asynchronous parallel operations in the upgrade operation and has no ability to set the order of execution of the two updates.
Solution	The updates need to be sequential: You must disable `multi_az` and wait until the operation finishes. `cf update-service SERVICE_INSTANCE_NAME -c '{"multi_az": false}'` Set `backup_retention_period` to `0`. `cf update-service SERVICE_INSTANCE_NAME -c '{"backup_retention_period":0}'`

Amazon Aurora Errors

The following errors can occur in any Aurora instance:

Error upgrading Aurora instance to a new major version

Error	The following error message is displayed in the `pg_upgrade_server.log` logs: FATAL: shared memory segment sizes are configured too large
Operation	Update
Symptom	Major upgrade fails and error in `pg_upgrade_server.log` (viewable from the AWS console) shows the preceding error.
Cause	Major version upgrade in Amazon Aurora for PostgreSQL with small instance types is not straightforward. In particular, upgrade for small instances or serverless with low max capacity, for example 2 ACUs, causes an error even through the AWS console.
Solution	Perform the following changes sequentially: First: Temporarily update to a bigger `instance_class` type. or, if using `serverless` instance type, increase the max capacity `serverless_max_capacity` to at least 4 ACUs `cf update-service SERVICE_INSTANCE_NAME -c '{"serverless_max_capacity": 4}'` Once that's done, you can retry the major version upgrade. For example: `cf update-service SERVICE_INSTANCE_NAME -c '{"engine_version": "14"}'` Finally, scale down `instance_class` or `serverless_max_capacity` to its previous value, since the extra capacity was only needed during the actual upgrade. `cf update-service SERVICE_INSTANCE_NAME -c '{"serverless_max_capacity": 2}'`