This topic gives you troubleshooting information for Tanzu Cloud Service Broker for AWS.
Troubleshoot Errors
Start here if you have a specific error or error messages.
Common Services Errors
The following errors can occur in multiple services:
Error |
Broker trying to recreate the instance when updating it |
Operation |
update |
Symptom |
The instance status is update failed and the message is similar to update failed: Error: Instance cannot be destroyed on main.tf **** has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed . |
Cause |
The update request for a field is failing because one of the following is true:
- The field cannot be updated
- The new value for a property, or combination of properties, would cause an instance recreation
The failing update request might be an indication of an out-of-band update performed on the instance. |
Examples |
An out-of-band upgrade of the Redis version to a newer major version causes the broker to try to downgrade to the previous version, which causes instance recreation. |
Solution |
- If the property can be updated, pass the parameter in the update request to match the IaaS configuration.
- If the property can be updated, but specified in the instance plan, then possible solutions include:
- Rolling back the change in the IaaS
- Changing the value in the instance plan
|
Error |
Broker trying to recreate the instance when changing plan |
Operation |
update plan |
Symptom |
The instance status is update failed and the message is similar to update failed: Error: Instance cannot be destroyed on main.tf **** has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed . |
Cause |
The update request for the plan is failing because the new plan contains incompatible property values. |
Examples |
Execute the plan update operation for Redis by setting a version of Redis earlier than the previously created instance. The downgrade of the version is not allowed because it involves the recreation of the instance. |
Solution |
Update the instance to a plan with the compatible values. |
Amazon ElastiCache for Redis Errors
The following errors can occur in Amazon ElastiCache for Redis:
Error |
Invalid parameter group |
Operation |
create or update |
Symptom |
Errors containing:
InvalidParameterCombination
InvalidParameterValue
|
Cause |
The value of parameter_group_name points to a parameter group that is not compatible with the version of Redis specified in redis_version . |
Solution |
- Set the
parameter_group_name to "" so that the default is used.
- Set the
parameter_group_name to a parameter group whose family matches the Redis version.
|
Error |
Snapshotting state while adding or removing nodes |
Operation |
update |
Symptom |
Errors containing: unexpected state 'snapshotting', wanted target 'available'. |
Cause |
An AWS snapshot was started during the operation. |
Solution |
Retry the operation. |
Error |
Unable to create instance without specifying minor version (redis 7 only). |
Operation |
create or update |
Symptom |
Errors containing:
InvalidParameterCombination: Cannot find version 7.x for redis
engine_version: Redis versions must match major.minor when using version 6 or higher .
|
Cause |
There is an [underlying error in AWS API](https://github.com/hashicorp/terraform-provider-aws/issues/27918) preventing this scenario. |
Solution |
- Unfortunately there is no workaround. Specify a minor version when using Redis version 7 and set auto_minor_version_upgrade to false.
|
Amazon General RDS Errors
The following errors can occur in any Amazon RDS instance:
Error |
Reaching AWS subnets quota in a subnet group for RDS |
Operation |
create |
Symptom |
Errors containing:
DBSubnetQuotaExceededFault
|
Cause |
There is a resource quota for AWS called Subnets per database subnet group that establishes the maximum number of subnets per database subnet group to each supported region to 20. When operators/developers do not supply an existing subnet group in the plan or provision time, the CSB creates a subnet group. The CSB adds all the present subnets in the specified VPC to the new subnet group. For example, let’s say the operator:
- Specifies a VPC with 25 subnets through the tile.
- Does not specify a database subnet group in the plan.
- Does not specify a database subnet group at provisioning time.
Then the CSB creates a database subnet group and adds all subnets, 25 in this example, to the database subnet group. Hence, this operation breaches the AWS resource quota. |
Solution |
Create a custom database subnet group through the AWS console and add the desired subnets for RDS instances to use. Then use the database subnet group name as a plan or provision parameter. |
Error |
Major engine version should be specified when auto_minor_version_upgrade is enabled |
Operation |
create or update |
Symptom |
Errors containing:
-
Resource postcondition failed ............ .......................................... A Major engine version should be specified when auto_minor_version_upgrade is enabled. Expected engine version: x.x - got: x.x.x
|
Cause |
A business rule prevents you from creating or updating an RDS instance with a configuration that enables auto_minor_version_upgrade and does not select a major engine version. AWS automatically upgrades the minor versions, but you must pick a major version. |
Solution |
Create or update your RDS instance with either auto minor version upgrade deactivated or auto minor version upgrade enabled but select a major engine version. To find out the major version, you can run the following command: aws rds describe-db-engine-versions --engine aurora-mysql --engine-version 5.7.mysql_aurora.2.02.3 --include-all --region us-west-2 | jq -r '.DBEngineVersions[] | { engine_version: .EngineVersion, major_version: .MajorEngineVersion }' Substitute the engine, aurora-mysql, and the engine version, 5.7.mysql_aurora.2.02.3, with the values that you want. |
Error |
Engine version not found when using a major version |
Operation |
create or update |
Symptom |
Errors containing:
-
InvalidParameterCombination: Cannot find version (minor engine version x.x.x) for (specific engine)
- Example:
InvalidParameterCombination: Cannot find version 8.0.mysql_aurora.3.04.0 for aurora-mysql
|
Cause |
The AWS API cannot eventually find a minor version within its catalog. Various causes can induce this error, such as:
- Limited version pool just before the release of a new minor version..
- Eventual inconsistency between the read API and write API.
|
Solution |
Create or update your RDS instance with auto minor version upgrade deactivated and select a specific minor engine version. |
Error |
incompatible-network state |
Operation |
create |
Symptom |
Errors containing:
|
Cause |
An incompatible-network state indicates one or more of the following is true of the Amazon RDS DB instance:
- There are no available IP addresses in the subnet that the Amazon RDS DB instance was launched into.
- The subnet used in the Amazon RDS DB subnet group no longer exists in the Amazon Virtual Private Cloud (Amazon VPC).
|
Solution |
AWS does not make any guarantees as to what subnet from the subnet group an RDS instance is launched in. Although you can assume it is going to balance new instance creation among all the subnets in the group, in reality, this doesn't happen. This means one subnet in the group can run out of IPs, while the others are widely unused. To work around this issue, create a custom DB subnet group through the AWS console and choose the subnets that still have available IP addresses from the navigation pane. Then use the DB subnet group name as a plan or provision parameter. |
Error |
Unreachable publicly accessible DB |
Operation |
create or update |
Symptom |
All following conditions must be occurring:
- Service instance is configured with the property
publicly_accessible: true .
- The database is not reachable from outside your Tanzu Application Service foundation.
- Apps within your Tanzu Application Service foundation can connect without issues by using a service binding.
|
Cause |
Several factors may contribute to the appearance of this error:
- The service instance was associated to an unexpected VPC.
Pitfall: when aws_vpc_id is left blank, the service instance is created in whatever VPC is specified in the Tile's config. Or in AWS' default VPC when not specified.
- The service instance was associated to some unexpected subnets.
Pitfall: when rds_subnet_group is left blank, the service instance is associated to whatever subnet group is specified in the Tile's Service Offering config if present. A new subnet group containing all subnets present in the VPC is created and assigned to the service instance when not specified.
- The service instance was associated to an unexpected security group.
Pitfall: when rds_vpc_security_group_ids is left blank the service instance is associated to whatever subnet group is specified in the Tile's Service Offering config if present. A new security group allows all ingress traffic but no egress traffic is created and assigned to the service instance when not specified.
- The subnet group associated to the service instance contains some private subnets.
Pitfall: according to AWS official docs,
for a database instance to be publicly accessible, all of the subnets in its database subnet group must be public.
- The security groups associated to the service instance are missing some rules to allow routing your external traffic, or some rules conflict with one another.
|
Recommendations |
For operators:
- Explicitly specify
aws_vpc_id , rds_subnet_group , and rds_vpc_security_group_ids in the plans or specify a default value in the Service Offering configuration when this option is present. There is no way to specify at plan level that a property is mandatory and can't be left empty when creating an instance, so if your use case doesn't allow you to set these fields in the plan, keep in mind the pitfalls listed in the preceding Cause section.
- Set
publicly_accessible: false in the plans if your VPC, subnets, and security groups are not designed with public dtabases in mind or if you want to disallow them.
|
Solution |
- Check whether explicitly specifying
aws_vpc_id , rds_subnet_group , and rds_vpc_security_group_ids solves the issue.
- If any of these fields are enforced by the plan, ask maintainers of the plan if they support public databases.
- Check whether you have correctly configured your Security group rules.
- Check whether you have correctly configured your database subnet group.
|
Error |
You can't modify storage type |
Operation |
Update |
Symptom |
Errors containing:
|
Cause |
Some modifications to storage-type are not allowed by AWS. Including but not limited to:
- Changing from
io1 to standard (magnetic) or vice versa.
|
Solution |
For non-production test instances where data is irrelevant the most straightforward solution is to delete the service instance with the wrong storage_type and create a new one. |
For production instances accidentally created with storage_type: standard , the only solution is to back up the existing instance and restore it in a new instance with the right storage_type . |
For production instances created with a different storage_type that you want to migrate to storage_type: standard .
|
Error |
You can't currently modify the storage of this DB instance because the previous storage change is being optimized |
Operation |
Update |
Symptom |
Errors containing:
|
Cause |
Scaling storage usually doesn't cause any outage or performance degradation of the database instance. After you modify the storage size for a database instance, the status of the database instance is storage-optimization . Storage optimization can take several hours. You can't make further storage modifications for either six (6) hours or until storage optimization has completed on the instance, whichever is longer. |
Solution |
Update any other properties not related to disk immediately and postpone the modification of disk-related properties. If you need to upscale your storage capacity, frequently enabling storage autoscaling might be a better option. See Amazon RDS for MSSQL configuration Parameters - max_allocated_storage. |
Error |
InvalidParameterCombination: You can't specify IOPS or storage throughput for engine postgres and a storage size less than 400 |
Operation |
Create |
Symptom |
PostgreSQL/MySQL instances fail to create with this error if storage_type is set to gp3 and storage_gb is less than a 400 GB event when iops is not specified. |
Cause |
The broker has a default value for IOPS of 3000 that is used if no value is specified and IOPS configuration is possible for the `storage_type` requested. However, this value cannot be set if the `storage_gb` value is below a certain threshold. For more information, see the [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage) |
Solution |
Specifying a value of 0 for iops prevents the broker from setting iops in the instance. Baseline storage performance is still maintained by AWS as [documented]((https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)). You have two alternatives
- Specify
"iops":0 in the plans. This value is configured for all instances of the plan and can't be overridden on an instance-per-instance basis.
- Specify
"iops":0 as provision parameter: This can be configured in each service instance individually. cf create-service csb-aws-postgresql PLAN_NAME SERVICE_INSTANCE_NAME -c '{"iops":0}'
|
Error |
InvalidParameterCombination: You can't specify IOPS or storage throughput for engine postgres and a storage size less than 400 |
Operation |
Update and Upgrade |
Symptom |
PostgreSQL/MySQL instances fail to create with this error if storage_type is set to gp3 and storage_gb is less than a 400 GB event when iops is not specified. |
Cause |
While creating instances with "storage_type": "gp3" and "storage_gb" < 400GB can be achieved by setting `iops: 0` as instructed in [gp3-iops--issue](#gp3-iops-create-issue), this doesn't work for updates. The AWS API sets a default iops value and setting it to 0 is interpreted as trying to change that default value. |
Solution |
Specifying a value of 3000 for iops preserves the AWS set default. Instances with the specified conditions must be created with "iops": 0 and then updated or moved to a plan that specifies "iops": 3000 for further updates/upgrades operations to work.
- Specify
"iops":3000 in the plans: This value is configured for all instances of the plan and can't be overridden on an instance-per-instance basis.
- Specify
"iops":3000 as update parameter. This can be configured in each service instance individually. cf update-service csb-aws-postgresql -c '{"iops":3000}'
|
Amazon PostgreSQL Errors
The following errors can occur in any Amazon PostgreSQL instance:
Error |
User does not have permission for tables created by other user in the PUBLIC schema |
Operation |
Bindings modifying/reading tables created by other bindings in the PUBLIC schema |
Symptom |
Errors mentioning lack of permissions/ownership:
must be owner of table
permission denied for table
|
Cause |
The Cloud Foundry binding model implies that multiple bindings can query or edit the same tables. This is particularly useful for rotating credentials where unbind and bind operations are needed. Additionally, the broker does not support creating bindings with different levels of access to the objects created. This means that all bindings need the same access to all objects and can query and edit them regardless of what binding created them in the first place. This conflicts with the PostgreSQL permission model, where the user that created an object is the owner and is the only one who can edit tables and query them, unless permission is explicitly granted to other roles. This applies for bindings and service keys. Specifically, the following issues can happen:
- Binding A not having access to a table that binding B created, when binding B created the table after binding A was created.
- Binding A cannot read tables created by binding B until a new binding C is created.
- Binding A cannot change tables created by binding B until binding B is deleted (unbound from its app).
|
Solution |
All the database users created when binding and creating service keys with Tanzu Cloud Service Broker for AWS are assigned the role binding_user_group . This implies they all have access to tables created by the binding_user_group role. Creating any objects with the binding_user_group role instead of the binding user resolves any of the issues mentioned here. You can achieve this by running SET ROLE binding_user_group before any other instruction in the SQL script that creates your object or framework performing database migrations. If you have issues with tables already created, you must either:
- Unbind the application that has created the objects and bind again (or delete the service key that has created the objects). This is because when unbinding, Tanzu Cloud Service Broker for AWS automatically transfers ownership of existing objects to the
binding_user_group role.
- Manually transfer ownership to the
binding_user_group with the following statement "ALTER TABLE tab_name OWNER TO binding_user_group;". You must run this statement after logging in to the database with the credentials from the binding/service key used to create the objects.
- If you only need other bindings to perform data operations, you can create new bindings for interacting with these objects. This is because Tanzu Cloud Service Broker for AWS assigns permissions to all existing tables whenever a new binding is created. However, this new binding does not have permissions to perform DDL operations until points 1. or 2. are implemented.
|
Amazon MySQL Errors
The following errors can occur in any Amazon MySQL instance:
Error |
InvalidParameterCombination: You can't specify IOPS or storage throughput for engine mysql and a storage size less than 400 |
Operation |
Create |
Symptom |
Instance fails to create with error mentioned above if `storage_type` is set to `gp3` and `storage_gb` is less than 400GB event when `iops` is not specified. |
Cause |
The broker has a default value for iops of 3000 that is used if no value is specified and iops configuration is possible for the `storage_type` requested. However this value cannot be set if the `storage_gb` value is below a certain threshold For more information see [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage) |
Solution |
Specifying a value of `0` for `iops` will prevent the broker from setting `iops` in the instance. Baseline storage performance will still be maintained by AWS as [documented]((https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp3-storage)). You have two alternatives
- Specify `"iops":0` in the plans: This value will be configured for all instances of the plan and can't be overridden on an instance-per-instance bases
- Specify `"iops":0` as provision parameter: This can be configured in each service instance individually.
cf create-service csb-aws-mysql PLAN_NAME SERVICE_INSTANCE_NAME -c '{"iops":0}'
|
Amazon MSSQL Errors
The following errors can occur in any Amazon MSSQL instance:
Error |
InvalidParameterValue: Backup retention cannot be set to zero for DB Instance xxxxx since it has Multi-AZ enabled on it. |
Operation |
Update |
Symptom |
The instance fails to update with this error if `backup_retention_period` is set to `0` and `multi_az` is set to `false` in the same updating operation. |
Cause |
The broker handles asynchronous parallel operations in the upgrade operation and has no ability to set the order of execution of the two updates. |
Solution |
The updates need to be sequential:
You must disable `multi_az` and wait until the operation finishes.
cf update-service SERVICE_INSTANCE_NAME -c '{"multi_az": false}'
Set `backup_retention_period` to `0`.
cf update-service SERVICE_INSTANCE_NAME -c '{"backup_retention_period":0}'
|
Amazon Aurora Errors
The following errors can occur in any Aurora instance:
Error |
The following error message is displayed in the pg_upgrade_server.log logs: FATAL: shared memory segment sizes are configured too large |
Operation |
Update |
Symptom |
Major upgrade fails and error in pg_upgrade_server.log (viewable from the AWS console) shows the preceding error. |
Cause |
Major version upgrade in Amazon Aurora for PostgreSQL with small instance types is not straightforward. In particular, upgrade for small instances or serverless with low max capacity, for example 2 ACUs, causes an error even through the AWS console. |
Solution |
Perform the following changes sequentially:
- First:
- Temporarily update to a bigger
instance_class type.
- or, if using
serverless instance type, increase the max capacity serverless_max_capacity to at least 4 ACUs
cf update-service SERVICE_INSTANCE_NAME -c '{"serverless_max_capacity": 4}'
- Once that's done, you can retry the major version upgrade. For example:
cf update-service SERVICE_INSTANCE_NAME -c '{"engine_version": "14"}'
- Finally, scale down
instance_class or serverless_max_capacity to its previous value, since the extra capacity was only needed during the actual upgrade.
cf update-service SERVICE_INSTANCE_NAME -c '{"serverless_max_capacity": 2}'
|