You can trigger a failover and switchover when using a Multi‑Site Replication plan.
You can trigger a failover or switchover to redirect traffic to a secondary foundation.
For information about when to trigger a failover or switchover, see About Failover and Switchover.
Before you trigger a failover or switchover, you must verify that the follower service instance is healthy. See Verify Follower Health.
The procedures in this topic assume that you created the leader service instance in the primary foundation and the follower service instance in the secondary foundation.
Before you trigger a failover or switchover, you must verify that the follower service instance is healthy. If your follower service instance is unhealthy, contact Support.
To verify the service instance:
Log in to the deployment for your secondary foundation by running:
cf login SECONDARY-API-URL
Where SECONDARY-API-URL
is the API endpoint for your secondary foundation.
Record the GUID of the follower service instance by running:
cf service SERVICE-INSTANCE-NAME --guid
Where SERVICE-INSTANCE-NAME
is the name of the follower service instance.
For example:
$ cf service my-follower --guid 12345678-90ab-cdef-1234-567890abcdef
SSH into the Tanzu Operations Manager VM by following the procedure in Log in to the Tanzu Operations Manager VM with SSH.
From the Tanzu Operations Manager VM, log in to your BOSH Director by following the procedure in SSH Into the BOSH Director VM.
View the health of the service instance by running:
bosh -d service-instance_GUID instance
For example:
$ bosh -d service-instance_12345678-90ab-cdef-1234-567890abcdef instance
Using environment 'https://10.0.0.6:25555' as client 'admin'
Task 21409. Done
Deployment 'service-instance_12345678-90ab-cdef-1234-567890abcdef'
Instance Process State AZ IPs
mysql/1373022d-4eab-46d3-8fd1-a12067edf597 running z2 10.0.17.14
1 instances
Succeeded
Ensure that the service instance is running
. If the service instance is failing
, contact Support.
You can only trigger failover if you do not need to recover the leader service instance.
To trigger a failover:
If you try to promote a leader-follower, highly available cluster, or single node service instance to leader or make it read-only, you get an error message similar to:
Updating service instance haDB as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: 1 error occurred: * the configuration parameter 'initiate-failover' is not a valid option
To promote the follower service instance to leader:
Log in to the deployment for your secondary foundation by running:
cf login SECONDARY-API-URL
Where SECONDARY-API-URL
is the API endpoint for your secondary foundation.
Promote the follower service instance to leader by running:
cf update-service SECONDARY-INSTANCE \
-c '{"initiate-failover":"promote-follower-to-leader"}'
For example:
$ cf update-service secondary-node \ -c '{"initiate-failover":"promote-follower-to-leader"}'
Updating service instance secondary-node as admin... OK
If this command fails, do one of the following:
Updating service instance secondary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Promotion of follower failed - has 1 transactions still unapplied
Updating service instance secondary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Promotion of follower failed - the leader is still writable
Watch the progress of the service instance update by running:
watch cf services
Wait for the last operation
for your instance to show as update succeeded
.
For example:
$ watch cf services
Getting services in org my-org / space my-space as admin... OK name service plan bound apps last operation secondary-node p.mysql db-pxc-single-node-small update succeeded
Reconfigure your global DNS load balancer to direct all traffic to apps in your secondary foundation. See Configure Your GLB.
When you do a failover, you cannot manually recover the leader service instance. After you promote the follower service instance to leader, you remove the former leader service instance. Otherwise, the service instance can recover in read-write mode.
The way you remove the service instance, depends on whether the VM has been lost or not.
To remove the former leader service instance:
Log in to the deployment for your primary foundation by running:
cf login PRIMARY-API-URL
Where PRIMARY-API-URL
is the API endpoint for the primary foundation.
Do one of the following:
If the foundation is lost, you purge the service instance after following the steps to recover the foundation's Cloud Controller database in Restoring Deployments from Backup with BBR.
If the VM is not lost:
Remove all app bindings by following the procedure Unbind an App from a Service Instance in Using VMware SQL with MySQL for Tanzu Application Service.
Delete the service keys from the former leader service instance.
Delete the service instance by following the procedure Delete a Service Instance in Using VMware SQL with MySQL for Tanzu Application Service.
To reconfigure Multi‑Site Replication between two instances, a new follower without any data must be created in the primary foundation.
To create a follower:
Log in to the deployment for your primary foundation by running:
cf login PRIMARY-API-URL
Where PRIMARY-API-URL
is the API endpoint for the primary foundation.
Create a service instance using the same Multi‑Site Replication plan that was used to create the original instance:
follower
because, if in the future you trigger a failover or switchover, this instance can no longer be the follower.The follower in the primary foundation needs to catch up to the newly promoted leader in the secondary foundation.
Reconfigure Multi‑Site Replication so that the follower in the primary foundation receives the data from the leader in the secondary foundation.
To reconfigure:
To trigger a switchover:
If you try to promote a leader-follower, highly available cluster, or single node service instance to leader or make it read-only, you get an error message similar to the following:
Updating service instance haDB as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: 1 error occurred: * the configuration parameter 'initiate-failover' is not a valid option
Before you promote the follower service instance, you must make the leader service instance, which is in the primary foundation, a read-only format.
To make the leader read-only and promote the follower to leader in the secondary foundation:
Log in to the deployment for your primary foundation by running:
cf login PRIMARY-API-URL
Where PRIMARY-API-URL
is the API endpoint for the primary foundation.
Set the service instance that is currently the leader to read-only by running:
cf update-service PRIMARY-INSTANCE \
-c '{"initiate-failover":"make-leader-read-only"}'
For example:
$ cf update-service primary-node \ -c '{"initiate-failover":"make-leader-read-only"}'
Updating service instance primary-node as admin... OK
Watch the progress of the service instance update by running:
watch cf services
Wait for the last operation
for your instance to show as update succeeded
.
Log in to the deployment for your secondary foundation by running:
cf login SECONDARY-API-URL
Where SECONDARY-API-URL
is the API endpoint for your secondary foundation.
Promote the service instance in the secondary foundation to leader by running:
cf update-service SECONDARY-INSTANCE \
-c '{"initiate-failover":"promote-follower-to-leader"}'
If this command fails, do one of the following:
Updating service instance secondary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Promotion of follower failed - has 1 transactions still unapplied
Updating service instance secondary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Promotion of follower failed - the leader is still writable
Watch the progress of the service instance update by running:
watch cf services
Wait for the last operation
for your instance to show as update succeeded
.
To establish a connection between the service instances in the primary and secondary foundations, you must reconfigure replication. Reconfiguring replication is similar to the procedure in Configure multi‑site replication except the service instance in the primary foundation is the follower and the service instance in the secondary foundation is the leader.
To successfully trigger a switchover, the follower dataset must be a subset of the leader dataset. This means that VMware Tanzu for MySQL has not written new data to the follower. The follower must also be no more than 3 days behind the leader.
If your follower instance does not satisfy these requirements, you must create a new Multi‑Site Replication service instance and reconfigure replication using this new, empty instance as the follower.
The following diagram describes the workflow for reconfiguring multi-site replication:
The steps shown in the diagram are:
To reconfigure replication for the service instances:
Log in to the deployment for your primary foundation by running:
cf login PRIMARY-API-URL
Create a host-info service key for the service instance in your primary foundation:
cf create-service-key PRIMARY-INSTANCE SERVICE-KEY \
-c '{"replication-request": "host-info"}'
Where:
PRIMARY-INSTANCE
is the name of the follower service instance in the primary foundation.SERVICE-KEY
is a name you choose for the host-info service key.For example:
$ cf create-service-key primary-node host-info \ -c '{"replication-request": "host-info" }'
Creating service key host-info for service instance primary-node as admin... OK
View the replication-credentials
for your host-info service key by running:
cf service-key PRIMARY-INSTANCE SERVICE-KEY
Where:
PRIMARY-INSTANCE
is the name of the follower service instance in the primary foundation.SERVICE-KEY
is the name of the host-info service key you created in previous step.For example:
$ cf service-key primary-node host-info-key
Getting key host-info-key for service instance primary-node as admin... { "credentials": { "replication": { "peer-info": { "hostname": "primary.bosh", "ip": "10.0.19.12", "system_domain": "sys.primary-domain.com", "uuid": "ab12cd34-5678-91e2-345f-67891h234567" }, "role": "leader" } } }
CautionIn cf CLI v8, the response includes a top-level
credentials
key. Earlier versions of the cf CLI do not include a top-levelcredentials
key. This procedure assumes that you are using cf CLI v8.
Record the output of the previous command, and remove the top-level credentials
key.
Log in to the deployment for your secondary foundation by running:
cf login SECONDARY-API-URL
Update your leader service instance in the secondary foundation with the host-info service key by running:
cf update-service SECONDARY-INSTANCE -c HOST-INFO
Where:
SECONDARY-INSTANCE
is the name of the primary service instance you created in step 1 of Creating Multi‑Site Replication service instances.HOST-INFO
is the output you recorded in step 3.For example:
$ cf update-service primary-node -c '{ "replication":{ \ "peer-info":{ \ "hostname": "primary.bosh", \ "ip": "10.0.18.12", \ "system_domain": "sys.primary-domain.com", \ "uuid": "ab12cd34-5678-91e2-345f-67891h234567" \ },\ "role": "leader" \ } \ }'
Updating service instance primary-node as admin... OK
Monitor the progress of the service instance update by running:
watch cf services
Wait for the last operation
for your instance to show as update succeeded
.
Create a credentials service key for the service instance in your secondary foundation by running:
cf create-service-key SECONDARY-INSTANCE SERVICE-KEY-NAME \
-c '{"replication-request": "credentials"}'
Where:
SECONDARY-INSTANCE
is the name of the service instance in the secondary foundation.SERVICE-KEY-NAME
is a name you choose for the credentials service key.For example:
$ cf create-service-key secondary-node cred-key \ -c '{"replication-request": "credentials" }'
Creating service key cred-key for service instance secondary-node as user@example.com... OK
The -c
flag is different than the one in step 2.
View the replication-credentials
for your credentials service key by running:
cf service-key SECONDARY-INSTANCE SERVICE-KEY-NAME
Where:
SECONDARY-INSTANCE
is the name of the leader service instance in the secondary foundation.SERVICE-KEY-NAME
is the name of the credentials service key you created in step 7.For example:
$ cf service-key secondary-node cred-key
Getting key cred-key for service instance secondary as admin... { "credentials": { "replication": { "credentials": { "password": "a22aaa2a2a2aaaaa", "username": "6bf07ae455a14064a9073cec8696366c" }, "peer-info": { "hostname": "secondary.bosh", "ip": "10.0.17.12", "system_domain": "sys.secondary-domain.com", "uuid": "zy98xw76-5432-19v8-765u-43219t876543", "ports": { "mysql": 3306, "agent": 8443, "backup": 8081 }, }, "role": "follower" } } }
CautionIn cf CLI v8, the response includes a top-level
credentials
key. Earlier versions of the cf CLI do not include a top-levelcredentials
key. This procedure assumes that you are using cf CLI v8.
Record the output of the previous command, and remove the top-level credentials
key.
Log in to the deployment for your primary foundation by running:
cf login PRIMARY-API-URL
Update the follower service instance in the primary foundation with the credentials service key by running:
cf update-service PRIMARY-INSTANCE -c CREDENTIALS
Where:
PRIMARY-INSTANCE
is name of the follower service instance in the primary foundation.CREDENTIALS
is the output you recorded in the previous step.For example:
$ cf update-service primary-node -c '{"replication": { \ "credentials": { \ "password": "a22aaa2a2a2aaaaa", \ "username": "6bf07ae455a14064a9073cec8696366c" \ }, \ "peer-info": { \ "hostname": "secondary.bosh", \ "ip": "10.0.17.12", \ "system_domain": "sys.secondary-domain.com", \ "uuid": "zy98xw76-5432-19v8-765u-43219t876543", \ "ports": { \ "mysql": 3306, \ "agent": 8443, \ "backup": 8081 \ }, \ }, \ "role": "follower" \ } \ }'
Updating service instance primary-node as admin... OK
Watch the progress of the service instance update by running:
watch cf services
Wait for the last operation
for your instance to show as update succeeded
.
You now have a leader-follower service instance successfully configured, where the leader is in your secondary foundation and your follower is in the primary foundation.
If this command fails and you get one of the following errors, you must create a new Multi‑Site Replication service instance and reconfigure replication using this new, empty instance as the follower.
$ cf update-service primary-node -c /tmp/credentials-key.json Updating service instance primary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Establishing Replication Failed - follower is too far behind Leader to start replication Leader GTIDs offering: "487e6056-6e93-11ea-8c96-42010a010806:5-9" Follower GTIDs missed: "487e6056-6e93-11ea-8c96-42010a010806:1-9" Try again with an empty instance or contact your operator to troubleshoot
$ cf update-service primary-node -c /tmp/credentials-key.json Updating service instance primary-node as admin... FAILED Server error, status code: 502, error code: 10001, message: Service broker error: Establishing Replication Failed - the follower has divergent data Leader GTIDs applied: "bd2ff185-6947-11ea-80d8-42010a000808:1-20" Follower GTIDs applied: "c1abd2a4-6947-11ea-8099-42010a010807:1-15" Try again with an empty instance or contact your operator to troubleshoot
In either case, you must create a new multi-site service instance and reconfigure replication using this new, empty instance as the follower.