Triggering a leader-follower failover

You can trigger a failover of apps from the leader to the follower.

You can trigger a failover in the following scenarios:

You want to take the leader VM down to do planned maintenance.
The performance of the leader VM degrades.
The leader VM fails unexpectedly.
The AZ where the leader VM is located goes offline unexpectedly.

You can use the following metrics to determine if you need to trigger a failover:

/p.mysql/available: This metric monitors whether the MySQL server is currently available. For more information, see Server availability.
/p.mysql/follower/seconds_behind_master: This metric monitors how far behind the follower is in applying writes from the leader. For more information, see Leader-Follower metrics.
/p.mysql/follower/seconds_since_leader_heartbeat: This metric monitors the number of seconds that elapse between the leader heartbeat and the replication of the heartbeat in the follower. For more information, see Leader-Follower metrics.

For information about errands used to trigger failover, see configure-leader-follower, make-leader, and make-read-only.

To trigger a failover:

Retrieve information.
Promote the follower.
Clean up former leader VM (Optional).
Configure the new follower.
Unbind and rebind the app.

Retrieve information

To retrieve the information necessary for stopping the leader and promoting the follower:

Log in to your deployment by running:
```
cf login API-URL
```
When prompted, enter your credentials.
Target the org and space where the leader-follower service instance is located by running:
```
cf target -o DESTINATION-ORG -s DESTINATION-SPACE
```
Record the GUID of the service instance by running:
```
cf service SERVICE-INSTANCE-NAME --guid
```
Where SERVICE-INSTANCE-NAME is the name of the leader-follower service instance.

For example:
```
$ cf service my-lf-instance --guid
82ddc607-710a-404e-b1b8-a7e3ea7ec063
```
If you do not know the name of the service instance, you can list service instances in the space with cf services.
Follow the procedure at Gather credential and IP Address information and SSH into Tanzu Operations Manager to SSH into the Tanzu Operations Manager VM.
From the Tanzu Operations Manager VM, log in to your BOSH Director with the BOSH CLI. For more information on logging in with the BOSH CLI, see Log in to the BOSH Director.

Use the BOSH CLI to run the inspect errand by running:

bosh -d service-instance_GUID run-errand inspect

Where GUID is the GUID of the leader-follower service instance you recorded.

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    run-errand inspect

See the output about the leader-follower MySQL VMs and identify the instance marked Role: leader.

For example output:

Instance   mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
Exit Code  0
Stdout     2018/04/03 18:08:46 Started executing command: inspect
        2018/04/03 18:08:46
        IP Address: 10.0.8.11
        Role: leader
        Read Only: false
        Replication Configured: false
        Replication Mode: async
        Has Data: true
        GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18
        2018/04/03 18:08:46 Successfully executed command: inspect
Stderr     -


Instance   mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
Exit Code  0
Stdout     2018/04/03 18:08:46 Started executing command: inspect
        2018/04/03 18:08:46
        IP Address: 10.0.8.10
        Role: follower
        Read Only: true
        Replication Configured: true
        Replication Mode: async
        Has Data: true
        GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18
        2018/04/03 18:08:46 Successfully executed command: inspect

Record the index of the instance marked Role: leader. In this example output, the index of the leader VM is ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0.
Record the index of the other instance, which is the follower VM. In this example output, the index of the follower VM is 37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8.

If you still have access to the AZ where the leader VM is located, determine if the leader VM is in the AZ you want to take offline by running:

bosh -d service-instance_GUID run-errand instances

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
instances
Deployment 'service-instance_f378ec82-61a4-4e66-8ed9-889c7cf5342f'


Instance                                    Process State  AZ             IPs
mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0  failing        us-central1-f  10.0.8.11
mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8  running        us-central1-a  10.0.8.10
2 instances

The leader VM might not display its status as failing if you are performing planned maintenance.

Examine the output to determine if the leader VM is in the AZ you want to take offline.

Promote the follower

To stop the leader VM and promote the follower VM to the new leader:

Stop any data from being written to the leader VM by setting it to read-only by running:

 bosh -d service-instance_GUID \
 run-errand make-read-only \
  --instance=mysql/INDEX

Where:

GUID: This is the GUID of the leader-follower service instance retrieved above.
INDEX: This is the index of the leader VM retrieved above.

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    run-errand make-read-only \
    --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

If you still have access to the AZ where the leader VM is located, stop the leader VM by running:

bosh -d service-instance_GUID stop mysql/INDEX

Use the index of the leader VM retrieved above.

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

Set the follower VM as writable by running:

bosh -d service-instance_GUID run-errand make-leader --instance=mysql/INDEX

Use the index of the follower VM retrieved above.

For example:

$ bosh -d service-instance\_82dc607-710a-404e-b1b8-a7e3ea7ec063 \
  run-errand make-leader \
  --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8

If this command returns an error, re-run it until the follower VM has completed applying the transactions.

At this point, a single instance is working but leader-follower replication has not yet been restored. To fail your app over to a single instance instead of restoring leader-follower, skip to Unbind and Rebind the App.

If you are triggering a failover in response to the AZ of the leader VM going offline, you can fail your app over to a single instance by following the procedure in Unbind and Rebind the App below. However, to restore leader-follower, you must regain access to the AZ where your leader VM is located before following the procedure in Clean Up Former Leader VM (Optional) and Configure the New Follower below.

Clean up former leader VM (Optional)

If you are triggering a failover in response to a failing leader VM, to clean up the former leader VM:

Deactivate resurrection, specifying the same deployment as previously shown, by running:
```
bosh update-resurrection off
```

Retrieve the CID of the failing former leader VM by running:

bosh -d service-instance_GUID instances \
  --details \
  --failing \
  --column=”VM CID” \
  --json

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
    --details \
    --failing \
    --column=”VM CID” \
    --json

Retrieve the disk CID of the failing former leader VM by running:

bosh -d service-instance_GUID instances \
  --details \
  --failing \
  --column=”Disk CIDs” \
  --json

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
  --details \
  --failing \
  --column=”Disk CIDs” \
  --json

Delete the failing former leader VM by running:
```
bosh -d service-instance_GUID delete-vm vm-CID
```
Where:
- GUID: This is the GUID of the leader-follower service instance retrieved above.
- CID: This is the CID of the failing former leader VM retrieved above.
For example:
```
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    delete-vm i-1db9ede6
```
Orphan the disk of the failing former leader VM by running:
```
bosh -d service-instance_GUID orphan-disk DISK-CID
```
Where:
- GUID: This is the GUID of the leader-follower service instance retrieved above.
- DISK-CID: This is the disk CID of the failing former leader VM retrieved above.
For example:
```
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
  orphan-disk b-1db9ede6
```
Orphaning a disk rather than deleting it preserves the disk for possible recovery. After performing recovery operations, you can reattach the disk to a VM. BOSH deletes orphaned disks after five days by default.

Configure the new follower

To start the former leader VM again and configure it as the new follower:

Create the former leader VM again by running:
```
bosh -d service-instance_GUID \
  recreate \
  mysql/INDEX
```
Where:
- GUID: This is the GUID of the leader-follower service instance retrieved above.
- INDEX: This is the index of the former leader VM that you are re-creating
For example:
```
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
 recreate \
 mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd01.
```

Set the former leader VM as a follower, using the same values as previously shown, by running:

bosh -d service-instance_GUID \
  run-errand configure-leader-follower \
  --instance=mysql/INDEX

For example:

$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
  run-errand configure-leader-follower \
  --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

Use the BOSH CLI to run the inspect errand, using the same value as previously shown, by running:
```
bosh -d service-instance_GUID \
    run-errand inspect
```
For example:
```
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
   run-errand inspect
```
If the output displays one instance marked Role: leader and another instance marked Role: follower, then leader-follower replication and high availability are resumed. The deployment should be in its original, working state. You can turn resurrection back on if you want too.

Unbind and rebind the app

To fail their apps over to the new leader VM, your developers must bind and rebind their apps to the leader-follower service instance:

If you have BOSH DNS enabled in Tanzu Operations Manager, you do not need to unbind and re-bind your app to a leader-follower service instance to failover the app. The operator activates BOSH DNS in BOSH Director > BOSH DNS Config.

Important If a developer rebinds an app to the Tanzu SQL for VMs service after unbinding, they must also rebind any existing custom schemas to the app. When you rebind an app, stored code, programs, and triggers break. For more information about binding custom schemas, see Use custom schemas.

To unbind and rebind your app:

Unbind the app from the leader-follower service instance by running:
```
cf unbind-service APP-NAME SERVICE-INSTANCE-NAME
```
Where:
- APP-NAME: This is the name of the app bound to the leader-follower service instance.
- SERVICE-INSTANCE-NAME: This is the name of the leader-follower service instance.
Rebind the app to the leader-follower service instance by running:
```
cf bind-service APP-NAME  SERVICE-INSTANCE-NAME
```
Restage the app by running:
```
cf restage APP-NAME
```