About the replication canary

This topic describes the replication canary for an highly available (HA) cluster. The replication canary runs on the jumpbox VM and monitors an HA cluster to ensure that replication is working.

The replication canary writes to a private dataset in the cluster and attempts to read the written data from each node. It pauses between read and write mode to ensure that the write-sets have been replicated. The private dataset does not use a significant amount of disk capacity.

When replication fails, the canary cannot read the data from all the nodes and does the following:

Sends a message that replication has failed to a configured email address. For more on notification emails, see Sample notification email.
Deactivates client access to the cluster. For more information on client access to the client, see Determine if the cluster is accessible.

When replication fails, data can be lost. Contact Support immediately in the case of replication failure.

Sample notification email

If the canary detects replication failure, it immediately sends an email through the VMware Tanzu Application Service for VMs (TAS for VMs) notification service. You must have configured email notifications in TAS for VMs for the replication canary to send emails.

For more information about configuring email notifications, see the Configure Email Notifications procedure for your IaaS in the Ops Manager documentation.

The notification service sends emails similar to the following:

Subject: CF Notification: p-mysql Replication Canary, alert 417

This message was sent directly to your email address.

{alert-code 417}
This is an email to notify you that the MySQL service's replication canary
has detected an unsafe cluster condition in which replication is not
performing as expected across all nodes.

Determine if the cluster is accessible

When the canary detects replication failure, it deactivates connections to the database cluster through the proxies. When the replication issue is resolved, the canary automatically restores client access to the cluster. You can determine if the canary deactivated the cluster access by observing the cluster using the Switchboard API.

To determine if cluster access is deactivated, do the following:

Do the prerequisite procedure in Monitor node health.

To view cluster access, run the following command:

curl https://USERNAME:PASSWORD@N-HOSTNAME/v0/cluster

Where:

USERNAME is the username you recorded in step 1.
PASSWORD is the password you recorded in step 1.
N is 0, 1, or 2 depending on the proxy you want to connect to.
HOSTNAME is the hostname you recorded in step 1.

For example:

$ curl https://abcdefghijklmno:012345678912345@0-proxy.123abc45-67d8-912e-34f5-g34612c10dba.org.dedicated-mysql.cf-app.com/v0/cluster
[
  {
    "currentBackendIndex":0,
    "trafficEnabled":false,
    "message":"Disabling cluster traffic",
    "lastUpdated":"2016-07-27T05:16:29.197754077Z"
  }
]

When cluster access is deactivated, trafficEnabled is set to false.

If you must restore client access to the cluster while replication is failing, contact Support.

For more information about the Switchboard API, see Monitoring node health.