Database Failover - Newer Appliances

This topic describes database failover functionality for appliances running version 9.0.0 or later. For these appliances, failover is entirely automated except when a service provider appliance shuts down unexpectedly, in which case there is still a manual process.

Note: This topic only applies to appliances running version 9.0.0 or later. For information pertinent to older appliances, see Database Failover - Legacy Appliances.

Automated Failover

Automated failover functionality includes the following.

Single datasource connection - All appliances now use a single datasource connection which is proxied through the new pgbouncer service. Pgbouncer is a connection pooler which maintains connections to the underlying postgres databases while providing a single connection point for the appliance services to use. This service also handles all failover and primary migrations; it is updated automatically to point to the new primary database, enabling a seamless transition between primaries.
Controlled switchover of primary database on appliance shutdown or restart - During any guest-initiated shutdown or restart of the current primary database, the primary role is migrated to the other appliance in the HA pair in a controlled manner. This causes only minor disruption to the functioning of the appliances, primarily impacting desktop connections via the Blast (web) portal. Desktop connections made via the Horizon Client are unaffected. There is also a brief period of horizonadmin unavailability, and users requesting a desktop connection during the switchover might get an error that no desktops are available (but retrying shortly afterwards should allow them to connect to a desktop).
Auto failover of primary database for unexpected failure of tenant appliance - Primary failures are now automatically detected, followed by a request to the service provider appliances to verify that the appliance is actually down. The failover process is then initiated after a three minute delay. This new failover process is less disruptive than past failover processes, and leaves the slony cluster in a normal replicating state afterwards – no slony reinit is required after a failover occurs.

These features are supported by the services described below.

Service	Description	Log location	Notes
Pgbouncer	Connection pooler that brokers database connections between the platform services and postgres.	/var/log/pgbouncer/	Connections to postgres using the psql command now by default go through pgbouncer, and this connects you to the primary database regardless of which appliance you are using. To connect to a specific database instance, use the -h and -p flags with psql. You must also specify port 6432 to connect to postgres directly. For example: psql -U admin -h ‹Appliance_IP› -p 6432 fdb You can check the pgbouncer configuration in /etc/pgbouncer/pgbouncer.ini to verify that it is pointing to the correct appliances. The connect strings at the top of the file should be pointing to the current slony primary. If they are not, in most cases a restart of the appliance will fix the problem.
Dbmonitor	Monitoring service that does the following: Detects primary failures and initiates the new failover process. Detects slony primary node changes caused by controlled switchover or resubscribe and updates pgbouncer with the new primary address accordingly.	/var/log/dbmonitor/
Switchover	Script runs on shutdown to perform switchover action.	/var/log/desktone/slony-services
Resubscribe	Script runs on startup to perform resubscribe action.	/var/log/desktone/slony-services

Manual Failover of Service Provider Appliance

If a service provider appliance has an unexpected shutdown, you can conduct a manual failover by performing the steps below on the other service provider appliance.

Run the failover-slony-master script (located in /usr/local/desktone/scripts/) as root:
```
failover-slony-master <database type> '<database password>'
```
where <database type> is fdb, edb, or avdb.
Confirm that the pgbouncer.ini file is pointing to the front-end IP address of the current appliance:
```
/usr/local/desktone/scripts# grep '<IP address>' /etc/pgbouncer/pgbouncer.ini
```
Reload the pgbouncer service:
```
service pgbouncer reload
```
Confirm primary status and replication by running slony-status:
```
/usr/local/desktone/scripts# slony-status <org #>
```