VMware Data Services Manager enables you to manage disaster recovery operations of the Provider VMs, Agent VMs, and Database VMs through the Site Recovery Manager (SRM). Availability of the disaster recovery feature enables you to maintain business continuity from a Recovery site even if the Primary site goes down due to a power outage or some other calamity.
If the Primary site is down (caused by power outage, network issues, storage issues, and so on), a Recovery site allows business continuity until the Primary site is up again. When the Primary site is up, business as usual continues with all the functionalities running from the Primary site.
SRM uses vSphere replication appliance to provide disaster recovery for VMs in VMware Data Services Manager. The following list provides the prerequisites for vSphere replication and for the smooth functioning of disaster recovery:
After you have configured VMware Data Services Manager and created databases, as required, in the Primary site, the major steps to configure SRM for disaster recovery are as follows, and for more details, see SRM documentation.
Step 1: Install SRM and and vSphere replication appliances in the Primary site and the Recovery site:
Step 2: Configure prerequisites (mentioned in the Prerequisites section).
Step 3:(Optional) Test the disaster recovery setup from the Primary site to the Recovery site to ensure that it is running smoothly, and then fix errors, if any.
Step 4:After the Primary site goes down, to ensure business continuity from the Recovery site, run the Recovery Plans starting with the Provider recovery, followed by the Agent recovery, and finally the database recovery.
Step 5:(Recommended) After Primary site is up, perform recovery from the Recovery site to the Primary site. Perform recovery of the Provider VMs, followed by the Agent VM, and finally the database VMs. Start performing all the VMware Data Services Manager functionalities from the Primary site again and be prepared to run Recovery plans when the Primary site goes down again.
After running the recovery plans to recover from the Primary site to the Recovery site, you need to perform few steps for MySQL database clusters. Also, when the Primary site is up again, the same steps needs to be performed when you want to recover back to the Primary site from the Recovery site.
If a MySQL database cluster has three nodes, and for example, the new Application network IP addresses assigned to each node is as follows:
Network | IP Address |
---|---|
Cluster | 192.168.20.100/24 |
Primary Node Application Network | 192.168.20.101/24 |
Secondary Node 1 Application Network | 192.168.20.102/24 |
Secondary Node 2 Application Network | 192.168.20.103/24 |
Modify the /etc/keepalived/keepalived.conf
file on each node of a MySQL cluster as follows:
Sample keepalived.conf
file for Primary node:
# keep rest of the content here as-is
vrrp_instance VRRP1 {
# keep rest of the contents here as-is
# change start
unicast_src_ip 192.168.20.101 # Application Network IP address of the current Node
unicast_peer {
192.168.20.102 # Application Network IP address of Secondary Node 1
192.168.20.103 # Application Network IP address of Secondary Node 2
}
virtual_ipaddress {
192.168.20.100/24 # Virtual/Cluster IP
}
# change end
# keep rest of the content here as-is
}
Sample keepalived.conf
file for Secondary Node 1:
# keep rest of the contents here as-is
vrrp_instance VRRP1 {
# keep rest of the content here as-is
# change start
unicast_src_ip 192.168.20.102 # Application Network IP address of the current Node
unicast_peer {
192.168.20.101 # Application Network IP address of the Primary Node
192.168.20.103 # Application Network IP addressp of the Secondary Node 2
}
virtual_ipaddress {
192.168.20.100/24 # Virtual/Cluster IP
}
# change end
# keep rest of the contents here as-is
}
Sample keepalived.conf
file for Secondary Node 2:
# keep rest of the contents here as-is
vrrp_instance VRRP1 {
# keep rest of the content here as-is
# change start
unicast_src_ip 192.168.20.103 # Application Network IP address of the current Node
unicast_peer {
192.168.20.101 # Application Network IP address of the Primary Node
192.168.20.102 # Application Network IP addressp of the Secondary Node 1
}
virtual_ipaddress {
192.168.20.100/24 # Virtual/Cluster IP
}
# change end
# keep rest of the contents here as-is
}
Restart keepalived.service
on all nodes of the MySQL database cluster.
Run the following MySQL shell commands to bring back the cluster from complete outage:
Command | Purpose |
---|---|
jq . /var/dbaas/configure/input.json |
To get root user password for any database vm in the given cluster |
mysqlsh root@<cluster-ip> -- dba reboot-cluster-from-complete-outage Cluster1 |
To reboot the cluster. This command needs to be run from a VM that can resolve FQDN of all members of the MySQL database cluster. |
mysqlsh root@<cluster-ip> -- cluster status |
To validate the status of the cluster. |
After running the mysqlsh root@<cluster-ip> -- cluster status
MySQL shell command, validate that roles of the cluster members are correctly assigned and the status of the cluster is ONLINE.
To ensure business continuity when the Primary site is down, there are certain tasks that you can perform on the Recovery site and they are as follows:
There are certain tasks that you cannot perform on the Recovery site and they are as follows: