Guidance for Running High Availability Setups

If the Sensor Gateway goes down, the sensors cannot communicate with the Carbon Black Cloud. To have High Availability (HA), Carbon Black recommends setting up at least two Sensor Gateway servers in Active or Passive mode. As a result, if one Sensor Gateway goes down, the other is available to cater the requests.

Although you can view the health of your Sensor Gateway servers in the Cloud, Carbon Black recommends monitoring the health of the Sensor Gateway in-house using any of the available tools. Thus, ensuring the Sensor Gateway servers remain healthy and avoid a situation where both the Active and Passive Sensor Gateway servers are down.

To set up the Sensor Gateway in Active or Passive mode, follow the steps.

Note: Consider the following when configuring the Keepalived settings.

Priority value is higher on the primary server. The state does not matter. If the state is MASTER but the priority is lower than the router with BACKUP, the server loses its MASTER state.
Virtual_router_id must be the same on the Sensor Gateway Server 1 and the Sensor Gateway Server 2.
By default, a single vrrp_instance supports up to 20 virtual_ipaddress. To add more addresses, you must add more instances.

Prerequisites

Ensure you have a network setup. The IP addresses in the network scenario below are for illustration purpose only. The Virtual IP is of utmost importance. You use this IP address while configuring the sensors. You can also map the IP to a DNS sensor-gateway.somecompany.com and use the FQDN while configuring the sensors.

Table 1. Network scenario
Sensor Gateway Server 1	192.168.10.111 (eth0)
Sensor Gateway Server 2	192.168.10.112 (eth0)
Virtual IP	192.168.10.121

Procedure

Install the required packages for configuring Keepalived on each of the Sensor Gateway servers by running the commands:
```
sudo apt-get update
sudo apt-get install linux-headers-$(uname -r)
```
Install Keepalived on both Sensor Gateway servers by running the command:
```
sudo apt-get install keepalived
```
Keepalived packages are available under the default apt repositories.

Set up Keepalived on the Sensor Gateway Server 1.

Create or open with an editor the Keepalived configuration file /etc/keepalived/keepalived.conf:
```
vim /etc/keepalived/keepalived.conf
```

Add the following settings. Update all values with your network and system configuration.

! Configuration File for keepalived

global_defs {
   notification_email {
     [email protected]
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}
vrrp_script chk_sgw_service_status {
    script "/bin/sh /usr/local/bin/sgw-service-check.sh"
    interval 30
    fall 3
    rise 3
    timeout 2
    weight 5
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.10.121
    }
    track_script {
        chk_sgw_service_status
    }
}

Set up Keepalived on the Sensor Gateway Server 2.

Create or open with an editor the Keepalived configuration file /etc/keepalived/keepalived.conf:
```
vim /etc/keepalived/keepalived.conf
```

Add the following settings. Update all values with your network and system configuration.

Note: Set the priority value lower than the one in the Sensor Gateway Server 1. For example, the configuration below shows the priority with value of 100 and the Sensor Gateway Server 1 has its priority as 101.

! Configuration File for keepalived

global_defs {
   notification_email {
     [email protected]
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}
vrrp_script chk_sgw_service_status {
    script "/bin/sh /usr/local/bin/sgw-service-check.sh"
    interval 30
    fall 3
    rise 3
    timeout 2
    weight 5
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 101
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.10.121
    }
    track_script {
        chk_sgw_service_status
    }

}

Create and edit /usr/local/bin/sgw-service-check.sh on both Sensor Gateway servers.

#!/bin/sh
curl -s -k https://localhost/sgw/health_check >  /tmp/health_response.json
RESPONSE=$(jq '.ErrorCode' /tmp/health_response.json)
if [ "$RESPONSE" = "null" ]
then
  exit 0
else
  exit 1
fi

Start the Keepalived service and configure it to auto-start on system boot.
```
sudo service keepalived start
```

Check for assigned Virtual IP on the interface.

ip addr show eth0

By default, a virtual IP is assigned to the primary server. If the primary server goes down, it automatically assigns the virtual IP to the secondary server.

Sample output:

2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:b9:b0:de brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.111/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.121/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::11ab:eb3b:dbce:a119/64 scope link
       valid_lft forever preferred_lft forever

Verify IP Failover.

Shut down the primary server (Sensor Gateway Server 1) and check if IPs automatically assign to the secondary server.
```
ip addr show eth0
```
Start the Sensor Gateway Server 1 and stop the Sensor Gateway Server 2.
```
ip addr show eth0
```
IPs automatically assign to the primary server.

Check the log files to ensure the IP failover verification is working.

tail -f /var/log/syslog

Sample output:

Feb  7 17:24:51 tecadmin Keepalived_healthcheckers[23177]: Registering Kernel netlink reflector
Feb  7 17:24:51 tecadmin Keepalived_healthcheckers[23177]: Registering Kernel netlink command channel
Feb  7 17:24:51 tecadmin Keepalived_healthcheckers[23177]: Opening file '/etc/keepalived/keepalived.conf'.
Feb  7 17:24:51 tecadmin Keepalived_healthcheckers[23177]: Configuration is using : 11104 Bytes
Feb  7 17:24:51 tecadmin Keepalived_healthcheckers[23177]: Using LinkWatch kernel netlink reflector...
Feb  7 17:24:52 tecadmin Keepalived_vrrp[23178]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb  7 17:24:53 tecadmin Keepalived_vrrp[23178]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb  7 17:24:53 tecadmin avahi-daemon[562]: Registering new address record for 192.168.10.121 on eth0.IPv4.

Appliance storage space is fully occupied by logs from /var/log/auth.log