External Health Monitor

This section covers the specific configuration for external health monitor type.

The external monitor type allows you to write scripts to provide highly customized and granular health checks. The scripts can be Linux shell, Python, or Perl, which can be used to execute wget, netcat, curl, snmpget, mysql-client, or dig. External monitors have constrained access to resources, such as CPU and memory to ensure the normal functioning of NSX Advanced Load Balancer Service Engines. As with any custom scripting, thoroughly validate the long-term stability of the implemented script before pointing it at production servers.

You can view the errors generated from the script in the output by navigating to Operations > Events log.

NSX Advanced Load Balancer includes three sample scripts through the System-Xternal Perl, Python, and Shell monitors.

Note:

NSX Advanced Load Balancer supports IPv6 external health monitors. External health monitor for RADIUS does not support IPv6.

As a best practice, clean up any temporary files created by scripts.

While building an external monitor, you need to manually test the successful execution of the commands. To test a command from an SE, it may be necessary to switch to the proper namespace or tenant. The production external monitor will correctly use the proper tenant.

Creating or Editing External Health Monitor

You can edit any of the external health monitors by clicking the required options and then clicking the edit icon:

System-Xternal-Perl
System-Xternal-Python
System-Xternal-Shell

To create a new External health monitor, click Create button. Select External option as the Type of health monitor. The following screen is displayed:

You can specify the following details related to External settings:


Field	Description
Upload or Paste Script Code	To input the script, do either one of the following: Click IMPORT FILE and select the required file. The script from the file is automatically pasted to the text box. Copy the script and paste it in the text box available.
Script Parameters	Specify the optional arguments to feed into the script. These strings are passed in as arguments to the script, such as $1 = username, $2 = password.
Health Monitor Port	Enter the port to override the port defined in the server pool. To use the server port, enter 0.
Script Variables	Specify the custom environment variables to be fed into the script to simplify re-usability. For instance, a script that authenticates to the server may have a variable set to USER=test. The variables that are included by default are `$IP`, `$PORT`, `$HM_NAME`, and `$POOL`.
Script Variables	HOSTNAME of GSLB service member: When a GSLB service member is configured as FQDN and GSLB service is using external health monitors, member FQDN will be passed as an environmental variable named HOSTNAME to the external health monitor script. This variable is available by default and can be used to fine-tune the health monitoring for each member of GSLB Service.

Sample Scripts

In the SharePoint monitor example below, the script includes a grep "200 OK". If this is found, this data is returned and the monitor exits as success. If the grep does not find this string, no data is returned and the monitor marks the server down.

MySQL Example Script

#!/bin/bash
#mysql --host=$IP --user=root --password=s3cret! -e "select 1"

SharePoint Example Script

#!/bin/bash
#curl http://$IP:$PORT/Shared%20Documents/10m.dat -I -L --ntlm -u $USER:$PASS -I -L > /run/hmuser/$HM_NAME.out 2>/dev/null
curl http://$IP:$PORT/Shared%20Documents/10m.dat -I -L --ntlm -u $USER:$PASS -I -L | grep "200 OK"

postgresql Example Script

Example 1:

In this example, the script makes NSX Advanced Load Balancer SE to query the database. On getting successful response, NSX Advanced Load Balancer SE marks the server UP, else it marks the server DOWN.

#!/bin/bash
#exporting username's password
export PGPASSWORD='password123'
psql -U aviuser -h $IP -p $PORT -d aviuser -c "SELECT * FROM employees"

Example 2:

In this example, the script makes the NSX Advanced Load Balancer SE to query the database and parse the response for cell present at the provided row, column and match it to the provided string. If it is matched, then the server will be marked as up, else the server will be marked DOWN.

#!/bin/bash
#example script for
#string match to cell present at row,column of query response
row=2
column=2
match_string="bob"
#exporting username's password
export PGPASSWORD='password123'
response="$(psql --field-separator=' ' -t --no-align -U aviuser -h $IP -p $PORT -d aviuser -c "SELECT * FROM employees")"
str="$(awk -v r="$row" -v c="$column" 'FNR == r {print $c}' <<< "$response")"
if [ "$str" = "$match_string" ]; then
    echo "Matched"

Oracle Example Script:

#!/usr/bin/python
import sys
import os
import cx_Oracle
IP=os.environ['IP']
conn_str='HR_user/HR_pw@' + IP  + '/hr_db'
connection = cx_Oracle.connect(conn_str)
cursor = connection.cursor()
cursor.execute('select * from JOBS')
for row in cursor:
    print row
connection.close()

Oracle Script Variables:

LD_LIBRARY_PATH=/usr/lib/oracle/12.2/client64/lib

TNS_ADMIN=/run/hmuser

RADIUS Example Script

The below example performs an Access-Request using PAP authentication against the RADIUS pool member and checks for an Access-Accept response.

#!/usr/bin/python3 
import os 
import radius 
try: 
    r = radius.Radius(os.environ['RAD_SECRET'], 
                      os.environ['IP'], 
                      port=int(os.environ['PORT']), 
                      timeout=int(os.environ['RAD_TIMEOUT'])) 
    if r.authenticate(os.environ['RAD_USERNAME'], os.environ['RAD_PASSWORD']): 
        print('Access Accepted') 
except: 
    pass

RAD_SECRET, RAD_TIMEOUT, RAD_USERNAME and RAD_PASSWORD can be passed in the health monitor script variables, for example:

RAD_SECRET=foo123 RAD_USERNAME=avihealth RAD_PASSWORD=bar123 RAD_TIMEOUT=1

Applications like curl can have different syntax for v4 and v6 addresses. The external health monitor scripts must be aware of these syntaxes. The examples are as follows:

Using Domain Names

Starting with NSX Advanced Load Balancer 21.1.3, to resolve domain names, DNS Resolution on Service Engine must be configured.

EXT_HM=exthm.example.com
curl <http://$EXT_HM:8123/path/to/resource> | grep "200 OK"```

*Shell Script Example for IPV6 Support*

Shell Script Example for IPV6 Support

#!/bin/bash
#curl -v $IP:$PORT >/run/hmuser/$HM_NAME.$IP.$PORT.out
if [[ $IP =~ : ]];
then curl -v [$IP]:$PORT;
else curl -v $IP:$PORT;
fi

perl Script Example for IPV6 Support

#!/usr/bin/perl -w
my $ip= $ARGV[0];
my $port = $ARGV[1];
my $curl_out;
if ($ip =~ /:/) {
$curl_out = `curl -v "[$ip]":"$port" 2>&1`;
} else {
$curl_out = `curl -v "$ip":"$port" 2>&1`;
}
if (index($curl_out, "200 OK") != -1) {
    print "Server is up";
}

Note:

External health monitoring (LDAP) does not support SNAT IP because external HM runs from the Kernel namespace.

Handling Errors

The external health monitor logs error messages that explicitly mention the cause for failure. For example,

Unexpected response code, received: [int] expected: [int]
Unexpected redirect URL: [str]
Application server down
Since Springboard application is unavailable, NSX Advanced Load Balancer introduces the tag ext_hm_usr_err_msg to display specific custom error message, as required. The external health monitor script returns the response output and if this data contains ext_hm_usr_err_msg tag, then the server is marked down with the reason External HM failed with error.

Consider this example to understand how the error is handled in NSX Advanced Load Balancer. This is an external health monitor in Python script to set up an HTTP connection.

#!/usr/bin/python3
import sys
import http.client

try:
    conn = http.client.HTTPConnection(sys.argv[1]+':'+sys.argv[2])
    conn.request("HEAD", "/index.html")
except Exception as e: 
    print("ext_hm_usr_err_msg: Http get request Failed with " + str(e))
    exit()

r1 = conn.getresponse()
if r1.status == 200:
    print(r1.status, r1.reason)
else:
    print("ext_hm_usr_err_msg:"+str(r1.status)+","+r1.reason)

There are two possible outcomes.

If the HTTP connection is not established then, the error that will be reported with External HM failed with error. In the response string, this is printed with the reason, for example connection refused.
If the connection is established, and NSX Advanced Load Balancer gets a response, but the response is not 200, then the error is still generated.

The custom script can be modified, as required.

From the image, the ext_hm_usr_err_msg tag is displayed with the error. Here, the error is HTTP get request failed with (Errno 111) Connection refused.

The server is marked down with the reason 404, Not found as shown below:

List of SE Packages

The following are the scripting languages:

Bash (shell script)
Perl
Python

The following are the Linux packages:

curl
snmp
dnsutils
libpython2.7
python-dev
mysql-client
nmap
freetds-dev
freetds-bin
ldapsearch
postgresql-client

The following are the Python packages:

pymssql
cx_Oracle (and related libraries for Oracle Database 12c)
py-radius

NTP Health Monitor Example using netcat program

nc -zuv pool.ntp.org 123 2>&1 | grep "(ntp) open"

The sample configuration for using a native perl script is as follows:

#!/usr/bin/perl 
# ntpdate.pl

# this code will query a ntp server for the local time and display
# it.  it is intended to show how to use a NTP server as a time
# source for a simple network connected device.

# 
# For better clock management see the offical NTP info at:
# http://www.eecis.udel.edu/~ntp/
#

# written by Tim Hogard ([email protected])
# Thu Sep 26 13:35:41 EAST 2002
# this code is in the public domain.
# it can be found here http://www.abnormal.com/~thogard/ntp/

$HOSTNAME=shift;
$HOSTNAME="192.168.1.254" unless $HOSTNAME ;	# our NTP server
$PORTNO=123;			# NTP is port 123
$MAXLEN=1024;			# check our buffers

use Socket;

#we use the system call to open a UDP socket
socket(SOCKET, PF_INET, SOCK_DGRAM, getprotobyname("udp")) or die "socket: $!";

#convert hostname to ipaddress if needed
$ipaddr   = inet_aton($HOSTNAME);
$portaddr = sockaddr_in($PORTNO, $ipaddr);

# build a message.  Our message is all zeros except for a one in the protocol version field
# $msg in binary is 00 001 000 00000000 ....  or in C msg[]={010,0,0,0,0,0,0,0,0,...}
#it should be a total of 48 bytes long
$MSG="\01

Note:

The ntpdate or ntpq programs are not packaged in the Service Engine, and hence cannot be used currently.

Upgrade to Python 3.0

Starting with the NSX Advanced Load Balancer release 20.1.1, the NSX Advanced Load Balancer Controller and Service Engines use Python 3.0.

The external Python health monitors must be converted to Python 3.0 syntax as part of upgrade procedure.

Before initiating the upgrade to NSX Advanced Load Balancer release 20.1.1, execute the following steps:

Identify the external Health Monitors using Python.
Remove the health monitors, or replace them with a non-Python health monitor.
Ensure that the health monitor script is modified to Python 3.0 syntax.

Steps Post Upgrade

After upgrading to NSX Advanced Load Balancer release 20.1.1, execute the following steps:

Replace the existing (Python 2.7) health monitor script with the Python 3 script.
Re-apply the health monitor to the required pools, and remove the temporary non-Python health monitor (if configure).