The main process (gwd) has its memory monitored by vc_process_monitor, which ensures that it never consumes more than 75% of available memory. As a result, monitoring for total system memory is done with a warning threshold of 80% and critical threshold of 90%.

You can monitor a Gateway with thresholds that provide warning or critical states which indicate potential issues prior to impacting services. The following table lists the threshold values and recommended actions.

Threshold State Threshold Value Recommended Corrective Action
Warning 80%

If the memory crosses warning threshold:

  • Collect Gateway diagnostic bundle.
  • Check per-process memory usage

Continue monitoring actively and check for increasing utilization.

Critical 90%

If the memory crosses critical threshold:

  • Monitor for possible critical packet drop which can indicate over capacity.

If the issue is observed again:

  • If over capacity is observed over a 5 minute interval, add Gateway capacity and rebalance to avoid capacity related service impact.
Note: Before rebalancing the Gateway, confirm that the capacity metrics are within the recommended limit. For more information on capacity metrics, see Capacity of Gateway Components.
The following is an example Python script for monitoring the memory usage:
Note: You can also use Telegraf to monitor the memory usage. For more information, see Monitor Gateways using Telegraf.
#!/usr/bin/env python

from optparse import OptionParser
import sys

# Parse commandline options:
parser = OptionParser(usage="%prog -w <warning threshold>% -c <critical threshold>% [ -h ]")
parser.add_option("-w", "--warning",
    action="store", type="string", dest="warn_threshold", help="Warning threshold in absolute(MB) or percentage")
parser.add_option("-c", "--critical",
    action="store", type="string", dest="crit_threshold", help="Critical threshold in ansolute(MB) or percentage")
(options, args) = parser.parse_args()

def read_meminfo():
    meminfo = {}
    for line in open('/proc/meminfo'):
        if not line: continue
        (name, value) = line.split()[0:2]
        meminfo[name.strip().rstrip(':')] = int(value)
    return meminfo

if __name__ == '__main__':
    if not options.crit_threshold:
        print "UNKNOWN: Missing critical threshold value."
        sys.exit(3)
    if not options.warn_threshold:
        print "UNKNOWN: Missing warning threshold value."
        sys.exit(3)

    is_warn_pct = options.warn_threshold.endswith('%')
    if is_warn_pct:
       warn_threshold = int(options.warn_threshold[0:-1])
    else:
       warn_threshold = int(options.warn_threshold)

    is_crit_pct = options.crit_threshold.endswith('%')
    if is_crit_pct:
       crit_threshold = int(options.crit_threshold[0:-1])
    else:
       crit_threshold = int(options.crit_threshold)

    if crit_threshold >= warn_threshold:
        print "UNKNOWN: Critical percentage can't be equal to or bigger than warning percentage."
        sys.exit(3)
    
    meminfo = read_meminfo()
    memTotal = meminfo["MemTotal"]
    memFree = meminfo["MemFree"] + meminfo["Buffers"] + meminfo["Cached"]
    memFreePct = 100.0*memFree/memTotal
    if (is_crit_pct and memFreePct <= crit_threshold) or (not is_crit_pct and memFree/1024<=crit_threshold):
        print "CRITICAL: Free memory is at %2.0f %% ( %d MB free our of %d MB total)" % (memFreePct, memFree/1024, memTotal/1024)
        sys.exit(2)
    if (is_warn_pct and memFreePct <= warn_threshold) or (not is_warn_pct and memFree/1024<=warn_threshold):
        print "WARNING: Free memory is at %2.0f %% ( %d MB free our of %d MB total)" % (memFreePct, memFree/1024, memTotal/1024)
        sys.exit(1)
    else:
        print "OK: Free memory is at %2.0f %% ( %d MB free our of %d MB total)" % (memFreePct, memFree/1024, memTotal/1024)
        sys.exit(0)