To quickly view any notification alerts, use the Notifications label on the header bar of the Command Center dashboard.
When you open the Notifications window, unread notifications are highlighted in light blue. The alerts are ordered by time and are kept for 7 days. When you close the Notifications window, all alerts are considered read.
To customize the alert notifications, go to the Alerts tab of the Admin view, where an administrator can set up alert rules. The rules help detect and respond to events that occur in the VMware Greenplum system and in currently executing database queries. When a rule is matched, Command Center logs a record.
You can set up email alerts by configuring an SMTP server in VMware Greenplum or in Command Center. Additionally, you can create a send_alert.sh
shell script to forward alerts to other destinations, such as an SMS gateway or a Slack channel. If the script is present, Command Center runs it whenever an alert is raised.
Command Center creates the gpmetrics
schema in the gpperfmon database to store both rules and log records. See gpmetrics Schema Reference for information about the gpcc_alert_rule
and gpcc_alert_log
tables in the gpmetrics
schema.
This topic contains the following subtopics:
Click EDIT to manage alert event rules. To enable an alert rule, enter any data required in the fields and check the box. Uncheck the box to disable the rule. Click SAVE when you have finished making changes to the alert configuration.
Database connectivity failure
An alert is raised when either of these conditions is detected:
Number of connections exceeds [N]
An alert is raised when the total number of database connections exceeds the number specified. The number of connections is checked every 30 seconds. After an alert is raised, the metrics collector checks the number of connections every 30 minutes until the number of connections drops below the threshold, and then it resumes checking every 30 seconds.
Coordinator Panic
An alert is raised when messages with PANIC severity are detected at the coordinator host. An PANIC error causes all database sessions to abort.
Coordinator Fatal
An alert is raised when messages with FATAL severity are detected at the coordinator host. A FATAL error causes the current session to abort. To reduce noise, fatal messages logged due to password authentication errors do not raise alerts.
Coordinator CPU exceeds [%] for [N] min
An alert is raised when the CPU usage on the coordinator host exceeds the specified percentage for the specified number of minutes. Command Center samples CPU usage on the coordinator host every 15 seconds and calculates the mean of the samples.
Coordinator Memory exceeds [%] for [N] min
An alert is raised when the percent of memory used on the coordinator host exceeds the specified percentage for the specified number of minutes. Command Center samples memory usage on the coordinator host every 15 seconds and calculates the mean of the samples. Only memory in use is considered; memory for buffers and cache is not included.
Segment failure
An alert is raised when one or more failed segments are detected. After the alert email is raised, Command Center will raise the alert every 30 minutes until the segments are recovered.
Total disk space exceeds [%] on a segment host
An alert is raised when the total of disk space in use for all segment hosts exceeds the specified percentage. Command Center gathers the available disk space and total disk space from each segment host in the VMware Greenplum cluster. The percent of total disk space in use is calculated by the following formula:
100 - sum(<available disk space>) / sum(<total disk space>) * 100
A disk space alert is raised no more than once every 24 hours.
Segment Hosts Avg CPU exceeds [%] for [N] min
An alert is raised when the average percent of CPU used for all segment hosts exceeds the specified percentage for the specified number of minutes. Command Center samples all segment hosts every 15 seconds and calculates the mean of the samples.
Segment Hosts Avg Memory exceeds [%] for [N] min
An alert is raised when the average memory for all segment hosts exceeds the specified percentage for the specified number of minutes. Command Center samples all segment hosts every 15 seconds and calculates the mean of the samples. Only memory in use is considered; memory for buffers and cache is not included.
Out of memory errors
An alert is raised when an executing query fails with an out of memory (OOM) error. Note that no alert is raised if there is insufficient memory to start the query.
Spill files for a query exceeds [N] GB
An alert is raised when the total disk space consumed by a running query’s spill files exceeds the specified number of gigabytes. An alert is raised only once per query.
Query runtime exceeds [N] min
An alert is raised when a query runtime exceeds the number of minutes specified. This alert is raised just once for a query.
Query is blocked for [N] min
An alert is raised if a query remains in a blocked state for longer than the specified number of minutes. If an alert is raised, and then the query unblocks, runs, and blocks again for the specified time, an additional alert is raised. Blocked time excludes the time a query is queued before it runs. It is possible for a “Query runtime exceeds [N] min” rule to also trigger while a query is blocked.
Command Center requires an SMTP server to send alert emails.
Command Center first attempts an encrypted TLS connection and then falls back to an unencrypted connection if TLS is not supported. The SMTP server must support one of the following authentication methods: NONE, PLAIN, LOGIN, or CRAM-MD5. Command Center will use the most secure of these methods the SMTP server allows.
Click EDIT in the Manage email configuration panel.
The alert email configuration is set with the following VMware Greenplum server configuration parameters:
SMTP Server address
The name or IP address of the SMTP server and the SMTP port number. The port number is typically 587 for connections with TLS encryption or 465 without encryption. Example: smtp.example.com:465
Username
The username of the account to authenticate with the SMTP server. This is an optional field, only required if the SMTP server requires a username for authentication. Example: [email protected]
Password
The password for the SMTP username. For security, the password is masked. This field is optional, only needed if the SMTP server requires a username and password for authentication.
Send emails to
To add an address to the list, enter the address and press Enter. To remove an email address, click the X
on the address.
From
The email address to use for the From:
address in the alert email. Example: [email protected]
.
If you leave this field blank, Command Center uses the default value, [email protected]
.
When you click SAVE, Command Center sends a test email to the addresses in the Send emails to field. The email contains a list of the currently configured alert rules. If there is an error in the SMTP server or username/password configuration and the email cannot be sent, Command Center displays an error message.
The send alert script is a shell script that you can use to send Command Center alerts to destinations such as SMS gateways, pagers, team collaboration tools like Slack, chat servers, archive files, alternative email servers, and so on. You can use the send alert script in addition to sending email from Command Center, or as an alternative to sending alert emails from Command Center.
Command Center looks for the script $HOME/gpmetrics/send_alert.sh
on the host where Command Center is running—either the coordinator host or standby host. If the file exists and is executable by the gpadmin user, Command Center executes the script. The following variables are set on the command line when the script runs.
Variable | Description |
---|---|
LINK | URL of the Greenplum Command Center web server. |
QUERYID | ID of the query, if the alert was triggered by a query. |
SERVERNAME | Name of the Greenplum Command Center server. |
QUERYTEXT | The text of the query, if the alert was triggered by a query. |
ACTIVERULENAME | Current text of the rule, with user-specified values included. |
LOGID | Value of this alert’s id column in the gpmetrics.gpcc_alert_log table. |
RULEDESCRIPTION | Text of the rule, including user-specified values, at the time the alert was raised. |
ALERTDATE | Date the alert was raised. |
ALERTTIME | Time the alert was raised. |
SUBJECT | Subject line for email. |
An example script that you can customize is provided at $GPCC_HOME/alert-email/send_alert.sh.sample
. The example formats the alert as HTML email text and pipes it through the Linux mail
command.
To set up a send alert script:
Copy the $GPCC_HOME/alert-email/send_alert.sh.sample
file to $HOME/gpmetrics/send_alert.sh
.
Customize the script with code to format and deliver the alert to your desired destination.
Run gpcc start
to restart Command Center and enable the script.
SNMP support is discontinued in VMware Greenplum 6, but you can use a Command Center alert script to send alerts to an SNMP manager.
This example alert script sends Command Center alerts as SNMP notifications in a format similar to VMware Greenplum 5 SNMP notifications, using the SNMP OIDs for VMware Greenplum as arguments.
This is the OID hierarchy structure for VMware Greenplum:
iso(1)
identified-organization(3)
dod(6)
internet(1)
private(4)
enterprises(1)
gpdbMIB(31327)
gpdbObjects(1)
gpdbAlertMsg(1)
The following table lists the VMware Greenplum OIDs:
Name | Address | Type: Description |
---|---|---|
gpdbAlertMsg | 1.3.6.1.4.1.31327.1.1 | string: alert message text |
gpdbAlertSeverity | 1.3.6.1.4.1.31327.1.2 | integer: severity level |
gpdbAlertSqlState | 1.3.6.1.4.1.31327.1.3 | string: SQL standard error codes |
gpdbAlertDetail | 1.3.6.1.4.1.31327.1.4 | string: detailed alert message text |
gpdbAlertSqlStmt | 1.3.6.1.4.1.31327.1.5 | string: SQL statement generating this alert, if applicable |
gpdbAlertSystemName | 1.3.6.1.4.1.31327.1.6 | string: hostname |
Command Center sets environment variables that can be mapped to the gpdbAlertMsg, gpdbAlertSqlStmt, and gpdbAlertSystemName OIDs. For the other OIDs, the script sends an empty string or 0, depending on data type.
The SNMP service must be configured and enabled on the host where Command Center is running. If the snmpd
file is not present in the /usr/sbin
directory, then SNMP is not installed on the system.
To install SNMP on Red Hat and CentOS, use yum
to install the net-snmp
, net-snmp-libs
, and net-snmp-utils
packages. Configure SNMP following the operating system documentation. This example script specifies the SNMP community public
, but you can set a different community string with the rocommunity
parameter in the /etc/snmp/snmpd.conf
file.
Start the snmpd
daemon with the systemctl
command:
$ sudo systemctl enable snmpd.service
Verify the system snmpd
daemon is running with this command:
$ sudo snmpwalk -v 1 -c public localhost .1.3.6.1.2.1.1.1.0
If the daemon is running, the snmpwalk
command output is similar to the following:
SNMPv2-MIB::sysDescr.0 = STRING: Linux gp-mdw 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019 x86_64
As the gpadmin
user, edit the $HOME/gpmetrics/send_alert.sh
file.
Locate the comment that reads:
“Following is an example for sending email with above input arguments Can be replaced with any operation, such as SMS, slack, wechat, etc…”
Replace the content between that comment and the last “}” character with the following code snippet:
RUNNINT_TIME=`psql -c 'select extract(epoch from (current_timestamp - pg_postmaster_start_time())) * 100;'|sed -n '3p'|awk '{print $1}'`
HOST=`hostname`
snmptrap -v 2c -c public \
{##trapserver##} \
"${RUNNINT_TIME}" \
"enterprises.31327.5.0.1" \
SNMPv2-SMI::enterprises.31327.1.1 s "${RULEDESCRIPTION}" \
SNMPv2-SMI::enterprises.31327.1.2 i 0 \
SNMPv2-SMI::enterprises.31327.1.3 s "" \
SNMPv2-SMI::enterprises.31327.1.4 s "" \
SNMPv2-SMI::enterprises.31327.1.5 s "${QUERYTEXT}" \
SNMPv2-SMI::enterprises.31327.1.6 s "${HOST}"
Also, replace {##trapserver##}
with the name of the host running snmpd
.
Save and exit.
Make the script executable.
$ chmod +x $HOME/gpmetrics/send_alert.sh
Restart Command Center to enable the script.
$ gpcc start
To test the script, you can use the snmptrapd
trap receiver.
As root, edit the /etc/snmp/snmptrap.conf
file and add the following line to allow the daemon to receive unauthenticated notifications:
disableAuthorization yes
Start the snmptrapd
daemon with this command:
# /usr/sbin/snmptrapd -m ALL -Lf ~/snmptest.log
Check the /home/root/snmptest.log
file for notifications.