VMware Aria Operations for Logs provides you with two sets of notifications about system health, general notifications, applicable for all product configurations, and notifications related to clusters for cluster-based deployments.

To view system notifications, expand the main menu and navigate to Alerts > System Alerts. With appropriate permissions, you can activate or deactivate the notifications. For more information, see View and Manage Alerts in Using VMware Aria Operations for Logs.

Note: In this topic, an admin user refers to a user associated with the Super Admin role, or a role that has the relevant permissions, as described in Create and Modify Roles.

The following tables list and describe system notifications for VMware Aria Operations for Logs.

General System Notifications

VMware Aria Operations for Logs issues notifications about conditions that might require administrative intervention, including archival failure or alert scheduling delays.

Notification Name Description
Oldest data will be unsearchable soon

VMware Aria Operations for Logs is expected to start decommissioning old data from the virtual appliance storage based on the expected size of searchable data, storage space, and the current ingestion rate. Data that has been rotated out is archived if you have configured archiving, or deleted if you have not.

To address this, add storage or adjust the retention notification threshold. For more information, see Configure VMware Aria Operations for Logs to Send Health Notifications.

The notification is sent after each restart of the VMware Aria Operations for Logs service.

Repository retention time

A retention period is the length of time data is retained on the local disk of your VMware Aria Operations for Logs instance. A retention period is determined by the amount of data the system can hold and the current ingestion rate. For example, if you are receiving 10 GB/day of data (after indexing) and you have 300 GB of space, then your retention rate is 30 days.

When your storage limit is reached, old data is removed to make way for newly ingested data. This notification tells you when the amount of searchable data that VMware Aria Operations for Logs can store at the current ingestion rates exceeds the storage space that is available on the virtual appliance.

You might run out of storage before the time period set with the Retention Notification Threshold. Add storage or adjust the retention notification threshold.

Dropped events

VMware Aria Operations for Logs failed to ingest all incoming log messages.

  • If a TCP Message drops, as tracked by VMware Aria Operations for Logs server, a system notification is sent as follows:
    • Once a day
    • Each time the VMware Aria Operations for Logs service is restarted, manually or automatically
  • The email contains the number of messages dropped since last notification email was sent and total message drops since the last restart of VMware Aria Operations for Logs.
Note: The time in the sent line is controlled by the email client, and is in the local time zone, while the email body displays the UTC time.
Corrupt index buckets

Part of the on-disk index is corrupt. A corrupt index usually indicates serious issues with the underlying storage system. The corrupt part of the index is excluded from serving queries. A corrupt index affects the ingestion of new data. VMware Aria Operations for Logs checks the integrity of the index upon service start-up. If corruption is detected, VMware Aria Operations for Logs sends a system notification as follows:

  • Once a day
  • Each time the VMware Aria Operations for Logs service is restarted, manually or automatically
Out of disk

VMware Aria Operations for Logs is running out of allocated disk space. VMware Aria Operations for Logs has most probably run into a storage-related issue.

Archive space will be full The disk space on the NFS server used for archiving VMware Aria Operations for Logs data will be used up soon. If the amount of archived data that the NFS server can hold at the current ingestion rate is less than seven days, a system notification is sent. For example, if you are archiving with a disk consumption rate of 708.9 MB per day of data and you have 2000 MB space, you have about three days of capacity, which is less than the threshold. In this case, you will receive a notification that you are below this capacity.
Total disk space change

The total size of the partition for the VMware Aria Operations for Logs data storage has decreased. This notification usually signals a serious issue in the underlying storage system. When VMware Aria Operations for Logs detects the condition, it sends this notification as follows:

  • Immediately
  • Once a day
Pending archivings VMware Aria Operations for Logs cannot archive data as expected. The notification usually indicates problems with the NFS storage that you configured for data archiving.
Allocated log record storage volume reached 75 percent of the maximum log record storage capacity VMware Aria Operations for Logs is configured to ensure STIG compliance, and the allocated log record storage volume reaches 75 percent of the maximum log record storage capacity of the repository.
Note: This notification is sent per node.
License is about to expire The license for VMware Aria Operations for Logs is about to expire.
License is expired The license for VMware Aria Operations for Logs has expired.
SSL certificate is about to expire The SSL certificate for the VMware Aria Operations for Logs cluster will expire in 30 days.
Unable to connect to AD server VMware Aria Operations for Logs is unable to connect to the configured Active Directory server.
Cannot take over High Availability IP address [IP Address] as it is already held by another machine The VMware Aria Operations for Logs cluster was unable to take over the configured IP Address for the Integrated Load Balancer (ILB). The most common reason for this notification is that another host within the same network holds the IP address, and therefore the IP address is not available to be taken over by the cluster.

You can resolve this conflict by either releasing the IP address from the host that currently holds it, or configuring VMware Aria Operations for Logs Integrated Load Balancer with a Static IP address that is available in the network. When changing the ILB IP address, you must reconfigure all clients to send logs to the new IP address, or to a FQDN/URL that resolves to this IP address. You must also unconfigure and reconfigure every vCenter Server integrated with VMware Aria Operations for Logs from the vSphere integration page.

High Availability IP address [IP Address] is unavailable due to too many node failures The IP Address configured for the Integrated Load Balancer (ILB) is unavailable. Clients trying to send logs to a VMware Aria Operations for Logs cluster through the ILB IP address or a FQDN/URL that resolves to this IP address will see it as unavailable. The most common reason for this notification is that most of the nodes in the VMware Aria Operations for Logs cluster are unhealthy, unavailable, or unreachable from the primary node. Another common reason is that NTP time synchronization has not been activated, or the configured NTP servers have a significant time drift between each other. You can confirm that the problem is still ongoing by trying to ping (if allowed) the IP address to verify that it is not reachable.

You can resolve this problem by ensuring that most of your cluster nodes are healthy and reachable, and enabling NTP time synchronization to accurate NTP servers.

Too many migrations of High Availability IP address [your IP Address] between VMware Aria Operations for Logs nodes The IP address configured for the Integrated Load Balancer (ILB) has migrated too many times within the last 10 minutes.

Under normal operation, the IP address rarely moves between VMware Aria Operations for Logs cluster nodes. However, the IP address might move if the current owner node is restarted or put in maintenance. The other reason can be the lack of time synchronization between VMware Aria Operations for Logs cluster nodes, which is essential for proper cluster functioning. For the latter, you can fix the problem by enabling NTP time synchronization to accurate NTP servers.

SSL certificate error

A syslog source has initiated a connection to VMware Aria Operations for Logs over SSL but ended the connection abruptly. This notification might indicate that the syslog source was unable to confirm the validity of the SSL certificate. In order for VMware Aria Operations for Logs to accept syslog messages over SSL, a certificate that is validated by the client is required and the clocks of the systems must be synchronized. There might be a problem with the SSL Certificate or with the Network Time Service.

You can validate that the SSL Certificate is trusted by your syslog source, reconfigure the source not to use SSL, or reinstall the SSL Certificate. See Configure the VMware Aria Operations for Logs Agent SSL Parameters and Install a Custom SSL Certificate.

vCenter collection failed

VMware Aria Operations for Logs is unable to collect VMware vCenter events, tasks, and alarms. To look for the exact error that caused the collection failure and to see if collection is working currently, look in the /var/log/vmware/loginsight/plugins/vsphere/li-vsphere.log file.

vCenter Kubernetes Service event collection failed

VMware Aria Operations for Logs is unable to collect VMware vCenter Kubernetes System events, tasks, and alarms. To look for the exact error that caused the collection failure and to see if collection is working currently, look in the /var/log/vmware/loginsight/plugins/vsphere/li-vsphere.log file.

Event forwarder events dropped

A forwarder drops events because of connection or overload problems.

Example:

Operations for Logs Admin Alert: Event Forwarder Events Dropped 
This alert is about your Operations for Logs installation on https://<your_url>

Event Forwarder Events Dropped triggered at 2016-08-02T18:41:06.972Z

Operations for Logs just dropped 670 events for forwarder target 'Test',
reason: Pending queue is full.
Alert queries behind schedule

VMware Aria Operations for Logs was unable to run a user-defined alert at its configured time. The reason for the delay might be because of one or more inefficient user-defined alerts or because the system is not properly sized for the ingestion and query load.

Auto deactivated alert

If a user-defined alert has run at least 10 times and its average run time is more than one hour, the alert is considered inefficient and is deactivated to prevent impacting other user-defined alerts.

Inefficient alert query

If a user-defined alert takes more than one hour to finish, then the alert is deemed to be inefficient.

New user created or user logged in for the first time VMware Aria Operations for Logs is configured to ensure STIG compliance, and a new user is created or an Active Directory or VMware Workspace ONE Access user logs in for the first time.

System Notifications for Clusters

VMware Aria Operations for Logs sends notifications about cluster topology changes, including the addition of new cluster members or transient node communication problems.

Sent by Notification Name Description
Primary node Approval needed for new worker node

A worker node is sending a request to join a cluster. An Admin user must approve or reject the request.

Primary node New worker node approved

An Admin user approved a membership request from a worker node to join a VMware Aria Operations for Logs cluster.

Primary node New worker node denied

An Admin user rejected a membership request from a worker node to join a VMware Aria Operations for Logs cluster. If the request was denied by mistake, an Admin user can place the request again from the worker and then approve it at the primary node.

Primary node Maximum supported nodes exceeded due to worker node

The number of worker nodes in the VMware Aria Operations for Logs cluster has exceeded the maximum supported count due to a new worker node.

Primary node Allowed nodes exceeded, new worker node denied

An user attempted to add more nodes to the cluster than the maximum allowed node count and the node has been denied.

Primary node Worker node disconnected

A previously connected worker node disconnected from the VMware Aria Operations for Logs cluster.

Primary node Worker node reconnected

A worker node reconnected to the VMware Aria Operations for Logs cluster.

Primary node Worker node revoked by

An Admin user revoked a worker node membership and the node is no longer a part of the VMware Aria Operations for Logs cluster.

Primary node Unknown worker node rejected

The VMware Aria Operations for Logs primary node rejected a request by a worker node because the worker node is unknown to the primary. If the worker is a valid node and it should be added to the cluster, log in to the worker node, remove its token file and user configuration at /storage/core/loginsight/config/, and run restart loginsight service on the worker node.

Primary node Worker node has entered into maintenance mode

A worker node entered into maintenance mode and an Admin user has to remove the worker node from maintenance mode before it can receive configuration changes and serve queries.

Primary node Worker node has returned to service

A worker node exited maintenance mode and returned to service.

Worker node Primary failed or disconnected from worker node

The worker node that sends the notification is unable to contact the VMware Aria Operations for Logs primary node. This notification might indicate that the primary node failed, and might need to be restarted. If the primary node failed, the cluster cannot be configured and queries cannot be submitted until it is back online. Worker nodes continue to ingest messages.

Note: You might receive many such notifications because many workers might detect the primary node failure independently and raise notifications.
Worker node Primary connected to worker node

The worker node that sends the notification is reconnected to the VMware Aria Operations for Logs primary node.