vRealize Log Insight provides you with two sets of notifications about system health, general notifications, applicable for all product configurations, and notifications related to clusters for cluster-based deployments.

To view system notifications, on the Alerts tab, click System Alerts. With appropriate permissions, you can activate or deactivate the notifications. For more information, see View and Manage Alerts in Using vRealize Log Insight.

Note: In this topic, an Admin user refers to a user associated with the Super Admin role, or a role that has the relevant permissions, as described in Create and Modify Roles.

The following tables list and describe system notifications for vRealize Log Insight.

General System Notifications

vRealize Log Insight issues notifications about conditions that might require administrative intervention, including archival failure or alert scheduling delays.

Notification Name Description
Oldest data will be unsearchable soon

vRealize Log Insight is expected to start decommissioning old data from the virtual appliance storage based on the expected size of searchable data, storage space, and the current ingestion rate. Data that has been rotated out is archived if you have configured archiving, or deleted if you have not.

To address this, add storage or adjust the retention notification threshold. For more information, see Configure vRealize Log Insight to Send Health Notifications.

The notification is sent after each restart of the vRealize Log Insight service.

Repository retention time

A retention period is the length of time data is retained on the local disk of your vRealize Log Insight instance. A retention period is determined by the amount of data the system can hold and the current ingestion rate. For example, if you are receiving 10 GB/day of data (after indexing) and you have 300 GB of space, then your retention rate is 30 days.

When your storage limit is reached, old data is removed to make way for newly ingested data. This notification tells you when the amount of searchable data that vRealize Log Insight can store at the current ingestion rates exceeds the storage space that is available on the virtual appliance.

You might run out of storage before the time period set with the Retention Notification Threshold. Add storage or adjust the retention notification threshold.

Dropped events

vRealize Log Insight failed to ingest all incoming log messages.

  • If a TCP Message drops, as tracked by vRealize Log Insight server, a system notification is sent as follows:
    • Once a day
    • Each time the vRealize Log Insight service is restarted, manually or automatically
  • The email contains the number of messages dropped since last notification email was sent and total message drops since the last restart of vRealize Log Insight.
Note: The time in the sent line is controlled by the email client, and is in the local time zone, while the email body displays the UTC time.
Corrupt index buckets

Part of the on-disk index is corrupt. A corrupt index usually indicates serious issues with the underlying storage system. The corrupt part of the index is excluded from serving queries. A corrupt index affects the ingestion of new data. vRealize Log Insight checks the integrity of the index upon service start-up. If corruption is detected, vRealize Log Insight sends a system notification as follows:

  • Once a day
  • Each time the vRealize Log Insight service is restarted, manually or automatically
Out of disk

vRealize Log Insight is running out of allocated disk space. vRealize Log Insight has most probably run into a storage-related issue.

Archive space will be full The disk space on the NFS server used for archiving vRealize Log Insight data will be used up soon. If the amount of archived data that the NFS server can hold at the current ingestion rate is less than seven days, a system notification is sent. For example, if you are archiving with a disk consumption rate of 708.9 MB per day of data and you have 2000 MB space, you have about three days of capacity, which is less than the threshold. In this case, you will receive a notification that you are below this capacity.
Total disk space change

The total size of the partition for the vRealize Log Insight data storage has decreased. This notification usually signals a serious issue in the underlying storage system. When vRealize Log Insight detects the condition, it sends this notification as follows:

  • Immediately
  • Once a day
Pending archivings vRealize Log Insight cannot archive data as expected. The notification usually indicates problems with the NFS storage that you configured for data archiving.
Allocated log record storage volume reached 75 percent of the maximum log record storage capacity vRealize Log Insight is configured to ensure STIG compliance, and the allocated log record storage volume reaches 75 percent of the maximum log record storage capacity of the repository.
Note: This notification is sent per node.
License is about to expire The license for vRealize Log Insight is about to expire.
License is expired The license for vRealize Log Insight has expired.
SSL certificate is about to expire The SSL certificate for the vRealize Log Insight cluster will expire in 30 days.
Unable to connect to AD server vRealize Log Insight is unable to connect to the configured Active Directory server.
Cannot take over High Availability IP address [IP Address] as it is already held by another machine The vRealize Log Insight cluster was unable to take over the configured IP Address for the Integrated Load Balancer (ILB). The most common reason for this notification is that another host within the same network holds the IP address, and therefore the IP address is not available to be taken over by the cluster.

You can resolve this conflict by either releasing the IP address from the host that currently holds it, or configuring Log Insight Integrated Load Balancer with a Static IP address that is available in the network. When changing the ILB IP address, you must reconfigure all clients to send logs to the new IP address, or to a FQDN/URL that resolves to this IP address. You must also unconfigure and reconfigure every vCenter Server integrated with vRealize Log Insight from the vSphere integration page.

High Availability IP address [IP Address] is unavailable due to too many node failures The IP Address configured for the Integrated Load Balancer (ILB) is unavailable. Clients trying to send logs to a vRealize Log Insight cluster through the ILB IP address or a FQDN/URL that resolves to this IP address will see it as unavailable. The most common reason for this notification is that most of the nodes in the vRealize Log Insight cluster are unhealthy, unavailable, or unreachable from the primary node. Another common reason is that NTP time synchronization has not been activated, or the configured NTP servers have a significant time drift between each other. You can confirm that the problem is still ongoing by trying to ping (if allowed) the IP address to verify that it is not reachable.

You can resolve this problem by ensuring that most of your cluster nodes are healthy and reachable, and enabling NTP time synchronization to accurate NTP servers.

Too many migrations of High Availability IP address [your IP Address] between vRealize Log Insight nodes The IP address configured for the Integrated Load Balancer (ILB) has migrated too many times within the last 10 minutes.

Under normal operation, the IP address rarely moves between vRealize Log Insight cluster nodes. However, the IP address might move if the current owner node is restarted or put in maintenance. The other reason can be the lack of time synchronization between Log Insight cluster nodes, which is essential for proper cluster functioning. For the latter, you can fix the problem by enabling NTP time synchronization to accurate NTP servers.

SSL certificate error

A syslog source has initiated a connection to vRealize Log Insight over SSL but ended the connection abruptly. This notification might indicate that the syslog source was unable to confirm the validity of the SSL certificate. In order for vRealize Log Insight to accept syslog messages over SSL, a certificate that is validated by the client is required and the clocks of the systems must be synchronized. There might be a problem with the SSL Certificate or with the Network Time Service.

You can validate that the SSL Certificate is trusted by your syslog source, reconfigure the source not to use SSL, or reinstall the SSL Certificate. See Configure the vRealize Log Insight Agent SSL Parameters and Install a Custom SSL Certificate.

vCenter collection failed

vRealize Log Insight is unable to collect vCenter events, tasks, and alarms. To look for the exact error that caused the collection failure and to see if collection is working currently, look in the /var/log/vmware/loginsight/plugins/vsphere/li-vsphere.log file.

vCenter Kubernetes Service event collection failed

vRealize Log Insight is unable to collect vCenter Kubernetes System events, tasks, and alarms. To look for the exact error that caused the collection failure and to see if collection is working currently, look in the /var/log/vmware/loginsight/plugins/vsphere/li-vsphere.log file.

Event forwarder events dropped

A forwarder drops events because of connection or overload problems.

Example:

Log Insight Admin Alert: Event Forwarder Events Dropped 
This alert is about your Log Insight installation on https://<your_url>

Event Forwarder Events Dropped triggered at 2016-08-02T18:41:06.972Z

Log Insight just dropped 670 events for forwarder target 'Test',
reason: Pending queue is full.
Alert queries behind schedule

vRealize Log Insight was unable to run a user-defined alert at its configured time. The reason for the delay might be because of one or more inefficient user-defined alerts or because the system is not properly sized for the ingestion and query load.

Auto deactivated alert

If a user-defined alert has run at least 10 times and its average run time is more than one hour, the alert is considered inefficient and is deactivated to prevent impacting other user-defined alerts.

Inefficient alert query

If a user-defined alert takes more than one hour to finish, then the alert is deemed to be inefficient.

New user created or user logged in for the first time vRealize Log Insight is configured to ensure STIG compliance, and a new user is created or an Active Directory or VMware Identity Manager user logs in for the first time.

System Notifications for Clusters

vRealize Log Insight sends notifications about cluster topology changes, including the addition of new cluster members or transient node communication problems.

Sent by Notification Name Description
Primary node Approval needed for new worker node

A worker node is sending a request to join a cluster. An Admin user must approve or reject the request.

Primary node New worker node approved

An Admin user approved a membership request from a worker node to join a vRealize Log Insight cluster.

Primary node New worker node denied

An Admin user rejected a membership request from a worker node to join a vRealize Log Insight cluster. If the request was denied by mistake, an Admin user can place the request again from the worker and then approve it at the primary node.

Primary node Maximum supported nodes exceeded due to worker node

The number of worker nodes in the Log Insight cluster has exceeded the maximum supported count due to a new worker node.

Primary node Allowed nodes exceeded, new worker node denied

An user attempted to add more nodes to the cluster than the maximum allowed node count and the node has been denied.

Primary node Worker node disconnected

A previously connected worker node disconnected from the vRealize Log Insight cluster.

Primary node Worker node reconnected

A worker node reconnected to the vRealize Log Insight cluster.

Primary node Worker node revoked by

An Admin user revoked a worker node membership and the node is no longer a part of the vRealize Log Insight cluster.

Primary node Unknown worker node rejected

The vRealize Log Insight primary node rejected a request by a worker node because the worker node is unknown to the primary. If the worker is a valid node and it should be added to the cluster, log in to the worker node, remove its token file and user configuration at /storage/core/loginsight/config/, and run restart loginsight service on the worker node.

Primary node Worker node has entered into maintenance mode

A worker node entered into maintenance mode and an Admin user has to remove the worker node from maintenance mode before it can receive configuration changes and serve queries.

Primary node Worker node has returned to service

A worker node exited maintenance mode and returned to service.

Worker node Primary failed or disconnected from worker node

The worker node that sends the notification is unable to contact the vRealize Log Insight primary node. This notification might indicate that the primary node failed, and might need to be restarted. If the primary node failed, the cluster cannot be configured and queries cannot be submitted until it is back online. Worker nodes continue to ingest messages.

Note: You might receive many such notifications because many workers might detect the primary node failure independently and raise notifications.
Worker node Primary connected to worker node

The worker node that sends the notification is reconnected to the vRealize Log Insight primary node.