vRealize Log Insight provides you with two sets of notifications about system health, general notifications, applicable for all product configurations, and notifications related to clusters for cluster-based deployments.

The following tables list and describe system notifications for vRealize Log Insight.

General System Notifications

vRealize Log Insight issues notifications about conditions that might require administrative intervention, including archival failure or alert scheduling delays.

Notification Name

Description

Oldest Data Will Be Unsearchable Soon

vRealize Log Insight is expected to start decommissioning old data from the virtual appliance storage based on the expected size of searchable data, storage space, and the current ingestion rate. Data that has been rotated out is archived if you have configured archiving, or deleted if you have not.

To address this, add storage or adjust the retention notification threshold. For more information, see Configure vRealize Log Insight to Send Health Notifications.

The notification is sent after each restart of the vRealize Log Insight service.

Repository Retention Time

A retention period is the length of time data is retained on the local disk of your vRealize Log Insight instance. A retention period is determined by the amount of data the system can hold and the current ingestion rate. For example, if you are receiving 10 GB/day of data (after indexing) and you have 300 GB of space, then your retention rate is 30 days.

When your storage limit is reached, old data is removed to make way for newly ingested data. This notification tells you when the amount of searchable data that vRealize Log Insight can store at the current ingestion rates exceeds the storage space that is available on the virtual appliance.

You could run out of storage before the time period set with the Retention Notification Threshold. Add storage or adjust the retention notification threshold.

Dropped Events

vRealize Log Insight failed to ingest all incoming log messages.

  • In case of a TCP Message drops, as tracked by vRealize Log Insight server, a system notification is sent as follows:

    • Once a day

    • Each time the vRealize Log Insight service is restarted, manually or automatically

  • The email contains the number of messages dropped since last notification email was sent and total message drops since the last restart of vRealize Log Insight.

Note that the time in the sent line is controlled by the email client, and is in the local time zone, while the email body displays UTC time.

Corrupt Index Buckets

Part of the on-disk index is corrupt. A corrupt index usually indicates serious issues with the underlying storage system. The corrupt part of the index is excluded from serving queries. A corrupt index affects the ingestion of new data. vRealize Log Insight checks the integrity of the index upon service start-up. In case of detected corruption, vRealize Log Insight sends a system notification as follows:

  • Once a day

  • Each time the vRealize Log Insight service is restarted, manually or automatically

Out of Disk

vRealize Log Insight is running out of allocated disk space. vRealize Log Insight has most probably run into a storage-related issue.

Archive Space Will Be Full

The disk space on the NFS server used for archiving vRealize Log Insight data will be used up soon.

Total Disk Space Change

The total size of the partition for vRealize Log Insight data storage has decreased. This usually signals a serious issue in the underlying storage system. When vRealize Log Insight detects the condition it sends this notification as follows:

  • Immediately

  • Once a day

Pending Archivings

vRealize Log Insight cannot archive data as expected. The notification usually indicates problems with the NFS storage that you configured for data archiving.

License is about to be expired

The license for vRealize Log Insight is about to expire.

License is expired

The license for vRealize Log Insight has expired.

Unable to connect to AD server

vRealize Log Insight is unable to connect to the configured Active Directory server.

Cannot take over High Availability IP address [IP Address] as it is already held by another machine

The vRealize Log Insight cluster was unable to take over the configured IP Address for the Integrated Load Balancer (ILB). The most common reason for this notification is that another host within the same network holds the IP address, and therefore the IP address is not available to be taken over by the cluster.

You can resolve this conflict by either releasing the IP address from the host that currently holds it, or configuring Log Insight Integrated Load Balancer with a Static IP address that is available in the network. When changing the ILB IP address, remember to reconfigure all clients to send logs to the new IP address, or to a FQDN/URL that resolves to this IP address. You must also unconfigure and reconfigure every vCenter Server integrated with vRealize Log Insight from the vSphere integration page.

High Availability IP address [IP Address] is unavailable due to too many node failures

The IP Address configured for the Integrated Load Balancer (ILB) is unavailable. This means that clients trying to send logs to a vRealize Log Insight cluster via the ILB IP address or a FQDN/URL that resolves to this IP address will see it as unavailable. The most common reason for this notification is that a majority of the nodes in the vRealize Log Insight cluster are unhealthy, unavailable, or unreachable from the master node. Another common reason is that NTP time synchronization has not been enabled, or the configured NTP servers have significant time drift between each other. You can confirm that the problem is still ongoing by trying to ping (if allowed) the IP address to verify that it is not reachable.

You can resolve this problem by ensuring a majority of your cluster nodes are healthy and reachable, and enabling NTP time synchronization to accurate NTP servers.

Too many migrations of High Availability IP address [your IP Address] between vRealize Log Insight nodes

The IP address configured for the Integrated Load Balancer (ILB) has migrated too many times within the last 10 minutes.

Under normal operation, the IP address rarely moves between ss might move if the current owner node is restarted or put in maintenance. The other reason can be lack of time synchronization between Log Insight cluster nodes, which is essential for proper cluster functioning. In case of latter, you can fix the problem by enabling NTP time synchronization to accurate NTP servers.

SSL Certificate Error

A syslog source has initiated a connection to vRealize Log Insight over SSL but ended the connection abruptly. This may indicate that the syslog source was unable to confirm the validity of the SSL certificate. In order for vRealize Log Insight to accept syslog messages over SSL, a certificate that is validated by the client is required and the clocks of the systems must be synchronized. There may be an issue with the SSL Certificate or with the Network Time Service.

You can validate that the SSL Certificate is trusted by your syslog source, reconfigure the source not to use SSL, or reinstall the SSL Certificate. See Configure the vRealize Log Insight Agent SSL Parameters and Install a Custom SSL Certificate.

vCenter collection failed

vRealize Log Insight is unable to collect vCenter events, tasks, and alarms. To look for the exact error that caused the collection failure and to see if collection is working currently, look in the /storage/var/loginsight/plugins/vsphere/li-vsphere.log file.

Event Forwarder Events Dropped

A forwarder drops events because of connection or overload issues.

Example:

Log Insight Admin Alert: Event Forwarder Events Dropped 
This alert is about your Log Insight installation on https://<your_url>

Event Forwarder Events Dropped triggered at 2016-08-02T18:41:06.972Z

Log Insight just dropped 670 events for forwarder target 'Test',
reason: Pending queue is full.

Alert Queries Behind Schedule

vRealize Log Insight was unable to run a user-defined alert at its configured time. The reason for the delay may be because of one or more inefficient user-defined alerts or because the system is not properly sized for the ingestion and query load.

Auto Disabled Alert

If a user-defined alert has run at least ten times and its average run time is more than one hour, then the alert is deemed to be inefficient and is disabled to prevent impacting other user-defined alerts.

Inefficient Alert Query

If a user-defined alert takes more than one hour to complete, then the alert is deemed to be inefficient.

System Notifications for Clusters

vRealize Log Insight sends notifications about cluster topology changes, including the addition of new cluster members or transient node communication problems.

Sent by

Notification Name

Description

Master node

Approval needed for new worker node

A worker node is sending a request to join a cluster. An Admin user needs to approve or reject the request.

Master node

New worker node approved

An Admin user approved a membership request from a worker node to join a vRealize Log Insight cluster.

Master node

New worker node denied

An Admin user rejected a membership request from a worker node to join a vRealize Log Insight cluster. If the request was denied by mistake, an Admin user can place the request again from the worker and then approve it at the master node.

Master node

Maximum supported nodes exceeded due to worker node

The number of worker nodes in the Log Insight cluster has exceeded the maximum supported count due to a new worker node.

Master node

Allowed nodes exceeded, new worker node denied

An Admin user attempted to add more nodes to the cluster than the maximum allowed node count and the node has been denied.

Master node

Worker node disconnected

A previously connected worker node disconnected from the vRealize Log Insight cluster.

Master node

Worker node reconnected

A worker node reconnected to the vRealize Log Insight cluster.

Master node

Worker node revoked by admin

An Admin user revoked a worker node membership and the node is no longer a part of the vRealize Log Insight cluster.

Master node

Unknown worker node rejected

The vRealize Log Insight master node rejected a request by a worker node because the worker node is unknown to the master. If the worker is a valid node and it should be added to the cluster, log in to the worker node, remove its token file and user configuration at /storage/core/loginsight/config/, and run restart loginsight service on the worker node.

Master node

Worker node has entered into maintenance mode

A worker node entered into maintenance mode and an Admin user has to remove the worker node from maintenance mode before it can receive configuration changes and serve queries.

Master node

Worker node has returned to service

A worker node exited maintenance mode and returned to service.

Worker node

Master failed or disconnected from worker node

The worker node that sends the notification is unable to contact the vRealize Log Insight master node. This might indicate that the master node failed, and might need to be restarted. If the master node failed, the cluster cannot be configured and queries cannot be submitted until it is back online. Worker nodes continue to ingest messages.

Note:

You might receive many such notifications because many workers might detect the master node failure independently and raise notifications.

Worker node

Master connected to worker node

The worker node that sends the notification is reconnected to the vRealize Log Insight master node.