How do I work with logs and log bundles in VMware Aria Automation

Various services generate logs automatically. You can generate log bundles in VMware Aria Automation. You can also configure your environment to send logs to VMware Aria Operations for Logs.

Use the --help argument in the vracli command line (for example, vracli log-bundle --help) for information about the vracli command line utility.

For related information about using VMware Aria Operations for Logs, see How do I configure log forwarding to VMware Aria Operations for Logs in VMware Aria Automation.

Log bundle commands

You can create a log bundle to contain all the logs that are generated by the services that you run. A log bundle contains all your service logs. You can use a log bundle for troubleshooting.

In a clustered environment (high availability mode), run the vracli log-bundle command on only one node. Logs are pulled from all nodes in the environment. However, in the event of a networking or other cluster issue, logs are pulled from as many nodes as can be reached. For example, if one node is disconnected in a cluster of three nodes, logs are only collected from the two healthy nodes. Output from the vracli log-bundle command contains information about any issues found and their workaround steps.

To create a log bundle, SSH to any node and run the following vracli command:
vracli log-bundle
To change the timeout value for collecting logs from each node, run the following vracli command:
vracli log-bundle --collector-timeout $CUSTOM_TIMEOUT_IN_SECONDS
For example, if your environment contains large log files, slow networking, or high CPU usage, you can set the timeout to greater than the 1000 second default value.
To determine the disk space being consumed by a specific service log such as ebs or vro, run the following vracli command and examine the command output:
vracli disk-mgr
To configure other options, such as assembly timeout and buffer location, use the following vracli help command:
vracli log-bundle --help

Log bundle structure

The log bundle is a timestamped tar file. The name of the bundle matches the patter log-bundle-<date>T<time>.tar file, for example log-bundle-20200629T131312.tar. Typically the log bundle contains logs from all nodes in the environment. In case of an error, it contains as many logs as possible. It minimally contains logs from the local node.

The log bundle consists of the following content:

Environment file
The environment file contains the output of various Kubernetes maintenance commands. It supplies information about current resource usage per nodes and per pods. It also contains cluster information and description of all available Kubernetes entities.
Host logs and configuration
The configuration of each host (for example, its /etc directory) and the host-specific logs (for example, journald) are collected in one directory for each cluster node or host. The directory name matches the host name of the node. The internal contents of the directory match the file system of the host. The number of directories matches the number of cluster nodes.
Services logs
Logs for Kubernetes services are located in the following folder structure:
- <hostname>/services-logs/<namespace>/<app-name>/file-logs/<container-name>.log
- <hostname>/services-logs/<namespace>/<app-name>/console-logs/<container-name>.log
An example file name is my-host-01/services-logs/prelude/vco-app/file-logs/vco-server-app.log.
- hostname is the host name of the node on which the application container is or was running. Typically, there is one instance for each node for each service. For example, 3 nodes = 3 instances.
- namespace is the Kubernetes namespace in which the application is deployed. For user-facing services, this value is prelude.
- app-name is the name of the Kubernetes application that produced the logs (for example, provisioning-service-app).
- container-name is the name of the container that produced the logs. Some apps consist of multiple containers. For example, the vco-app container includes the vco-server-app and vco-controlcenter-app containers.
(Legacy) Pod logs
While you can continue to generate pod logs in the bundle by using the vracli log-bundle --include-legacy-pod-logs command, doing so is not advised as all log information already resides in each services' logs. Including pod logs can unnecessarily increase the time and space required to generate the log bundle.

Reducing the size of the log bundle

To generate a smaller log bundle, use either of the following vracli log-bundle commands:

vracli log-bundle --since-days n
Use this command to collect only the log files that were generated over the past number of days. Otherwise, logs are retained and collected for the past 2 days. For example:
vracli log-bundle --since-days 1
vracli log-bundle --services service_A,service_B,service_C
Use this command to collect only the logs for the named provided services. For example:
vracli log-bundle --services ebs-app,vco-app
vracli log-bundle --skip-heap-dumps
Use this command to exclude all heap dumps from the generated log bundle.

Displaying logs

You can output the logs of a service pod or app by using the vracli logs <pod_name> command.

The following command options are available:

--service
Displays a merged log for all nodes of the app instead of a single pod
Example: vracli logs --service abx-service-app
--tail n
Displays the last n lines of the log. The default n value is 10.
Example: vracli logs --tail 20 abx-service-app-8598fcd4b4-tjwhk
--file
Displays only the specified file. If a file name is not provided, all files are shown.
Example: vracli logs --file abx-service-app.log abx-service-app-8598fcd4b4-tjwhk

Understanding log rotation

Regarding log rotation, recognize the following service log considerations:

All services produce logs. Service logs are stored in a dedicated /var/log/services-logs disk.
All logs are rotated regularly. Rotation occurs either hourly or when a certain size limit is reached.
All old log rotations are eventually compressed.
There is no per-service quota for log rotations.
The system retains as many logs as possible. Automation regularly checks the used disk space for logs. When the space becomes 70% full, older logs are purged until the disk space for logs reaches 60% full.
You can resize your logs disk if you need more space. See Increase VMware Aria Automation appliance disk space.

To check the logs disk space, run the following vracli commands. The free space of /dev/sdc(/var/log) should be near 30% or more for each node.

# vracli cluster exec -- bash -c 'current_node; vracli disk-mgr; exit 0'
sc1-10-182-1-103.eng.vmware.com
/dev/sda4(/):
	Total size: 47.80GiB
	Free: 34.46GiB(72.1%)
	Available(for non-superusers): 32.00GiB(66.9%)
	SCSI ID: (0:0)
/dev/sdb(/data):
	Total size: 140.68GiB
	Free: 116.68GiB(82.9%)
	Available(for non-superusers): 109.47GiB(77.8%)
	SCSI ID: (0:1)
/dev/sdc(/var/log):
	Total size: 21.48GiB
	Free: 20.76GiB(96.6%)
	Available(for non-superusers): 19.64GiB(91.4%)
	SCSI ID: (0:2)
/dev/sdd(/home):
	Total size: 29.36GiB
	Free: 29.01GiB(98.8%)
	Available(for non-superusers): 27.49GiB(93.7%)
	SCSI ID: (0:3)