Learn how to troubleshoot your VMware Tanzu Application Service for VMs [Windows] (TAS for VMs [Windows]) issues.
This section describes the issues that might occur during the installation process.
Symptom
You run the winfs-injector
and see the following error about certificates:
Get https://auth.docker.io/token?service=registry.docker.io&
scope=repository:cloudfoundry/windows2016fs:pull: x509:
failed to load system roots and no roots provided
Explanation
Local certificates are needed to communicate with Docker Hub.
Solution
Install the necessary certificates on your local machine. On Ubuntu, you can install certificates with the ca-certificates
package.
Symptom
You run the winfs-injector
and see the following error about a missing file or directory:
open ...windows2016fs-release/VERSION: no such file or directory
Explanation
You are using an outdated version of the winfs-injector
.
Solution
From the VMware Tanzu Application Service for VMs [Windows] page on VMware Tanzu Network, download the recommended version of File System Injector tool for the tile.
Symptom
You click the + icon in Ops Manager to add the TAS for VMs [Windows] tile to the Installation Dashboard and see the following error:
Explanation
The product file that you are trying to upload does not contain the Windows Server container base image.
Solution
Delete the product file listing from Ops Manager by clicking its trash can icon under Import a Product.
Follow the TAS for VMs [Windows] installation instructions to run the winfs-injector
tool locally on the product file. This step adds the Windows Server container base image to the product file, requires internet access, and can take up to 20 minutes. For more information, see Install the Tile in Installing and Configuring TAS for VMs [Windows].
Click Import a Product to upload the injected product file.
Click the + icon next to the product listing to add the TAS for VMs [Windows] tile to the Installation Dashboard.
This section describes issues that may occur during the upgrade process.
Symptom
The prestart script for the windowsfs
job fails, and the upgrade fails with the following output:
Task 308031 | 13:47:04 | Preparing deployment: Preparing deployment (00:00:03)
Task 308031 | 13:47:11 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 308031 | 13:47:21 | Updating instance windows_diego_cell: windows_diego_cell/44c5841f-7580-4e9c-9856-89fcbe08ab0d (2) (canary) (00:00:35)
L Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.
Task 308031 | 13:47:56 | Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.
Otherwise, the post-start script for the rep_windows
job fails, and the upgrade fails with the following output:
Task 8192 | 21:12:30 | Updating instance windows2019-cell: windows2019-cell/bd6d70b9-ed1f-412f-9d49-8045627f4ab3 (0) (canary) (00:17:24)
L Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.
Task 8192 | 21:29:55 | Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.
Explanation
When upgrading between versions of Windows rootfs that have a shared Microsoft base layer, TAS for VMs [Windows] may fail to create containers.
Solution
For available workarounds, see Failure to create containers when upgrading with shared Microsoft base image.
You can use Windows Diego Cell logs to troubleshoot Windows Diego Cells. Windows Diego Cells generate the following types of logs:
BOSH job logs, such as rep_windows
and consul_agent_windows
. These logs stream to the syslog server configured in the System Logging pane of the TAS for VMs [Windows] tile, along with other Ops Manager component logs. The names of these BOSH job logs correspond to the names of the logs emitted by Linux Diego Cells.
Windows event logs. These logs stream to the syslog server configured in the System Logging pane of the TAS for VMs [Windows] tile.
To forward Windows logs to an external syslog server:
Go to the Ops Manager Installation Dashboard.
Click the TAS for VMs [Windows] tile.
Select System Logging.
Under Enable syslog for VM logs?, select Enable.
Under Address, enter the hostname or IP address of your syslog server.
Under Port, enter the port of your syslog server. The default port is 514
.
Note: The host must be reachable from the TAS for VMs network. Ensure that your syslog server listens on external interfaces.
Under Protocol, select the transport protocol to use when forwarding logs.
Under tls_enabled, select enabled if you are using tcp and want tls.
Under ca_cert, add the certificate to validate connections to external server if using tls.
Enable the Enable system metrics check box. For a list of the VM metrics that the System Metric Agent emits, see System Metrics Agent in the System Metrics repository on GitHub.
Click Save.
To download Windows Diego Cell logs:
Go to the Ops Manager Installation Dashboard.
Click the TAS for VMs [Windows] tile.
Click the Status tab.
Under the Logs column, click the download icon for the Windows Diego Cell for which you want to retrieve logs.
Click the Logs tab.
When the logs are ready, click the filename to download them.
Unzip the file to examine the contents. Each component on the Diego Cell has its own logs directory:
/consul_agent_windows/
/garden-windows/
/metron_agent_windows/
/rep_windows/
BOSH automatically deletes a compilation VM after the compilation VM fails. In a vSphere environment, use one of the procedures below to troubleshoot your Windows stemcell v2019.7 and later compilation VM issues:
The easiest method to troubleshoot a Windows compilation VM is to SSH into the VM before BOSH deletes it.
To troubleshoot a compilation VM from an ssh
session:
Open the vSphere UI.
Open two different BOSH CLI terminal sessions.
From the first BOSH CLI terminal, monitor the BOSH task:
watch -n 5 "bosh -d TAS-WINDOWS-DEPLOYMENT is --details | grep compilation"
Where TAS-WINDOWS-DEPLOYMENT
is the name of your TAS for VMs [Windows] deployment.
Wait until the compilation VM CID is up.
From the second BOSH CLI terminal, SSH to the Windows compilation VM:
bosh -d TAS-WINDOWS-DEPLOYMENT ssh COMPILATION-NAME
Where:
TAS-WINDOWS-DEPLOYMENT
is the name of your TAS for VMs [Windows] deployment.COMPILATION-NAME
is the name of your Windows compilation VM.To prevent BOSH from deleting the compilation VM after the compilation VM fails, search for the compilation VM CID in the vSphere UI and rename it. You can now troubleshoot within this session.
After troubleshooting, delete the VM manually.
In some situations, the Windows compilation VM might be deleted very quickly, making it impossible to SSH into the VM before BOSH deletes it.
To troubleshoot a quickly-deleted compilation VM:
Download an Ubuntu desktop image from Ubuntu Releases Xenial.
Upload the Ubuntu desktop image into your vSphere datastore.
Open the vSphere UI.
Open a BOSH CLI terminal session.
Click Apply Changes in Ops Manager.
From the BOSH CLI terminal, monitor the BOSH task:
watch -n 5 "bosh -d TAS-WINDOWS-DEPLOYMENT is --details | grep compilation"
Where TAS-WINDOWS-DEPLOYMENT
is the name of your TAS for VMs [Windows] deployment.
Wait until the compilation VM CID is up.
From the vSphere UI:
10000 milliseconds
.On the BIOS setup screen, boot with the CD-ROM Drive.
After Ubuntu desktop starts, select try Ubuntu and launch a terminal.
In the terminal, run:
sudo fdisk -l
sudo mkdir /mnt/windows
sudo mount /dev/sda1 /mnt/windows
You can now troubleshoot inside this session by exploring the contents of the Windows VMs file system in /mnt/windows
.
After troubleshooting, delete the VM manually.
Note: For the basics, start with the [Microsoft troubleshooting docs](https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/gmsa-troubleshooting#non-domain-joined-hosts-make-sure-the-host-is-configured-to-retrieve-the-gmsa-account).
The Windows Authentication feature uses the Windows event log system. To access a log from the Windows Diego Cell, ssh
onto the cell, then enter powershell
. To get the events in an event log, use the command Get-WinEvent
:
> bosh ssh -d TAS-WINDOWS-DEPLOYMENT windows_diego_cell/INDEX
> powershell
PS> Get-WinEvent -LogName LOG-NAME
Where TAS-WINDOWS-DEPLOYMENT
is the name of your TAS for VMs [Windows] deployment, INDEX
is the VM you want to access, and LOG-NAME
is the name of the log for which you want to view events.
TAS gMSA plugin logs are in the event log Cloudfoundry-CCG-Plugin
. They will tell you if the plugin is successfully invoked, and if the input to the plugin is correct. Successful logs look similar to the following example:
PS> Get-WinEvent -LogName Cloudfoundry-CCG-Plugin
ProviderName: CfCcgPlugin
TimeCreated Id LevelDisplayName Message
----------- -- ---------------- -------
10/28/2022 6:06:58 PM 0 Information Successfully got password credentials
10/28/2022 6:06:58 PM 0 Information Plugin invoked
10/28/2022 6:06:58 PM 0 Information Plugin instantiated
If Windows containers aren’t starting, you can check the logs for the Host Compute Service, which is the Windows component that administers Windows containers. Successful logs may look like:
PS> Get-WinEvent -LogName Microsoft-Windows-Hyper-V-Compute-Admin
ProviderName: Microsoft-Windows-Hyper-V-Compute
TimeCreated Id LevelDisplayName Message
----------- -- ---------------- -------
10/28/2022 5:57:30 PM 1001 Information The Host Compute Service started successfully.
In the event that Windows containers aren’t starting, or Windows Authentication isn’t working through an app, you can check the logs for the CCG.exe process, which invokes the TAS plugin and establishes connectivity to Active Directory. Successful logs look similar to the following:
PS> Get-WinEvent -LogName Microsoft-Windows-Containers-CCG/Admin
ProviderName: Microsoft-Windows-Containers-CCG
TimeCreated Id LevelDisplayName Message
----------- -- ---------------- -------
10/28/2022 6:35:34 AM 2 Information Container Credential Guard fetched gmsa credentials for GMSA$ using plugin: {8019A64C-3F4E-4DE3-AD2B-9A544290E2C3}.
Where GMSA$
is the name of your GMSA service account.
For a list of possible Microsoft-Windows-Containers-CCG
events, see the Microsoft Troubleshooting docs.
Note You might need to follow additional steps to have events show up in the Microsoft-Windows-Containers-CCG
log. If you have additional questions, contact Tanzu Support.
Both the Windows Diego Cell and the app container must have network connectivity to the Active Directory instance. Ensure that firewall rules are set up appropriately. If you have connectivity from the Windows Diego Cell but not from the app container, make sure that there are no App Security Groups preventing access from app containers.
From inside an app container, the following commands should complete successfully if the network connectivity is set up properly:
nslookup AD.DOMAIN # Make sure DNS is properly set
ping AD.DOMAIN # Make sure basic connectivity to the Domain Controllers is working
nltest /sc_query:AD.DOMAIN # Make sure the secure channel to the Domain Controllers is working
Where AD.DOMAIN
is the fully-qualified domain name of the Active Directory instance.
If the environment variable COMPUTERNAME
is not set to GMSA$
and USERDNSDOMAIN
is not set to AD.DOMAIN
, check your tile setup.