This topic broadly outlines techniques for troubleshooting your Concourse for VMware Tanzu installation. Jump to a topic with the table of contents on the right side of the page.
fly
is the Concourse CLI. To get help on all fly
commands, run:
fly --help
In particular, the fly
commands and summaries listed below provide useful information to help you troubleshoot Concourse environments and pipelines.
You can use the following fly
commands to troubleshoot possible environment problems.
fly Command | Short Description |
---|---|
containers |
Lists active containers, their type, and which worker they are running on. |
workers |
Lists registered workers. This helps you verify that the number of containers does not exceed the maximum number allowable. |
prune-worker |
Removes a non-running worker. Stops Concourse from tracking an out-of-commission worker. |
volumes |
Lists active volumes and the worker on which they are located. |
You can use the following fly
commands to troubleshoot possible pipeline problems.
fly Command | Description |
---|---|
pipelines |
Lists configured pipelines. |
builds |
Shows build history. This is useful for listing build IDs of one-off tasks ran previously using execute . |
validate-pipeline |
Validates a pipeline's configuration without calling set-pipeline . |
check-resource |
Checks for new versions. This is useful when developing a new resource. |
watch |
Shows logs of in-progress builds. |
intercept |
Displays build steps for a running or recent build and optionally connect to one of the active containers. |
execute |
Submits local tasks. This is useful for spinning up a task quickly to test it before putting it in a job. |
The following shows some common problems and solutions.
Problem: An error states there is an inability to create a storage volume and might state that permissions are denied.
Solution: Increase persistent disk size for the worker or increase the number of worker VMs.
Problem: Cannot create container: limit of 250 containers reached. This error state is unlikely to appear.
Solution: Increase the number of worker VMs. Change container placement strategy. Decrease gc_interval
if set to custom value. A large interval could mean that there are too many expired containers.
Problem: This error might present as the build getting stuck in Pending state.
Solution: Restart the ATC job: Log in as a root user on the Concourse web VMs where the ATC job is located. Alternatively, run the monit restart atc
command.
Problem: When a build fails after BOSH deploys a Concourse update from a job running on that Concourse instance, typically the job fails with a "worker for container not found" error.
Solution: This is expected behavior; the BOSH Director re-creates the worker VM. Run the job again.
Problem: BOSH is not able to restart the worker job to finalize the upgrade until all work is completed.
Solution: If you have a long-running task, wait for the task to be completed. If you need to upgrade quickly, cancel running tasks and jobs.
You might need to contact VMware Global Support Services for help identifying a problem. In that case, support might ask you to send job log files.
For general information about accessing log files, see Location and use of logs in the BOSH documentation and Advanced Troubleshooting with the BOSH CLI.
The following links provide other troubleshooting resources.