Container technology provides organizations several benefits:
However, as organizations use container technology, two significant governance challenges emerge.
Tanzu CloudHealth provides long-term, trended visibility into container resource utilization by service and team. The module helps you discover which services are consuming the most resources and identify opportunities for optimization.
Using the module, you can:
Deploy a lightweight container called a Collector in each cluster in your environment so that the collector can gather metadata from your container environment
If you are using Amazon ECS as your orchestration solution, see Getting Started with Amazon ECS in Tanzu CloudHealth.
There are two ways to get started with Kubernetes in Tanzu CloudHealth: 1. Set up the Helm Chart to automatically deploy the Tanzu CloudHealth Collector in each cluster in your environment. 2. Deploy the Tanzu CloudHealth Collector to each cluster using the deployment file.
Use the helm chart to deploy a lightweight container called the Tanzu CloudHealth Collector agent into each Kubernetes cluster in your environment. The Collector gathers metadata from your environment to generate reports.
Tanzu CloudHealth gathers two categories of data through the Collector:
$ export CHT_API_TOKEN=
.cloudhealth-collector
: $ helm repo add cloudhealth https://cloudhealth.github.io/helm/
$ helm install cloudhealth-collector --set apiToken=<CloudHealth API Token>,clusterName=<Cluster Name> cloudhealth/cloudhealth-collector
These commands deploy the Tanzu CloudHealth Collector on the Kubernetes cluster in the default configuration. To view the parameters that can be configured during installation, visit the Helm Chart GitHub page.
Results: The Helm Chart is installed and deploys the Tanzu CloudHealth Collector to the new cluster configured in your environment.
Configure a collector for each Kubernetes cluster. Next, deploy the collectors so that Tanzu CloudHealth can start gathering metrics on your container environment.
There should be no equal sign between the variables and the value. The variables should also be referenced as $VARIABLENAME
instead of %VARIABLENAME%
.
You can return to the Collector deploy instructions at any time. The cluster created will be displayed on the page at Setup > Containers > Clusters.
Results: The Tanzu CloudHealth Collector is deployed to the cluster.
The collector starts collecting metrics from the cluster as soon as it is deployed, but it does not backfill historical information. The status of the cluster changes to Healthy after Tanzu CloudHealth starts receiving data from the collector.
It can take up to 24 hours for meaningful visualizations to appear in the Tanzu CloudHealth platform after the collector has been deployed.
On the Setup > Containers > Clusters page, clusters can have one of three statuses:
You can confirm that the Collector is collecting metrics through the Metrics column:
Use the Tanzu CloudHealth platform to organize your container assets using Perspectives. The goal of this organization is to map specific container tasks to the container assets where those tasks are run.
See Configure Container Infrastructure for Cost Analysisfor more details about grouping and distributing cluster costs.
kubectl get --namespace cloudhealth pods
kubectl get --namespace cloudhealth pods
kubectl logs --namespace cloudhealth <pod-name>
Ensure that the Collector is on the latest version and is able to collect metrics. Confirm that the Metrics column is Healthy from the Containers Cluster page.
If the Metrics column displays an Unhealthy status, update the Collector manually or using Helm:
$ helm upgrade cloudhealth-collector cloudhealth/cloudhealth-collector
kubectl get --namespace cloudhealth pods
kubectl logs --namespace cloudhealth <pod-name>
Example Output:
CHT Containers Collector Environment
CHT_API_TOKEN: ****
CHT_CLUSTER_NAME: testCluster
JAVA_OPTS:
-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=1 -XX:+ExitOnOutOfMemoryError -Xms10M -Xmx891M
CHT_INTERVAL: 900
=========================================================================
CHT Containers Collector : version DIRTY starting
I, [2021-02-01T23:19:39.211985 #11] INFO -- : loaded K8S config from with master @ https://kubernetes.default.svc/ with ca certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt with client_cert_file with client key file with trust_certs false with trust store file with proxy username
D, [2021-02-01T23:19:39.732649 #11] DEBUG -- : Ensuring cache directory is present: /tmp/cache
D, [2021-02-01T23:19:39.793997 #11] DEBUG -- : Fetching state...
D, [2021-02-01T23:19:39.798196 #11] DEBUG -- : Connecting to URL: https://kubernetes.default.svc/api/v1/nodes
D, [2021-02-01T23:19:40.588497 #11] DEBUG -- : Connecting to URL: https://kubernetes.default.svc/api/v1/pods
D, [2021-02-01T23:19:40.610918 #11] DEBUG -- : Connecting to URL: https://kubernetes.default.svc/api/v1/services
D, [2021-02-01T23:19:40.618938 #11] DEBUG -- : Posting state...
D, [2021-02-01T23:19:40.622422 #11] DEBUG -- : Posting state from 2021-02-01 23:19:39 +0000: /tmp/cache/kubernetes_nodes_1612221579 (size: 1685)
E, [2021-02-01T23:19:40.703980 #11] ERROR -- : Not Found [404]: Failed to post cluster state to http://10.108.1.248:9292/v1/containers/kubernetes/state?auth_token=API_TOKEN_REDACTED&cluster_id=testCluster&sample_time=1612221579.0. Error: Could not find: http://127.0.0.1:8500/v1/kv/customer_container/blobs/
From the example logs above you can derive the following:
/tmp/cache
./tmp/cache/kubernetes_nodes_1612221579
with a size as shown in (size: 1685). If this value is 0, then there is no data from Kubernetes and you need to investigate issues with the cluster.To validate the Kubernetes cluster, run the following commands:
kubectl get pods --all-namespaces -o wide | grep cloudhealth
https://kubernetes.default.svc/api/v1/nodes
.To validate collector agent connectivity to our own collection endpoint, run the following commands:
Use nmap
or netcat
to ping port 443
nmap -p 443 api.cloudhealthtech.com
nc -zv api.cloudhealthtech.com 443
Run CURL commands against the collection endpoint manually:
curl -v -X GET https://containers-api.edge.cloudhealthtech.com/api/v1/health
to request the collection health endpoint.{"status":"healthy","time":"Fri, 29 Jan 2021 22:48:10 GMT"}
curl --header "Content-Type: application/json" --request POST https://containers-api.edge.cloudhealthtech.com/v1/containers/kubernetes/state?cluster_id=INSERT_CLUSTER_ID_HERE&auth_token=INSERT_AUTHENTICATION_TOKEN_HERE
to mock the request made by the collector agent (except without any k8s data cache payload). auth_token
and the cluster_id
as necessary. {"messages": "Required request body is missing"}
.Ensure that the cluster original address:
Ensure that the Collector: