How Tanzu Service Mesh SLO Addresses Challenges in Microservices

One of the challenges of ensuring a high quality of service is being able to measure the factors that reflect the quality of user experience, such as latencies and error rates.

Tanzu Service Mesh provides those metrics out of the box, without needing additional plugins or code changes. Metric levels are displayed in real-time graphs in the Tanzu Service Mesh Console user interface.

Tanzu Service Mesh provides an interface where you can configure SLO targets and select the SLIs that determine service health, Tanzu Service Mesh SLO helps make informed decisions around which parts of the application may need work, where feature development can be accelerated and help configure the Tanzu Service Mesh Service Autoscaler, which further helps ensure that the SLO is met. For more information about Tanzu Service Mesh Autoscaler, see the Service Autoscaling with Tanzu Service Mesh User's Guide.

With Tanzu Service Mesh SLO, you can observe and monitor the health of your services inside a Global Namespace or for services directly in their cluster namespaces in the UI and through the API.

Summary of Tanzu Service Mesh SLOs Configuration Approaches

There are two ways that SLOs are offered in Tanzu Service Mesh: monitored SLOs and actionable SLOs. In the case of monitored SLOs, you can configure these to monitor the behavior of one or more services and inspect associated performance graphs. In the case of actionable SLOs, you can monitor and also influence autoscaler decisions when SLIs are violated.

Note:

If you are not sure about what actionable SLOs to set, one approach is to start with monitored SLOs and run these for a period to learn more about the behavior of the services. Once comfortable with the behavior of a service, you can set up actionable SLOs. Some customers start with both kinds of SLO. There is no limitation; it’s just a best practice that can vary from service to service.

Monitored SLOs. The monitored SLOs provide an indicator on the performance of the services and whether these services meet the target Service Level Objective condition based on the SLIs specified for the service. The monitored SLO policies can be configured either for services inside a global namespace (GNS-scoped SLOs) or for services directly in the cluster (org-scoped SLOs). When configured for a global namespace service, the SLO budget is depleted when any of the service instances inside the global namespace, violates SLIs. For more information about monitored SLOs, see Use Case 1A: GNS-Scoped Monitored SLO.
Actionable SLOs. Like monitored SLOs, actionable SLOs provide an indicator of the health of the services and track how well the services meet the defined SLO target. Unlike monitored SLOs, each actionable SLO can only target a single GNS-scoped service, and they can help influencing the service resiliency actions like autoscaling. The actionable SLO policies can be configured for services inside a global namespace. When configured for a global namespace service, the SLO budget is depleted when any of the service instances inside the global namespace, violates SLIs. For more information about actionable SLOs, see Use Case 1B: GNS-Scoped Actionable SLO.

Important:

Each actionable SLO can have only one service, and a service can only have one actionable SLO.

The table below provides a summary of the differences between monitored and actionable SLOs based on the scope and behavior of the policies.

Table 1. Monitored SLO vs Actionable SLO
Monitored SLO	Actionable SLO
A monitored SLO can be configured as org-scoped or GNS-scoped.	An actionable SLO can be configured as GNS-scoped.
A monitored SLO can monitor multiple services.	An actionable SLO can monitor and provide action on a single global namespace service.
A monitored SLO monitors and observes the health of the services based on the SLI metrics set in the SLO policy.	An actionable SLO monitors and observes a service based on the SLI metrics set in the SLO policy. In addition, if there are autoscaling policies configured in the system for the associated service versions, actionable SLO’s SLI violations act as a signal for the Tanzu Service Mesh autoscaler to make scaling decisions.