Multi-tenancy Fairness Control

Introduction

This section allows the system administrator to configure fairness among different tenants w.r.t access to workflow-hub.

Workflow Hub Resources & Fairness

Tradeoff Between Fairness & Utilisation

If Workflow Hub allows the first tenant to trigger 128 runs, then no other tenant would be able to trigger any new runs until the ongoing runs are completed. This makes the access to Workflow Hub unfair.

On the other hand, if Workflow Hub allows for perfect sharing (Example: suppose we had 8 tenants, then each tenant would trigger exactly 16 workflow runs), the system can be under-utilised when all the tenants are NOT active. In the example discussed in the preceding sentence, if only one tenant is active, the tenant would still have access to only 16 executions, which causes the utilisation to drop to 12.5%.

As we can see, there is a fundamental tradeoff between utilisation and fairness. The exact tradeoff to be made depends on the user and the context in which the system is being operated in. Workflow Hub offers multiple algorithms for the users to choose from depending on the situation.

Fairness Control Algorithms

Fairness control algorithms is allowed to be configured from the Workflow Hub Administration Tab. As this configuration affects all tenants, privilege to access this tab should be provided to System Administrators ONLY. When any role in the tenant is being created, it must be ensured that the following privileges are NOT allocated to the role (allocation of these privileges would enable access to Workflow Hub Administration Tab).

WORKFLOW HUB CONFIGURATION READ
WORKFLOW HUB CONFIGURATION WRITE

The algorithms are described below.

Configuration

MultiTenancy Fairness Control 1

Administration tab lists three parameters, of which 2 are configurable. The other parameter is read-only.

Max Workflow Runs: This indicates the system-limit about the number of concurrent workflow runs supported. User cannot change this value.
Algorithm Type: This is a drop-down, which allows user to select one of three types: No-Limit, Static & Dynamic
Tenant Fairness Parameter (TFM): This is a parameter provided to the algorithm.

No Limit

When user selects no-limit, it will NOT impose any limit on per-tenant executions. In other words each tenant would have access to trigger all the 128 executions. Selection of this algorithm is recommended when we have no more than 1-2 tenants using Workflow Hub. When No Limit is selected, the TFM has no impact on the algorithm behavior.

Static

When user selects static, a tenant can execute runs up to a certain limit (dictated by the TFM). Once a specific tenant reaches the limit, subsequent execution attempts will receive a HTTP 429 error. In the UI, the message would be sent stating that the user cannot execute any more due to the limit.

The limit is calculated as follows:

Limit = MaxWfRuns*TFM/(Number of tenants)

As an example: If we have 128 workflows and 8 tenants, with TFM set to 1, each tenant would be allowed to execute 16 concurrent runs. This also implies that the system utilisation is very low (12.5%) IF ONLY ONE tenant is active. User can tweak this, by adjusting the TFM to say 3-4. Then a single tenant could use up to 50% of the system. However, if more than 2-3 users access the system simultaneously, there could be a fairness problem.

The static algorithm is recommended, when we have a large number tenants, but very few of the tenants are active at any given time (i.e., very few tenants need to simultaneously execute workflows). In that case, the TFM should be tuned to a value to achieve high system utilisation, with few active tenants.

Dynamic

With the Dynamic algorithm, the limit is dynamically determined depending on the overall usage of the Workflow Hub. The limit would be a function of the spare-capacity : TFM*(MaxWfRuns - TotalRunningWFs). So, when the usage is high, the spare-capacity will be small, and therefore limit will become small. Light users with very few running workflows would be allowed to submit new execution runs. However, heavy users of the system would not be allowed to trigger new workflow-runs. This approach trades off short-term utilisation for longer term fairness. Use of Dynamic limits are recommended, when the number of active tenants cannot be predicted as easily as the case with Static.

Example:

Suppose the total ongoing executions is 32, this implies the spare capacity is 96. The limit would be set to 384 (which is higher than the MaxWfRuns, but it does not matter, as this value changes with every evaluation). So, a tenant with 1 ongoing execution as well as 29 ongoing executions would be treated the same. Both requests would be allowed.
Suppose the total ongoing executions is 120: The spare capacity is 8. The limit would be set to 24 (with TFM of 3.0). Now, the tenant with 1 ongoing execution would be allowed to submit a execution request. However, a tenant with 29 ongoing executions would be rejected.

Recommended TFM for Dynamic: 3.0