This section explains the following additional options available for Flexible Upgrades:
Rollback - Error Recovery
Stop Cleanup
SE Group Resume Option
Rollback – Error Recovery
If the upgrade process hits an error, then an automated rollback error recovery is initiated to bring the NSX Advanced Load Balancer Controller(s) into a known good state.
If the SE group encounters an error, SE group will suspend upgrade if the
suspend_on_failure
flag is used. Otherwise, it will continue to upgrade the rest of the Service Engines in the SE group.If the error happens in the context of a patch, then the patch will be rollback.
If an error is encountered during the execution of rollback mechanism, then it will be treated as upgrade cancelled.
Stop Cleanup
Whenever the rollback operation is triggered and it fails, then the NSX Advanced Load Balancer Controller or SE group will be moved to the abort state. In Flexible Upgrades, these states can be cleaned up and NSX Advanced Load Balancer Controller and SE group are transitioned to a known stable state.
Using SE Group Resume Option
SE group resume option is supported only from NSX Advanced Load Balancer release 18.2.8.
Whenever an SE group is upgraded with suspend_on_failure
enabled and an issue is encountered, the upgrade process for that SE group is suspended. After the issue is resolved through manual intervention, use the following options to resume the upgrade:
Se-group-uuids — Specify the SE group that needs to be resumed.
Ignore_failure — This field overrides the earlier suspend on failure. The upgrade will take place in an unconditional manner and will proceed even if there is a failure in the subsequent upgrade iteration. Default value is false.
Skip-suspended — This field will skip the SE(s) that are suspended in the previous upgrade iteration and proceed with the remaining SE(s) in the group. The default value is false.
[admin:controller]: >resume segroup se_group_refs <se-group-name>
[admin:controller]: >resume segroup se_group_refs seg-a
The following options are available for resuming SE Group Upgrade with Options:
[admin:controller]: >resume segroup skip_suspended se_group_refs Default-Group action_on_error continue_upgrade_ops_on_error
[admin:controller]: >resume segroup skip_suspended se_group_refs Default-Group action_on_error suspend_upgrade_ops_on_error
Use the following API POST method to resume SE group upgrade.
API: /api/segroup/resume POST /api/segroup/resume JSON data: { "se_group_uuids": [ "serviceenginegroup-ec9c8141-844d-467d-bdc0-d7855e9d8419" ], "skip_warnings": true }
When skip_warnings": true
is used, upgrade proceeds without exhibiting warnings messages and upgrade previews.
Resume with other options:
Option: Skip suspended SE’s and continue with the upgrade, update the se_group action_on_error
with CONTINUE_UPGRADE_OPS_ON_ERROR
.
{ "se_group_options": { "action_on_error": "CONTINUE_UPGRADE_OPS_ON_ERROR", "skip_suspended": true }, "se_group_resume_options": { "action_on_error": "CONTINUE_UPGRADE_OPS_ON_ERROR", "skip_suspended": true }, "se_group_uuids": [ "serviceenginegroup-ec9c8141-844d-467d-bdc0-d7855e9d8419" ], "skip_warnings": true }
Option: Skip suspended SE’s and continue with the upgrade.
{ "se_group_options": { "action_on_error": "SUSPEND_UPGRADE_OPS_ON_ERROR", "skip_suspended": true }, "se_group_resume_options": { "action_on_error": "SUSPEND_UPGRADE_OPS_ON_ERROR", "skip_suspended": true }, "se_group_uuids": [ "serviceenginegroup-ec9c8141-844d-467d-bdc0-d7855e9d8419" ], "skip_warnings": true }
Creating SE Group using SE Group Template
During cloud initiation and SE Group creation, it is checked if the cloud has a Service Engine group template. If the template is available, the base image or patch image is copied from the SE group template. Otherwise, the image information is picked from the NSX Advanced Load Balancer Controller.
For each cloud, there is a se_group_template_uuid
. This is used to ensure that the newly created SE Group follow the se_group_template_uuid
.
Any SE group can be designated as the template. If an SE group (Seg1) is assigned as the default SE group template, then the newly created SE group (Seg2) picks the base and patch image from Seg as shown below.
[admin:controller]: > show upgrade status filter serviceenginegroup Seg2 +------+--------+---------------+------------------+-----------+-----------------------------+---------------------------------+ | Name | Tenant | Cloud | State | Operation | Image | Patch | +------+--------+---------------+------------------+-----------+-----------------------------+---------------------------------+ | Seg3 | admin | Default-Cloud | UPGRADE_FSM_INIT | None | 18.2.9-9000-20200509.052234 | 18.2.9-9000-2p1-20200430.133146 | +------+--------+---------------+------------------+-----------+-----------------------------+---------------------------------+
The patch rollback option should be enabled as shown below.
[admin:controller]: > show upgrade status detail filter serviceenginegroup Seg2 +-----------------------+-------------------------------------------------------------------------+ | Field | Value | +-----------------------+-------------------------------------------------------------------------+ | uuid | serviceenginegroup-d564305e-9db5-4ae6-941c-485a26af062a | | name | Seg2 | | node_type | NODE_SE_GROUP | | version | 18.2.9-9000-20200509.052234 | | image_ref | 18.2.9-9000-20200509.052234 | | patch_version | 2p1 | | patch_image_ref | 18.2.9-9000-2p1-20200430.133146 | | state | | | state | UPGRADE_FSM_INIT | | last_changed_time | Sat May 9 06:15:49 2020 ms(41) UTC | | seg_status | | | notes[1] | [2020-05-09 06:15:49] Init segroup(seg3 defaults to seg-template(seg1). | | start_time | 2020-05-09 06:15:49.406502 | | enable_rollback | False | | enable_patch_rollback | True | | progress | 0 percent | | tenant_ref | admin | | obj_cloud_ref | Default-Cloud | +-----------------------+-------------------------------------------------------------------------|