VMware Telco Cloud Service Assurance allows you to take a backup and restore your data from the older version of VMware Telco Cloud Service Assurance to the newer version.

You must perform the following procedure to migrate the data from older version of VMware Telco Cloud Service Assurance to the newer version.

Procedure

  1. Take a backup on an older setup (VMware Telco Cloud Service Assurance 2.3.0 / 2.3.1) using an S3 bucket or an NFS server.
  2. Use the same S3 bucket or NFS server details on the target VMware Telco Cloud Service Assurance 2.4.0 cluster by providing them in values-user-overrides.yaml file in the tcx-deployer/product-helm-charts/tcsa bundle of VMware Telco Cloud Service Assurance 2.4.0 and then re-deploy it.
  3. In the syncbackup.yaml file of VMware Telco Cloud Service Assurance 2.4.0, there are tcsa-system namespace components and tps-system namespace components. In the syncbackup.yaml file, provide the bucket and backup name (same as the ones given in both the namespace system.).
    Note: An example for this could be found in tcx-deployer/examples/backup-and-restore/synbackup.yaml.example.
  4. In the syncbackup.yaml file, under spec uncomment the below two lines, only if the backup was taken on either VMware Telco Cloud Service Assurance 2.3.1 / 2.4.0. Set name to tcsa2.3.1 if the backup was taken on 2.3.1. Set name to tcsa2.4.0 if the backup was taken on VMware Telco Cloud Service Assurance 2.4.0.
    apiVersion: tcx.vmware.com/v1
    kind: SyncBackup
    metadata:
      name: sync-backup-tps
      namespace: tps-system
    spec:
      overrideExisting: false       # set this if the backup with same name already exists in the cluster  # setting this to true will not delete the data
      filter:
        componentList:
          - postgres
        backupList:
          - group-backup
      pauseIntegrityCheck: true
      overrideNamespace:
        targetNamespace: tps-system
    
      #Uncomment the below two lines ONLY if the backup was taken on either 2.3.1 / 2.4.0. Set "name" to tcsa2.3.1 if the backup was taken on 2.3.1, or tcsa2.4.0 if taken on 2.4.0
       cluster:
         name: tcsa2.4.0
    
      storage:
        minio:
          bucket: vmware-tcsa-backup
          endpoint: minio.tcsa-system.svc.cluster.local:9000
          secretRef:
            name: minio-secrets
            namespace: tcsa-system
            accessKey:
              key: root-user
            secretKey:
              key: root-password
    
    ---
    apiVersion: tcx.vmware.com/v1
    kind: SyncBackup
    metadata:
      name: sync-backup-tcsa
      namespace: tcsa-system
    spec:
      overrideExisting: false      # set this if the backup with same name already exists in the cluster  # setting this to true will not delete the data
      filter:
        componentList:
          - elasticsearch
          - collectors
          - zookeeper
          - kubernetesResources
        backupList:
          - group-backup
      pauseIntegrityCheck: true
      overrideNamespace:
        targetNamespace: tcsa-system
       cluster:
         name: tcsa2.4.0
      storage:
        minio:
          bucket: vmware-tcsa-backup
          endpoint: minio.tcsa-system.svc.cluster.local:9000
          secretRef:
            name: minio-secrets
            namespace: tcsa-system
            accessKey:
              key: root-user
            secretKey:
              key: root-password
  5. In the example yaml file, default bucket vmware-tcsa-backup is used. To override the bucket name, you must update the NFS File Server bucket or an S3 bucket, whichever was used on the older setup, in the yaml file.
  6. Run the following command to sync the backup in the target cluster.
    kubectl apply -f synbackup.yaml.example

    After the sync operation is complete, you can see the status as SUCCESSFUL.

    The backups stored in NFS File Server are accessible in the new cluster.
    Note: Please note that performing the sync backup is essential before initiating the restore operation.
    [root@wdc-10-214-150-193 backup-and-restore]#
    kubectl get syncbackups -A
    NAMESPACE   NAME               STATUS       CURRENT STATE   READY   AGE     MESSAGE
    default     sync-backup-tcsa   SUCCESSFUL   syncBackup      True    4h51m   synced: 1, skipped: 0, failed: 0
    default     sync-backup-tps    SUCCESSFUL   syncBackup      True    4h51m   synced: 1, skipped: 0, failed: 0

    In case of any failure, the MESSAGE field is populated with the error message.

  7. After syncbackup is successful, you must check the status of your backup that you have taken in the previous or older tcsa setup.
    kubectl get backups -A
  8. If you want to restore VMware Telco Cloud Service Assurance 2.3.0 or 2.3.1, use the following example.
    Provide the VMware Telco Cloud Service Assurance 2.3.1 or 2.3.0 backup name in the following restoration file for both the name spaces, tcsa-system and tps-system.
    • The following example file is for restoring 2.3.0 backup.
      tcx-deployer/examples/backup-and-restore/restore-230-version.yaml.example
    • The following example file is for restoring 2.3.1 backup.
      tcx-deployer/examples/backup-and-restore/restore-231-version.yaml.example

      The following is the example content of the restore-231-version.yaml.example file.

      apiVersion: tcx.vmware.com/v1
      kind: Restore
      metadata:
        name: group-restore-tps
        namespace: tps-system
      spec:
        backupName: <backup name of tcsa2.3.1>
        restore:
          postgres:
            timeout: 10m
            config:
              adminSecret:
                name: postgres-db-secret
                namespace: tps-system
              endpoint:
                host: postgres-cluster.tps-system.svc.cluster.local
                port: 5432
            dbs:
            - analyticsservice
            - alarmservice
            - collector
            - grafana
            - keycloak
            #- "remediation"
            #- "airflow"
            postAction:
              name: pgpostaction
              serviceAccount: cluster-admin-sa
              timeout: 30m
              resource:
                cpu: 200m
                memory: 256Mi
              bash:
                command:
                - /bin/bash
                - -c
                - |
                  set -ex; psql -a -U pgadmin -d grafana -c "ALTER TABLE alert_configuration_history DROP COLUMN IF EXISTS last_applied;";
              env:
               - name: PGPORT
                 value: "5432"
               - name: PGHOST
                 value: postgres-cluster.tps-system.svc.cluster.local
               - name: PGUSER
                 valueFrom:
                   secretKeyRef:
                     key: username
                     name: postgres-db-secret
               - name: PGPASSWORD
                 valueFrom:
                   secretKeyRef:
                     key: password
                     name: postgres-db-secret
      
      ---
      apiVersion: tcx.vmware.com/v1
      kind: Restore
      metadata:
        name: group-restore-tcsa
        namespace: tcsa-system
      spec:
        backupName: <backup name of tcsa2.3.1>
        postAction:
          name: postaction
          serviceAccount: cluster-admin-sa
          timeout: 30m
          resource:
            memory: 250Mi
            cpu: 100m
          bash:
            command:
              - /bin/bash
              - -c
              - |
                set ex;kubectl delete pods -n tcsa-system --selector run=apiservice;
                sleep 200;
                set ex;kubectl delete pod -n tcsa-system  --selector=app.kubernetes.io/name=grafana;
                sleep 10;
                set ex;kubectl exec -it deploy/br-operator -n tcsa-system -- curl -k -s --show-error --stderr - -H 'Content-Type: application/json' -X POST --data '{ "isCleanUpgrade": true }' http://apiservice:8080/smartsrestcontroller/vsa/smarts/domain/migrate;
        restore:
          collectors:
            config:
              authenticationSecret:
                name: collectors-secrets
                namespace: tcsa-system
                usernameKey:
                  key: COLLECTORS_USERNAME
                passwordKey:
                  key: COLLECTORS_PASSWORD
              endpoint:
                basePath: /dcc/v1/
                host: collector-manager.tcsa-system.svc.cluster.local
                port: 12375
                scheme: http
            timeout: 10m
          elastic:
            authentication:
              name: elasticsearch-secret-credentials
              namespace: tcsa-system
              passwordKey:
                key: ES_PASSWORD
              usernameKey:
                key: ES_USER_NAME
            cleanUpIndices: false
            config:
              endpoint:
                host: elasticsearch.tcsa-system.svc.cluster.local
                port: 9200
                scheme: https
              region: ap-south-1
            indexList:
            - vsa_chaining_history-*
            - vsa_events_history-*
            - vsa_audit-*
            - gateway-mappings
            - vsarole,policy,userpreference,mapping-metadata,mnr-metadata
            # Uncomment vsametrics to restore metrics and set cleanUpIndices as true
            #- vsametrics*
            # Uncomment vsa_catalog to restore TCSA 2.4 backup
            #- vsa_catalog
            # 'removeAndAddRepository: true' and trigger Backup/Restore, to cleanup the respository.
            removeAndAddRepository: true
            timeout: 30m
            tls:
              caCrt:
                key: ca.crt
              insecureSkipVerify: true
              namespace: tcsa-system
              secretName: elasticsearch-cert
              tlsCrt:
                key: tls.crt
        # Uncomment KubernetesResources to restore configmaps/secrets.
        # kubernetesResources:
        #   timeout: 10m
        #   resources:
        #     - groupVersionResource:
        #         group: ""
        #         version: "v1"
        #         resource: "secrets"
        #       nameList:
        #         - name: "spe-pguser"
        #           namespace: "tcsa-system"
        #     - groupVersionResource:
        #         group: ""
        #         version: "v1"
        #         resource: "configmaps"
        #       nameList:
        #         - name: "product-info"
        #           namespace: "tcsa-system"
          zookeeper:
            endpoint:
              host: zookeeper.tcsa-system.svc.cluster.local
              port: 2181
            paths:
            - path: /vmware/vsa/gateway
            - path: /vmware/vsa/smarts
            # Uncomment the zookeeper path for NCM backup
            #- path: /vmware/vsa/ncm
            timeout: 10m
      
  9. If you want to restore VMware Telco Cloud Service Assurance 2.4.0, use the following example.
    This example file can also be found in tcx-deployer/examples/backup-and-restore/restore.yaml.example.
    apiVersion: tcx.vmware.com/v1
    kind: Restore
    metadata:
      name: group-restore-tps
      namespace: tps-system
    spec:
      backupName: <backup name of tcsa2.4.0>
      restore:
        postgres:
          timeout: 10m
          config:
            endpoint:
              host: postgres-cluster.tps-system.svc.cluster.local
              port: 5432
            adminSecret:
              name: postgres-db-secret
              namespace: tps-system
          dbs:
            - "analyticsservice"
            - "alarmservice"
            - "collector"
            - "grafana"
            - "keycloak"
           #- "airflow"
           #- "remediation"
           #- "dm_upgrade"
    ---
    apiVersion: tcx.vmware.com/v1
    kind: Restore
    metadata:
      name: group-restore-tcsa
      namespace: tcsa-system
    spec:
      backupName: <backup name of tcsa2.4.0>
      postAction:
        name: postaction
        serviceAccount: cluster-admin-sa
        timeout: 30m
        resource:
          memory: 250Mi
          cpu: 100m
        bash:
          command:
          - /bin/bash
          - -c
          - |
            set ex;kubectl delete pods -n tcsa-system --selector run=apiservice;
            sleep 200;
            set ex;kubectl delete pod -n tcsa-system  --selector=app.kubernetes.io/name=grafana;
            sleep 10;
            set ex;kubectl exec -it deploy/br-operator -n tcsa-system -- curl -k -s --show-error --stderr - -H 'Content-Type: application/json' -X POST --data '{ "isCleanUpgrade": true }' http://apiservice:8080/smartsrestcontroller/vsa/smarts/domain/migrate;
      restore:
        collectors:
          config:
            authenticationSecret:
              name: collectors-secrets
              namespace: tcsa-system
              passwordKey:
                key: COLLECTORS_PASSWORD
              usernameKey:
                key: COLLECTORS_USERNAME
            endpoint:
              basePath: /dcc/v1/
              host: collector-manager.tcsa-system.svc.cluster.local
              port: 12375
              scheme: http
          timeout: 10m
        elastic:
          authentication:
            name: elasticsearch-secret-credentials
            namespace: tcsa-system
            passwordKey:
              key: ES_PASSWORD
            usernameKey:
              key: ES_USER_NAME
          cleanUpIndices: true
          config:
            endpoint:
              host: elasticsearch.tcsa-system.svc.cluster.local
              port: 9200
              scheme: https
            region: ap-south-1
          indexList:
          - vsa_chaining_history-*
          - vsa_events_history-*
          - vsa_audit-*
          - vsarole,policy,userpreference,mapping-metadata,mnr-metadata
          - gateway-mappings
    #     Uncomment vsametrics for metrics restore and set cleanUpIndices as true
    #      - vsametrics*
    #      Uncomment vsa_catalog to restore TCSA 2.4 backup
    #      - vsa_catalog
    #    'removeAndAddRepository: true' and trigger Backup/Restore, to cleanup the respository.
          removeAndAddRepository: true
          timeout: 30m
          tls:
            caCrt:
              key: ca.crt
            insecureSkipVerify: true
            namespace: tcsa-system
            secretName: elasticsearch-cert
            tlsCrt:
              key: tls.crt
      # Uncomment KubernetesResources to restore configmaps/secrets.
      # kubernetesResources:
      #   timeout: 10m
      #   resources:
      #     - groupVersionResource:
      #         group: ""
      #         version: "v1"
      #         resource: "secrets"
      #       nameList:
      #         - name: "spe-pguser"
      #           namespace: "tcsa-system"
      #     - groupVersionResource:
      #         group: ""
      #         version: "v1"
      #         resource: "configmaps"
      #       nameList:
      #         - name: "product-info"
      #           namespace: "tcsa-system"
        zookeeper:
          endpoint:
            host: zookeeper.tcsa-system.svc.cluster.local
            port: 2181
          paths:
          - path: /vmware/vsa/gateway
          - path: /vmware/vsa/smarts
    #   Uncomment the zookeeper path for NCM backup
    #      - path: /vmware/vsa/ncm
          timeout: 10m
    
    Note:
    • Provide the same backup name in the restore.yaml file. This would also have both the tcsa-system and tps-system namespace components to restore.
    • Add "/vmware/vsa/ncm" to restore backup of NCM reports.
  10. Before starting the restore process, it is important to remove/comment any datastores that are not required. If you have not backed up some components in a datastore, you must comment out the corresponding components in the Restore CRs.
  11. To restore the backup, run the following command:
    kubectl apply -f <restoration YAML file>
    You can check the restoration status by executing the following command. After restoration is successful, you can also launch the VMware Telco Cloud Service Assurance UI and check the backup data.
    [root]# kubectl get restore -A
    NAMESPACE     NAME                                   STATUS       CURRENT STATE   READY   AGE     MESSAGE
    tcsa-system   scheduled-group-restore-tcsa231-tcsa       SUCCESSFUL   restore         True    2d16h   
    tcsa-system   scheduled-group-restore-tcsa231-tps        SUCCESSFUL   restore         True    2d16h
    Note:
    • Once restore is triggered it cannot be undone, a failure to restore might result in a partial restore of the system. If the restore fails, a failure message displays in the Message field.
    • During restore, if any data ingestion occurs and the index is getting used, then the restore can fail.
    • After upgrading or migrating VMware Telco Cloud Service Assurance from older version to VMware Telco Cloud Service Assurance 2.4, the custom catalog metrics that you have created in the older version will not be available in VMware Telco Cloud Service Assurance 2.4 due to the vsa_catalog schema changes. You can create the customized catalog metrics again in the VMware Telco Cloud Service Assurance 2.4 Catalog UI.
    • When restoring an older version of a backup, please note that the configmap and secrets are not backward compatible. If there is a need to apply an older version of configmap and secrets, this must be done manually using the kubectl command.
    • To view the NCM reports, enter the NCM database IP address and password in the Grafana NCM-Postgres datasource.
    • Use the Grafana export and import options to export any customized Grafana dashboards from VMware Telco Cloud Service Assurance 2.3.0 and import them into VMware Telco Cloud Service Assurance 2.4.0.