Deploy a STIG-Hardened Management Cluster to an Airgapped AWS VPC

This topic explains how to deploy a STIG-hardened Tanzu Kubernetes Grid (TKG) management cluster to an airgapped Virtual Private Cloud (VPC) on AWS. The management cluster can then create and manage STIG-hardened Tanzu Kubernetes (workload) clusters.

For STIG compliance scan results and NSA/CISA Kubernetes Hardening Guidance for Tanzu Kubernetes Grid, see STIG and NSA/CISA Hardening

Options

You have the following options when deploying hardened TKG to the airgapped VPC:

  • You can enable FIPS-compliance for the clusters, based on Canonical FIPS for the node OS and BoringCrypto for Kubernetes.

  • The TKG deployment can use your own, existing image registry. Otherwise, by default, the 1-Click script installs a new instance of Harbor on an Amazon 2 AMI.

Prerequisites

To deploy a STIG-hardened management cluster to an airgapped environment on AWS, you need:

  • An AWS user account with permissions to create, list, and delete:

    • VPC endpoints
    • security groups
    • EC2 instances
    • bucket policies
    • AMIs
    • (Optional) IAM profiles
      • If the account cannot create IAM profiles, you need another way to create them.

    Some of these permissions are required to fulfill prerequisites below.

  • An airgapped AWS Virtual Private Cloud (VPC) with:

    • No NAT gateway or internet gateway
    • The following VPC endpoints of interface type enabled:
      • sts
      • ssm
      • ec2
      • ec2messages
      • elasticloadbalancing
      • secretsmanager
      • ssmmessages
      • cloudformation
    • The following VPC endpoint of gateway type enabled:
      • s3
  • An S3 bucket in the region where you will run TKG and accessible from within the airgapped VPC

    • Configure access to the bucket via a bucket policy that looks like this:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Access-to-specific-VPCE-only",
                "Effect": "Allow",
                "Principal": "*",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<MY BUCKET NAME>/*",
                "Condition": {
                    "StringEquals": {
                        "aws:sourceVpce": "<MY VPC ENDPOINT ID>"
                    }
                }
            }
        ]
    }
    
  • An ssh key pair for the AWS region where you will run TKG, created via the Amazon EC2 console.

  • A removable storage device, such as a USB thumb drive, with at least 20GB of free space.

  • A Linux bastion host (jumpbox) that:

    • Can access the VPC
    • Can copy data from the storage device
    • You have root access to
    • Has the following installed:
      • Docker
      • aws CLI
      • jq
      • make, with the build-essential package

    This is your bootstrap machine.

  • An online machine that can download content from a shared S3 bucket and code repository onto the removable storage device outside of the airgapped environment.

Procedure

To deploy the management cluster, you do the following as described in the sections below:

  1. Transfer content to your bootstrap machine and (optionally) to an existing registry
  2. Set environment variables to configure the 1-Click script
  3. Edit files used by the 1-Click script (for some options)
  4. Run the 1-Click script

Step 1: Transfer the 1-Click Script and TKG Dependencies

  1. From the online machine, load the removable storage device with the following content:

    • The Tanzu Compliance TKG 1-Click repository and its submodules.

      • Contact your VMware Representative to get the contents of this repo.
      • To retrieve the submodules, run git clone REPO --recursive.
    • The TKG dependencies. These images are stored in a shared S3 bucket called tkg-1click-dependencies:

      1. Retrieve read-only access credentials for the S3 bucket:

        • Contact your VMware representative for access to this file.
        • The credentials are rotated frequently.
      2. Copy the airgapped dependencies to the storage device:

        export TKG_VERSION=TKG-VERSION
        export TKR_VERSION=TKR-VERSION
        make download-deps
        

        Where TKG Version and TKR Version are the respective versions of TKG and TKR you wish you to use.

        Note: Currently, only TKG_VERSION 1.4.0 and TKR_VERSION 1.21.2 are supported, which pull TKG v1.4.0-fips.1 and TKR v1.21.2_vmware.1-fips.1-tkg.1.

  2. Plug the storage device into the bootstrap machine and copy its content to your own S3 bucket:

    export BUCKET_NAME=MY-BUCKET
    export DEPS_DIR=MY-DEPENDENCY-DIRECTORY
    make upload-deps
    

    Where MY-BUCKET is your own S3 bucket, accessible from within the airgapped VPC and DEPS_DIR is the path to the directory where your airgapped dependencies are located.

  3. (Optional) To use your own, existing image registry instead of having a registry automatically created, set up your registry up as follows:

    1. Create a publicly-readable project in the registry called tkg, for TKG to access at REGISTRY-NAME/tkg.

    2. Export the following environment variables:

        export REGISTRY=MY-REGISTRY
        export REGISTRY_CA_PATH=/PATH/TO/REGISTRY/CA
        export BUCKET_NAME=MY-BUCKET
        export TKG_VERSION=TKG-VERSION
        export TKR_VERSION=TKR-VERSION
        export IMGPKG_USERNAME=REGISTRY-USERNAME
        export IMGPKG_PASSWORD=REGISTRY-PASSWORD
      

      Where MY-REGISTRY,/PATH/TO/REGISTRY/CA,MY-BUCKET, TKG-VERSION, TKR-VERSION,IMGPKG_USERNAME, and IMGPKG_PASSWORD are your registry’s DNS name, a full path that points to a local file containing your registry’s CA, your own s3 bucket, the TKG Version (i.e. v1.4.0), the TKR Version (i.e. v1.21.2), and your username and password to use to access your registry.

    3. Upload the TKG and image-builder images into your registry’s tkg project by running the following sequence of commands:

       make upload-images
      

Step 2: Set Environment Variables

Set the environment variables that the 1-Click script uses to deploy the management cluster:

  1. Set the required variables, based on your AWS account and the values above:

      export BUCKET_NAME=MY-BUCKET
      export VPC_ID=MY-VPC
      export SUBNET_ID=MY-SUBNET-ID
      export SSH_KEY_NAME=AWS-RSA-SSH-KEY
      export AWS_AZ_ZONE=MY-AZ
      export AWS_ACCESS_KEY_ID=KEYPAIR-ACCESS-KEY-ID
      export AWS_SECRET_ACCESS_KEY=KEYPAIR-SECRET-ACCESS-KEY
      export AWS_DEFAULT_REGION=MY-AWS-REGION
      export TKR_VERSION=TKR-VERSION
      export TKG_VERSION=TKG-VERSION
    

    Note: Currently, only TKG_VERSION 1.4.0 and TKR_VERSION 1.21.2 are supported, which pull TKG v1.4.0-fips.1 and TKR v1.21.2_vmware.1-fips.1-tkg.1.

  2. Image Registry: Set variables to configure the image registry, depending on whether TKG will use an existing registry:

    • If you are using your own image registry set the following:

      export REGISTRY=MY-REGISTRY
      export USE_EXISTING_REGISTRY=true
      export REGISTRY_CA_FILENAME=/PATH/TO/REGISTRY/CA
      

      Where /PATH/TO/REGISTRY/CA is a full path that points to a local file containing your registry’s CA. The file should have the extension .crt.

    • To customize the Harbor registry that the 1-Click script creates, optionally set the following:

      • By default, the 1-click script sets the admin password for Harbor to the value that Terraform writes into the HARBOR_ADMIN_PWD field in the file air-gapped/airgapped.env. To use a different password:

        export TF_VAR_harbor_pwd=CUSTOM-HARBOR-PASSWORD
        
      • To configure registry access via either certs that you specify, or generated certs:

        export TF_VAR_create_certs = <Default is true>
        
        • If you set TF_VAR_create_certs to true or let it default, set the following:

          export TF_VAR_cert_l=#Default Minneapolis| L in the certs cn(Location)
          export TF_VAR_cert_st=#Default Minnesota|ST in the certs CN(State)
          export TF_VAR_cert_o=#Default VmWare|O in the certs CN(Organization)
          export TF_VAR_cert_ou=#Default VmWare R&D|OU in the certs CN(Organizational Unit)
          
        • If you set TF_VAR_create_certs to false, set the following:

          export TF_VAR_cert_path=#Path to certificate on harbor ami
          export TF_VAR_cert_key_path=#Path to private key on harbor ami
          export TF_VAR_cert_ca_path=#Path to ca certificate on harbor ami
          

Step 3: Edit Script Files

In addition to environment variables, the 1-Click script uses configuration options that you set within your local copy of the script repository itself:

  • To disable FIPS, set install_fips to no in this repo submodule file:

    • ami/stig/roles/canonical-ubuntu-18.04-lts-stig-hardening/vars/main.yml
  • To add CA certificates to the AMI, copy them in PEM format into this submodule folder:

    • ami/stig/roles/canonical-ubuntu-18.04-lts-stig-hardening/files/ca
  • Additional options:

    • The terraform/startup.sh file contains the following configurable options that you can set within the file, listed here with their defaults:
    Name Default Description
    AMI_ID tkg_ami_id value from Terraform The AMI ID to deploy
    REGISTRY_CA_FILENAME ca.crt The name of the CA file for the private registry
    AWS_NODE_AZ az_zone value from Terraform The first AWS Availability Zone to deploy to
    AWS_SSH_KEY_NAME set in tfvars The ssh key to use for TKG cluster; must be RSA if STIG
    AWS_REGION set it tfvars The AWS region to deploy TKG in
    AWS_VPC_ID VPC ID of bootstrap machine The VPC ID to deploy TKG into
    OFFLINE_REGISTRY Registry DNS name of the Harbor instance The DNS name of the Docker registry; only modify if user-provided registry
    REGISTRY_IP IP of the Harbor instance IP Address of Docker registry only modify if user provided registry
    TKG_CUSTOM_IMAGE_REPOSITORY $OFFLINE_REGISTRY/tkg The full Docker registry project path to use for tkg images
    CLUSTER_NAME airgapped-mgmnt The name of the TKG management cluster to deploy
    TKG_CUSTOM_COMPATIBILITY_PATH fips/tkg-compatibility The compatibility path to use; set to "" if non-FIPS deploy
    COMPLIANCE stig The compliance standard to follow; set to stig, cis, or none
    ENABLE_AUDIT_LOGGING true Whether or not auditing is enabled on Kubernetes
    ENABLE_SERVING_CERTS false Whether or not to enable serving certificates on Kubernetes
    PROTECT_KERNEL_DEFAULTS true Whether or not to set --protect-kernel-defaults on kubelet; only set to true with AMIs that allow it
    CLUSTER_PLAN dev The cluster plan for TKG; set to dev or prod
    AWS_PRIVATE_SUBNET_ID For dev plan clusters; set to subnet ID of bootstrap machine Used for dev plan clusters; set to private subnet ID to deploy TKG into
    AWS_NODE_AZ_1 none Required for prod plan clusters; set to node Availability Zone 1
    AWS_NODE_AZ_2 none Required for prod plan clusters; set to node Availability Zone 2
    AWS_PRIVATE_SUBNET_ID_1 none Required for prod plan clusters; set to private subnet 1
    AWS_PRIVATE_SUBNET_ID_2 none Required for prod plan clusters; set to private subnet 2
    CONTROL_PLANE_MACHINE_TYPE none Required for prod plan clusters; the AWS machine type to use for control plane nodes
    NODE_MACHINE_TYPE none Required for prod plan clusters; the AWS machine type to use for worker nodes
    SERVICE_CIDR none Required for prod plan clusters; set to Kubernetes services CIDR
    CLUSTER_CIDR none Required for prod plan clusters; set to cluster CIDR

Step 4: Deploy the Management Cluster

To deploy the management cluster:

  1. Log in as root to the bootstrap machine.

  2. Depending on your IAM privileges in AWS, do one of the following:

    • If you can create IAM policies and roles, run:
    make all
    
    • If you cannot create IAM policies and roles:

      1. Have someone with IAM privileges run the CloudFormation template 1clickiamtemplate in the TKG 1-Click repository. This creates the roles, policies, and instance profiles needed to deploy a TKG management cluster on AWS.

      2. Run:

      make all-no-iam
      

To view a list of supported commands run:

make

Monitor the Deployment

You can track the deployment progress by accessing the newly-created image registry and TKG cluster as follows:

  • Harbor

    1. Once Terraform finishes, set up VPC peering between your airgapped VPC and another, non-airgapped VPC.

    2. Within the non-airgapped VPC, modify the security group on an EC2 instance to allow it to ssh over to the airgapped bootstrap machine.

    3. ssh into the bootstrap machine and then again into the Harbor instance.

    4. Run the following to track the progress of your Harbor installation and subsequent loading of TKG images:

      sudo tail -f /var/log/cloud-init-output.log
      
  • TKG

    1. Set up VPC peering and ssh access to your bootstrap machine as described for Harbor above.

    2. ssh into the bootstrap machine and run the following to track the progress of your management cluster deployment:

      sudo tail -f /var/log/cloud-init-output.log
      

      Once you see a message about the security group of your bootstrap being modified, the script has finished.

    3. After the script has finished, you can:

      • Run kubectl get pods -A to see all the pods running on your management cluster.
      • Run kubectl get nodes to retrieve an IP address of one of the cluster nodes, then ssh into it using the ssh_key you provided to Terraform.

Update the Harbor admin Password

To update the admin password to the Harbor instance created by the 1-Click script, run the following from your bootstrap machine:

curl -XPUT -H 'Content-Type: application/json' -u admin:$HARBOR_ADMIN_PWD "https://$REGISTRY/api/v2.0/users/1/password" --cacert /etc/docker/certs.d/$REGISTRY/ca.crt -d '{
  "new_password": "NEW-PASSWORD",
  "old_password": "OLD-PASSWORD"
}'

Where REGISTRY, OLD-PASSWORD, and NEW-PASSWORD are the registry name and old and new passwords.

Delete the Management Cluster

To delete the TKG management cluster deployed by the 1-Click script, run the following from the bootstrap machine:

sudo su
cd air-gapped
./delete-airgapped.sh

Delete the TKG Bootstrap Server

The delete the bootstrap server:

  1. Save the TKG management cluster’s kubeconfig, or delete the management cluster via Delete the Management Cluster above.

  2. From the TKG 1-Click repository, run:

    make destroy
    

Delete the Harbor Server

To delete the Harbor server:

  1. Make sure no TKG clusters are using the images hosted the Harbor server.

  2. From the TKG 1-Click repository, run:

    make destroy-harbor
    

Troubleshooting

If the management cluster deployment does not succeed, try the following from the bootstrap machine:

  1. Export KUBECONFIG to the one used by the temporary local bootstrap kind cluster when TKG starts deploying:

    export KUBECONFIG=~/.kube-tkg/tmp/config_UID
    

    Where UID is the UID of the kind cluster created by tanzu management-cluster create

  2. Try the following commands to examine the deployed Kubernetes objects:

    kubectl get events -A --sort-by='.metadata.creationTimestamp'
    kubectl get clusters -n tkg-system -o yaml
    kubectl get machinedeployments -n tkg-system -o yaml
    kubectl get awsclusters -n tkg-system -o yaml
    kubectl get kcp -n tkg-system -o yaml
    kubectl get machines -n tkg-system -o yaml
    
  3. If you are sure about a change that you need to make to one of the .yaml object specifications above, run kubectl edit <apiobject> -n tkg-system <object name> to edit the file.

    • If you edit an object, check its OwnerReferences section to ensure that it does not have a controller that will revert your changes.

How It Works

The following explanations, diagrams, and CLI output examples explain the deployment process, following the default option of creating an image registry rather than using an existing one:

  1. In Step 1: Transfer the 1-Click Script and TKG Dependencies, you:

    1. Populate the portable storage device with the TKG dependencies and the 1-Click installer repo

      debs-to-usb

    2. Copy the contents of the portable device to the bastion VM.

      debs-to-bastion

    3. Copy the dependencies to the AWS S3 bucket.

      bastion-to-s3

  2. In Step 2: Set Environment Variables, you export variables used by the 1-Click script.

  3. In Step 3: Edit Script Files you configure more options within the 1-Click script repository and its submodules.

  4. In Step 4: Deploy the Management Cluster, you run the 1-Click script, 1click.sh.

  5. Using the TKG dependencies, the 1-Click script creates an Amazon Linux 2 AMI inside the VPC hosting an instance of Harbor registry, and populates the registry with TKG images.

    • After the Harbor instance is created, you can monitor the Harbor logs as described in Monitor the Deployment, above.

      install-harbor

  6. Using the TKG dependencies, the 1-Click script creates an AMI for the bootstrap VM in the VPC.

    • When the bootstrap AMI is ready, you see output similar to:

      ==> Builds finished. The artifacts of successful builds are:
      --> aws-tkg-bootstrap-builder: AMIs were created:
      us-east-1: ami-05cf054a1ecf64784
      
      --> aws-tkg-bootstrap-builder: AMIs were created:
      us-east-1: ami-05cf054a1ecf64784
      

      create-bootstrap-ami

  7. Using the TKG dependencies, the 1-Click script creates a STIG AMI with FIPS enabled, for creating cluster nodes.

    • When the bootstrap AMI is ready, you see output similar to:

      ==> Builds finished. The artifacts of successful builds are:
      --> ubuntu-18.04: AMIs were created:
      us-east-1: ami-00b79c841eeef51c2
      
      --> ubuntu-18.04: AMIs were created:
      us-east-1: ami-00b79c841eeef51c2
      

      create-stig-ami

  8. Using the bootstrap AMI, the 1-Click script creates the bootstrap VM in the VPC.

    • After the bootstrap VM is created, you should be able to monitor the status of your management cluster deploy as described in Monitor the Deployment, above.

      create-bootstrap-ec2

  9. The 1-Click script runs tanzu management-cluster create on the bootstrap VM to deploy the STIG-compliant management cluster in the airgapped VPC.

    • From the management cluster, you can create and manage STIG-compliant workload clusters.

      create-mgmnt-cluster

What the Script Installs Using an Existing Registry

If you configure the 1-Click script to use an existing registry rather than create a new one, the deployment end state looks like this:

exiting-registry

check-circle-line exclamation-circle-line close-line
Scroll to top icon