CloudHealth Secure State provides a unique approach to improving cloud security by automating remediation across your AWS and Azure cloud environments. Customers can not only monitor their cloud for any misconfigurations in real-time, but also programmatically remediate findings from the dashboard. CloudHealth Secure State's remediation service is designed around its cloud permissions control policy, which enables you to manage and remediate misconfigurations while still providing the service with read-only access (least privileges) to your cloud accounts. This document outlines how to deploy and configure the remediation service on your cloud accounts.
You can view the latest remediation job release versions on the Github repository.
Note: At this time, remediation is supported only for AWS and Azure environments.
There are a few key concepts that you may want to be familiar with before getting started. A brief architectural overview is provided below for your reference.
The diagram above presents the various components in the remediation workflow that interact with your cloud resources to improve the security posture.
There are two parts to the remediation framework:
The CloudHealth Secure State platform that acts as the control plane for any actions.
The worker group that is deployed in the customer’s cloud environment and managed by the customer.
CloudHealth Secure State requires read-only permissions to the customer’s account, while the worker group requires limited write permissions to a scoped set of accounts that customers enable.
A remediation is an action configured for the desired criteria on findings. It defines the remediation job to run for a set of findings. Remediation criteria, which includes a provider, rule, cloud accounts, tags, and regions, works as a filter on the findings to act upon. Any findings that match a set of criteria can be either manually or automatically remediated.
A remediation worker, a container hosting remediation scripts, needs to be deployed in customer’s environment in order to apply remediating configurations. This container is completely owned by the customer. The worker automatically registers with CloudHealth Secure State on activation and sends back health notifications and logs. There are several out-of-the-box actions included in the container image for common misconfigurations. Any of these actions can be modified and new actions can be authored by customers.
Remediation Worker Group
A worker group is a set of remediation workers that all act upon the same logical group of resources. There can be multiple worker groups for the same cloud provider, and the same worker group can act on resources in multiple clouds. A typical example is to have a worker group per provider (AWS, Azure) and software environment (development, staging, production).
A remediation job is a script that contains the code to fix a misconfiguration. This script is hosted on the worker and automatically loads when the worker activates.
A remediation action is the change (or changes) a remediation worker is configured to make to the selected criteria on findings.
Remediation Action Criteria
Metadata (tags, regions, and so on) used to define the scope of a remediation action.
Remediation (Action) Run
In this context, a run is a single instance of a remediation action performed on a finding. Remediation run status can be Success, In Progress, or Pending.
A metric that tracks the status of individual remediation action runs.
Before you can run remediations on findings, you must setup the framework needed to run remediations in your cloud environment. This requires several steps you must take both in CloudHealth Secure State and your cloud environment.
The specific instructions for each of these steps differ based on your cloud provider. This section describes the end-to-end process for setting up a remediation worker in AWS and Azure.
This process only needs to be performed once for the environment you want to run remediations in. Some customers may only need to set up a single configuration for their entire organization, while others may need several to provide logical separation between different cloud providers or software environments (development, staging, production, and so on).