Ransomware Recovery and Disaster Recovery

Ransomware recovery has similarities with disaster recovery, but there are key differences that require unique automation workflows to guarantee the security of production workloads, shorten recovery times, and reduce data loss.

For instance, during ransomware recovery you cannot be certain that snapshots are not infected without validating the snapshots. Unlike disaster recovery, the most recent snapshots are likely compromised in a ransomware attack. Ransomware recovery must be always performed under the assumption that malware is embedded in the snapshot data.

Snapshots must be either validated as free of infection, or malware must be removed during the ransomware recovery process to avoid reintroducing ransomware into a production environment. Because snapshots are potentially infected, they should not be directly recovered to the production environment. Instead, snapshots must be initially restored into the Isolated Recovery Environment (IRE) for security analysis.

The ransomware recovery workflow offers several preconfigured VM network isolation levels that can be changed with the push of a button. To prevent malware lateral movement in the IRE, each selected VM snapshot is powered on in a quarantine isolation level by default. Ransomware recovery often involves multiple recovery iterations to identify a malware-free backup or to "clean" from a chosen snapshot (malware removal).

To avoid reinfection, you restore snapshots to the production environment following validation in the recovery workflow. Or, if the infected site is unavailable due to the ransomware attack (attacks often render a site inoperable), you can recover production VMs to the recovery SDDC on a separate network than the network used for the IRE. (For more information, see Creating Cloud Gateways for the Recovery SDDC). These production VMs can be failed back to a protected site once it becomes available.

You can use a recovery plan for both disaster and ransomware recovery, but enabling ransomware recovery requires explicit plan configuration.

The following table provides a comparison of the main differences between ransomware recovery and disaster recovery:


	Disaster Recovery	Ransomware Recovery
Objective	Recovery of business operations with minimal downtime and data loss.	Recovery of compromised workloads with minimal loss, while providing data integrity and security assurance.
Recovery site	The recovery SDDC in VMware Cloud on AWS. Workloads can be made externally available immediately upon recovery.	VMs are validated and cleaned (malware or other malicious software removed) in the IRE in the recovery SDDC prior to restoring them to the protected site to avoid reinfection of production workloads. Workloads are made externally available only following security validation.
Snapshot retention	Limited snapshot retention time to accommodate the disaster recovery use case, with the latest snapshot being the primary recovery candidate.	Longer snapshot retention. Ransomware 'dwell time' (duration between infection and manifestation) can range from weeks to months. VMware recommends at least 3 months retention for ransomware recovery.
Recovery Point Objective (RPO)	Disaster recovery RPO is configured through snapshot schedules based on business needs.	Recent snapshots are likely encrypted and/or infected and might not be suitable for recover, which can result in higher RPO. Early detection improves RPO.
Recovery Time Objective (RTO)	Instant power-on in the recovery SDDC.	Higher RTO due to security validation in the IRE prior to recovery to the protected site.
Orchestration	Full automation with recovery plans that allow for unattended recovery. A recovery plan recovers all protected VMs to a recovery site.	Iterative recovery requires control over ransomware recovery workflow state transitions. Partial recovery of subsets of protected VMs is possible.
Tools	VMware Live Cyber Recovery Orchestrator.	In addition to using the VMware Live Cyber Recovery Orchestrator, requires backup validation and remediation with integrated or third party security tools. Preconfigured network isolation levels available for use during validation.