Data Loss Prevention

Overview

The Data Loss Prevention (DLP) feature prevents the unintentional or intentional leakage of sensitive data to the Internet to ensure compliance with HIPAA, PCI, GDPR, and other data privacy laws. The DLP feature inspects file uploads and text entered into Web pages for sensitive data by referencing it. When a DLP inspection discovers sensitive data, the Cloud Web Security administrator can set the action to skip, log, or block while also providing an optional email alert to an Auditor.

While the DLP requirements for every organization are different, the workflow for creating a DLP policy is the same regardless.

The first half describes the two key components of the DLP feature: Dictionaries (predefined and custom), and Auditors. The second half covers the process of creating and applying a DLP rule.

Note: For answers to frequently asked questions about DLP, see Data Loss Prevention Frequently Asked Questions.

Prerequisites

Users need the following for access to the Data Loss Prevention (DLP) feature with Cloud Web Security:

A customer enterprise on a production VMware SASE Orchestrator with Cloud Web Security activated. Both the Edges and Orchestrator must use VMware Release 4.5.0 or later.
A customer must have a Cloud Web Security Advanced package to access the DLP feature.
Important: A customer with a Cloud Web Security Standard package would not be able to access DLP and a locked icon would appear next to all DLP options on the Orchestrator UI.

Overview of DLP Dictionaries

A DLP Dictionary uses matching expressions to identify sensitive data. For example, credit card numbers and social security numbers follow a specific format. And dictionaries can match against those patterns and determine if sensitive data is or is not present in a file upload or text input.

Predefined Dictionaries

Cloud Web Security predefined data dictionaries are a combination of pattern matching, checksums, context scoring, and fuzzy logic to identify sensitive data. Cloud Web Security has more than 340 predefined data dictionaries covering the following major data categories:

Document Classification
Financial Data
Health Care
HIPAA
Item Identifiers
PCI DSS
PII

Additionally, the predefined data dictionaries are region-specific to ensure correct pattern matching is applied across the globe. Data dictionaries can be set to 29 different countries or regions. Of those 29, two are reserved for Global and Other. These two options allow categorizing multinational data or data that does not neatly fit into a country or region category.

Users can explore dictionaries on the VMware SASE Orchestrator by going to the Cloud Web Security section and navigating to Configure > Policy Settings > DLP > Dictionaries

On this page, users are shown all the dictionaries available for use in DLP policy. The dictionaries are organized in a table containing names, descriptions, types, categories, and region fields.

Name is used to identify the dictionary for use in policy.
Description provides a high-level overview of what the dictionary matches.
Type distinguishes two different dictionary types:
- Predefined
- Custom
Category includes:
- Canadian Health Service
- Document Classification
- Financial Data, HIPAA
- HIPAA/Health Care
- Health Care
- Item Identifiers
- Other
- PCI DSS
- Personally Identifiable Information
- UK National Health Service
Region represents the geography the dictionary applies to, which includes:
- Australia
- Belgium
- Brazil
- Canada
- Denmark
- Finland
- France
- Germany
- Global
- Hong Kong
- India
- Indonesia
- Ireland
- Italy
- Japan
- Malaysia
- Netherlands
- New York
- New Zealand
- Norway
- Other
- Poland
- Singapore
- South Africa
- Spain
- Sweden
- United Kingdom (UK)
- United States of America (USA)

New Dictionary configuration screen with Match Criteria

The Search bar applies to all fields on the Dictionaries page and can be used to quickly display specific dictionaries users are interested in viewing.
Each row contains a dictionary that can be clicked to explore further.
The Dictionaries per Page can display up to 100 entries on a single page.
Page Navigation buttons are provided for going back or skipping ahead.

To continue this examination, find the Postal addresses [Global] dictionary and click on the blue text to bring up the Edit Dictionary screen.

While the fields on this page have already been discussed, there is one that warrants further explanation. The Description provides the details needed to know whether this dictionary is appropriate for your policy. The exact mechanisms used to identify data are proprietary and confidential. However, users can be assured that pattern matching uses advanced techniques to ensure accuracy across the numerous categories and regions dictionaries support.

Note: The method used with predefined dictionaries of triggering a DLP violation based on a sensitivity level and DLP engine heuristics is in contrast to the method used for a Custom Dictionary, which uses a specific repeat count. There are more details on that method in the Custom Dictionary section.

Clicking on the Next button in the modal brings users to the Threshold settings. It is not recommended to adjust the Threshold Details from their default values unless necessary.

The screenshot above shows the weighted average number of violations for both File Uploads and User Inputs is set to 10. For predefined dictionaries, do not think of this as a simple occurrence count but rather a computational scoring of all information discovered in a document. This scoring mechanism helps to reduce the number of false positives observed when using this data dictionary. When finished viewing this modal, click Cancel. Please note that if users made changes to any editable values, users would need to click Update to preserve those changes.

Custom Dictionaries

Cloud Web Security DLP Custom Dictionaries give users the flexibility to create data dictionaries pertinent to their organization. As with predefined dictionaries, customer dictionaries start by having users add four fields:

Name
Description
Category
Country/Region

These are the same four fields shown for predefined dictionaries, but with the ability to set each value to what is relevant for the dictionary users are creating.

For identifying data, customer dictionaries employ two methods:

String is used to match an exact combination of alphanumeric and special characters. It can be set to match or ignore casing.
Expression uses Perl regular expressions (regex) to find data patterns that are otherwise difficult to find with a simple string.

There are numerous resources on the Internet for learning more about regular expressions. One such resource is https://perldoc.perl.org/perlre. Here users can find multiple examples of different pattern matching syntax for regexes.

To create a Custom Dictionary, click the New Dictionary button from Configure > Policy Settings > DLP > Dictionaries page.

The Dictionary Details screen prompts users to enter values for Name, Description, Category, and Country/Region.

New Dictionary configuration screen with Dictionary Details.

The screenshot above indicates that this dictionary is meant to identify Sensitive IP Addresses and is For Internal Use Only. The selection of Other for both category and country/region indicate that the data matched by this dictionary either does not fit into one of the preexisting categories, or the additional metadata is not necessary.

For the Match Data screen, the example configuration is based on the IP address ranges 192.0.2.0/24, 198.151.100.0/24, and 203.0.133.0/24 (RFC 5737), the sensitive data the company needs to protect. The regex used to look for any IP addresses in those ranges is: (192\.0\.2\..*|198\.51\.100\..*|203\.0\.113\..*)

The regex is read as, "Match a string if it contains 192.0.2. OR 198.51.100. OR 203.0.113." And the Repeated value is set to 1, indicating that discovering this pattern one or more times will trigger the dictionary.

Note: While a Custom Dictionary uses a specific repeat count to trigger a DLP violation, for a Predefined Dictionary the threshold to trigger the DLP violation is based on a sensitivity level and DLP engine heuristics.

The regex is not broken up over several lines using the Plus Icon to add another row because the dictionary logic across multiple rows is a logical AND. Had the Match Criteria been defined in this manner, the dictionary would trigger only when all three IP address ranges were present in a document.

New Dictionary Match Criteria adding additional Match Rules

After configuring the Custom Dictionary settings, click Finish to make the dictionary available for use in Cloud Web Security.

Auditors

An Auditor is someone in the organization designated to follow up on any incidents that pertain to the attempted exfiltration of data, whether intentional or accidental. This individual can be notified via email from the Orchestrator that a DLP rule has been violated. The email sent to the Auditor contains the name of the DLP rule, user's input or file name that contained sensitive data, the destination to which the user was trying to send the data, and the person's username that tried to expose the data. Optionally, user's input or file can be sent to the Auditor, either in its original format, as a ZIP file, or an encrypted ZIP file.

Users can add, edit, delete, and view auditors by logging into Cloud Web Security and navigating to:

Configure > Policy Settings > DLP > Auditors

DLP Settings, Auditors configuration screen.

In the Auditors screen, users can see that there are currently no auditors in the system. To add the first Auditor, select + NEW AUDITOR PROFILE. A pop-up will prompt users to provide the following information:

Name (mandatory) is the name of Auditor.
Email Address (mandatory) is a valid email address account for the individual.
Description (optional) is any relevant information users would want to provide about the Auditor. For example, "PCI Auditor" if the Auditor's primary function is to monitor for PCI violations.

The next page will ask users for File Details. This page is completely optional, but it provides users with the option to send the offending file to the DLP Auditor for their review. Configuration options include:

Send File to the Auditors, with the default behavior being to not send the file to the Auditor(s).
File Format becomes available when users select Send the file to the Auditor(s). Users have the option of selecting the Original File, Zip, or Encrypted Zip. Since this file will contain sensitive information, it is recommended to use the Encrypted Zip option.
- Maximum File Size is the maximum size of the attachment included with the email that is sent by the system. The limit can be set for up to 1GB, but it is recommended to match their organization's email file size restrictions.
  Important: If a file size exceeds the Maximum File Size value, then that file is bypassed. In other words, the file is not attached to the DLP violation alert, and the alert is sent without the file.
- Encrypted Zip Password is autogenerated by the system and can be regenerated if compromised. Users can also configure their own password if desired.

Click the Finish button to save the New DLP Auditor Profile configuration. The Auditor entry appears in the DLP Settings Auditor page. Optionally, users can view, edit, or delete the Auditor entry.

DLP Configuration workflow

Having covered the two key components that comprise the Data Loss Prevention (DLP) feature, this section will cover the overall DLP workflow.

Create, Configure, and Apply a Security Policy

A DLP rule is part of a Security Policy and thus prior to configuring a DLP rule, there must first be a Security Policy. For details on Creating, Configuring, or Applying a Security Policy for the Cloud Web Security service, consult the relevant documentation in the Cloud Web Security Configuration Guide.

Create and Apply a DLP Rule

To create and apply a DLP Rule, see Configure Data Loss Prevention Rules.

Verify a DLP Rule is Working

There are three criteria which together confirm that a DLP rule is configured properly and working as expected:

Cloud Web Security blocks the exfiltration of sensitive data that matches a DLP Rule.
Cloud Web Security detects and logs the attempt to exfiltrate sensitive data.
Cloud Web Security sends an email alert to a DLP Auditor when the rule is triggered.

To verify the effectiveness of a DLP rule, do the following:

From an endpoint device (Windows, MacOS, iOS, or Android) that sits behind an SD-WAN Edge, login to a file hosting service (for example, Apple iCloud, Dropbox, Google Drive, Microsoft OneDrive or similar).
If the rule includes a Custom Dictionary, upload a Text Input, Text File, or PDF which matches the criteria set in the DLP Rule.
Note: Text Input is like a form post or text message. A Text File is an actual .txt attached to an upload.
Alternatively, use any of the Predefined Dictionaries and their respective thresholds for PII data, Social Security numbers, Bank Account numbers, or something similar.
Note: With Predefined Dictionaries, the threshold to trigger the DLP violation is based on the combination of a sensitivity level and DLP engine heuristics. This contrasts with the Custom Dictionary which uses a specific repeat count.
The text file/input or file upload is blocked.
Verify in the DLP logs that the block action has been logged.
1. The following is a sample log for a Text Input block in DLP Test which matches a Custom Dictionary the DLP Rule uses.
2. The following is a sample log for a PDF file blocked in Dropbox for a Social Security number match from a Predefined Dictionary.
Verify that a DLP Auditor has received an alert email based on the DLP Rule and the action configured for this rule.
1. The following is a sample email for a Text Input block in DLP Test for a Custom Dictionary the DLP Rule uses.
2. The following is a sample email for a PDF file blocked in Dropbox for a Social Security number match from a Predefined Dictionary.
  Note: Non-text files may appear with an "Unknown" file name. As a result, the attached file in the Auditor email would also show as "Unknown".