What Is DLP?

What is DLP?

Data loss prevention (DLP) is a set of technologies and processes that monitor and inspect data on a corporate network to prevent exfiltration of critical data as a result of cyberattacks, such a phishing or malicious insider threats.

Our digital age produces massive volumes of sensitive data, such as personally identifiable information (PII) about customers and employees, protected health information (PHI), financial data including credit card numbers, and intellectual property. This data is an organization’s lifeblood, so it's critical to implement strong data security.

In another era, this sensitive information was printed on paper and kept in a locked file cabinet. Now, these zeros and ones race from a data center to a cloud storage provider to a user's endpoint device, more vulnerable than ever. To protect it, organizations need to implement comprehensive data loss prevention (DLP) strategies.

A DLP tool should always be part of an organization-wide DLP strategy that brings business and IT leaders together to identify what constitutes “sensitive data” for the organization, agree on how this data should be used, and delineate what a violation looks like. These information security guidelines, including data classification, data privacy and compliance information, and remediation procedures, can then be translated into DLP policy.

While many organizations have an incentive to deploy DLP to comply with regulations (e.g., GDPR, HIPAA, PCI DSS) to avoid fines or restrictions to their business operations, data breaches can also expose end users' personal data, putting a breached organization at risk of losing customers, incurring brand damage, or even facing legal consequences. With a well-defined DLP policy bolstered by well-managed supporting technology, organizations can significantly reduce these risks.

 

How does DLP work?

In the simplest terms, DLP technology works by identifying sensitive data in need of protection, and then protecting it. Data exists in one of three states at any given time, broadly speaking—in use, in motion, or at rest—and a DLP solution may be designed to identify data in all or only some of these states. To flag data as sensitive, DLP agent programs may use many different techniques, such as:

  • Rule-based matching or "regular expressions": This common technique identifies sensitive data based on prewritten rules (e.g., 16-digit numbers are often credit card numbers). Because of a high false positive rate, rule-based matching is often only a first pass before deeper inspection.
  • Exact data matching (database fingerprinting): This technique identifies data that exactly matches other sensitive data the agent has fingerprinted, usually from a provided database.
  • Exact file matching: This technique works essentially like exact data matching, except it identifies matching file hashes without analyzing the file's contents.
  • Partial document matching: This technique pinpoints sensitive data by matching it to established patterns or templates (e.g., the format of a form filled out by every patient in an urgent care facility).
  • Machine learning, statistical analysis, etc.: This family of techniques relies on feeding a learning model a large volume of data in order to "train" it to recognize when a given data string is likely to be sensitive. This is particularly useful for identifying unstructured data.
  • Custom rules: Many organizations have unique types of data to identify and protect, and most modern DLP solutions allow them to build their own rules to run alongside the others.

Once the sensitive data is identified, it's up to your organization's DLP policy to determine how the data is protected. In turn, how you want to protect it has a lot to do with why you want to protect it.

 

Main use cases for DLP

The core use case for DLP is self-evident in "data loss prevention," although there are distinct kinds of "data loss" to consider: accidental or deliberate (i.e., malicious). Then, there are various types of data, such as intellectual property (IP) and regulated/personal data (which includes employees' and clients' personal details, health information, credit card and Social Security numbers, etc.). As we've already covered, securing this data protects your organization against other forms of loss—of customers, of revenue, of reputation—and helps you stay on the right side of industry and legal compliance regulations. Finally, protecting this data naturally requires being able to identify what and where it is, which constitutes another key use case: visibility.

So, in short, the main use cases for a DLP solution are:

  • Protect IP and sensitive/regulated data
  • Stay compliant with regulations
  • Get visibility into your data

 

Integrated DLP and enterprise DLP

Today's DLP solutions have reached a high level of maturity. However, because the market has seen very little differentiation between enterprise DLP solutions, analyst firm Gartner has retired its Magic Quadrant for Enterprise DLP. Instead, Gartner now focuses on a market guide that highlights the importance of holistic data protection strategies and educates readers on the use of integrated DLP solutions. In 2017, the firm predicted that 90% of organizations would be using some kind of integrated DLP by 2021. 

Traditional enterprise DLP solutions have typically provided various products and functions across all channels data is either stored on or passes through (i.e., endpoints, storage, exchanges), where data leakage can occur. All of these require a different set of tools or techniques to prevent data leaks.

Digital transformation, however, has shifted user behavior and traffic patterns, making it more important to secure the data that flows between endpoints, cloud apps, and data storage with a data-in-motion/network DLP solution. When this protection is natively provided by technologies such as secure web gateways, content management, or cloud access security brokers (CASB), it's referred to as integrated DLP.

Enterprise DLP solutions are notorious for being complex and expensive. Organizations that purchase enterprise DLP often use only a subset of its functionality and address only basic use cases that integrated DLP could address more quickly and cost-effectively.

DLP can’t prevent data loss if it is blind to traffic

As organizations have migrated to the cloud, three challenges have left network DLP solutions unable to see the traffic they are supposed to inspect:

  • Remote users: With network DLP, the levels of visibility and protection depend on where users are. They can easily bypass inspection when off-network, connecting directly to cloud apps. To be effective, DLP and security policies need to follow users wherever they connect, and on whatever mobile devices they may be using.
  • Encryption: The incredible growth of TLS/SSL-encrypted traffic has created a significant blind spot for network-based DLP incapable of decrypting it for inspection.
  • Performance limitations: Traditional network DLP appliances have finite resources and can’t scale to inspect the constantly growing amount of internet traffic inline.

 

DLP in a cloud- and mobile-first world requires a new mindset and modern technology

To address the data protection challenges that accompany digital transformation and overcome the weaknesses of traditional enterprise DLP, it is not enough to reconfigure a traditional hardware stack for the cloud—that's inefficient and lacks the protection and services of a cloud-built solution. Any cloud-based DLP solution should provide three elements:

  • Identical protection for all users on- or off-network, ensuring comprehensive data protection to all users, wherever they are—at HQ, a branch, an airport, or a home office.
  • Native inspection of TLS/SSL-encrypted traffic, giving the organization crucial visibility into more than 80% of today's internet traffic, which could otherwise hide threats.
  • Elastic scalability for inline inspection, preventing data loss by inspecting all traffic as it comes and quarantining suspicious or unknown files—not relying on damage control after a compromise.

 

In its 2021 Cost of a Data Breach Report, the Ponemon Institute found that data breaches in the last year had cost on average US$9.05 million in the US and $4.24 million worldwide—38% of it in the form of lost business.

The study also found that organizations with a mature zero trust approach saved an average $1.76 million per breach compared to those without.

Achieving Comprehensive Cloud Security with Zscaler Data Protection

Get SANS' Solution Review

What Happened to the Gartner DLP Magic Quadrant?

Read our blog post
what is dlp blog icon

The loss of non-regulated data costs more than you think

Read the blog post
what is dlp blog icon

Data Loss Prevention and Digital Transformation

Read our white paper
what is dlp whitepaper

Safeguarding Your Data in a Work-From-Anywhere World

Download our ebook
what is dlp whitepaper

Data Protection Dialogues: DLP in a work-from-anywhere world

Watch our video
what is dlp whitepaper

Exact Data Match for DLP 

Data loss prevention solutions have long used pattern-matching to identify credit card numbers, Social Security numbers, and more for protection. This approach is imprecise, however. Safe traffic is often blocked because it includes a pattern that has been selected for protection, and security teams can get bombarded with false positives.

Exact data match (EDM) is a powerful innovation in DLP technology that increases detection accuracy and nearly eliminates false positives. Instead of matching patterns, EDM “fingerprints” sensitive data, and then watches for attempts to move the fingerprinted data in order to stop it from being shared or transferred inappropriately.
 

DLP best practices

The ideal way to fine-tune DLP depends on your organization's unique needs, but certain best practices apply to every situation. Completely covering this subject is a job for another article, but here are a few of the most important best practices:

  • When you first deploy, start in monitor-only mode so you can get a sense of the data flow across your organization to inform you on the best policies.
  • Use user notifications to keep employees in the loop and so that policies aren't executed without their knowledge, as this can disrupt workflows and frustrate them.
  • Use a solution that allows users to submit feedback on notifications (to justify their actions or flag broken policies), which you can use to refine your policies.
  • Leverage advanced classification measures like exact data matching (EDM) to reduce false positives.
  • Only use a solution that can decrypt TLS/SSL-encrypted traffic, since the vast majority of web traffic is now encrypted.
     

Where your enterprise should start when it comes to data loss prevention

With increasing risks and expanding regulations for data protection, your organization needs to close security gaps created by the cloud and mobility. This isn't breaking news: a 2019 study by Cybersecurity Insiders found that preventing data loss is the second most important priority for IT executives.

In the past, that would have meant adding more appliances to already complex stacks. Today, there's Cloud DLP. With a solution like Zscaler Cloud Data Loss Prevention (DLP) as part of a broader secure access service edge (SASE) platform, you can close your data protection gaps, no matter where your users connect or where your applications are hosted, and reduce IT cost and complexity at the same time.

 

Additional resources