The public cloud represents the most diverse IT environment with a multitude of services spread across many different cloud providers and operating environments. The increased risk of rapid exposure from vulnerabilities and misconfigurations, due to extended use of automation and reuse of software components, leaves organizations very little time to respond. Responding to incidents quickly and effectively continues to be the utopia that every security team is chasing. Unfortunately, the statistics tell a different story. While the mean time to respond to on-prem or endpoint-related incidents has somewhat improved (thanks to tools like EDR and better security analytics), we are far from achieving parity when it comes to cloud environments. In fact, the number of cloud infrastructure related breaches are on the rise.
Over the years, I’ve had the pleasure of working with several Security Operations (SecOps) teams of various shapes, sizes, and maturity levels, and the topic of incident response (IR) in the cloud almost always elicits mixed reactions. One thing that always remains consistent in these discussions though, is that the cloud represents a “black box” for SecOps and they keep trying to throw security tools around this black box to gain a sense of what might be happening. The crux of the matter though can be summarized across these key problem statements:
1. Lack of comprehensive insight into the cloud assets and related operational activities, the security team lives in siloes with the cloud operations teams who are creating these assets and have no way of ensuring they are partnering with security effectively. In addition, cloud assets are ephemeral - how do you do IR for something that no longer exists?
2. The incident response process has been predominantly reliant on “logs” & agents; but the shared responsibility model breaks this operating model as services like PaaS introduce abstraction and loss of control as to what you can log and how to gain context around those logs. In fact, most organizations struggle with what they log in the cloud and what they do with those logs! So a SIEM cannot solve your cloud IR nightmares unless you spend thousands of man-hours building detection rules for every possible cloud service. Threat actors are increasingly becoming aware of this and exploiting PaaS/containerized services resulting in misuse of cloud resources.
3. Lack of expertise and understanding of the cloud services within the SOC
4. Native built-in cloud security tools that are cumbersome & fragmented across many point capabilities; when it comes to incident response in the cloud, they do not operate in an intuitive manner or speak a security language a security analyst might be comfortable with.
So the question is what do we really need to overcome some of these challenges? The answer really lies in a cloud security architecture that can converge dynamic asset visibility, exposure, and vulnerability monitoring across a wide array of cloud environments and services and use threat intelligence to enrich and deliver context around detections.
To put this theory to the test, let’s consider a simple breach scenario:
Figure 1: Example Incident - Sensitive Data Theft
In the above example, an attacker manages to steal sensitive data stored in a private blob container, which is part of an Azure Storage account, by exploiting a vulnerability in the workload. Then, they use weaknesses in cloud IAM configuration to escalate privileges and ultimately gain access to the storage account and exfiltrate data via the available cloud service provider APIs.
As a SOC analyst, how do you manage the lifecycle of such an incident when all you get is an alarm from a firewall/WAF for a potential exploit attempt? Let's ponder this for a second—a SOC probably gets thousands of such alarms in a day, especially if it’s an internet-facing asset. Is it even possible for an organization to investigate each alarm? Unfortunately, this is where the bad guys are winning. Going back to our example incident, apart from the initial CVE exploit attempt; for the attack to succeed, the threat actor must go beyond the initial breached workload and move to access other cloud assets or services. But wait, as a SOC analyst I only have an alarm from a WAF and maybe an indicator on the asset that it may have some vulnerabilities, and maybe some obscure post on Twitter by the threat actor claiming they have breached the target organization.
In reality, the threat actor uses several living-off-the-land (LOTL) techniques using legitimate cloud APIs to perform many activities such as generating tokens, calling API’s, etc., all of which go under the radar, and it's simply impossible for a SOC analyst to piece together as these activities are also part of a cloud environment’s normal operations. Any guesses why? They lack tools that can establish relationships between cloud assets, the configuration of the cloud environment, and alarms coming from individual sensors in the cloud, such as from vulnerability management tools, data protection tools, and cloud posture management tools. In essence, they have very little visibility of the potential attack paths or toxic combinations that might be crucial in understanding how the attack might have played out in their environment, and without building an effective hypothesis it is impossible to mount a response to a cloud incident.
This is exactly where a context-aware multi-vector CNAPP solution such as Zscaler Posture Control (ZPC) provides true value. Posture Control is a single unified platform that combines cloud asset & asset configuration analysis, identity and entitlement management, vulnerability management, data protection, threat intelligence, and cloud threat detection capabilities.
Figure 2: Zscaler Posture Control (ZPC) Capabilities
So, in the context of the above breach scenario how can a SOC analyst leverage Posture Control? Let's map this out with key steps with respect to our example incident response scenario per the NIST 800-61 IR steps:
Figure 3: NIST 800-61 Key Incident Response Steps
1. Preparation – Being prepared to respond and putting measures in place before disaster strikes is extremely crucial for a successful incident response outcome. The other part of being prepared is also knowing what you are up against i.e., “know thy enemy.” As such, you need to be threat informed and have a mechanism to understand how a vulnerability or misconfiguration is exploited, what impact it has on assets, and a way to categorize and prioritize incidents.
Any kind of adverse situation also requires an effective communication strategy, and IR in the cloud is no exception.
So how can a CNAPP solution help organizations be better prepared for cloud incidents? It provides the following capabilities:
- A consolidated view of all cloud assets & continuous assessment of those assets for risk.
- A centralized place for multi-cloud security policy & guardrail management across both cloud build and runtime.
- A consolidated view of all cloud vulnerabilities and potential exposures to vulnerabilities.
- Multi-cloud alert management capabilities can redirect alerts to the correct asset owners by leveraging additional information such as asset tags, cloud type, or business units. This drastically improves alerting workflows and communication across teams.
- Prebuilt & custom correlation rules across asset, identity, data, network, vulnerability, and activity telemetry that otherwise take extensive research and detection engineering efforts that could run into hundreds of man hours. This has a direct effect on alert prioritization as well as what resources are left to support incident response.
2. Detection & Analysis – This is the heart of an incident response process and in theory should work effortlessly, provided the preparation phase of the IR cycle was executed with the required rigor, so let’s put this to test against our example breach use case:
- Alert Prioritization – Correlation capabilities automatically looks at the combination of the vulnerable workload, internet exposure, and potential attack path via powerful IAM identities. It then maps it to the external exposure asset category and generates converged detections across multiple threat vectors.
A converged CNAPP solution uses information available from threat intelligence, network flow data, and the DLP to prioritize and raise the severity of the incident.
So, in the context of our breach scenario, Posture Control detects the presence of an internet facing workload with a high severity vulnerability with access to the Azure Storage account and blob container. The solution automatically increases the severity of the incident as a result of detection of multiple potential attack paths into a single consolidated finding.
- Incident Triage and Scoping – The next step in the process is understanding the blast radius of the attack and any likely related assets that might have also been a casualty of the breach.
ZPC’s data clustering capabilities allow the SOC analyst to easily visualize the relationship between the assets, identity and vulnerabilities as well as any malicious activity. This helps in understanding the scope of an incident and what all assets could have potentially been impacted. The solution also provides the ability to search for wider presence of the potential misconfiguration based attributes of the impacted assets.
3. Containment Eradication & Post Incident Activity – The key goal of these steps is to restore the operating environment to a “known good” state and learn from mistakes made. In the context of cloud, this could mean removing offending API keys, IAM permissions, tokens, access to encryption keys, or block network access, etc. & applying the correct detection and protection guardrails. In addition, the SOC analysts need to understand if malicious changes were made to the operating environment and have enough understanding of the impacted cloud service so they can guide a cloud operations team effectively through the remediation steps.
So, let’s put this in the context of our example breach use case. The key containment step would be to remove the overly provisioned IAM permissions associated with the managed identity. The offending identity is easily surfaced by Posture Control by drilling down into the attack path and selecting the associated IAM role.
ZPC provides remediation guidance for detected exposures and continuously tracks asset changes to simplify remediation. The solution also allows us to convert the attack detection and analysis steps into future guardrails that directly feed into the organizational preparedness of being able to detect and mitigate the future occurrence of such an attack.
Zscaler’s integrated CNAPP platform makes network-based eradication a lot simpler. A SOC analyst can simply apply a ZIA/ZPA policy to block malicious communication flows that might be allowing the threat actor access into the operating environment irrespective of the type of cloud workload.
Incident response in the cloud is a complex process that can be significantly simplified by breaking down siloed toolsets and integrating them into the cloud application lifecycle across both build and runtime environments. A CNAPP solution can drive significant value to your security operations team’s capability to assess and respond to cloud threats and significantly reduce mean time to respond (MTTR). Gartner defines Cloud Detection and Response (CDR) as a key function of a CNAPP solution, it is in fact an outcome of a convergence of the other capabilities (CSPM, CIEM, CWPP) that enables a CNAPP solution to be effective for CDR.
So, what's next? Give us an opportunity to be your mission partner in delivering a secure cloud infrastructure by providing an assessment of your cloud infrastructure today and helping you build security processes that scale to your needs tomorrow.
Learn More about Zscaler Posture Control.
Schedule a demo or try out our platform! Reach out to your friendly Zscaler account manager for more information.