Zscalerのブログ

Zscalerの最新ブログ情報を受信

Security Research

When the Scanner Starts Thinking: Learnings from Mythos & GPT 5.5 Cyber in Security Testing

image

Overview

Frontier AI models like Anthropic Mythos and OpenAI GPT 5.5 Cyber present a critical inflection point for enterprise security. While they unlock transformative potential for security engineers seeking to embed AI into their workflows, they also expand the attack surface for organizations facing increasingly sophisticated attacks when used by threat actors. Mythos and GPT 5.5 Cyber do something fundamentally different from previous models. They reason across attack paths, weigh exploitability, and generate security-relevant workflows. The threat chain remains the same. Attackers will continue to find what’s exposed, break in through a weak point, move laterally, and steal data. What’s changed is the expertise required, speed, and scale.

The question isn't whether these models will impact your security posture; it's whether your team will harness them faster than your attackers. In this blog, we share what we've learned from putting these models to the test at Zscaler: what they can do for your security operations, vulnerability management, and what they mean for your enterprise cyber defenses.

Frontier Model Testing Methodology

To unlock the full potential of frontier AI in security testing, we engineered a purpose-built evaluation framework organized around three core testing harnesses—each designed to mirror real-world attack and defense scenarios.

  1. Think Like an Attacker - Black Box Testing: The model engages the target with zero internal system knowledge, simulating the perspective of a motivated external adversary. Findings validated through this harness are immediately elevated for remediation, given their direct exploitability by malicious actors in the wild.
  2. The Defender's First Take - Artifact & Code Repository Testing:  The model conducts deep inspection of source code, compiled binaries, and static files, looking for security weaknesses before they can be weaponized. While this harness yields fewer confirmed findings than its counterparts, we found it uniquely effective at decomposing complex systems and generating high-quality findings for downstream dynamic validation.
  3. The Informed Adversary - Gray Box & White Box Testing: The model conducts its most informed and precise analysis armed with partial or full system context, including threat models, architectural specifications, and results from prior scans. This approach generated the most actionable findings, enabling the model to identify paths to compromise more effectively, although results were heavily influenced by the quality and extent of the context provided.

With this framework in place, we could finally measure what matters. Not whether AI can simply find security issues, but whether frontier AI finds the right ones, faster than any approach before it.

Every run moved through the same pipeline: attack surface mapping, test planning, active testing, dynamic validation, deduplication, triage, ticketing, patching, and validation. We designed this structure thoughtfully, incorporating context like what held up under dynamic validation, how severity shifted after deduplication, and how clean the remediation path looked.

Image

Figure 1: The three core testing harnesses that we used to evaluate new frontier AI model capabilities.

How Mythos & GPT 5.5 Cyber Models Operate: A Fundamental Shift in Security Reasoning

The defining capability that separates new frontier AI models from conventional security tooling is multi-step reasoning. Rather than returning isolated findings, these models construct complete attack paths—connecting preconditions, privilege states, misconfigurations, and downstream exposures into chains that mirror how real adversaries actually operate.

We pushed these models hard across the full spectrum of security capabilities. Below are the findings:

Capability

Value to Security Teams

Attack Path Analysis

Identifies how separate weaknesses can combine into a viable compromise.

Demonstrable Exploitation

Backs findings with working proof-of-concept exploit scripts and independently validates the outcome.

Vulnerability Prioritization

Separates theoretical risk from reachable, exploitable exposure so teams focus on what matters.

Iterative Analysis

Able to dynamically use multi-step reasoning across a problem rather than returning pattern-based one-shot answers.

Detection Engineering

Accelerates the creation and refinement of detections, threat hunts, and analytic logic.

Investigation Support

Rapidly assists with evidence gathering, summarization, and data analysis for incidents.

Remediation Guidance

Recommends controls and corrective actions aligned to likely attacker behavior.

Operational Speed

Reduces time from signal to decision, especially in complex environments.

Of all the capabilities we evaluated, attack chaining and iterative analysis were the most consequential. Frontier models don't just enumerate vulnerabilities, they reason across them, connecting privilege states, misconfigurations, and exposures into plausible, multi-stage attack paths.

Here is an example illustrating the model’s advanced capabilities of reasoning.

Multi-Path Attack Chaining: Converging on the Same Objective from Multiple Angles

Mythos and GPT 5.5 Cyber can extend reasoning further than ever before, exploring multiple simultaneous attack paths toward the same adversarial objective. Starting from an initial endpoint mapping, the model branches across independent vulnerability chains, combines vulnerabilities with misconfigurations, preserves intermediate attacker state (credentials, tokens, session data), and converges on a single high-impact outcome.

Image

Figure 2: Three independent paths. One converging outcome. Identified autonomously, with full reasoning chains intact.

Frontier models are better sensors. They detect weaker signals while filtering more noise, and they do it fast. The data was always there, what changed is the ability to resolve it into a complete, actionable picture—something that is difficult or in some cases impossible for a human to do at this scale.

Key Learnings from Testing Mythos & GPT 5.5 Cyber 

Across our benchmarks, frontier models surfaced twice as many high-severity findings, twice as fast as legacy tooling and pen-testing approaches. But the more important outcome is what survived validation. The findings that held up were all actionable with accurate severity, clear reproduction paths, and remediation guidance grounded in realistic attacker behavior. 

This represented a significant improvement in signal-to-noise ratio with actionable outcomes when compared to legacy tooling.

Key Learnings

  • The differentiator is reasoning depth, not just the scan speed: Frontier models win by thinking deeper, not scanning faster—chaining isolated, low-severity findings into critical attack paths that legacy tools miss entirely.
  • Context is a double-edged sword: Providing architectural context, threat models, and known weaknesses significantly improved accuracy. But there's a counterintuitive risk: feeding the model examples of previously found issue classes caused it to anchor on those patterns and stop hunting for what hadn't been discovered yet. Ground the model in its environment. Don't lead it to your conclusions.
  • No context inflates severity: Without grounding, models misread dependencies and over-escalate findings. Context-aware reasoning is the minimum bar for meaningful results.
  • Focused, expert-guided workflows outperform broad usage: Untargeted prompting wastes capacity and produces noise. Point the model at specific objectives (vulnerability hunting, code scanning, or targeted analysis) with relevant context. Expert-led, targeted workflows are what separate signals from slop.
  • The harness is the force multiplier: While the model quality is table stakes, the real force multiplier is embedding frontier AI into structured, repeatable test harnesses. Our most effective workflows evolved from a core set developed by Product Security and refined by Security Champions across engineering teams. 

How Security Leaders Can Prepare

Frontier AI capability is spreading quickly. The challenge will no longer be access to the models, but instead how to use them defensively before your adversaries use them to attack. Defenders need to prepare for this inevitable crossroads now.

We developed these high-impact recommendations that go beyond active vulnerability management to start reducing your risks today:

  1. Hide your apps: Reduce your external exposure by moving your applications behind a Zero Trust Architecture like Zscaler Private Access. Attackers can’t breach what they can’t reach.
  2. Understand your assets and associated risks: Establish complete visibility of exposed and internal assets including AI assets. This is where Zscaler can help with AI Asset Management, Asset Exposure Management, External Attack Surface Management, and Unified Vulnerability Management, powered by AI.
  3. Prioritize deploying proactive defense with Deception: AI will use multiple paths to get to the action-on-objective stage and, in the process, inadvertently trigger carefully planted decoys in the environment. Zscaler customers can deploy our built-in Deception technology to auto-contain the asset or identity from accessing all real applications while capturing full activity in the decoy environment.
  4. Prioritize Zero Trust everywhere architecture: Apply Zero Trust consistently across remote and on-prem environments. Enforce user-to-application segmentation to prevent lateral propagation and reduce the blast radius from AI-driven attacks.
  5. AI red teaming and guardrails for your production models: Treat your production AI like a real attack surface. Protect it from prompt injection, toxic content, hallucinations, and model drift over time.
  6. AI-Powered Exposure Management:  Prioritize remediation and patching using Zscaler Exposure Management Remediation Agent for high risk areas (applicable to both external and internal assets).
     

Conclusion 

AI is moving from simple assistants to a mission-critical operational capability. That creates both opportunity and urgency. Defenders now have the chance to improve speed, precision, and scalability in ways that were difficult to achieve with human effort alone. At the same time, adversaries will pursue the same advantages.

The organizations that lead in this next phase will be those that combine frontier AI with strong architecture, trusted context, and disciplined enforcement.

At Zscaler, we believe this is where frontier cyber models and Zero Trust naturally converge. The future of cyber defense will not be defined by more alerts or more dashboards. It will be defined by systems that understand exposure, reason across attack paths, and help defenders act faster and more precisely than the adversary. That is the future security teams should be preparing for now.

form submtited
お読みいただきありがとうございました

このブログは役に立ちましたか?

免責事項:このブログは、Zscalerが情報提供のみを目的として作成したものであり、「現状のまま」提供されています。記載された内容の正確性、完全性、信頼性については一切保証されません。Zscalerは、ブログ内の情報の誤りや欠如、またはその情報に基づいて行われるいかなる行為に関して一切の責任を負いません。また、ブログ内でリンクされているサードパーティーのWebサイトおよびリソースは、利便性のみを目的として提供されており、その内容や運用についても一切の責任を負いません。すべての内容は予告なく変更される場合があります。このブログにアクセスすることで、これらの条件に同意し、情報の確認および使用は自己責任で行うことを理解したものとみなされます。

Zscalerの最新ブログ情報を受信

このフォームを送信することで、Zscalerのプライバシー ポリシーに同意したものとみなされます。