Blog

Anthrophic expanding Project Glasswing to Critical Infrastructure, alongside OpenAI Trusted Access for Cyber Programme.

 

The Final Mile Problem: AI Can Find ICS/OT Vulnerabilities Faster Than We Can Safely Fix Them

Frontier AI has changed the economics of vulnerability discovery.

Table of Content

  1. Discovery is accelerating

  2. Patch development is the first capacity constraint

  3. Triage cannot stop at CVSS

  4. Simulation and testing are where AI can help safely

  5. Scheduling is the hard wall

  6. Installation is the final mile

  7. Verification closes the loop

  8. The fast-follower problem is geopolitical, not only commercial

  9. What needs to change

  10. How DeNexus can help

That was the concern behind my previous article on OT technical debt: industrial environments are already carrying a backlog of ageing systems, unpatched vulnerabilities, unsupported software, and legacy dependencies that cannot be replaced just because the cybersecurity team would prefer a cleaner architecture [1]. Rosa Kariger’s article on Mythos and Project Glasswing raised the next question: what happens when AI models can find serious flaws faster than the OT ecosystem can absorb them [2]?

That question has now moved from hypothetical to operational. Anthropic has expanded Project Glasswing "to approximately 150 new organizations across more than 15 countries, covering industries like power, water, healthcare, communications, and hardware" [3].

That sounds reassuring, but it is also imprecise. Sector framing ≠ remediation scope. Critical infrastructure is not a software category. You do not patch “the water sector” or “the power sector”. You patch specific products, from specific suppliers, deployed in specific architectures, running specific versions, under specific operational constraints.

For ICS/OT asset owners, the important question is not whether Glasswing has expanded to critical infrastructure in the abstract. The important question is which product suppliers, maintainers, hardware providers, remote-access vendors, industrial software vendors, and industrial automation suppliers are now seeing AI-discovered vulnerabilities in code that may already be deployed inside plants.

The vendors are not publicly named. That may be necessary. If a supplier is identified too early, asset owners may reasonably ask where the patches are, why they are not ready, and how long the vulnerability has been known. But suppliers also need time to understand the findings, validate them, build fixes, test those fixes, write useful advisories, and support customers through safe deployment.

This is the heart of the issue: Mythos, GPT-5.5-Cyber, and the models that follow them may accelerate vulnerability discovery, but they do not eliminate the operational waterfall that turns discovery into risk reduction.

The vulnerability-management waterfall is the story

For years, vulnerability management has often been discussed as if discovery were the hard part. Find the CVE. Score the CVE. Patch the CVE. Close the ticket.

That works poorly in IT. It works even worse in OT.

In industrial environments, vulnerability management is not a single action. It is a waterfall:

Discovery → Patch Development → Triage / Simulation → Testing → Scheduling → Install → Verification

AI can help in several of these steps. It can accelerate discovery. It can help suppliers understand flawed code. It can support triage, simulation, test-case generation, exploitability analysis, dependency mapping, and documentation. It can make the front half of the process smarter and faster.

But AI cannot replace the humans, engineering judgement, outage planning, vendor accountability, safety analysis, operational discipline, and plant-specific verification required in the back half.

That is where the real bottleneck lives.

1. Discovery is accelerating

Project Glasswing is important because it shows what happens when frontier AI is pointed at real codebases. Anthropic says Project Glasswing partners have identified more than 10,000 high- or critical-severity flaws using Claude Mythos Preview. In its initial update, Anthropic also reported that of 1,752 high- or critical-rated findings carefully assessed, 90.6% were valid true positives and 62.4% were confirmed high or critical [4].

OpenAI is moving in the same direction with GPT-5.5-Cyber through its Trusted Access for Cyber programme. OpenAI describes this as limited access for verified defenders, including those responsible for critical infrastructure, with controls intended to enable authorised vulnerability validation while restricting malicious activity [5].

This is good news for defenders. It is also uncomfortable news for everyone responsible for industrial control systems and operational technology.

The issue is not that AI will suddenly make OT vulnerable. OT is already vulnerable. The issue is that AI can reveal those vulnerabilities faster than the OT ecosystem can safely absorb, prioritise, patch, and verify them. OT environments are already very far behind; in real ICS/OT telemetry, I have seen vulnerabilities with patches more than 2,000 days old still outstanding.

More discovery does not automatically create more capacity. It does not create more outage windows. It does not create more control engineers. It does not create more vendor support. It does not create more executive appetite for operational disruption.

It simply adds more work to a system that was already saturated. So, what we can do?

2. Patch development is the first capacity constraint

Once a vulnerability is discovered, the burden shifts to the product supplier, developer, open-source maintainer, or embedded component provider.

They must confirm the issue, reproduce it, understand affected versions, assess exploitability, decide severity, determine whether the vulnerability sits in proprietary code or third-party dependencies, and then create a fix that does not break customer environments.

That is not trivial in industrial systems.

Many OT products are deployed in regulated, safety-sensitive, high-availability environments. A rushed patch can create more operational risk than the vulnerability it attempts to fix. A supplier cannot simply “move fast” when the customer environment may include refineries, power plants, chemical facilities, water treatment systems, pharmaceutical manufacturing, rail networks, pipelines, and large-scale manufacturing lines.

AI can help suppliers review code, generate candidate fixes, identify similar bug patterns, map affected components, and draft disclosure material. But the supplier still owns the engineering decision. The supplier still owns regression risk. The supplier still owns the support burden. The supplier still has to ship something customers can trust.

In the AI era, suppliers may not be constrained by the ability to find bugs. They may be constrained by their ability to safely fix them.

Over time, the better answer is not merely to patch faster after release. It is to integrate AI-assisted vulnerability discovery into the secure development lifecycle so fewer hidden vulnerabilities reach production code in the first place. That will take time. Legacy ICS/OT codebases, compatibility requirements, and long product lifecycles will not be remediated overnight.

3. Triage cannot stop at CVSS

Once a vulnerability or advisory reaches the asset owner, the next question is not simply: “Is this critical?”

The better question is: “Does this vulnerability materially increase risk in our environment?”

For OT, that requires context. Is the vulnerable asset actually present? Is the affected version deployed? Is the vulnerable function enabled? Is the system reachable from an attacker-controlled path? Is there a known exploit? Is it in CISA’s Known Exploited Vulnerabilities catalogue [9]? Does it sit on a Level 3 remote-access pathway, a historian, an engineering workstation, a domain controller, a firewall, or a Level 1 controller? What process does it support? Could compromise lead to downtime, unsafe operation, equipment damage, product loss, environmental release, or injury?

This is where severity scoring alone is insufficient. NIST’s patch-management guidance recognises patching as a process of identifying, prioritising, acquiring, installing, and verifying updates [6]. In OT, the prioritisation step must also account for process consequence, exposure, outage windows, compensating controls, and recovery confidence.

A CVSS 9.8 vulnerability on an isolated asset with strong compensating controls may be less urgent than a CVSS 7.5 vulnerability on an exposed remote-access system used to reach multiple plants. Likewise, a vulnerability affecting a safety-critical or production-critical process may deserve more attention than its technical score suggests.

This is where DeNexus´ DeRISK QVM -Quantified Vulnerability Management- becomes relevant [www.denexus.io/derisk-platform]. The operational question is no longer, “How bad is this CVE?” It is, “How much financial loss potential does this vulnerability contribute to our environment, and how much does that loss potential reduce if we remediate it?”

That changes the decision. Instead of ranking vulnerabilities only by technical severity, asset owners can evaluate vulnerabilities based on business impact through Cyber Risk Quantification: annualized loss expectancy, value at risk, tail loss, and expected loss reduction. If patching a vulnerability materially reduces expected financial loss, that patch competes differently for scarce maintenance windows, engineering effort, and management attention.

In OT, the winning prioritisation model is not “patch all criticals first”. It is “reduce the most financially material risk with the least operational disruption”.

4. Simulation and testing are where AI can help safely

The middle of the waterfall is where AI may provide some of its most useful defensive value.

AI can help convert vulnerability advisories into asset-specific questions. It can summarise vendor bulletins. It can identify affected versions. It can help map dependencies. It can draft test plans. It can generate regression-test ideas. It can compare configurations against recommended mitigations. It can support tabletop exercises and simulation. It can help security and engineering teams ask better questions before a change reaches production.

This matters because OT patching is rarely just a cybersecurity task. It is an engineering change.

Before a patch is installed, teams may need to test whether the HMI still communicates with the PLC; whether the historian still collects data; whether the engineering workstation can still upload and download logic; whether the batch system still runs recipes correctly; whether the remote-access pathway still works for approved vendors; whether the safety system remains independent; and whether backups, images, and rollback plans are valid.

AI can help prepare for that work.

It cannot certify safety.

It cannot know every plant-specific dependency. It cannot take accountability for a process trip. It cannot decide whether the production line should stop on Friday night or wait until the next planned outage. It cannot replace the judgement of operations, process safety, control engineers, reliability teams, and plant leadership.

AI can reduce uncertainty before the outage. It cannot make an unsafe outage safe.

5. Scheduling is the hard wall

This is where the waterfall slows down.

In IT, patching often means inconvenience. In OT, patching may mean downtime, lost production, safety review, vendor support, backup validation, rollback planning, and a narrow maintenance window that may only appear quarterly, annually, or during a turnaround.

Some systems cannot reboot casually. Some cannot be changed without vendor presence. Some have no supported patch path because they are end of life. Some run old operating systems because the application or hardware depends on them. Some are reliable precisely because nobody has touched them in years.

That reality is frustrating, but it is real.

It is also why more vulnerability discovery can have a demoralising effect. Many asset owners already have 1,000-day-old patches waiting to be installed. Adding more AI-discovered vulnerabilities to the pile does not automatically create more outage windows, more engineers, more spare parts, more vendor support, or more executive appetite for operational disruption.

At some point, the backlog becomes psychologically unmanageable and cost prohibitive. It starts to feel like global hunger or climate change: too large to solve, too complex to explain, and too overwhelming to act on.

That is dangerous.

If vulnerability management becomes an infinite queue, asset owners may disengage. The challenge becomes too big, so the organisation stops believing it can make meaningful progress.

That is why prioritisation, quantification, and defensible risk reduction matter. The goal cannot be to patch everything. The goal must be to identify the changes that reduce the most risk and execute them safely. Turning the conversation from “thousand vulnerabilities waiting in my patching queue” to “if I solve 2% of my vulnerabilities I will reduce my financial risk by 76%”. A totally different game.

6. Installation is the final mile

This is the point most AI commentary misses. Nobody is talking enough about how to address an exploding backlog after the vulnerability has been discovered and the patch exists.

A model can find the flaw. A supplier can develop the patch. A security team can triage the risk. An engineer can test the change. A manager can approve the maintenance window.

But someone still has to install it.

That is the final mile problem.

The patch is not done until it is installed, verified, and safe to leave in place.

For asset owners, this is where the real cost appears. Installation requires people. It requires planning. It requires access. It requires backups. It requires rollback. It requires coordination with operations, integrators, test labs, and the vendor. It requires acceptance that something could go wrong.

AI does not remove that burden. It may make the process better prepared, but it does not perform the operational act of changing a live industrial environment without consequence.

This is especially important for Purdue Level 3 and above.

Many OT discussions focus on controllers, sensors, actuators, and production equipment at Levels 0, 1, and 2. Those layers are genuinely difficult to patch. But Level 3 and above contain many of the systems attackers target: firewalls, VPNs, remote-access platforms, jump hosts, Active Directory, virtualisation platforms, backup servers, historians, and engineering workstations.

These systems should not have the same operational constraints as a controller running a physical process. Yet they often carry the same backlog culture.

That needs to change. A lot of ICS/OT infrastructure can be patched because it does not have the same operational outage constraints as the lower levels of the control environment.

As AI accelerates vulnerability discovery, the Level 3+ patching burden will increase. Perimeter and remote-access systems will remain prime targets because they offer scale, reach, and leverage. Ransomware groups will use them to get deeper. Hacktivists will use them to disrupt. State-aligned and military operators will use them to prepare access, persistence, and potential damage.

The days when firewalls and antivirus were considered enough are gone. Perimeter security still matters, but it is no longer a strategy. OT organisations must assume breach and prepare detection, segmentation, containment, backup integrity, incident response, and recovery plans that are realistic enough to survive contact with the plant floor.

7. Verification closes the loop

Verification is often treated as administrative clean-up. In OT, it is much more important.

After installation, teams need to confirm the patch applied correctly. They need to confirm the vulnerable version is gone. They need to confirm compensating controls remain in place. They need to confirm the industrial process still behaves as expected. They need to confirm no new failure mode was introduced. They need to update asset records, vulnerability records, risk models, maintenance logs, and audit evidence.

AI can help here as well. It can compare expected and observed states. It can support evidence collection. It can summarise change records. It can flag inconsistencies. It can help update risk registers and management reporting.

But it cannot replace operational verification.

In OT, “the system came back online” is not the same as “the system is safe, stable, and correctly functioning”.

The fast-follower problem is geopolitical, not only commercial

The near-term concern is not only what Anthropic or OpenAI do.

Both companies are using controlled access models, vetted participants, and safety guardrails. Those controls matter. But they do not end the risk.

The real fast-follower concern is that other actors will develop, steal, buy, or obtain similar cyber-capable models without the same restrictions. That may include hostile states, military cyber units, intelligence services, criminal ecosystems, or loosely governed models outside western regulatory reach.

If western defenders get a temporary head start, they should use it.

The window may be short.

Once similar capabilities are available to less constrained actors, the defender advantage shrinks. Adversaries will not need to discover every vulnerability themselves. They will need to identify the ones that are exposed, unpatched, reachable, and valuable. They will use AI to accelerate exploit development, patch diffing, dependency analysis, targeting, phishing, reconnaissance, and attack-path planning.

For OT, the question is not whether AI will create more vulnerability information. It will.

The question is whether defenders can turn that information into risk reduction before adversaries turn it into access.

What needs to change

The answer is not to panic, and it is not to pretend that every new AI-discovered vulnerability can be patched immediately.

Industrial organisations need a more realistic vulnerability-management model for the AI era.

Picture 1-4

First, suppliers need to prepare for AI-scale discovery. That means stronger product security teams, better SBOMs, clearer affected-version mapping, faster validation, more useful advisories, realistic mitigations, and patch-development capacity that matches the new discovery rate.

Second, suppliers need to bring the same capability upstream into development. At first, suppliers will be on their heels trying to catch up with an explosion of discoveries in existing code. The long-term answer is to make AI-assisted discovery part of the secure development lifecycle: find and fix more vulnerabilities before compile, before release, and before they become an asset-owner problem.

Third, asset owners need better visibility. They need OEMs to make affected-version mapping easier, and they need cyber vendors to be more accurate. You cannot triage what you cannot see. Vulnerability management depends on knowing which assets exist, which versions are installed, which dependencies are present, which communication paths are possible, and which processes could be affected.

Fourth, prioritization must become consequence-based. CVSS, EPSS, KEV, vendor severity, and exploit availability all matter, but OT needs to connect them to business and physical consequences. Forescout reported 508 CISA ICS advisories covering 2,155 CVEs in 2025, while Dragos’ risk-based “Now, Next, Never” approach concluded that only a small minority of OT vulnerabilities require immediate action [7] [8]. That is exactly the point: OT needs to separate noise from risk, quickly and defensibly. You need a tool like DeRISK QVM that can prioritize vulnerabilities, and that can compute new discovered vulnerabilities fast to cope with the pace at which they are discovered. Manual assessments or tribal knowledge is no longer an option, if it has ever been.

Fifth, Level 3+ environments need to become more patchable. Remote access, identity, virtualisation, backups, network infrastructure, and engineering workstations are often where compromise begins or expands. They should be managed with more IT-like discipline, even when they support OT.

Sixth, recovery must be treated as a risk-reducing control. If an organisation cannot patch quickly, it must be able to detect, contain, restore, and verify quickly. That includes known-good backups, controller logic, golden images, offline recovery procedures, spare parts, vendor support, and incident-response exercises that include operations and engineering.

Finally, the industry needs to talk honestly about the final mile. AI can help find the vulnerabilities. Suppliers can develop the patches. Security teams can prioritise the work. But asset owners still inherit the final-mile problem.

That is where the risk becomes real.

Conclusion

Mythos, GPT-5.5-Cyber, and the models that follow them are not the end of vulnerability management. They are the beginning of a harder version of it.

They will make discovery faster. They will make old code look worse. They will expose open-source dependencies buried inside commercial products. They will increase pressure on suppliers. They will increase pressure on asset owners. They will increase pressure on the already fragile patching process used across industrial environments.

But they will not create more outage windows. They will not create more control engineers. They will not certify safety. They will not reboot a mission-critical system without consequence. They will not turn a 30-year operational architecture into a modern, patchable platform overnight.

The AI era does not eliminate OT patch debt.

It increases the interest rate.

The organisations that succeed will not be the ones that try to patch everything. They will be the ones that can see their exposure clearly, quantify which vulnerabilities matter, reduce the most material risk first, execute changes safely, and do it at the scale and pace set by the speed of discovery.

AI can accelerate the discovery of risk.

It is still up to humans to reduce it.


How DeNexus Can Help

If AI is going to increase the speed and volume of vulnerability discovery, industrial companies need an equally fast way to decide which vulnerabilities actually matter. That is where DeNexus helps.

The bottleneck in OT vulnerability management has never been the absence of findings. It has been the absence of a disciplined way to separate the findings that threaten the business from the ones that do not. As Rosa Kariger noted in her earlier analysis of Project Glasswing, the challenge is not finding problems. It is knowing which problems actually threaten the business — and which fixes are worth funding first.

DeRISK QVM: sorting vulnerabilities by expected loss reduction, not severity score

DeRISK QVM (Quantified Vulnerability Management) translates every CVE in the environment into a dollar value: its estimated financial loss contribution, its Value at Risk impact, and the expected loss reduction if it is remediated. Vulnerabilities are ranked by expected loss reduction — not by CVSS score.

The process works in four steps. CVEs are ingested with their CVSS and EPSS scores and matched against the client's specific asset inventory and network context. Each CVE is cross-referenced against active ICS-CERT and CISA advisories. Then, using a patent-pending AI pipeline, each CVE is automatically mapped to the applicable MITRE ATT&CK for Enterprise and MITRE ATT&CK for ICS techniques — modelling realistic threat paths, not theoretical ones. Finally, the ATT&CK-mapped CVEs are fed into a simulation-twin of the network that computes VaR and estimated financial loss per vulnerability.

CVSS sorts by severity. DeRISK QVM sorts by expected loss reduction. Different question. Different answer. Different outcomes.

This distinction matters most in an AI-accelerated environment. Only 1–2% of vulnerabilities drive 90% of actual risk. When Mythos and comparable tools are surfacing thousands of findings across critical infrastructure codebases, that 1–2% is the only place where a finite maintenance window should be spent. DeRISK QVM is the product that surfaces which 1–2%.

Asset owners can also simulate remediation before committing an outage window — comparing alternative postures (patch this set of CVEs, segment this zone, apply this compensating control) and seeing the quantified financial loss reduction for each option against a baseline. The winning prioritisation model in OT is not "patch all criticals first." It is "reduce the most financially material risk with the least operational disruption."

DeRISK QVM has more than 200 deployments amongst our CRQ customers. It integrates natively with Forescout, Nozomi Networks, Claroty, and Tenable, drawing CVE context directly from the client's existing OT telemetry.

 

The broader DeRISK Platform

DeRISK QVM is one product within the broader DeRISK Platform. DeRISK CRQ extends the same financial quantification engine to security investment decisions at the facility and portfolio level — simulating which controls and projects reduce annualised loss expectancy and Value at Risk the most per dollar spent. Learn more at denexus.io/derisk-platform.

The financial loss conversation your board needs to have

Most CISOs and CFOs approach cyber risk the same way: threat scores, risk ratings, red-amber-green dashboards. Those tools measure the likelihood and severity of an incident in qualitative terms. They do not answer the question a CFO actually needs answered: if this incident happens, what does it cost us — and what would it have cost us to prevent it?

DeNexus changes that conversation. DeRISK QVM identifies which specific vulnerabilities in your environment contribute the most to expected financial loss. DeRISK CRQ identifies which security projects reduce that expected loss the most. Together, they give industrial organisations the ability to walk into a board meeting and say: patching these CVEs reduces our annualised expected loss by $X. Funding this segmentation project reduces our Value at Risk by $Y. Leaving this backlog unaddressed carries a tail-loss exposure of $Z.

That is not a cyber risk conversation. That is a capital allocation conversation — and it is the only framing that competes for budget alongside every other financial decision an industrial company makes.

AI can accelerate discovery. DeNexus translates that discovery into the financial decisions that actually reduce loss.

References

[1] Tindill, Donovan. “We’re All Going to Drown in Tech Debt. Some of Us Just Got Here Sooner.” LinkedIn, 2026. https://www.linkedin.com/pulse/were-all-going-drown-tech-debt-some-us-just-got-here-sooner-tindill-lxxic/

[2] Kariger, Rosa. “Mythos and Project Glasswing: The OT Blind Spots.” DeNexus, 2026. https://www.denexus.io/resources/mythos-and-project-glasswing-the-ot-blind-spots

[3] Anthropic. “Expanding Project Glasswing.” Anthropic, 2 June 2026. https://www.anthropic.com/news/expanding-project-glasswing

[4] Anthropic. “Project Glasswing: An Initial Update.” Anthropic, 22 May 2026. https://www.anthropic.com/research/glasswing-initial-update

[5] OpenAI. “Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber.” OpenAI, 2026. https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber/

[6] National Institute of Standards and Technology. Guide to Enterprise Patch Management Planning: Preventive Maintenance for Technology. NIST Special Publication 800-40 Revision 4, 2022. https://csrc.nist.gov/pubs/sp/800/40/r4/final

[7] Forescout. “ICS Cybersecurity in 2026: Vulnerabilities and the Path Forward.” Forescout, 2026. https://www.forescout.com/blog/ics-cybersecurity-in-2026-vulnerabilities-and-the-path-forward/

[8] Dragos. “Dragos 2026 OT Cybersecurity Year in Review.” Dragos, 2026. https://www.dragos.com/blog/dragos-2026-ot-cybersecurity-year-in-review

[9] Cybersecurity and Infrastructure Security Agency. Known Exploited Vulnerabilities Catalog. CISA. https://www.cisa.gov/known-exploited-vulnerabilities-catalog