← Work
By David Johnsen··
AIMythosGlasswingindustry

AI Just Took a Leap. Access Is Becoming a New Advantage.

Anthropic released Mythos Preview to a closed group of organizations. The capability leap is real, but the access model may matter just as much. As AI shifts from an equalizer to a gated advantage, the teams that win will be the ones that can turn that capability into working systems.


I build systems with AI every day, so even small model improvements are obvious. A 5% jump in real-world benchmarks shows up immediately in what I can build and how workflows behave.

SWE-bench Verified
Real-world software engineering tasks
Private frontier
Mythos Preview
93.9%
Public frontier
Opus 4.6
80.8%
Gemini 3.1 Pro
80.6%

Mythos Preview is 13 points ahead of the best public models on SWE-bench. That is not a marginal gain. It is the kind of gap where all other models will feel limited by comparison. And Anthropic is unlikely to be alone. OpenAI is rumored to be preparing a similar release this month.

Which makes the next point more important than the benchmarks.

It is not broadly available.

At this level of capability, that is likely the right approach. Systems like this introduce real safety and alignment risks. Gating access also changes the shape of the AI landscape.


The Capability Leap

Mythos Preview is a step change across most benchmarks. That is already noticeable in single-step tasks.

AI agent systems are usually a chain of steps that lead to the final output. Small changes in accuracy compound across that chain. A few percentage points at each step can turn into a much larger difference in the final result.

That is why a jump like this matters. It improves more than one response. It improves the behavior of the whole system.

93.9%
SWE-bench Verified
+13.1 pts vs Opus 4.6
181
Firefox zero-days
vs 2 for Opus 4.6 (90x)
97.6%
USAMO 2026 math
+55.3 pts vs Opus 4.6
82.0%
Terminal-Bench 2.0
+16.6 pts vs Opus 4.6
View
SWE-bench Verified
Real-world software engineering tasks from GitHub
Private frontier
Mythos Preview
93.9%
Public frontier
Opus 4.6
80.8%
Gemini 3.1 Pro
80.6%
GPT-5.4
~80%
SWE-bench Pro
Harder tasks with complex dependencies and multi-step reasoning
Private frontier
Mythos Preview
77.8%
Public frontier
GPT-5.4
57.7%
Gemini 3.1 Pro
54.2%
Opus 4.6
53.4%
Terminal-Bench 2.0
Autonomous task completion in a real terminal environment
Private frontier
Mythos Preview
82%
Public frontier
GPT-5.4
75.1%
Gemini 3.1 Pro
68.5%
Opus 4.6
65.4%

Current models already reason about code effectively. Mythos Preview extends that to reasoning about entire systems.

It found a 27-year-old flaw in OpenBSD that every security researcher and automated scanner missed. It found a 16-year-old vulnerability in FFmpeg that survived five million automated test runs.

The model builds internal representations of how systems interact, finds edge cases across those interactions, and executes multi-step verification autonomously.

On SWE-bench Pro, which tests real-world engineering tasks with complex dependencies, Mythos Preview scores 77.8%. The next best system scores 57.7%. That 20-point gap is larger than most model-to-model improvements over the last two years combined.

AI systems today already chain actions, call tools, and automate real workflows. But the operator still handles a lot of the orchestration. You break a problem into pieces, manage the sequencing, and intervene when something unexpected comes back.

At this capability level, more of that orchestration moves inside the system. You describe the outcome. The model handles the decomposition, the sequencing, and more of the edge cases without intervention.

Less manual coordination. Fewer places where the chain breaks. More of the workflow is handled end-to-end within the system.


This Changes How Software Works

AI systems today already automate real work. Agents run multi-step workflows, process documents, and push data between systems.

What changes at this capability level is how much of the workflow the system handles without the operator stepping in. Less manual coordination. Fewer edge cases escaping the chain. More of the workflow holds together end to end.

Current systems
  • AI handles parts of the workflow
  • Operator defines the structure
  • Operator manages edge cases
  • Systems require active oversight
At higher capability
  • Systems handle more of the chain
  • Agents monitor and act on their own
  • Workflows execute with less intervention
  • Operator reviews outcomes
  • Fewer manual handoffs

I build workflow automation for companies. Today that means: read this PDF, extract the data, validate it, push it to the right system. Current models already handle each piece well. I still assemble the pipeline and manage the exceptions. These are not hypothetical workflows. They are already running systems.

View live systems →

At this capability level, more of that pipeline runs without manual orchestration, including more of the exception handling. The operator role shifts toward reviewing outcomes instead of holding the workflow together step by step.

What changes

The gap shows up in how quickly systems come together and how much ongoing effort they require to keep running.


The Access Problem

The capability leap matters. But access to it determines who can actually use it.

Anthropic did not release Mythos Preview to the public. They said it may never be released in its current form. Instead, they deployed it through Project Glasswing, a closed group of twelve founding partners with exclusive access.

That group includes:

AWS
Anthropic
Apple
Broadcom
Cisco
CrowdStrike
Google
JPMorganChase
Linux Foundation
Microsoft
NVIDIA
Palo Alto Networks

Plus roughly 40 additional organizations responsible for maintaining critical software infrastructure.

Everyone else waits.

And the clock starts running.

The safety case is real, not performative. During internal testing, Mythos Preview breached its sandbox containment and operated outside its restricted environment. A researcher received an email from the model while away from their workstation. In another test, it posted details about how it bypassed safeguards on publicly accessible platforms. Unprompted.

Sam Bowman, Anthropic's alignment lead, put it this way: the model misbehaves less often than prior models, but the consequences of even rare failures are more significant.

A system that can find 181 Firefox exploits can find exploits in anything. Releasing that broadly is a different risk calculation than releasing a better chatbot. Staged rollout, responsible disclosure, defensive-first deployment. These are reasonable choices.

The partners in Project Glasswing are not just finding security vulnerabilities. They are learning how to operate with a fundamentally more capable system, building institutional knowledge about workflows, architectures, and possibilities that nobody outside the program can develop.

When Mythos-class capabilities become broadly available, these organizations will be months ahead. Not because they had more money or better engineers. Because they had access. Every week with a more capable system produces insights that accelerate the next week's work. For everyone outside the gate, that compounding does not start until they get in.

Anthropic committed $100 million in usage credits to Glasswing partners. That is not a research grant. That is a head start.

A head start in capability only turns into an advantage if it is translated into systems that actually run. That translation is where most teams struggle.

With access
Week 1: Learn system
Week 2: Build workflows
Week 4: Compound insights
Week 8: Structural advantage
Without access
Weeks 1–4: Waiting
Week 8: Start from zero
Takeaway

The advantage is shifting toward access.

Access changes who can reach the frontier. It does not remove the need to turn that capability into working systems.


This Is Not New. But It Feels Different.

Gated advantage is the oldest pattern in technology.

Cloud infrastructure was gated by capital. The first companies on AWS had structural advantages that took years to erode. Enterprise software was gated by price. Salesforce cost more than most startups could afford in 2005. Data was gated by collection infrastructure. Google's real moat was never the algorithm.

AI broke that pattern, at least for a brief window. Access did not matter.

A developer with a $20 API key had access to the same model as a Fortune 500 company. The gap was skill, not infrastructure. That was genuinely new. It was why solo founders and small teams could suddenly compete with organizations a hundred times their size.

Mythos Preview represents the first time that pattern might reverse. Not because the current models are going away. GPT-5.4 and Opus 4.6 are still excellent.

Takeaway

"Excellent" and "best available" are different positions. The gap between them is where advantage lives.

If the gap between public and private AI is large enough, and it appears to be, access becomes the differentiator again. And this time, the cycles are faster. The advantage compounds in weeks, not years.

What changes

Execution still matters. But access compounds faster.


Where do you place your bet?

Your answer changes how you build, partner, and compete.

If access matters more, the strategy changes. It is less about using AI better and more about getting closer to the systems that control it.

Distribution starts to matter as much as skill. Partnerships, industry consortia, early-access programs. These become as important as technical ability. The best prompt engineer in the world loses to a mediocre one with a better model.

This is uncomfortable if you expect merit to be enough. Infrastructure has always shaped who wins. AI just felt like an exception for a while.


What Happens Next

When access becomes the constraint, the effects do not stay contained to model performance. They show up everywhere else.

Strategy shifts

Teams already building with AI start asking a different question: how do we get closer to the next tier?

Partnerships, early access, distribution relationships. These move onto the roadmap alongside product and engineering.

Execution diverges

Two teams start in the same place. Same idea, same talent, same tools.

One gets access to a more capable system.

Within weeks, iteration speed and output quality begin to diverge.

Learning compounds unevenly

The team with access learns faster. Those learnings feed into the next system, which improves the next iteration.

The gap is not static.

It compounds.

Markets move faster

In previous cycles, advantages took years to play out.

Here, the cycle is measured in weeks.

By the time access broadens, the leaders may already be established.

The definition of "keeping up" changes

Most teams are still early in how they use AI. Workflows are fragmented. A lot of value is still left on the table.

The question shifts from "are we using AI" to whether the version you have access to is keeping pace with what is possible.

You can already see this in how systems are being built. Teams with better models are building fewer tools and more complete systems.

Takeaway

This is not just a capability shift. It is a compounding advantage.

And compounding advantages are hard to catch once they start.


The Open Question

Is this a temporary phase?

Staged rollouts are normal. GPT-4 was API-only for months. Claude 3 launched with usage limits. New capabilities always arrive unevenly.

But Anthropic did not say Mythos Preview would roll out broadly on a timeline. They said it may never be released in its current form. That is a different statement. That suggests a class of capability that the developers themselves believe should not be generally available.

If other frontier labs reach similar capability levels, and they will, the question gets louder. Do you release it? To whom? Under what terms?

Simon Willison raised an important point: why is only Anthropic gating access when other models may have comparable capabilities? If the safety argument is real, it applies to everyone. If it does not apply to everyone, it starts to look like competitive positioning.

There is probably truth in both readings. This is not just a product decision. It is a policy decision about intelligence distribution.


Where This Leaves Us

The publicly available AI models are remarkable and are still getting better. For most work, they are more than sufficient.

But "sufficient" is a different word than "best." And for the first time, the best is not available to everyone willing to pay for it.

For the first time, intelligence itself may be unevenly distributed.

This is probably the right way to proceed. Systems at this level of capability introduce real safety and alignment risks. Controlled access, staged rollout, and tighter oversight are rational choices.

Takeaway

The question is no longer just what AI can do. It is who gets to use it.

Access shapes the frontier. Execution determines who captures the value.

Most teams will not have early access. They will have some version of these systems. The difference comes from how quickly they turn that into systems that actually run.

That gap does not go away. It becomes more important.

That shift matters whether you think it is temporary or permanent. It changes how you plan, what advantages you invest in, and what "keeping up" means.

Either way, the equalizer era had a good run.

We will see how long the next one lasts.


David Johnsen

David Johnsen

Founder, CloudBuddy Solutions

Want to automate a workflow in your business?

Start with a free audit to find your highest-value opportunity.

Request a workflow audit