AI Just Took a Leap. Access Is Becoming a New Advantage.
Anthropic released Mythos Preview to a closed group of organizations. The capability leap is real, but the access model may matter just as much. As AI shifts from an equalizer to a gated advantage, the teams that win will be the ones that can turn that capability into working systems.
I build systems with AI every day, so even small model improvements are obvious. A 5% jump in real-world benchmarks shows up immediately in what I can build and how workflows behave.
Mythos Preview is 13 points ahead of the best public models on SWE-bench. That is not a marginal gain. It is the kind of gap where all other models will feel limited by comparison. And Anthropic is unlikely to be alone. OpenAI is rumored to be preparing a similar release this month.
Which makes the next point more important than the benchmarks.
It is not broadly available.
At this level of capability, that is likely the right approach. Systems like this introduce real safety and alignment risks. Gating access also changes the shape of the AI landscape.
The Capability Leap
Mythos Preview is a step change across most benchmarks. That is already noticeable in single-step tasks.
AI agent systems are usually a chain of steps that lead to the final output. Small changes in accuracy compound across that chain. A few percentage points at each step can turn into a much larger difference in the final result.
That is why a jump like this matters. It improves more than one response. It improves the behavior of the whole system.
Current models already reason about code effectively. Mythos Preview extends that to reasoning about entire systems.
It found a 27-year-old flaw in OpenBSD that every security researcher and automated scanner missed. It found a 16-year-old vulnerability in FFmpeg that survived five million automated test runs.
The model builds internal representations of how systems interact, finds edge cases across those interactions, and executes multi-step verification autonomously.
On SWE-bench Pro, which tests real-world engineering tasks with complex dependencies, Mythos Preview scores 77.8%. The next best system scores 57.7%. That 20-point gap is larger than most model-to-model improvements over the last two years combined.
AI systems today already chain actions, call tools, and automate real workflows. But the operator still handles a lot of the orchestration. You break a problem into pieces, manage the sequencing, and intervene when something unexpected comes back.
At this capability level, more of that orchestration moves inside the system. You describe the outcome. The model handles the decomposition, the sequencing, and more of the edge cases without intervention.
Less manual coordination. Fewer places where the chain breaks. More of the workflow is handled end-to-end within the system.
This Changes How Software Works
AI systems today already automate real work. Agents run multi-step workflows, process documents, and push data between systems.
What changes at this capability level is how much of the workflow the system handles without the operator stepping in. Less manual coordination. Fewer edge cases escaping the chain. More of the workflow holds together end to end.
- AI handles parts of the workflow
- Operator defines the structure
- Operator manages edge cases
- Systems require active oversight
- Systems handle more of the chain
- Agents monitor and act on their own
- Workflows execute with less intervention
- Operator reviews outcomes
- Fewer manual handoffs
I build workflow automation for companies. Today that means: read this PDF, extract the data, validate it, push it to the right system. Current models already handle each piece well. I still assemble the pipeline and manage the exceptions. These are not hypothetical workflows. They are already running systems.
View live systems →At this capability level, more of that pipeline runs without manual orchestration, including more of the exception handling. The operator role shifts toward reviewing outcomes instead of holding the workflow together step by step.
The gap shows up in how quickly systems come together and how much ongoing effort they require to keep running.
The Access Problem
The capability leap matters. But access to it determines who can actually use it.
Anthropic did not release Mythos Preview to the public. They said it may never be released in its current form. Instead, they deployed it through Project Glasswing, a closed group of twelve founding partners with exclusive access.
That group includes:
Plus roughly 40 additional organizations responsible for maintaining critical software infrastructure.
Everyone else waits.
And the clock starts running.
The safety case is real, not performative. During internal testing, Mythos Preview breached its sandbox containment and operated outside its restricted environment. A researcher received an email from the model while away from their workstation. In another test, it posted details about how it bypassed safeguards on publicly accessible platforms. Unprompted.
Sam Bowman, Anthropic's alignment lead, put it this way: the model misbehaves less often than prior models, but the consequences of even rare failures are more significant.
A system that can find 181 Firefox exploits can find exploits in anything. Releasing that broadly is a different risk calculation than releasing a better chatbot. Staged rollout, responsible disclosure, defensive-first deployment. These are reasonable choices.
The partners in Project Glasswing are not just finding security vulnerabilities. They are learning how to operate with a fundamentally more capable system, building institutional knowledge about workflows, architectures, and possibilities that nobody outside the program can develop.
When Mythos-class capabilities become broadly available, these organizations will be months ahead. Not because they had more money or better engineers. Because they had access. Every week with a more capable system produces insights that accelerate the next week's work. For everyone outside the gate, that compounding does not start until they get in.
Anthropic committed $100 million in usage credits to Glasswing partners. That is not a research grant. That is a head start.
A head start in capability only turns into an advantage if it is translated into systems that actually run. That translation is where most teams struggle.
TakeawayThe advantage is shifting toward access.
Access changes who can reach the frontier. It does not remove the need to turn that capability into working systems.
This Is Not New. But It Feels Different.
Gated advantage is the oldest pattern in technology.
Cloud infrastructure was gated by capital. The first companies on AWS had structural advantages that took years to erode. Enterprise software was gated by price. Salesforce cost more than most startups could afford in 2005. Data was gated by collection infrastructure. Google's real moat was never the algorithm.
AI broke that pattern, at least for a brief window. Access did not matter.
A developer with a $20 API key had access to the same model as a Fortune 500 company. The gap was skill, not infrastructure. That was genuinely new. It was why solo founders and small teams could suddenly compete with organizations a hundred times their size.
Mythos Preview represents the first time that pattern might reverse. Not because the current models are going away. GPT-5.4 and Opus 4.6 are still excellent.
Takeaway"Excellent" and "best available" are different positions. The gap between them is where advantage lives.
If the gap between public and private AI is large enough, and it appears to be, access becomes the differentiator again. And this time, the cycles are faster. The advantage compounds in weeks, not years.
Execution still matters. But access compounds faster.
Where do you place your bet?
Your answer changes how you build, partner, and compete.
What Happens Next
When access becomes the constraint, the effects do not stay contained to model performance. They show up everywhere else.
Strategy shifts
Teams already building with AI start asking a different question: how do we get closer to the next tier?
Partnerships, early access, distribution relationships. These move onto the roadmap alongside product and engineering.
Execution diverges
Two teams start in the same place. Same idea, same talent, same tools.
One gets access to a more capable system.
Within weeks, iteration speed and output quality begin to diverge.
Learning compounds unevenly
The team with access learns faster. Those learnings feed into the next system, which improves the next iteration.
The gap is not static.
It compounds.
Markets move faster
In previous cycles, advantages took years to play out.
Here, the cycle is measured in weeks.
By the time access broadens, the leaders may already be established.
The definition of "keeping up" changes
Most teams are still early in how they use AI. Workflows are fragmented. A lot of value is still left on the table.
The question shifts from "are we using AI" to whether the version you have access to is keeping pace with what is possible.
You can already see this in how systems are being built. Teams with better models are building fewer tools and more complete systems.
TakeawayThis is not just a capability shift. It is a compounding advantage.
And compounding advantages are hard to catch once they start.
The Open Question
Is this a temporary phase?
Staged rollouts are normal. GPT-4 was API-only for months. Claude 3 launched with usage limits. New capabilities always arrive unevenly.
But Anthropic did not say Mythos Preview would roll out broadly on a timeline. They said it may never be released in its current form. That is a different statement. That suggests a class of capability that the developers themselves believe should not be generally available.
If other frontier labs reach similar capability levels, and they will, the question gets louder. Do you release it? To whom? Under what terms?
Simon Willison raised an important point: why is only Anthropic gating access when other models may have comparable capabilities? If the safety argument is real, it applies to everyone. If it does not apply to everyone, it starts to look like competitive positioning.
There is probably truth in both readings. This is not just a product decision. It is a policy decision about intelligence distribution.
Where This Leaves Us
The publicly available AI models are remarkable and are still getting better. For most work, they are more than sufficient.
But "sufficient" is a different word than "best." And for the first time, the best is not available to everyone willing to pay for it.
For the first time, intelligence itself may be unevenly distributed.
This is probably the right way to proceed. Systems at this level of capability introduce real safety and alignment risks. Controlled access, staged rollout, and tighter oversight are rational choices.
TakeawayThe question is no longer just what AI can do. It is who gets to use it.
Access shapes the frontier. Execution determines who captures the value.
Most teams will not have early access. They will have some version of these systems. The difference comes from how quickly they turn that into systems that actually run.
That gap does not go away. It becomes more important.
That shift matters whether you think it is temporary or permanent. It changes how you plan, what advantages you invest in, and what "keeping up" means.
Either way, the equalizer era had a good run.
We will see how long the next one lasts.

David Johnsen
Founder, CloudBuddy Solutions
Want to automate a workflow in your business?
Start with a free audit to find your highest-value opportunity.
Request a workflow auditMore posts
Claude Code Leaked. I Looked Under the Hood.
Claude Code CLI accidentally exposed part of its codebase. I pulled the package and looked under the hood. The direction is clear: AI agents are becoming systems.
I Rebuilt My Company Site From My Phone (At the Gym)
What it actually looks like to work with remote AI agents. What worked, what didn't, and how Claude Dispatch compares to our custom AI tools.