Fable 5: The Mythos Model Just Went Public
Anthropic released Fable 5 today, a Mythos-class model anyone with an API key can use. I ran it head-to-head against Opus 4.8 and Sonnet 4.6 on launch day, from quick puzzles to a 150,000-line codebase audit. The frontier difference shows up exactly where you'd expect, and not where you wouldn't.
In April I wrote about Mythos Preview, the model Anthropic said might never be released in its current form. It went to twelve founding partners and a closed group of organizations maintaining critical infrastructure. Everyone else waited, and the compounding clock ran only for the people inside the gate.
Today the gate opened. Anthropic released Claude Fable 5, a Mythos-class model, to everyone with an API key or a paid plan.
I build systems with AI every day, so a release like this changes my week, my clients' systems, and the planning conversations I have with companies. This post covers what Fable 5 is, what it does better in practical terms, and where the access story I wrote about lands now.
What Anthropic Shipped
Fable 5 is the public version of the Mythos-class model. The capability is the headline, but the release mechanism is the interesting part: instead of holding the model back, Anthropic ships it with a safeguard layer. Queries that touch high-risk areas like offensive cybersecurity or biology get answered by Claude Opus 4.8 instead. Everything else runs on Fable 5 directly.
Anthropic reports the fallback triggers in under 5% of sessions. For most work, you are simply using the strongest model they have ever released to the public.
The pricing breaks a pattern, and this part is easy to miss. Every previous Claude release was folded into the existing monthly plans on day one and stayed there. Fable 5 is the first that is not. Paid plans include it free only through June 22. After that it leaves the subscription, and every Fable 5 token is billed at usage rates on top of whatever plan you already pay for.
Mythos 5 itself, without the safeguard layer, stays restricted to Project Glasswing partners and select researchers. The frontier still has a gated tier. But the public tier just jumped to within arm's reach of it.
What It Does Better in Practice
Benchmark deltas are easy to find. The more useful question for anyone running AI in a business is where the gains show up in working systems. Four areas stand out.
1. Long projects finish instead of drifting
The clearest practical gain is long-horizon work. Prior models could run multi-step tasks, but on long chains they drift: they lose the goal, redo finished work, or declare victory early. Fable 5 holds focus across tasks spanning millions of tokens.
The reference point from the launch: Stripe used Fable 5 to migrate a 50-million-line codebase in one day. Their manual estimate for the same work was two months.
TakeawayThe unit of delegation is shifting from a task to a project.
For company workflows, that changes what you can hand off. A document backlog, a data migration, a quarter of reconciliation cleanup: work that used to need a person re-aiming the model every hour can now run as a supervised job with review at the end.
2. Vision you can hand a screenshot
Fable 5 can rebuild the working source code of a web app from screenshots alone. As a stunt, it also completed Pokémon FireRed using nothing but raw game screenshots, with no helper tools reading the game's memory.
The business translation: legacy systems with no documentation, scanned paperwork, dashboard exports, charts in old PDFs. A lot of company knowledge exists only as pixels. Models that read pixels at this level can pull that knowledge into systems that use it.
3. It catches its own mistakes mid-task
Anthropic describes Fable 5's reasoning as operating at senior research scientist grade, with stronger reflection and self-validation. It posted the top score on Hebbia's Finance Benchmark for senior-level reasoning and leads Cognition's FrontierCode evaluation for production code quality.
In April I wrote that agent systems are chains of steps, and small accuracy gains compound across the chain. Self-validation is that effect concentrated: a model that notices a wrong step at step 3 saves the whole run, because errors that survive early steps multiply through everything downstream.
Fewer silent errors per step means longer chains become trustworthy. That is the property agent systems have been waiting on.
4. Less tuning for the people who build with it
A quieter change for builders: the API surface keeps getting simpler. There are no sampling knobs to tune on Fable 5, and the model decides when and how much to think on its own. You describe the outcome and give it room.
That sounds minor. In practice it removes a whole category of configuration that teams used to maintain, test, and argue about. The model carries more of the judgment that used to live in settings.
I Ran My Own Tests. Then a Harder One.
Launch-day claims deserve testing, so on day one I ran my own comparison: Fable 5 against Opus 4.8 and Sonnet 4.6, the models most teams are using right now. I started with three short, scorable tasks, three trials per model, identical prompts. 27 runs total.
- A planted-error audit: an expense report where exactly one number is wrong, constructed so that checking only the totals points you at the wrong answer. Finding the true error requires verifying every underlying row.
- A production bug hunt: a bank-reconciliation function that intermittently loses real transactions. The root cause is a de-duplication key that collapses distinct same-amount, same-day transactions, with plausible decoy issues nearby.
- A chart with no data labels: a rendered bar chart where the models had to estimate six values from pixels alone.
Every model dodged the audit trap in every trial. Every model found the dedup bug and prescribed the same fix. On the chart, Fable 5 and Opus 4.8 read all eighteen values exactly; Sonnet 4.6 read one bar as 24 instead of 23 in two of three trials. Across 27 runs, that single unit was the entire measurable difference.
TakeawayOn short, bounded tasks, the public models have converged. The difference lives in the long tasks.
The quick-question taste test cannot tell these models apart anymore, and that includes Sonnet at a third of the price. So I gave them a harder problem, the kind Anthropic says is Fable 5's actual territory.
The Real Test: Audit a 150,000-Line Codebase
I pointed each model at one of my own production backends: a multi-tenant TypeScript service of about 150,000 lines, far more than any model can read in full. The instruction was the same for all three: audit it read-only, find every defect that could cause incorrect behavior, data loss, or a security problem, report everything, and suggest the highest-value upgrades. One pass each, same prompt, same tools.
This is the task that can't be faked with a clever trick. A model has to decide what to read, hold a map of the system in its head, and reason about how pieces interact across files. And here the three models came apart.
They agreed on a solid floor: all three caught the same handful of real issues, including a customer-facing integration that was stubbed out and silently failing, and in-memory state that quietly resets every time the server restarts. Any of the three would have earned its keep. But each one also surfaced a serious problem the other two completely missed, and the misses fell along clear lines.
- Fable 5: a secret with an unsafe fallback that fails open if misconfigured, plus a regulatory cost cap that was only enforced in one of four calculation paths and a second cap defined but never applied at all.
- Opus 4.8: a concurrency bug where a value was read for safe-write protection but never actually used, letting two parallel jobs silently corrupt each other's results.
- Sonnet 4.6: a security control that was fully built and tested but never wired into a single route, so it protected nothing.
- Fable read the domain, not just the code: it understood what the rules were supposed to do and traced them across the whole pipeline.
- Opus went deepest into one hard subsystem and found the most insidious bug in the codebase.
- Sonnet was the most thorough at breadth and caught what was built-but-not-connected, with more low-confidence items mixed in.
TakeawayOn the easy tests they tied. On a real codebase, they had different blind spots.
Fable's standouts were the ones I'd least want to ship without: a fail-open secret and a fee rule that was only half-enforced. Both require understanding what the code is supposed to do, not just spotting a suspicious pattern. That is the difference the benchmarks point at, showing up in a real review rather than a leaderboard. Every issue here was caught in a pre-release audit, which is exactly when you want it caught.
The practical lesson: on a hard, open-ended audit the frontier models have different blind spots, so running more than one is additive, not redundant. The cheap models tie on easy work and diverge on hard work. That is the whole story from both ends.
All runs went through Claude Code with per-run model overrides, so every model received identical prompts inside an identical harness. For the short tests, tool use was forbidden on the two reasoning tasks and a single image read was allowed for the chart. For the codebase audit, each model had read-only file tools and one pass.
Caveats: the short tests ran three trials per cell, a small sample; the codebase audit was a single pass per model, so treat the head-to-head as one detailed data point, not a benchmark. The Claude Code harness adds its own system prompt, identical across models but different from a raw API call. Specifics of the audited codebase are withheld; findings are described only in general terms.
The Access Story, Two Months Later
My April post argued that access was becoming the advantage: Glasswing partners were compounding learnings on a model nobody else could touch, and the public might wait a long time.
The capability went public faster than the language suggested. "May never be released in its current form" turned out to mean: released in a modified form, eight weeks later. The safeguard fallback is what made that possible. It let Anthropic ship the capability broadly while keeping the highest-risk uses on a tighter leash.
- Twelve founding partners
- Roughly 40 additional organizations
- $100M in credits to insiders
- Everyone else waiting
- Compounding ran inside the gate
- Anyone with an API key
- Free on paid plans through June 22 only
- High-risk queries fall back to Opus 4.8
- Mythos 5 still gated for partners
- Compounding starts for everyone
The honest read is that both things are true. The equalizer era got an extension: an API key once again buys capability close to the frontier, and for two weeks a $20 subscription does too. And the partners who spent two months operating Mythos-class systems still hold that head start, plus continued access to the unrestricted model.
The gate moved. It did not disappear. But the distance between public and frontier just collapsed from a chasm to a step, and that changes the planning math for every team that was telling itself it could wait.
What I Would Do With the Free Window
Fable 5 is included on paid plans at no extra cost through June 22, and then it is gone from the subscription for good. This is the first Claude model that monthly plans do not cover. You get two weeks to find out what it changes for your work, and after that every run is a line item.
- Skip the quick-question taste test. My head-to-head runs came back nearly identical across Fable 5, Opus 4.8, and Sonnet 4.6 on short tasks. You will not feel the difference there.
- Point it at something big and messy you actually own: a real codebase to audit, a long document set to reconcile, a migration you keep putting off. That is where the models separated for me.
- Hand it something visual: a screenshot of a legacy tool, a scanned form, a chart from a PDF. See what it extracts.
- On a high-stakes review, run more than one model. They have different blind spots, and the second model is cheap insurance against the first one's miss.
Keep Reading
The April piece on Mythos Preview and why access was becoming the new advantage:
AI Just Took a Leap. Access Is Becoming a New Advantage.And if the free window has you wondering where to point a stronger model first:
AI Workflow Ranking: What to Automate FirstHave CloudBuddy map a workflow with you
David Johnsen
Founder, CloudBuddy Solutions
Want to automate a workflow in your business?
Start with free Workflow Mapping to find your highest-value opportunity.
Request workflow mappingMore posts
Agent Queues: How AI Turns Backlogs Into Systems
The easiest way to understand useful AI at work is to look at the queue: the inbox, ticket list, approval pile, lead backlog, or report stack where work waits for the next step.
AI Workflow Ranking: What to Automate First
Most AI projects fail at the first decision: which workflow to build for. AI Workflow Ranking is a repeatable way to score every workflow on readiness and value, then pick the first build that actually pays off.
A Repeatable System Audit Framework for Production Software
A repeatable framework for auditing a SaaS codebase at scale. A set of audit tracks you select and adapt to your system, an invariants loop that stops regressions, and a verification cycle that makes each audit cheaper than the last. One recent application surfaced hundreds of findings and promoted 36 invariants to code-level guardrails.
AI Just Took a Leap. Access Is Becoming a New Advantage.
Anthropic released Mythos Preview to a closed group of organizations. The capability leap is real, but the access model may matter just as much. As AI shifts from an equalizer to a gated advantage, the teams that win will be the ones that can turn that capability into working systems.
Claude Code Leaked. I Looked Under the Hood.
Claude Code CLI accidentally exposed part of its codebase. I pulled the package and looked under the hood. The direction is clear: AI agents are becoming systems.
I Rebuilt My Company Site From My Phone (At the Gym)
What it actually looks like to work with remote AI agents. What worked, what didn't, and how Claude Dispatch compares to our custom AI tools.