Why Most People Can't Tell Claude Fable 5 From Opus (and Why That's a Clue)

Claude Fable 5 looks identical to Opus in a chat window. Give it a real job and it isn't. Why the agentic era is where the difference shows.

Jun 13, 2026

Anthropic released Claude Fable 5 on 9 June 2026. It is the first model in their new Mythos class, a capability tier that sits above Opus, and the most capable model they have ever made generally available. TechCrunch called it the public version of a model Anthropic had previously restricted to a small group of vetted organisations.

But the most common reaction I've heard in the three days since launch is some version of "I tried it and it seems about the same as Opus." I shared my own first impressions of Fable 5 after two days with it; this piece is about why so many people are underwhelmed when they shouldn't be.

Both of those statements are true. Fable 5 is a generational leap, and most people genuinely can't see it. The gap between those two facts tells you more about where AI is heading than any benchmark chart.

A chat window hides the difference

If you use AI through a chatbox interface, you are the orchestrator. You ask a question, the model answers, you read it, you decide what to ask next. Every turn is a short, self-contained task, and the model only has to be good for one step at a time.

Opus was already excellent at one step at a time. So is Fable 5. Which is why, on routine single-turn work, the performance differences narrow considerably. Anthropic says it plainly in their own announcement: the longer and more complex the task, the larger Fable 5's lead. We are talking about tasks that might take hours to produce. So, the inverse is also true. Make the task short and simple, and the lead nearly vanishes.

Asking a frontier model to draft an email is like hiring a senior engineer to change a lightbulb. You won't learn much about what they can do.

The benchmarks only diverge when the task gets long

Look at where the published numbers actually split:

On SWE-bench Pro, a benchmark of real software engineering tasks, Fable 5 scores 80.3% against 69.2% for Opus 4.8.
On FrontierCode Diamond, which tests the hardest long-horizon coding problems, Fable 5 scores 29.3% against 13.4%. That is more than double.

Both numbers come from the same benchmark comparison. Notice the pattern: the harder and longer the work, the wider the gap. These are tasks that take hours of autonomous effort, hundreds of small decisions, each one building on the last. Small improvements in per-step judgement compound enormously over a long chain. A model that recovers well from its own mistakes finishes jobs that a slightly weaker model abandons halfway.

The most striking real-world example came from Stripe, who reported during early testing that Fable 5 completed a codebase-wide migration across 50 million lines of code in a single day. Their estimate for doing it by hand was over two months of team effort.

Phase 2 of AI is agentic

I think about AI adoption in two phases.

Phase 1 was the chatbot era. AI answers. You ask, it responds, you do something with the response. Almost everyone's mental model of AI was formed here, and almost all of the "is it actually better?" judgements are still being made here.

Phase 2 is the agentic era. AI does. You give it a goal, and it plans the work, uses tools, reads and writes files, runs commands, tests its own output, corrects course, and keeps going until the job is finished. It produces a chain of hundreds of actions rather than a single answer.

Fable 5 was built for phase 2, and the design choices show it. Its reasoning is always on rather than optional. A single request can run for many minutes while it works. It delegates dependably to parallel sub-agents. It uses file-based memory unusually well: in one of Anthropic's tests, giving the model persistent notes improved its performance three times more than the same memory helped Opus 4.8. None of those qualities are visible in a four-line chat exchange.

There is a price signal here too. Fable 5 costs double Opus per token, yet analysts note it often completes the same work in fewer steps, so the economics favour long jobs rather than quick answers. Anthropic priced it for work rather than chat.

Where you'll actually feel the difference

In a terminal. In an agent platform. In an automated pipeline. Anywhere the model is given a job instead of a question.

Most of my own AI use has moved out of the chat window. Agents research prospects, draft and check content, monitor systems, and build features end to end while I do something else. At that kind of work, the difference between Fable and Opus is stark. It is the difference between checking in on an agent and finding the job done, versus finding it stuck.

If you want to test this yourself, don't ask Fable 5 a clever question. Give it a real multi-step job through an agentic tool like Claude Code: "audit this codebase and fix what you find", "research these ten companies and produce a comparison", "build this feature and test it". Then judge.

The takeaway

If your experience of AI is a chat box, your benchmark for AI progress is stuck in phase 1, and every new model will feel like a minor update from here on. The frontier has moved to a different surface. The organisations compounding an advantage right now are the ones handing AI whole jobs instead of single questions.

That shift, from AI that answers to AI that does, is the one worth getting ahead of. If you'd like to see what agentic AI looks like on real business work, that's exactly what we explore at Bykov-Brett Enterprises, live demos included.

@JamieBykovBrett

Discussion about this post

Ready for more?