The Autonomy Curve: How Much Freedom Can You Give an AI Agent?

How much autonomy you can give an AI agent comes down to one variable: how long a model can hold a task without drifting. The further a model runs a chain of reasoning and tool calls reliably, the more rope you can hand it in a single pass. We have run an agent harness for nearly two years, from Claude 3.5 Sonnet through the Sonnet and Opus line to Claude Fable 5, and every release moved that line a little further. A good harness plus a model that runs long chains reliably is what turns "AI that writes code" into "AI that does the work."

What "autonomy" actually means for an agent

Autonomy is not a feature you toggle. It is how much work you can hand off in one pass before you have to step back in and correct.

A low-autonomy agent gets one small, well-scoped instruction, does it, and stops. You review, you re-prompt, you do it again. A high-autonomy agent gets a goal, plans the steps itself, runs the tools, fixes its own mistakes, and comes back when the whole thing is done. The gap between those two is not the harness alone. It is whether the model can stay on the rails across a long chain of decisions.

That is the single variable. Everything else follows from it.

Two definitions before we go further, since the rest of this post leans on them:

Claude Fable 5 is Anthropic's newest model, built for complex, long-running, autonomous work. It runs at $10 per 1M input tokens and $50 per 1M output tokens, with a 1M-token context window.
Claude Opus 4.8 (released May 2026) is Anthropic's most capable Opus-tier model for everyday coding and agentic work. It runs at $5 per 1M input tokens and $25 per 1M output tokens.

The curve we actually watched climb

We did not theorize this. We lived it. Our harness has been running continuously since Claude 3.5 Sonnet, and each model release let us delete a little more babysitting code and hand the agent a little more rope.

Here is the curve, qualitatively, era by era. No invented benchmarks. Just what each step let us do.

Model era	How much rope we could give it	What that looked like in practice
Claude 3.5 Sonnet	Short, tightly scoped tasks	One file at a time. Heavy human review between steps. The harness did most of the holding.
Sonnet / Opus 4.x line	Medium tasks, fewer check-ins	Multi-file changes in a single pass. The model held a plan across several tool calls before drifting.
Claude Opus 4.8	Long agentic tasks, everyday default	State-of-the-art long-horizon work at a price that makes it the daily driver for coding.
Claude Fable 5	Hand-off-and-walk-away tasks	The longest, hardest runs. More freedom in one pass, and it holds together without drifting.

The shape is the point. Each era did not just get "smarter" in the abstract. It got better at the one property that decides autonomy: running a long chain reliably.

Why a good harness still matters

More autonomy is not just a model property. It is a harness property too.

A model that can run long chains reliably is wasted if the harness around it cannot give it room. And a great harness wrapped around a model that drifts after three steps just fails faster. The two together decide how far you can go.

Concretely, the harness is what:

Gives the agent the right tools, scoped to what the task needs.
Catches and feeds back errors so the model can self-correct instead of stalling.
Holds the goal steady so the model is not re-deriving what it is supposed to do every turn.
Sets the boundary, so a long autonomous run cannot wander somewhere expensive or destructive.

When the model gets more reliable over long chains, you can move work out of the harness and into the model. That is what every release on the curve let us do. Less hand-holding code. More trust per pass.

This is the same idea we wrote about in Building is not the bottleneck: the code is rarely the hard part. The hard part is everything around the code that decides whether the work actually ships.

What changes with Claude Fable 5

The practical difference with Claude Fable 5 is not a number on a chart. It is how much room you can give it.

You can hand it a longer task, give it more freedom in a single pass, and it holds together without drifting. For an agent harness, that one property does more than raise the ceiling. Reliability over long chains absorbs part of the QA burden, because a run that does not drift is a run you do not have to babysit and re-verify step by step.

That matters because QA is where most of the cost hides. We made that case in full in QA is the real AI bottleneck, published the same day as this post. A model that stays on the rails longer is not just more capable. It quietly shrinks the most expensive part of the loop.

The trade-off: when to reach for Fable 5

Fable 5 is not the default. It is the tool you reach for when the task earns it.

At $10 input and $50 output per 1M tokens, it is built for long, hard, autonomous runs, not for every small change. For everyday coding, Claude Opus 4.8 at $5 input and $25 output per 1M tokens is still the better value, and it is genuinely strong at agentic work.

Here is the rule we use:

Use Claude Opus 4.8 when you are in the loop. Interactive coding, fast iteration, the daily driver.
Use Claude Fable 5 when you want to hand off a long task and walk away. The runs where reliability over a long chain is worth paying for.

The honest version: pick the model for the length and stakes of the run, not for the headline. Most of your work does not need Fable 5. The work that does, needs it badly.

FAQ

How much autonomy can you give an AI coding agent?

As much as the model can hold without drifting. The single variable that decides agent autonomy is how reliably a model runs a long chain of reasoning and tool calls in one pass. A good harness sets the boundaries and feeds back errors, but the model's reliability over long chains is what determines how much work you can hand off before you have to step back in.

Is Claude Fable 5 better for agents than Claude Opus 4.8?

For long, hard, autonomous runs, yes. Claude Fable 5 is Anthropic's newest model for complex long-running work ($10 input / $50 output per 1M tokens) and it holds a longer task together without drifting. For everyday interactive coding, Claude Opus 4.8 ($5 input / $25 output per 1M tokens, May 2026) is the better value and still strong at agentic work. Use Fable 5 when you want to hand off and walk away.

What is the difference between a model and a harness in agent autonomy?

The model decides how long a task it can run reliably. The harness decides how much room the model gets to run. A reliable model in a weak harness is starved of room. A great harness around a model that drifts just fails faster. Autonomy is the product of the two, which is why improving either one lets you hand off more work.

Does more autonomy reduce the QA burden?

Yes, indirectly. A model that runs a long chain without drifting produces a run you do not have to verify step by step, so reliability over long chains absorbs part of the QA cost. This is why long-horizon reliability matters more for an agent harness than raw single-step capability.

We watched the autonomy curve climb from Claude 3.5 Sonnet to Claude Fable 5, and the next step will move it again. If you want to see how the model choice fits the rest of the picture, start with the best AI coding model for 2026, or read the specifics on Claude Fable 5 and Claude Opus 4.8. The full lineup is in all models.

The Autonomy Curve: How Much Freedom Can You Give an AI Agent?

On this page