When the agent fails: recovery patterns that don't loop forever

When the agent fails: recovery patterns that don't loop forever

Agent failures don’t throw exceptions. They produce plausible-looking output that’s wrong, or quietly retry the same broken approach in a slightly different way. Wrapping agents in try/catch is the wrong mental model — the agent didn’t crash, it just kept going in a useless direction. Recovery has to be designed in, not bolted on.

The failure modes that need different recovery

Tool failures — the API returned an error or timed out — are the easiest case: the agent should see the error and try a different approach. Reasoning failures — the agent is confidently wrong about what step comes next — are harder, because the agent doesn’t know it’s wrong. Loop failures — the agent retries the same approach over and over — are the worst, because each iteration looks productive in isolation.

Recovery patterns that survive contact with reality

Cap iteration count, always. Detect repetition in the action history — if the agent has called the same tool with similar arguments three times in a row, escalate or abort. For reasoning failures, a separate “is the current plan still right?” check, run periodically by a smaller model on the action log, catches the worst cases. None of this is glamorous, and all of it gets cut from the first version of every agent because it feels paranoid until the first time it fires.

Agent failure recovery is the part of the system that exists to keep small failures from becoming catastrophic. Skipping it is how you discover that “agentic” and “autonomous” are not the same word.

Related Posts

Memory strategies for long-running agents

Memory strategies for long-running agents

Long-running agents accumulate context. The job of ...

Evaluating agents when there's no single right answer

Evaluating agents when there's no single right answer

Evaluating a single prompt is hard. Evaluating an ...

Agent guardrails without lobotomizing the agent

Agent guardrails without lobotomizing the agent

Adding guardrails to an agent is one of those task ...

Planner-executor splits: when to separate them

Planner-executor splits: when to separate them

A single model doing both planning and execution f ...

Tool selection: when the model should pick, and when you should

Tool selection: when the model should pick, and when you should

Tool-using agents look powerful in demos because t ...

Designing an agent harness that doesn't fight the model

Designing an agent harness that doesn't fight the model

Lorem ipsum dolor sit amet consectetur adipisicing ...

How autonomous is too autonomous

How autonomous is too autonomous

Autonomy in agents is a slider, not a switch, and ...

Agent memory: episodic, semantic, and what to keep

Agent memory: episodic, semantic, and what to keep

The first agent you build has no memory beyond the ...