Tool selection: when the model should pick, and when you should

Tool selection: when the model should pick, and when you should

Tool-using agents look powerful in demos because the model is choosing what to do next. They look fragile in production because the model is choosing what to do next. The space of available tools grows linearly with features and quadratically with edge cases — past about a dozen tools, the model starts conflating their roles and picking based on surface similarity in the tool name.

What goes wrong as tool count grows

Beyond ten or fifteen tools, the descriptions blur together in the model’s representation. The model picks a search tool when a database lookup was correct, because both have “lookup” in their description. It picks the simpler tool when the complex one was needed, because the simpler one matched the user phrasing. None of this shows up in single-call testing — it surfaces when one of the tools quietly handles a request another tool was supposed to handle, and the answer is technically valid but operationally wrong.

Architectural answers, not prompt answers

Group tools by purpose and route the request to a sub-agent that only sees the relevant subset. Surface fewer tools to the top-level model than you actually expose internally — five visible tools with clear purposes outperforms twenty undifferentiated ones. For destructive or expensive tools, require an explicit naming match, not a model-chosen one.

The number of tools an agent should choose from is much smaller than the number of tools you’d like to give it. Past a threshold, every additional tool makes every other choice worse.

Related Posts

Evaluating agents when there's no single right answer

Evaluating agents when there's no single right answer

Evaluating a single prompt is hard. Evaluating an ...

Agent guardrails without lobotomizing the agent

Agent guardrails without lobotomizing the agent

Adding guardrails to an agent is one of those task ...

Planner-executor splits: when to separate them

Planner-executor splits: when to separate them

A single model doing both planning and execution f ...

Designing an agent harness that doesn't fight the model

Designing an agent harness that doesn't fight the model

Lorem ipsum dolor sit amet consectetur adipisicing ...

Memory strategies for long-running agents

Memory strategies for long-running agents

Long-running agents accumulate context. The job of ...

How autonomous is too autonomous

How autonomous is too autonomous

Autonomy in agents is a slider, not a switch, and ...

Agent memory: episodic, semantic, and what to keep

Agent memory: episodic, semantic, and what to keep

The first agent you build has no memory beyond the ...