Tool selection: when the model should pick, and when you should
- William Jacob
- Tools , Agents
- 10 May, 2026
Tool-using agents look powerful in demos because the model is choosing what to do next. They look fragile in production because the model is choosing what to do next. The space of available tools grows linearly with features and quadratically with edge cases — past about a dozen tools, the model starts conflating their roles and picking based on surface similarity in the tool name.
What goes wrong as tool count grows
Beyond ten or fifteen tools, the descriptions blur together in the model’s representation. The model picks a search tool when a database lookup was correct, because both have “lookup” in their description. It picks the simpler tool when the complex one was needed, because the simpler one matched the user phrasing. None of this shows up in single-call testing — it surfaces when one of the tools quietly handles a request another tool was supposed to handle, and the answer is technically valid but operationally wrong.
Architectural answers, not prompt answers
Group tools by purpose and route the request to a sub-agent that only sees the relevant subset. Surface fewer tools to the top-level model than you actually expose internally — five visible tools with clear purposes outperforms twenty undifferentiated ones. For destructive or expensive tools, require an explicit naming match, not a model-chosen one.
The number of tools an agent should choose from is much smaller than the number of tools you’d like to give it. Past a threshold, every additional tool makes every other choice worse.