Function calling: when the schema is more important than the prompt

Function calling: when the schema is more important than the prompt

Function calling looks like a free win the first time it works. The model picks the right tool, fills in the right arguments, and your application code handles the rest. Then you ship, and the corner cases arrive: an argument hallucinated out of thin air, a function called with valid types but nonsensical values, or a refusal to call any function when the right answer was obviously to call one.

Where reliability actually comes from

The schema is doing more work than the prompt. Tight enums beat free-form strings. Required fields with descriptive names beat optional fields the model can skip. The most common reliability bug is a function definition with vague parameter descriptions — the model fills in plausible-looking values that match the type but not the intent. Treat the parameter descriptions as part of the prompt, because they are.

When to call your own check

Don’t trust the model’s first choice for irreversible operations. A confirmation step — even a simple “Is this what you meant to do?” — catches the cases where the model picked the right function with the wrong arguments. For multi-tool tasks, log the model’s tool selection rationale; the time you spend reading those logs is the cheapest debugging you’ll do.

Function calling shifts the failure mode from “wrong text” to “wrong action.” That’s a different category of bug, and you cannot fix it with prompt tuning alone.

Related Posts

Tool use patterns that survive context decay

Tool use patterns that survive context decay

Tool use looks easy in a one-shot example and hard ...