Self-consistency sampling: cheap reliability when you need the right answer

Self-consistency sampling sounds like the kind of thing a researcher proposes and a production engineer ignores. Sample the same prompt N times at non-zero temperature, take the majority answer, ship it. It is unreasonably effective for tasks with a discrete correct answer, and it costs roughly N times more — which makes everyone uncomfortable until they price it against the cost of being wrong.

Where self-consistency pays off

Multiple-choice classification, numerical extraction, structured decisions — anywhere the answer space is small and the correctness criterion is sharp. Five samples typically capture most of the gain, ten samples is where the curve flattens. The interesting property is that variance reduction comes from the model’s own uncertainty: when the model is confident, all samples agree and you’re paying for nothing. When it isn’t confident, you find out, which itself is useful signal.

Where it doesn’t help

Open-ended generation, summarization, anything with a wide answer space — the majority vote degenerates into either the most common opening sentence or random noise, neither of which is what you wanted. For those tasks, the engineering effort is better spent on chain-of-thought, retrieval, or fine-tuning.

Self-consistency is the cheapest reliability technique available to LLM engineers. The reason teams skip it is that the math feels wasteful — but the comparison is to being wrong, not to being efficient.

Self-consistency sampling: cheap reliability when you need the right answer

Where self-consistency pays off

Where it doesn’t help

Tags :

Share :

Related Posts

Tool use patterns that survive context decay

Forcing structured output without breaking the model

Temperature and top-p: tuning when the answer matters more than novelty