Temperature and top-p: tuning when the answer matters more than novelty
- John Doe
- Prompting , Sampling
- 09 May, 2026
Temperature and top-p are the two sampling parameters every team adjusts and almost none tune systematically. The default of 0.7 is everyone’s first guess, the second guess is 0, and that’s where most projects stop. The real cost shows up later: classification tasks running with creative-writing temperatures, and creative writing tasks suffocating at temperature zero.
The decision rule that actually scales
For tasks with a single correct answer — classification, extraction, structured output — temperature should be 0 and top-p doesn’t matter. For tasks with many acceptable answers — summarization, rewriting — 0.5 to 0.7 with top-p around 0.9 is a reasonable starting point. For genuinely creative work, 0.8 to 1.0 is the right band, but always with top-p capped to avoid the tail of low-probability tokens that cause incoherence.
What the defaults hide
Setting temperature to 0 doesn’t make models deterministic — there’s still floating-point noise in tied probabilities. Two identical calls can produce different outputs. If you need true reproducibility, you need to capture the seed too, and not all APIs expose it. Treat temperature 0 as low-variance, not zero-variance, and your tests will stop being flaky.
The teams that ship reliable LLM features pick sampling parameters per task, not per project. The default config is the wrong config for half your endpoints.