Token economics: when cheap models cost more
A cheaper model that gets the answer right 80% of the time costs more than a premium model that gets the answer right 95% of the time, once you factor in retries, escalation, and the human-review queue that the cheaper model fills. Everyone knows this in principle and almost no one prices it correctly the first time, because the cheap-model invoice is concrete and the cost-of-being-wrong is diffuse.
The numbers that actually matter
Price per correct answer is the metric, not price per token. Compute it as: model cost × calls-per-task ÷ accuracy. A model at half the price with two-thirds the accuracy is more expensive on this metric, not less. Add the second-order costs: retries, fallback escalation, support tickets, refunds. The cheap-model column wins on the spreadsheet you started with and loses on the spreadsheet you should have started with.
Where mixed-model architectures pay off
Use the cheap model for tasks where confidence is easy to measure — a separate model judges the cheap model’s output, and only escalates to the expensive model when confidence is low. Cache aggressively at every tier. Run benchmarks specific to your task before assuming the published evals translate; published numbers are averages, your task isn’t.
The cheapest model is rarely the cheapest deployment. The teams that ship cost-effective LLM features picked their model after a real cost-per-correct-answer study, not before.