RAG that beats fine-tuning, and the cases where it doesn't

RAG that beats fine-tuning, and the cases where it doesn't

RAG won the early-deployment war for good reasons: it’s cheaper than fine-tuning, the knowledge base updates without retraining, and you can audit what the model saw. For most question-answering and document-grounded tasks, it is still the right architecture. But the failure modes are real, and “just add RAG” has become the same kind of unhelpful advice that “just use a database” was a decade ago.

Where RAG quietly underperforms

Tasks that require synthesis across many documents — where no single retrieved chunk contains the answer — are where RAG hurts most. The retriever returns the chunks that are individually most relevant, which often misses the chunk that bridges them. Tasks that require domain-specific style or terminology rarely benefit from RAG, because the retrieved text is content, not voice. And small corpora — under a few thousand documents — sometimes work better just stuffed into context, with none of the retrieval complexity.

When fine-tuning wins

If the task requires the model to learn a new format, a new domain vocabulary, or a behavioral pattern that prompting can’t establish, fine-tuning is the lever. RAG cannot teach behavior. It can only retrieve content. Mixing them — fine-tune for behavior, retrieve for content — is more often the right answer than either alone.

RAG is not a solution. It’s a deployment pattern that works for a specific class of problems, and treating it as the answer to every LLM problem is how teams end up debugging vector search instead of solving their actual task.

Related Posts

Context window management when 128k still isn't enough

Context window management when 128k still isn't enough

Larger context windows were supposed to make conte ...