Versioning prompts like you version code
- William Jacob
- Engineering , Prompts
- 15 May, 2026
The lifecycle of an LLM project is: someone writes a prompt that works, someone else copies it into the codebase, a third person tweaks a word, and four months later nobody can reproduce the version that everyone agrees was best. Prompts are code, but most teams treat them like configuration — at best — and they pay for it on the day they need to roll back.
What “version control for prompts” actually requires
A canonical store with an ID per prompt, a version per change, and a diff history. The store can be a YAML file in the repo, a database, or a third-party tool — what matters is that there is exactly one source of truth and the production system pulls from it. Every prompt change is reviewed like a code change, with at least one other person reading the diff. The prompt ID is logged with every model call, so when you see a regression in metrics, you can correlate it to a specific prompt version.
What this prevents
Silent regressions when someone “fixes” a prompt without telling anyone. A/B tests that can’t be analyzed because the traffic mixed two versions invisibly. The classic incident where the “old” prompt is restored but it’s actually the third-most-recent version, and nobody notices the difference until users do. None of this requires heavy tooling — a YAML file plus discipline gets you most of the way.
Prompts are code. Treat them like code or treat them like a problem you’ll have later.