Caveman: Cut 80% of AI Bloat — the Tool That Makes Your Coding Assistant Shut Up and Ship

You are staring at a terminal while Claude Code produces a 300-word explanation for a question that needed 30. You know exactly what this is costing you — every token is metered, and over half of them are “That’s an excellent question,” “Let me provide additional context,” and “I hope this helps.” You have learned to tune it out. The real question is: what are you actually paying for?

This is not anecdotal. A developer benchmarked it: explaining a React re-render bug took 1,180 tokens in normal mode and 159 tokens in Caveman mode. Same answer. That is an 87% cut. A GitHub project called Caveman has 44.8K stars for making exactly one decision — flipping the mode switch to minimal and rejecting every drop of social lubrication.

Caveman’s approach is brutal to the point of elegance. No model parameters are changed. No architecture is swapped. No fine-tuning. It does one thing: in the system prompt, it tells the AI, “You are not a civilized assistant. You are a caveman. No greetings. No setup. No wrap-up. Just the point.” It sounds like a joke, but the results are real. Research shows that when models are forced into concise output, accuracy does not degrade — it sometimes improves, because the cognitive load of maintaining a “polite persona” is eliminated.

It is not a blunt instrument. Caveman offers three compression levels. Lite strips filler words while preserving full sentences — suitable for formal documentation and external syncs. Full is the default, cutting articles, fragmenting expressions, full caveman style — ideal for daily development and rapid debugging. Ultra goes full telegraph mode with aggressive abbreviations and symbolic mappings for maximum token savings. What matters most: it knows the difference between noise and code. Technical terms, function names, file paths, and code blocks remain untouched. Only the connective tissue of natural language gets pulverized.

Picture a typical afternoon: three bugs, four back-and-forth turns each. Debugging an auth middleware expiration costs 704 tokens. Configuring a PostgreSQL connection pool somehow burns 2,347 tokens. Implementing a React error boundary component hits 3,454 tokens. In Caveman mode, those numbers become 121, 380, and 456. An hour of dense conversation that would clock 100,000 tokens now stays under 20,000. The money saved is the superficial win. The real savings are the waiting time that no longer breaks your flow, the scrolling you no longer do to find the actual answer.

Three scenarios illustrate where it shines. Scenario one: you hit an unfamiliar error and want the root cause and fix — nothing else. Caveman Full gives you a fragmented technical verdict that cuts to the chase in half a sentence. Scenario two: you are reviewing code for security vulnerabilities, and Caveman’s compression ratio drops to about 41%, well below the 80%+ of typical Q&A. Code review requires complete logical chains — the tool recognizes this and preserves what matters. Scenario three: you are drafting team-facing technical documentation that needs to be concise but grammatically complete. Lite mode keeps the syntax and structure intact while stripping only pure filler words and redundant statements.

Caveman has limits. It is not built for interactions that require conversational warmth — onboarding a junior engineer, simulating an interview, brainstorming. Over-compression in complex architectural discussions can drop essential reasoning steps. Ultra mode abbreviations may leave teammates reading your chat logs completely lost. Under the hood, Caveman is a system-prompt-level constraint, not a hard-coded truncator — some models comply better than others. At its core, it is outsourced prompt engineering, and its effectiveness depends on how obedient the underlying model is to its directives.

Installation is a one-liner. Claude Code users install via the plugin marketplace: claude plugin marketplace add JuliusBrussee/caveman then claude plugin install caveman@caveman. Cursor, Copilot, Windsurf, Cline, and others use the universal path: npx skills add JuliusBrussee/caveman. Once installed, type /caveman to enter caveman mode, and stop caveman to return to civilization. Switching compression levels is a single command: /caveman ultra.

Half of what you pay AI for is emotional value. The developers who dare to cut that half are the ones whose efficiency actually turns the page.

GitHub: https://github.com/JuliusBrussee/caveman

Related Posts

Claude Code 2.1.136: When an AI Agent's Safety Gate Switches from Trust to Verify

Claude Code 2.1.136: When an AI Agent's Safety Gate Switches from Trust to Verify

You let Claude Code run a long task in auto mode. When you come back, it has written your AWS creden ...

Vibe-Trading: Turn Trading Ideas Into Backtestable Agent Research

Vibe-Trading: Turn Trading Ideas Into Backtestable Agent Research

Ask a trading question and an LLM can give you a polished answer. The problem is that it has no data ...

Multica: When Coding Agents Multiply, Terminals Stop Scaling

Multica: When Coding Agents Multiply, Terminals Stop Scaling

One coding agent is useful. Ten coding agents start to look like a coordination problem. You give o ...

Your Agent Isn't Blind — You Just Forgot to Give It Eyes

Your agent can recite 400K words of investment theory, spot rug pulls, and write strategy code. But ...

Postiz Agent CLI: Hand Your AI the Keys to 28 Social Platforms

You built an agent that reads RSS feeds, summarizes papers, and generates cover images. Then you hit ...

Five Things Claude Code 102 Teaches Academic Researchers

Five Things Claude Code 102 Teaches Academic Researchers

On May 11, 2026, Mushtaq Bilal, PhD published "Claude Code 102 for Academic Researchers," the second ...