Every Claude Code user has had the same quiet realization at the end of the month: the token bill is not a rounding error. Agent loops are expensive. Long CLAUDE.md files are expensive. Verbose responses you skim for three seconds and forget are the most expensive of all.
Julius Brussee's answer is as stupid as it is effective. Make Claude talk like a caveman.
Caveman is a Claude Code skill (with ports for Codex, Cursor, Windsurf, Gemini CLI, Cline, and GitHub Copilot) that strips out filler language, articles, and conversational padding while preserving technical accuracy. It hit 30,400 GitHub stars in weeks, climbed to #1 on GitHub's trending list, and landed on Product Hunt's April 14, 2026 daily leaderboard.
It is also, obviously, a joke that works.
What Caveman Actually Does
The plugin ships a set of response-shaping instructions that push the model toward short declarative sentences — subject, verb, object, done. No pleasantries, no "Certainly! Here's how…" intros, no recap paragraphs at the end.
The canonical example from the repo is a React question about why a memoized component re-renders. Normal Claude writes ~1,180 tokens explaining referential equality, the role of useMemo, and common pitfalls. Caveman Claude writes:
Inline obj prop → new ref → re-render. useMemo.
159 tokens. 87% reduction. You still learn the thing.
Across 11 benchmark prompts, Caveman averages a 65% drop in response tokens, with individual task reductions ranging from 22% to 87%. That is the number to anchor on. The author's "~75%" headline figure reflects his personal workload; yours will depend on how verbose your tasks naturally are.
The Mode Ladder
Caveman ships with four intensity levels, and that matters — you rarely want maximum compression for every task.
| Mode | Behavior | When to use |
|---|---|---|
| Lite | Professional terseness, grammar intact | PR reviews, docs, anything a teammate will read |
| Full | Default caveman mode, dropped articles | Day-to-day coding work |
| Ultra | Telegraphic, heavy abbreviation | Repetitive agent loops, cost-sensitive jobs |
| 文言文 | Classical Chinese literary syntax, three levels | Maximum token-per-information ratio |
The Classical Chinese mode is not a gimmick. Ancient literary Chinese is one of the most information-dense writing systems humans have ever produced, and large models tokenize it efficiently. For pure compression it beats English. For readability by your coworkers, obviously, it does not.
The Companion Tools That Matter More
The compression trick on output tokens is clever. The compression trick on input tokens is where the real savings live.
Caveman Compress rewrites your CLAUDE.md file — the memory document Claude reads at the start of every session — into caveman-speak. The author reports roughly 46% fewer input tokens per session, which is the kind of number that actually moves a monthly bill because it multiplies by every single turn of every conversation.
Two other specialized skills are worth enabling:
- caveman-commit — generates commit messages capped at 50 characters. Forces the discipline git users have been trying and failing to keep since 2005.
- caveman-review — one-line PR comments with emoji severity indicators. The PR review you'll actually read.
Installing It
Pick your agent:
Claude Code:
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Gemini CLI:
gemini extensions install https://github.com/JuliusBrussee/caveman
Cursor, Windsurf, Cline, or Copilot:
npx skills add JuliusBrussee/caveman -a cursor
# or windsurf, cline, github-copilot
Codex: clone the repo, drop it in /plugins, search for Caveman, install.
The plugin is MIT licensed. No telemetry, no account, no upsell.
The Honest Catch
The 65% output reduction is real. The total cost savings per session are smaller — usually in the 8–10% range once you count the back-and-forth turns where a terser first answer prompts a follow-up question that would have been unnecessary otherwise.
Terse answers are cheaper per response but occasionally expensive per outcome. The Lite mode exists precisely for this reason — use it on anything where a colleague will need to act on the output without a translator.
For code-only loops, where the agent is talking to itself, Full and Ultra are nearly always a win. For anything touching humans, Lite is the sweet spot.
Why It Resonates
The interesting thing about Caveman is not the plugin. It's the premise. Modern LLMs have been tuned to be pleasant — to apologize, hedge, recap, and narrate. For a coding tool, that tuning is expensive theater. Caveman is the first skill to treat conversational politeness as the cost center it actually is.
Expect to see more plugins in this shape before summer. Agent tooling has spent two years optimizing for capability. The next year will be spent optimizing for tokens per useful answer, and Caveman just published the first playbook.
The Bottom Line
Caveman is a one-command install that cuts your Claude Code output tokens by roughly 65% on average, with a companion tool that trims input tokens by another 46% per session. It is MIT licensed, trivially uninstalled, and the only honest way to measure its value is to run it on your own workload for a week and check the bill. The entire premise is a joke. The savings are not.
