Caveman: The Claude Code Skill That Cuts 65% of Output Tokens

Caveman, a Claude Code skill, dramatically cuts AI output tokens by 65%, optimizing agent interactions.

Apr 15, 2026

Every Claude Code user has had the same quiet realization at the end of the month: the token bill is not a rounding error. Agent loops are expensive. Long CLAUDE.md files are expensive. Verbose responses you skim for three seconds and forget are the most expensive of all.

Julius Brussee's answer is as stupid as it is effective. Make Claude talk like a caveman.

Caveman is a Claude Code skill (with ports for Codex, Cursor, Windsurf, Gemini CLI, Cline, and GitHub Copilot) that strips out filler language, articles, and conversational padding while preserving technical accuracy. It hit 30,400 GitHub stars in weeks, climbed to #1 on GitHub's trending list, and landed on Product Hunt's April 14, 2026 daily leaderboard.

It is also, obviously, a joke that works.

What Caveman Actually Does

The plugin ships a set of response-shaping instructions that push the model toward short declarative sentences — subject, verb, object, done. No pleasantries, no "Certainly! Here's how…" intros, no recap paragraphs at the end.

The canonical example from the repo is a React question about why a memoized component re-renders. Normal Claude writes ~1,180 tokens explaining referential equality, the role of useMemo, and common pitfalls. Caveman Claude writes:

Inline obj prop → new ref → re-render. useMemo.

159 tokens. 87% reduction. You still learn the thing.

Across 11 benchmark prompts, Caveman averages a 65% drop in response tokens, with individual task reductions ranging from 22% to 87%. That is the number to anchor on. The author's "~75%" headline figure reflects his personal workload; yours will depend on how verbose your tasks naturally are.

The Mode Ladder

Caveman ships with four intensity levels, and that matters — you rarely want maximum compression for every task.

Mode	Behavior	When to use
Lite	Professional terseness, grammar intact	PR reviews, docs, anything a teammate will read
Full	Default caveman mode, dropped articles	Day-to-day coding work
Ultra	Telegraphic, heavy abbreviation	Repetitive agent loops, cost-sensitive jobs
文言文	Classical Chinese literary syntax, three levels	Maximum token-per-information ratio

The Classical Chinese mode is not a gimmick. Ancient literary Chinese is one of the most information-dense writing systems humans have ever produced, and large models tokenize it efficiently. For pure compression it beats English. For readability by your coworkers, obviously, it does not.

The Companion Tools That Matter More

The compression trick on output tokens is clever. The compression trick on input tokens is where the real savings live.

Caveman Compress rewrites your CLAUDE.md file — the memory document Claude reads at the start of every session — into caveman-speak. The author reports roughly 46% fewer input tokens per session, which is the kind of number that actually moves a monthly bill because it multiplies by every single turn of every conversation.

Two other specialized skills are worth enabling:

caveman-commit — generates commit messages capped at 50 characters. Forces the discipline git users have been trying and failing to keep since 2005.
caveman-review — one-line PR comments with emoji severity indicators. The PR review you'll actually read.

Installing It

Pick your agent:

Claude Code:

claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

Gemini CLI:

gemini extensions install https://github.com/JuliusBrussee/caveman

Cursor, Windsurf, Cline, or Copilot:

npx skills add JuliusBrussee/caveman -a cursor
# or windsurf, cline, github-copilot

Codex: clone the repo, drop it in /plugins, search for Caveman, install.

The plugin is MIT licensed. No telemetry, no account, no upsell.

The Honest Catch

The 65% output reduction is real. The total cost savings per session are smaller — usually in the 8–10% range once you count the back-and-forth turns where a terser first answer prompts a follow-up question that would have been unnecessary otherwise.

Terse answers are cheaper per response but occasionally expensive per outcome. The Lite mode exists precisely for this reason — use it on anything where a colleague will need to act on the output without a translator.

For code-only loops, where the agent is talking to itself, Full and Ultra are nearly always a win. For anything touching humans, Lite is the sweet spot.

Why It Resonates

The interesting thing about Caveman is not the plugin. It's the premise. Modern LLMs have been tuned to be pleasant — to apologize, hedge, recap, and narrate. For a coding tool, that tuning is expensive theater. Caveman is the first skill to treat conversational politeness as the cost center it actually is.

Expect to see more plugins in this shape before summer. Agent tooling has spent two years optimizing for capability. The next year will be spent optimizing for tokens per useful answer, and Caveman just published the first playbook.

The Bottom Line

Caveman is a one-command install that cuts your Claude Code output tokens by roughly 65% on average, with a companion tool that trims input tokens by another 46% per session. It is MIT licensed, trivially uninstalled, and the only honest way to measure its value is to run it on your own workload for a week and check the bill. The entire premise is a joke. The savings are not.

claude-code developer-tools llm-cost plugins productivity

More in Tech Tips

Tech Tips

Firecrawl: Turn Any Website Into Agent-Ready Markdown

Firecrawl converts messy, JavaScript-rendered websites into clean, LLM-ready markdown for RAG and AI agents. Install with 'pip install firecrawl' and use the Firecrawl class: scrape for known URLs (1 credit), crawl for discovery (1 credit per page, always set a limit), and schema-based extraction for typed JSON. Watch Enhanced/Stealth Mode, which costs 5 credits per page on Cloudflare-protected sites, and note that credits do not roll over.

By Marcus Rivera · 5 min · Jun 10, 2026

Tech Tips

RAG Grounding: 7 Ways to Stop LLM Hallucinations in Production

A practitioner's guide to grounding retrieval-augmented generation systems. Covers fixing retrieval first, hybrid dense-plus-keyword search, cross-encoder reranking, contextual compression, refusal prompting, verified citations, Chain-of-Verification, confidence-threshold abstention, and measuring faithfulness with RAGAS.

By Marcus Rivera · 6 min · Jun 9, 2026

Tech Tips

MCP Security: A 2026 Hardening Playbook After CVE-2025-6514

A practical 2026 security playbook for Model Context Protocol agents. It explains MCP-specific threats (prompt injection, tool poisoning, rug pulls, confused-deputy), dissects the critical CVE-2025-6514 mcp-remote RCE, and gives concrete hardening steps: patch to 0.1.16, enforce OAuth 2.1 over HTTPS, isolate servers, gate destructive actions, and audit agent activity.

By Marcus Rivera · 7 min · Jun 2, 2026