Edgee Codex Compressor: The Rust Gateway That Cuts Codex Costs 35.6%

Edgee Codex Compressor, a Rust gateway, cuts LLM costs by 35.6% by compressing tool output.

Apr 12, 2026

Edgee Codex Compressor: The Rust Gateway That Cuts Codex Costs 35.6%

Every coding agent has the same dirty secret: most of your tokens aren't code. They're noise — directory listings, git logs, build output, test failures — stuffed into the context window over and over as the agent works. Edgee, an open-source Rust-based LLM gateway, is the first tool to take that noise seriously and charge nothing to strip it out.

The numbers it put up for Codex are the kind that make billing dashboards look suspicious in a good way.

The Benchmark That Broke the Dashboard

In Edgee's own published benchmark, pairing Codex + Edgee against Codex alone produced three measurable wins on the same task:

49.5% fewer input tokens consumed
Cache hit rate improved from 76.1% to 85.4%
35.6% lower total session cost

That third number is the headline. A third of your coding-agent bill, gone, without changing the model, the prompt, or the code you ship. The compression is described as lossless from the model's perspective — output quality is preserved, but the prompt arriving at OpenAI is substantially leaner.

Edgee isn't making your model dumber. It's stopping you from paying to send the same ls output for the ninth time.

How Token Compression Actually Works

Edgee sits as a transparent proxy between your coding agent and the upstream LLM provider. Before each request hits OpenAI or Anthropic, it analyzes the tool outputs being injected into the context and removes what the model doesn't actually need to reason about the task:

File listings get deduped and summarized
Git logs get trimmed to the commits that matter
Build output collapses to the failures, not the 4,000 lines of passing steps
Test results drop the stack traces the model already saw two turns ago

The result is a context window that still carries the signal of what the agent has observed without the repeated noise. Cache hit rates go up because the compressed prompts are more stable across turns — and that's where the second compounding cost reduction comes from.

Installation

Edgee is written in Rust (96.8% of the codebase) and distributed as a single binary. One line gets it onto your machine:

# macOS / Linux
curl -fsSL https://edgee.ai/install.sh | bash

# Windows (PowerShell)
irm https://edgee.ai/install.ps1 | iex

# Homebrew
brew install edgee-ai/tap/edgee

Wrapping your coding agent is a single subcommand — no config files, no SDK integration, no code changes:

edgee launch codex      # wraps Codex
edgee launch claude     # wraps Claude Code
edgee launch opencode   # wraps Opencode

For anything else that speaks the OpenAI API, edgee serve exposes a drop-in proxy endpoint.

What's Supported Today

Agent	Status	Command
Codex	Supported	`edgee launch codex`
Claude Code	Supported	`edgee launch claude`
Opencode	Supported	`edgee launch opencode`
Cursor	Coming soon	—
Any OpenAI-compatible client	Supported via proxy	`edgee serve`

The latest release, v0.2.1, shipped on April 9, 2026. The project is licensed under Apache 2.0, which means the compression logic is fully auditable — an important property if you're going to let a middleman rewrite the prompts you're paying for.

Why Rust Matters Here

Putting a gateway between your agent and the LLM provider is a latency bet. Every millisecond Edgee spends analyzing tool output is a millisecond your agent is sitting idle. Rust is the right language for this job — the binary is small, startup is instant, and the per-request overhead is low enough that the compression wins dominate cleanly in the benchmark.

It's also why the project can credibly claim lossless from the model's perspective: the compression passes aren't model-driven, they're deterministic transforms over known tool outputs. You aren't replacing one LLM call with two — you're doing work in microseconds that saves you tokens in seconds.

The Bottom Line

Edgee is the kind of infrastructure that should have existed a year ago. Coding agents ship with context management that was designed for demos, not for month-three of a real codebase, and the bill reflects it. A transparent Rust proxy that cuts Codex session costs by 35.6% without touching your workflow is the shortest path from "my AI spend is out of control" to "my AI spend is reasonable" that exists today.

Install it, wrap your agent, and watch the same tasks run on half the input tokens. If the benchmark holds on your workload, the install command paid for itself before you finished reading this sentence.

llm-gateway rust token-compression codex claude-code open-source

More in Tech Tips

Tech Tips

Firecrawl: Turn Any Website Into Agent-Ready Markdown

Firecrawl converts messy, JavaScript-rendered websites into clean, LLM-ready markdown for RAG and AI agents. Install with 'pip install firecrawl' and use the Firecrawl class: scrape for known URLs (1 credit), crawl for discovery (1 credit per page, always set a limit), and schema-based extraction for typed JSON. Watch Enhanced/Stealth Mode, which costs 5 credits per page on Cloudflare-protected sites, and note that credits do not roll over.

By Marcus Rivera · 5 min · Jun 10, 2026

Tech Tips

RAG Grounding: 7 Ways to Stop LLM Hallucinations in Production

A practitioner's guide to grounding retrieval-augmented generation systems. Covers fixing retrieval first, hybrid dense-plus-keyword search, cross-encoder reranking, contextual compression, refusal prompting, verified citations, Chain-of-Verification, confidence-threshold abstention, and measuring faithfulness with RAGAS.

By Marcus Rivera · 6 min · Jun 9, 2026

Tech Tips

MCP Security: A 2026 Hardening Playbook After CVE-2025-6514

A practical 2026 security playbook for Model Context Protocol agents. It explains MCP-specific threats (prompt injection, tool poisoning, rug pulls, confused-deputy), dissects the critical CVE-2025-6514 mcp-remote RCE, and gives concrete hardening steps: patch to 0.1.16, enforce OAuth 2.1 over HTTPS, isolate servers, gate destructive actions, and audit agent activity.

By Marcus Rivera · 7 min · Jun 2, 2026