Edgee Codex Compressor: The Rust Gateway That Cuts Codex Costs 35.6%
Every coding agent has the same dirty secret: most of your tokens aren't code. They're noise — directory listings, git logs, build output, test failures — stuffed into the context window over and over as the agent works. Edgee, an open-source Rust-based LLM gateway, is the first tool to take that noise seriously and charge nothing to strip it out.
The numbers it put up for Codex are the kind that make billing dashboards look suspicious in a good way.
The Benchmark That Broke the Dashboard
In Edgee's own published benchmark, pairing Codex + Edgee against Codex alone produced three measurable wins on the same task:
- 49.5% fewer input tokens consumed
- Cache hit rate improved from 76.1% to 85.4%
- 35.6% lower total session cost
That third number is the headline. A third of your coding-agent bill, gone, without changing the model, the prompt, or the code you ship. The compression is described as lossless from the model's perspective — output quality is preserved, but the prompt arriving at OpenAI is substantially leaner.
Edgee isn't making your model dumber. It's stopping you from paying to send the same
lsoutput for the ninth time.
How Token Compression Actually Works
Edgee sits as a transparent proxy between your coding agent and the upstream LLM provider. Before each request hits OpenAI or Anthropic, it analyzes the tool outputs being injected into the context and removes what the model doesn't actually need to reason about the task:
- File listings get deduped and summarized
- Git logs get trimmed to the commits that matter
- Build output collapses to the failures, not the 4,000 lines of passing steps
- Test results drop the stack traces the model already saw two turns ago
The result is a context window that still carries the signal of what the agent has observed without the repeated noise. Cache hit rates go up because the compressed prompts are more stable across turns — and that's where the second compounding cost reduction comes from.
Installation
Edgee is written in Rust (96.8% of the codebase) and distributed as a single binary. One line gets it onto your machine:
# macOS / Linux
curl -fsSL https://edgee.ai/install.sh | bash
# Windows (PowerShell)
irm https://edgee.ai/install.ps1 | iex
# Homebrew
brew install edgee-ai/tap/edgee
Wrapping your coding agent is a single subcommand — no config files, no SDK integration, no code changes:
edgee launch codex # wraps Codex
edgee launch claude # wraps Claude Code
edgee launch opencode # wraps Opencode
For anything else that speaks the OpenAI API, edgee serve exposes a drop-in proxy endpoint.
What's Supported Today
| Agent | Status | Command |
|---|---|---|
| Codex | Supported | edgee launch codex |
| Claude Code | Supported | edgee launch claude |
| Opencode | Supported | edgee launch opencode |
| Cursor | Coming soon | — |
| Any OpenAI-compatible client | Supported via proxy | edgee serve |
The latest release, v0.2.1, shipped on April 9, 2026. The project is licensed under Apache 2.0, which means the compression logic is fully auditable — an important property if you're going to let a middleman rewrite the prompts you're paying for.
Why Rust Matters Here
Putting a gateway between your agent and the LLM provider is a latency bet. Every millisecond Edgee spends analyzing tool output is a millisecond your agent is sitting idle. Rust is the right language for this job — the binary is small, startup is instant, and the per-request overhead is low enough that the compression wins dominate cleanly in the benchmark.
It's also why the project can credibly claim lossless from the model's perspective: the compression passes aren't model-driven, they're deterministic transforms over known tool outputs. You aren't replacing one LLM call with two — you're doing work in microseconds that saves you tokens in seconds.
The Bottom Line
Edgee is the kind of infrastructure that should have existed a year ago. Coding agents ship with context management that was designed for demos, not for month-three of a real codebase, and the bill reflects it. A transparent Rust proxy that cuts Codex session costs by 35.6% without touching your workflow is the shortest path from "my AI spend is out of control" to "my AI spend is reasonable" that exists today.
Install it, wrap your agent, and watch the same tasks run on half the input tokens. If the benchmark holds on your workload, the install command paid for itself before you finished reading this sentence.
