A CLI proxy and MCP server that compresses noisy shell output before it reaches an LLM coding agent, with per-developer token spend tracking and team budget enforcement

AI coding agents (Claude Code, Cursor, Copilot) burn millions of tokens on raw stdout from cargo test, git log, and build tools — developers report 10M+ tokens wasted per two-week sprint with no visibility into where the spend went or who on the team is responsible. RTK and Lowfat solve the compression problem for individual developers but ship no team tier, no MCP-native protocol integration, no per-developer budget caps, and no spend dashboard. This product is a single-binary CLI proxy that sits between any dev command and any AI coding agent, compresses output 60-90% via pluggable per-command filters, exposes an MCP server endpoint so agents query structured summaries instead of raw dumps, and adds a SaaS control plane where engineering leads set per-developer token budgets, see real-time spend attribution by developer and repo, and receive alerts before bills spike.

Demand Breakdown

GitHub

66,924

156

Social Proof 3 sources

CLI proxy reducing LLM token consumption 60-90%

@gh:rtk-ai · 2026-01-22

65,989 GH

Pluggable CLI filter for LLM token reduction

@gh:zdkaster · 2026-05-01

935 HN

Show HN: Lowfat - pluggable CLI filter that saved 91.8% of my LLM tokens

@zdkaster · 2026-06-05

156

Gap Assessment

CompetitiveMultiple tools exist but differentiation opportunities remain

4 tools exist (RTK (Rust Token Killer), Lowfat, LiteLLM, Helicone) but gaps remain: No MCP-native protocol integration so agents query the proxy as a tool. No per-developer spend tracking or budget caps. No real-time team spend dashboard or alerts. No pluggable per-command filter composition.; Individual dev tool only. No team billing layer, no MCP server, no budget enforcement, no spend observability. Architecture debate still open in community..

Features7 agent-ready prompts

Per-command output filter pipeline

▶

MCP server endpoint for structured summaries

▶

Per-developer token spend tracking

▶

Team budget enforcement and alerts

▶

Session-level coding cost report

▶

Agent context window awareness

▶

Install, onboarding, and team invite flow

▶

Competitive LandscapeFREE

Product	Does	Missing
RTK (Rust Token Killer)	Single Rust binary CLI proxy that intercepts and compresses 100+ dev commands before output reaches LLM context. 60-90% token reduction. 62k GitHub stars. Free tier plus $15/dev/month team plan.	No MCP-native protocol integration so agents query the proxy as a tool. No per-developer spend tracking or budget caps. No real-time team spend dashboard or alerts. No pluggable per-command filter composition.
Lowfat	Pluggable CLI filter with per-command transformation rules. Saved 91.8% of tokens in reported benchmarks. Open source, 935 GitHub stars.	Individual dev tool only. No team billing layer, no MCP server, no budget enforcement, no spend observability. Architecture debate still open in community.
LiteLLM	AI gateway with virtual API keys, per-team and per-user budget caps, cost routing across 100+ LLM providers. Targets platform/infra teams managing model calls.	Operates at the API gateway layer, not the CLI output layer. Does not compress or filter noisy shell output before it hits the agent. No developer-side CLI proxy. Budget enforcement is at the API key level not the coding session level.
Helicone	LLM observability platform with per-call logging, cost attribution, and prompt management. Used by teams to track spend in production apps.	Observability tool, not a compression layer. Records spend after the fact but does not reduce token count at source. No CLI proxy, no MCP integration, no shell output filtering.