An MCP server proxy that enforces per-client token quotas, rate limits, and hard per-task spend ceilings that kill runaway agent loops before they exhaust an API budget

When MCP servers wrap paid APIs like GitHub, Slack, or Jira, a single misconfigured agent can exhaust an entire month's quota in minutes because the MCP protocol has no native mechanism for per-client throttling or budget enforcement. Teams today hard-code throttle logic inside each server individually, and per-task spend ceilings do not exist at all in any shipping tool: LLM proxies like LiteLLM cap at the account or user level, not at the individual agent task or session. This product is a drop-in proxy layer that sits between MCP clients and any MCP server, enforcing per-client token quotas, sliding-window rate limits, and hard per-task spend ceilings that terminate an agent mid-run when a configured budget is hit and optionally checkpoint state so the task can resume.

Demand Breakdown

312

GitHub

Social Proof 4 sources

Show HN: Agent Vault - Open-source credential proxy and vault for agents

@dangtony98 · 2026-04-22

212 HN

Usage-based pricing killing your vibe, here's how to roll your own local AI

@n/a · 2026-05-04

90 GH

Bug: Image API error triggers infinite loop, rapidly exhausting session tokens

@KhalidBA23 · 2026-05-18

12 HN

Ask HN: Any enterprises experimenting with AI agents / MCP-style infra?

@schappim · 2025-06-15

Gap Assessment

CompetitiveMultiple tools exist but differentiation opportunities remain

4 tools exist (Portkey AI Gateway, LiteLLM Proxy, Azure API Management (MCP support), Alephant AI Gateway) but gaps remain: Controls LLM calls only, not downstream MCP tool calls against third-party APIs like GitHub or Jira; no per-task hard kill that terminates mid-run when a ceiling hits; no MCP protocol awareness; Budgets are account/user/team level against LLM APIs, not per-agent-task or per-session; no hard mid-run kill with checkpoint; no MCP tool call layer awareness at all.

Features8 agent-ready prompts

Per-client quota isolation

▶

Sliding-window rate limiter

▶

Per-task spend ceiling with hard kill

▶

Checkpoint and resume on budget hit

▶

Drop-in proxy with zero MCP server changes

▶

Cost estimation engine

▶

Operator dashboard and audit log

▶

Policy-as-code configuration with per-group overrides

▶

Competitive LandscapeFREE

Product	Does	Missing
Portkey AI Gateway	LLM call rate limiting, spend tracking, and observability at the LLM API layer; acquired by Palo Alto Networks June 2026 signaling enterprise validation; $18M raised prior to acquisition	Controls LLM calls only, not downstream MCP tool calls against third-party APIs like GitHub or Jira; no per-task hard kill that terminates mid-run when a ceiling hits; no MCP protocol awareness
LiteLLM Proxy	LLM proxy with per-user and per-team spend budgets, rate limiting, and cost tracking across 100+ LLM providers; 25k+ GitHub stars; YC W23	Budgets are account/user/team level against LLM APIs, not per-agent-task or per-session; no hard mid-run kill with checkpoint; no MCP tool call layer awareness at all
Azure API Management (MCP support)	Enterprise API gateway with MCP server support; rate limiting and quota enforcement per subscription on MCP tool calls available from late 2025	Requires full Azure APIM stack deployment; no per-task spend ceiling with checkpoint-and-resume semantics; no lightweight self-hosted option for teams not on Azure; pricing locked to Azure consumption
Alephant AI Gateway	Open-source Rust gateway for real-time LLM API budget guardrails including per-session monthly spend ceilings with hard reject on crossing threshold	LLM API layer only, not MCP protocol aware; kill is a full reject not a checkpoint-resume; no per-client quota isolation for shared MCP server deployments; early-stage with limited enterprise adoption