What is the OpenRouter caching feature in OpenClaw v2026.5.4?

OpenClaw can now send X-OpenRouter-Cache and X-OpenRouter-Cache-TTL headers on verified OpenRouter routes, enabling server-side caching for repeated prompts to reduce latency and cost.

Which models support OpenRouter caching in OpenClaw?

Anthropic models get cache_control injected on system/developer prompt blocks. DeepSeek, Moonshot, and zAI models use contextPruning.mode: cache-ttl for automatic provider-side caching.

Is OpenRouter caching enabled by default?

No, caching is opt-in. You need to configure it in your OpenClaw settings to enable the X-OpenRouter-Cache headers.

How much cost reduction does OpenRouter caching provide?

The reduction depends on workload — it is most effective for repetitive system prompts where the same instructions are sent repeatedly across agent sessions.

When was this feature released?

The OpenRouter caching feature shipped in OpenClaw v2026.5.4 on May 5, 2026.

← Back to dashboard

clawsmith.com/signal/openclaw-openrouter-cache-headers-v2026-5-4

📈 TrendsWide OpenLive

OpenClaw v2026.5.4 Ships OpenRouter Server-Side Caching — Reduces Latency and Cost on Repetitive Prompts

v2026.5.4 adds opt-in response caching via X-OpenRouter-Cache and X-OpenRouter-Cache-TTL headers on verified OpenRouter routes. Injects Anthropic cache_control on system/developer prompt blocks. For deepseek/moonshot/zai refs, enables contextPruning.mode cache-ttl for automatic provider-side caching.

Product Idea from this Signal

A background service that enforces per-session token budgets on OpenClaw agents, auto-prunes context when limits approach, and reports cost per task

354 ▲

OpenClaw retains all conversation history by default, growing from 5K tokens in round 1 to 150K by round 10. Users routinely hit $50-100/day in API costs running stock configurations. Memory consumption climbs to 28GB by day three. Existing tools show token usage after the fact or add a kill-switch, but nothing actively manages the context window in real time by pruning low-value messages, enforcing per-session budgets, and attributing cost to individual tasks. This service sits between OpenClaw and the LLM provider, intercepts every request, enforces configurable token budgets, and auto-prunes context using a relevance scorer before the request hits the API.

BACKGROUND-SERVICECOST-OPTIMIZATIONOPEN-SOURCEDEVTOOL

CompetitiveView Opportunity →