How effective is guidance injection against OpenClaw?

Across 52 natural prompts and 6 LLM backends, attacks achieved 16-64% success rates, with most malicious actions executed autonomously without user confirmation.

Can existing scanners detect guidance injection?

No — 94% of malicious skills using guidance injection evaded detection by both static and LLM-based scanners.

What attacks can guidance injection enable?

26 malicious skills spanning 13 categories: credential exfiltration, workspace destruction, privilege escalation, persistent backdoor installation, and content manipulation.

How does guidance injection differ from prompt injection?

Traditional prompt injection uses explicit malicious instructions. Guidance injection redefines what the agent considers routine operations, causing autonomous harmful actions on ambiguous requests.

Who discovered guidance injection?

Researchers from Shanghai Jiao Tong University published the Trojan's Whisper paper on arXiv in March 2026.

Is there a fix for guidance injection?

No specific patch exists. The vulnerability is architectural — it exploits OpenClaw's extensible skill ecosystem lifecycle hooks during agent initialization.

← Back to dashboard

clawsmith.com/signal/trojans-whisper-guidance-injection-94pct-evasion-openclaw

⚠ IssueWide OpenLive

Trojan's Whisper: Guidance Injection Attack Evades 94% of OpenClaw Scanners

Q: What is guidance injection in OpenClaw?

A new attack class where malicious skills embed adversarial narratives into bootstrap guidance files, framing harmful actions as routine best practices.

Academic paper from Shanghai Jiao Tong University reveals guidance injection — a new attack class where malicious OpenClaw skills embed adversarial narratives into bootstrap guidance files. 26 malicious skills across 13 attack categories achieved 16-64% success rates across 6 LLM backends, with 94% evading existing static and LLM-based scanners.

Product Idea from this Signal

A runtime behavioral sandbox that detects guidance injection attacks in OpenClaw skills by observing what agents actually do instead of scanning what skills say

17.6k ▲

Existing OpenClaw skill scanners use static analysis and LLM-based content scanning to flag malicious skills before installation. The Trojan's Whisper paper (March 2026) proved that 94% of guidance injection attacks evade both approaches because the malicious payload is disguised as routine operational guidance, not explicit instructions. Meanwhile 12% of ClawHub's skill registry has been compromised at some point in 2026. The gap is clear. Instead of scanning skill text, this product spins up an isolated OpenClaw instance, installs the skill, runs a battery of natural user prompts, and observes what the agent actually does. Credential access, file writes outside sandbox, network exfiltration, privilege escalation attempts all get flagged as behavioral anomalies regardless of how the skill's guidance file describes them.

CLIOPEN-SOURCESECURITYDEVTOOLRUNTIME-ANALYSIS

CompetitiveView Opportunity →