SkillSieve is a three-layer detection framework that applies progressively deeper analysis to identify malicious AI agent skills, achieving 0.800 F1 score on a 400-skill benchmark.

How does SkillSieve detect malicious skills?

Layer 1 uses regex, AST, and metadata checks to filter 86% of benign skills in under 40ms. Layer 2 sends suspicious skills to an LLM with 4 parallel analysis tasks. Layer 3 uses a multi-LLM jury that votes and debates.

How much does SkillSieve cost per skill?

Average cost is $0.006 per skill, with Layer 1 costing nothing (local processing) and Layers 2-3 incurring LLM API costs only for suspicious skills.

How does SkillSieve compare to ClawVet?

SkillSieve achieves 0.800 F1 compared to ClawVets 0.421 F1 on the same benchmark, nearly doubling detection accuracy.

Is SkillSieve open source?

Yes, the code, data, and benchmark are all open-sourced. The paper is available on arXiv (2604.06550).

← Back to dashboard

clawsmith.com/signal/skillsieve-hierarchical-triage-malicious-agent-skills

📈 TrendsWide OpenLive

SkillSieve: Three-Layer Triage Detects Malicious AI Agent Skills at 0.800 F1 for $0.006/Skill

Three-layer detection framework from academic researchers. Layer 1: regex/AST/metadata XGBoost scorer filters 86% of benign skills in <40ms at zero API cost. Layer 2: LLM analysis with 4 parallel sub-tasks (intent alignment, permission justification, covert behavior detection, cross-file consistency). Layer 3: multi-LLM jury voting with debate. Achieves 0.800 F1 vs ClawVet 0.421 on 400-skill benchmark. Evaluated on 49,592 real ClawHub skills. Runs on $440 ARM SBC. Code and data open-sourced.

Product Idea from this Signal

A pre-install verification gate that formally proves an AI agent skill cannot exceed its declared capabilities before allowing it onto your system

13.0k ▲

26.1% of agent skills across major registries have at least one security vulnerability according to a 42,447-skill empirical study. Snyk found 13.4% of ClawHub skills contain critical issues. Current scanners use pattern matching and heuristics, which miss novel attack vectors. This tool uses formal verification to mathematically prove that a skill's actual behavior matches its declared capability set, blocking installation if the proof fails. It sits as a pre-install gate in the OpenClaw skill lifecycle.

CLIOPEN-SOURCESECURITYDEVTOOLFORMAL-VERIFICATION

CompetitiveView Opportunity →