An SDK that manages on-device LLM model caching, updates, and hardware routing across mobile apps

Demand Breakdown

499

Social Proof 2 sources

Show HN: Cactus - Ollama for Smartphones

@HenryNdubuaku · 2025-07-10T00:00:00+00:00

313 HN

Launch HN: Cactus (YC S25)

@HenryNdubuaku · 2025-09-18T00:00:00+00:00

186

Gap Assessment

CompetitiveMultiple tools exist but differentiation opportunities remain

3 tools exist (Qualcomm AI Hub, Apple Core ML / Foundation Models, ExecuTorch) but gaps remain: Compile-time tool, not a runtime model-ops layer; no cross-app cache, OTA delta, or adaptive routing; Per-app sandbox, no cross-app model sharing or OTA delta updates.

Features8 agent-ready prompts

Cross-app model cache and deduplication

▶

Model version management and rollback

▶

OTA delta patching for model weights

▶

Hardware-adaptive inference routing

▶

Battery and thermal-aware scheduling

▶

Model registry and signing

▶

A/B model rollout with traffic splitting

▶

Graceful cloud fallback with parity API

▶

Competitive LandscapeFREE

Product	Does	Missing
Qualcomm AI Hub	Cloud workbench to compile, profile, and deploy models to devices	Compile-time tool, not a runtime model-ops layer; no cross-app cache, OTA delta, or adaptive routing
Apple Core ML / Foundation Models	On-device model runtime per app	Per-app sandbox, no cross-app model sharing or OTA delta updates
ExecuTorch	Mobile inference SDK (Meta), GA Oct 2025	Inference engine only, nothing above it for lifecycle, updates, or routing