Agents hallucinate tool calls for non-existent tools and corrupt structured outputs when switching between OpenAI UI and Responses API
After OpenAI shipped the Responses API (replacing Assistants API, August 2026 shutdown deadline), developers hitting the community forum report two compounding failure modes. First, the model generates tool calls invoking tools not defined in the schema (hallucinated 'search' function with valid-looking parameters) while simultaneously producing a useful response -- leaving the handler in an ambiguous state with no clean error path. Second, agents configured identically in the Agent Kit UI vs the Responses API produce dramatically different outputs: zero hallucinations in UI, fabricated UUIDs in API integration. The discrepancy traces to hidden configuration differences (temperature defaults, context window tool-output integration, implicit strictness controls) that are not documented. Strict mode (constrained decoding) exists as an opt-in fix but is not the default, meaning most production deployments silently tolerate type mismatches, missing required fields, and invalid enum values. The earlier Assistants-to-Responses migration CLI signal (already in DB) focused on migration tooling; this signal is distinct: it is about runtime output integrity of already-migrated agents in production -- specifically the UI-vs-API behavioral gap and hallucinated tool invocations post-migration.
Score Breakdown
Social Proof 1 sources
Gap Assessment
Guardrails AI and Instructor add schema enforcement but require wrapping every call and are not aware of the UI/API behavioral gap. No product does: a drop-in Responses API proxy that enforces strict schema on every tool call, detects and surfaces the UI-vs-API config delta before deployment, and logs hallucinated-tool invocations for replay. Narrow focused gap, no funded competitor in this exact slot.