MCP Session Corruption Pitfalls¶
STATUS: active OWNER: all agents with MCP tools CATEGORY: pitfall ADDED: 2026-03-27 SEVERITY: critical
THE PROBLEM¶
When an MCP tool returns image data (base64 PNG, inline screenshots) that the LLM API cannot process, the image becomes permanently stuck in the conversation context. Every subsequent API call re-sends the broken image, causing an infinite 400 error loop. The session is unrecoverable — the only fix is killing the process.
Discovered 2026-03-27 during Abby PA dogfood with Alexander + Stitch MCP.
HOW IT HAPPENS¶
- Agent calls MCP tool that generates/returns image data
- MCP tool returns data inline (base64 or data reference)
- Image enters the LLM conversation message history
- LLM API rejects it (400 Bad Request — malformed/unsupported image)
- Every subsequent turn re-sends same conversation history with broken image
- Infinite 400 loop — tokens burn on each failed retry
AFFECTED TOOLS¶
| Tool | Risk | Reason |
|---|---|---|
Stitch get_screen_image |
HIGH | Returns base64 PNG inline |
Stitch get_screen_code |
LOW | Returns HTML (text, not image) |
Figma download_figma_images |
SAFE | Saves to disk via localPath |
Figma get_screenshot |
HIGH | Returns screenshot inline |
Playwright browser_screenshot |
MEDIUM | Can return inline or save to file |
| Runway | HIGH | Returns video frames inline |
MITIGATION (IMPLEMENTED)¶
1. Resilience Module (ge_agent/execution/resilience.py)¶
- ErrorWindow: sliding 30-second window tracking error patterns
- Detects corruption (API 400 + image keywords) or error storms (3+ identical errors)
- Kills session immediately to stop token burn
- Writes ISO/SOC incident to
ge-ops/system/incidents/ - Notifies human inbox at
ge-ops/system/inbox/pending/ - Records learning to
ge-ops/system/learnings/
2. Recovery Retry (ge_agent/execution/pty_executor.py)¶
- After corruption kill, starts fresh session WITHOUT MCP config
- Injects recovery instructions: "do not use image tools"
- Carries forward list of files already created
- Uses half the original turn budget (easy work already done)
- If recovery also fails, gives up (no infinite retry)
3. Version Pinning¶
- Stitch MCP pinned to v0.4.0 (v0.5.1 has process.exit bug)
- Blocked versions tracked in
mcp_integration.py→ MCP_BLOCKED_VERSIONS
PREVENTION RULES FOR AGENTS¶
- ALWAYS save images to disk — use localPath/output-dir parameters
- NEVER request inline base64 from MCP tools that generate images
- Keep image operations LAST — complete all text work before generating images
- Validate before continuing — check MCP tool response before proceeding
- One image at a time — if generating multiple designs, save each to disk
PREVENTION RULES FOR INTEGRATION AUTHORS¶
- Prefer file output over inline — MCP tools should write to disk
- Size limit inline responses — truncate/reject base64 > 100KB
- Document image behavior — every MCP tool that returns images must state whether inline or file
- Test with Claude Code — validate the full MCP → Claude → API round-trip
RELATED¶
- Resilience module:
ge_agent/execution/resilience.py - MCP integration framework:
ge_agent/execution/mcp_integration.py - MCP configs:
ge-ops/master/mcp-configs/ - Alexander MCP:
development/integrations/google-stitch.md - Figma MCP:
development/integrations/figma.md