Skip to content

Agent System Pitfalls

Hook Loop — Annegreet/Eltjo Token Burn

ISSUE: Post-completion hooks with condition "always" at no_block tier created infinite Annegreet-Eltjo feedback loop COST: $1000+ in 30 days from uncontrolled token burn FIX DEPLOYED: hook_origin_depth prevents cross-trigger, per-agent rate limit 20 hooks/hr CURRENT STATE: All 4 agents RE-ENABLED 2026-02-15 after 3-layer hook loop fix (hook_origin_depth, per-agent rate limit, no_block depth cap). See Hook Loops for details. RULE: NEVER add post-completion hook with condition "always" at no_block tier

CronJobs (Active Since 2026-02-15)

HISTORY: All GE CronJobs were suspended Feb 2-15 due to hook loop token burn STATUS: All unsuspended and active as of 2026-02-15 after hook loop fix deployed RUNNING: executor-refresh, health-check, zombie-cleanup, learning-backlog-monitor, learning-struggle-detector, learning-wiki-writer ALWAYS ACTIVE: vault-unseal CronJob (ge-system) and gitlab-toolbox-backup (ge-gitlab) were never suspended

Simulation Anti-Patterns (Fake Proof of Life)

These patterns are DECEPTIVE — they produce correct-looking output through the wrong path: - Inserting rows directly into PostgreSQL to make Admin UI display a "working" feature - Agent reporting "I'm on OpenAI" because config says so without verifying CLI works - Discussion system "works" because test script called API directly instead of triggering real agents - Billing dashboard shows costs from POSTed test data, not actual agent sessions - Provider switching "works" because dropdown saves to DB, but executor never invokes selected CLI

Legacy Identity File Confusion (Three Copies Existed)

ISSUE: Before the wiki brain migration, agent identities existed in THREE locations: 1. /home/claude/ge-bootstrap/identities/{name}/IDENTITY.md — oldest, root-level (ARCHIVED) 2. /home/claude/ge-bootstrap/ge-ops/identities/{name}/IDENTITY.md — partial, 9 agents (ARCHIVED) 3. /home/claude/ge-bootstrap/ge-ops/master/agent-configs/{name}/IDENTITY.md — single-file alongside tiered (ARCHIVED) CURRENT: Only the tiered files are authoritative: IDENTITY-CORE.md, IDENTITY-ROLE.md, IDENTITY-REFERENCE.md LOCATION: ge-ops/master/agent-configs/{name}/ NOTE: File names are IDENTITY-CORE.md (not CORE.md) — the INFRA-OVERVIEW.md had this wrong

LEARNINGS.md Path Mismatch (Fixed 2026-02-16)

AUTHORITATIVE: ge-ops/master/agent-configs/{name}/LEARNINGS.md — where agents write real learnings STUBS: ge-ops/agents/{name}/LEARNINGS.md — empty 9-line stubs, NOT authoritative FIXED: Identity loader reads from agent-configs/ (primary), agents/ as fallback. See EVO-2026-0216-007.

Double Delivery (Fixed)

ISSUE: task-service.ts was XADDing to BOTH triggers.{agent} AND ge:work:incoming — 2x execution cost STATUS: Fixed (ge:work:incoming XADD removed) RULE: NEVER XADD to both streams for the same task

Cost Gate Bypass

ISSUE: cost_gate.py enforces $5/session, $10/agent/hr, $100/day limits RISK: Removing cost_gate imports or bypassing pre-execution checks removes all cost protection RULE: cost_gate.py MUST remain imported and active in pty_executor.py. NEVER bypass.

Before Re-enabling Disabled Agents

  1. Root cause documented
  2. Fix deployed and verified
  3. Agent's Redis stream drained to 0
  4. Enable with replicas=1 first, monitor 30 minutes
  5. Check billing: agent cost < $2 after 30 minutes

Token Budget Bloat (Fixed 2026-02-15)

Problem

Every API call showed ~40k input tokens. The tiered identity system was supposed to keep sessions lean, but the complexity classifier was broken — almost every task classified as "complex", loading all 3 identity tiers (~10-17k prompt tokens) instead of just CORE + ROLE (~5-9k).

Root Causes

1. Complexity classifier threshold too low ISSUE: TaskComplexityClassifier.COMPLEX_SCORE_THRESHOLD = 1 — a single keyword match triggered "complex" KEYWORDS: "fix", "test", "analyze", "create", "configure", "deploy" — present in almost every task description IMPACT: Every session loaded IDENTITY-CORE + IDENTITY-ROLE + IDENTITY-REFERENCE + LEARNINGS FIX: Raised threshold to 3 and removed overly common keywords. Now most tasks classify as "normal". LOCATION: ge_agent/execution/context.py

2. LEARNINGS.md loaded from wrong path ISSUE: Loader read from ge-ops/master/agent-configs/{name}/LEARNINGS.md (stale copies, 400-3000 tokens) RIGHT PATH: ge-ops/agents/{name}/LEARNINGS.md (canonical, written by learning pipeline) EXTRA FIX: Learnings capped at 3000 chars (~750 tokens) in the prompt — full learnings browsable in wiki LOCATION: ge_agent/identity/loader.py

What's In The Prompt (after fix)

Component Tokens (normal) Tokens (simple) Tokens (complex)
Constitution ~2,100 ~2,100 ~2,100
IDENTITY-CORE ~1,400-2,400 ~1,400-2,400 ~1,400-2,400
IDENTITY-ROLE ~2,600-5,600 ~2,600-5,600
IDENTITY-REFERENCE ~2,700-4,400
LEARNINGS (capped) ~30-750 ~30-750 ~30-750
JIT learnings ~500 ~500 ~500
Task context ~125 ~125 ~125
Our prompt total ~6,800-11,500 ~4,200-5,900 ~9,400-15,800
Claude Code overhead ~20,000-25,000 ~20,000-25,000 ~20,000-25,000
Session total ~27,000-36,000 ~24,000-31,000 ~29,000-41,000

How Classification Works Now

Classification Requires Tiers Loaded Turn Budget
simple 2+ simple keywords (status, check, list, health) CORE only 25
normal Default (most tasks) CORE + ROLE 40
complex 3+ complex keywords (implement, comprehensive, investigate, multiple) CORE + ROLE + REFERENCE 60

Source of truth: ge_agent/execution/context.py (classifier), config/agent-execution.yaml (turn budgets)

IDENTITY-ROLE Files Are Oversized

Several agents have ROLE files 2-3x their target (e.g. Annegreet: 773 lines, target: 200). This is a content debt issue. Each ROLE file should be reviewed and trimmed to its target. Priority: agents that execute most often (koen, urszula, boris, annegreet).