Agent System Pitfalls¶

Hook Loop — Annegreet/Eltjo Token Burn¶

ISSUE: Post-completion hooks with condition "always" at no_block tier created infinite Annegreet-Eltjo feedback loop COST: $1000+ in 30 days from uncontrolled token burn FIX DEPLOYED: hook_origin_depth prevents cross-trigger, per-agent rate limit 20 hooks/hr CURRENT STATE: All 4 agents RE-ENABLED 2026-02-15 after 3-layer hook loop fix (hook_origin_depth, per-agent rate limit, no_block depth cap). See Hook Loops for details. RULE: NEVER add post-completion hook with condition "always" at no_block tier

CronJobs (Active Since 2026-02-15)¶

HISTORY: All GE CronJobs were suspended Feb 2-15 due to hook loop token burn STATUS: All unsuspended and active as of 2026-02-15 after hook loop fix deployed RUNNING: executor-refresh, health-check, zombie-cleanup, learning-backlog-monitor, learning-struggle-detector, learning-wiki-writer ALWAYS ACTIVE: vault-unseal CronJob (ge-system) and gitlab-toolbox-backup (ge-gitlab) were never suspended

Simulation Anti-Patterns (Fake Proof of Life)¶

These patterns are DECEPTIVE — they produce correct-looking output through the wrong path: - Inserting rows directly into PostgreSQL to make Admin UI display a "working" feature - Agent reporting "I'm on OpenAI" because config says so without verifying CLI works - Discussion system "works" because test script called API directly instead of triggering real agents - Billing dashboard shows costs from POSTed test data, not actual agent sessions - Provider switching "works" because dropdown saves to DB, but executor never invokes selected CLI

Legacy Identity File Confusion (Three Copies Existed)¶

ISSUE: Before the wiki brain migration, agent identities existed in THREE locations: 1. /home/claude/ge-bootstrap/identities/{name}/IDENTITY.md — oldest, root-level (ARCHIVED) 2. /home/claude/ge-bootstrap/ge-ops/identities/{name}/IDENTITY.md — partial, 9 agents (ARCHIVED) 3. /home/claude/ge-bootstrap/ge-ops/master/agent-configs/{name}/IDENTITY.md — single-file alongside tiered (ARCHIVED) CURRENT: Only the tiered files are authoritative: IDENTITY-CORE.md, IDENTITY-ROLE.md, IDENTITY-REFERENCE.md LOCATION: ge-ops/master/agent-configs/{name}/ NOTE: File names are IDENTITY-CORE.md (not CORE.md) — the INFRA-OVERVIEW.md had this wrong

LEARNINGS.md Path Mismatch (Fixed 2026-02-16)¶

AUTHORITATIVE: ge-ops/master/agent-configs/{name}/LEARNINGS.md — where agents write real learnings STUBS: ge-ops/agents/{name}/LEARNINGS.md — empty 9-line stubs, NOT authoritative FIXED: Identity loader reads from agent-configs/ (primary), agents/ as fallback. See EVO-2026-0216-007.

Double Delivery (Fixed)¶

ISSUE: task-service.ts was XADDing to BOTH triggers.{agent} AND ge:work:incoming — 2x execution cost STATUS: Fixed (ge:work:incoming XADD removed) RULE: NEVER XADD to both streams for the same task

Cost Gate Bypass¶

ISSUE: cost_gate.py enforces $5/session, $10/agent/hr, $100/day limits RISK: Removing cost_gate imports or bypassing pre-execution checks removes all cost protection RULE: cost_gate.py MUST remain imported and active in pty_executor.py. NEVER bypass.

Before Re-enabling Disabled Agents¶

Root cause documented
Fix deployed and verified
Agent's Redis stream drained to 0
Enable with replicas=1 first, monitor 30 minutes
Check billing: agent cost < $2 after 30 minutes

Token Budget Bloat (Fixed 2026-02-15)¶

Problem¶

Every API call showed ~40k input tokens. The tiered identity system was supposed to keep sessions lean, but the complexity classifier was broken — almost every task classified as "complex", loading all 3 identity tiers (~10-17k prompt tokens) instead of just CORE + ROLE (~5-9k).

Root Causes¶

1. Complexity classifier threshold too low ISSUE: TaskComplexityClassifier.COMPLEX_SCORE_THRESHOLD = 1 — a single keyword match triggered "complex" KEYWORDS: "fix", "test", "analyze", "create", "configure", "deploy" — present in almost every task description IMPACT: Every session loaded IDENTITY-CORE + IDENTITY-ROLE + IDENTITY-REFERENCE + LEARNINGS FIX: Raised threshold to 3 and removed overly common keywords. Now most tasks classify as "normal". LOCATION: ge_agent/execution/context.py

2. LEARNINGS.md loaded from wrong path ISSUE: Loader read from ge-ops/master/agent-configs/{name}/LEARNINGS.md (stale copies, 400-3000 tokens) RIGHT PATH: ge-ops/agents/{name}/LEARNINGS.md (canonical, written by learning pipeline) EXTRA FIX: Learnings capped at 3000 chars (~750 tokens) in the prompt — full learnings browsable in wiki LOCATION: ge_agent/identity/loader.py

What's In The Prompt (after fix)¶

Component	Tokens (normal)	Tokens (simple)	Tokens (complex)
Constitution	~2,100	~2,100	~2,100
IDENTITY-CORE	~1,400-2,400	~1,400-2,400	~1,400-2,400
IDENTITY-ROLE	~2,600-5,600	—	~2,600-5,600
IDENTITY-REFERENCE	—	—	~2,700-4,400
LEARNINGS (capped)	~30-750	~30-750	~30-750
JIT learnings	~500	~500	~500
Task context	~125	~125	~125
Our prompt total	~6,800-11,500	~4,200-5,900	~9,400-15,800
Claude Code overhead	~20,000-25,000	~20,000-25,000	~20,000-25,000
Session total	~27,000-36,000	~24,000-31,000	~29,000-41,000

How Classification Works Now¶

Classification	Requires	Tiers Loaded	Turn Budget
simple	2+ simple keywords (status, check, list, health)	CORE only	25
normal	Default (most tasks)	CORE + ROLE	40
complex	3+ complex keywords (implement, comprehensive, investigate, multiple)	CORE + ROLE + REFERENCE	60

Source of truth: ge_agent/execution/context.py (classifier), config/agent-execution.yaml (turn budgets)

IDENTITY-ROLE Files Are Oversized¶

Several agents have ROLE files 2-3x their target (e.g. Annegreet: 773 lines, target: 200). This is a content debt issue. Each ROLE file should be reviewed and trimmed to its target. Priority: agents that execute most often (koen, urszula, boris, annegreet).