Multi-Agent Patterns¶
Why Patterns Matter¶
Multi-agent systems fail when agents interact in unstructured ways. Research on 1,642 multi-agent execution traces found failure rates between 41% and 86.7%, with most failures arising from inter-agent interaction rather than individual agent limitations. The cure is not smarter agents — it is better-structured interaction.
GE uses four primary interaction patterns, each suited to a different type of work. Choosing the wrong pattern for a task is a common source of failure.
Pattern 1: Pipeline (Sequential Handoff)¶
What It Is¶
Work flows through a defined sequence of agents. Each agent receives input from the previous stage, performs its specialized function, and passes output to the next stage. No agent skips a stage. No agent operates out of sequence.
When to Use¶
- Work that has a natural sequential dependency (specification must exist before tests, tests before implementation)
- Work where each stage adds a distinct type of value
- Work where the output of one stage is the input to the next
GE Implementation¶
The primary pipeline is the software delivery pipeline:
Aimee (Scoping)
-> Anna (Formal Specification)
-> Antje (Test Generation)
-> Developer (Implementation)
-> Koen (Deterministic Quality Gates)
-> Marije (Integration Testing)
-> Jasper (Reconciliation)
-> Leon (Deployment)
Each handoff is implemented as a Redis Stream message. When Aimee completes scoping, a completion event triggers the orchestrator, which creates a task for Anna. When Anna completes her specification, the orchestrator creates a task for Antje. And so on.
Handoff Protocol¶
Every pipeline handoff includes:
| Field | Description | Example |
|---|---|---|
work_item_id |
Unique identifier for the work | WI-2026-0342 |
work_item |
Description of the work | "User authentication module" |
type |
Work type for routing | specification |
priority |
Urgency level | normal |
artifacts |
References to output files | ["/specs/auth-module.md"] |
upstream_agent |
Who produced the input | aimee |
RULE: Every handoff must be complete. If an agent cannot produce a complete output, it must not hand off. It must either request clarification or escalate.
Failure Modes¶
- Incomplete handoff. An agent passes work forward without completing its stage. Downstream agents receive partial input and produce partial output. The error compounds through every subsequent stage.
- Stage skipping. An agent or operator bypasses a pipeline stage ("we don't need a spec for this small change"). The missing stage removes a layer of verification, and errors that stage would have caught propagate forward.
- Bottleneck. One stage takes much longer than the others, creating a queue. GE mitigates this with parallel teams (Alfa and Bravo) for the development stage.
Pattern 2: Review (Parallel Assessment)¶
What It Is¶
Multiple agents independently assess the same artifact. Their assessments are compared. Disagreements trigger investigation. Agreement provides confidence.
When to Use¶
- Quality gates where a single reviewer might miss defects
- Decisions where different perspectives add value (security review + performance review + usability review)
- High-risk changes where the cost of a missed defect exceeds the cost of multiple reviews
GE Implementation¶
Code review in GE is not a single-agent operation. Multiple agents review from different perspectives:
| Agent | Review Focus |
|---|---|
| Koen | Deterministic quality (lint, typecheck, dead code) |
| Eric | Business logic correctness, requirement alignment |
| Ashley | Adversarial testing (what happens if input is malicious?) |
| Victoria | Security vulnerabilities, credential exposure |
| Jasper | Reconciliation (does TDD output match post-implementation reality?) |
Each reviewer operates independently. They do not see each other's reviews. This is deliberate: if one reviewer's output influences another, you lose the benefit of independent assessment.
Oracle Independence¶
RULE: Reviewers must not share context about each other's findings until all reviews are complete. RATIONALE: If Agent A's review is in Agent B's context, Agent B will anchor on Agent A's findings and miss independent issues. This is the same principle behind double-blind peer review.
After all reviews are complete, findings are merged. Conflicts (where reviewers disagree) are escalated to the discussion pattern.
Failure Modes¶
- Anchoring. A reviewer sees another reviewer's output and anchors on it. Prevented by context isolation.
- Rubber stamping. A reviewer approves without meaningful examination. Detected by tracking review depth (word count, specific findings, time spent).
- Scope confusion. A reviewer checks things outside their expertise. Prevented by explicit review checklists per agent.
Pattern 3: Discussion (Consensus Building)¶
What It Is¶
Multiple agents deliberate on a question, propose answers, vote, and reach consensus. If consensus cannot be reached, the question escalates to a human.
When to Use¶
- Architectural decisions that affect multiple agents or teams
- Ambiguous requirements where the correct interpretation is not obvious
- Conflicting findings from the review pattern
- Any decision with lasting impact that should not be made by a single agent
GE Implementation¶
The discussion model operates through the Discussions API:
Phase 1: Initiation
Any agent can initiate a discussion. The initiator states the question, provides context, and identifies which agents should participate (based on domain expertise).
Discussion: "Should we use WebSocket or SSE for real-time client updates?"
Initiator: Arjan (infrastructure)
Participants: Floris (frontend), Urszula (backend), Nessa (performance), Ron (security)
Phase 2: Deliberation
Each participant reviews the question and submits their position with reasoning. Positions are structured:
- Position: The agent's recommendation (e.g., "SSE")
- Reasoning: Why this position is correct
- Constraints: What conditions would change the recommendation
- Risk assessment: What could go wrong with this choice
Phase 3: Voting
After all participants have submitted positions, a vote is taken. Each participant gets one vote. The question is:
- If a clear majority exists (>50%) -> consensus is reached. The majority position becomes the decision.
- If no majority exists -> the discussion continues with a second round of deliberation, incorporating the arguments from the first round.
- If no majority after two rounds -> the question escalates to the human (Dirk-Jan).
Phase 4: Decision Capture
The outcome — whether by consensus or human decision — is captured as a learning in the wiki brain. Future agents working in the same domain will receive this decision as JIT knowledge, preventing the same question from being re-debated.
Key Design Choices¶
Asynchronous deliberation. Agents do not "talk" in real-time. Each agent submits their position independently. This prevents groupthink, where early speakers influence later ones.
Structured positions. Free-form discussion between LLMs degrades into verbose, repetitive exchanges. Structured positions force conciseness and comparability.
Finite rounds. Without a round limit, discussions can continue indefinitely. Two rounds is the maximum before human escalation.
Learning capture. The highest-value output of a discussion is not the decision itself but the documented reasoning. This becomes institutional knowledge.
Failure Modes¶
- Groupthink. All agents agree because they share the same training data and arrive at the same (possibly wrong) conclusion. Mitigated by including agents with different model providers (Claude, OpenAI, Gemini) and different roles.
- Decision fatigue. Too many discussions running simultaneously, overwhelming participants. Mitigated by rate limiting (orchestrator controls discussion creation).
- Stale decisions. A decision made months ago may no longer be valid. Mitigated by expiration dates on discussion outcomes and periodic review.
Pattern 4: Escalation (Human-in-the-Loop)¶
What It Is¶
An agent recognizes it cannot or should not make a decision autonomously and escalates to a human or a more senior agent.
When to Use¶
- Decisions that exceed the agent's authority tier
- Situations where the agent's confidence is low
- HALT conditions (see human-in-the-loop.md)
- Disagreements that the discussion pattern cannot resolve
GE Implementation¶
Escalation follows a tiered structure:
| Tier | Who Decides | Examples |
|---|---|---|
| Autonomous | The agent itself | Code formatting, variable naming, test structure |
| Peer escalation | Another agent with relevant expertise | "Is this a security concern?" -> Ron |
| Discussion | Multi-agent consensus | Architectural decisions, technology choices |
| Human escalation | Dirk-Jan | Contract decisions, client communication, agent commissioning |
RULE: An agent must never escalate sideways when it should escalate upward. If the question is about business policy, it goes to the human, not to another agent.
Escalation is implemented through notification files written to ge-ops/notifications/human/ for human escalation, or through Redis Stream messages for peer and discussion escalation.
Failure Modes¶
- Under-escalation. The agent makes a decision it should not have made. The most dangerous failure mode.
- Over-escalation. The agent escalates routine decisions, overwhelming the human. Creates a bottleneck and defeats the purpose of automation.
- Escalation loops. Agent A escalates to Agent B, who escalates back to Agent A. Prevented by the orchestrator's chain depth limit (max 3).
Communication Infrastructure¶
Redis Streams¶
All inter-agent communication in GE flows through Redis Streams. This is a deliberate architectural choice:
Why Redis Streams, not direct calls:
| Property | Direct Calls | Redis Streams |
|---|---|---|
| Persistence | Lost if receiver is down | Persisted until consumed |
| Auditability | Requires separate logging | Every message is stored |
| Replay | Not possible | Any message can be replayed |
| Decoupling | Sender must know receiver | Sender writes to stream, orchestrator routes |
| Ordering | Depends on timing | Guaranteed by stream ID |
Stream naming convention:
triggers.{agent}— Per-agent work queue. The orchestrator writes here, the executor reads.ge:work:incoming— System-wide intake. The admin UI and external systems write here.ge:completions— Completion events. The executor writes here when an agent finishes a task.
RULE: Never XADD to both triggers.{agent} and ge:work:incoming for the same task. This causes double delivery and double execution — a token burn incident that GE learned from directly.
RULE: Every XADD includes MAXLEN (~100 for per-agent, ~1000 for system). Unbounded streams grow until they consume all available memory.
Consumer Groups¶
Each stream is consumed by a consumer group. This enables:
- Load balancing. Multiple executor instances can consume from the same stream.
- Acknowledgment. Messages are not removed until explicitly acknowledged.
- Recovery. Unacknowledged messages are visible for reprocessing after crashes.
- Exactly-once semantics. Combined with work_item_id deduplication (5-minute window).
DAG Enforcement and Swimming Lanes¶
Work Package Dependencies¶
Complex work (a new feature, a client project) is broken into work packages with dependencies. These dependencies form a Directed Acyclic Graph (DAG):
WP-001 (Specification)
-> WP-002 (Test Generation)
-> WP-003 (Database Schema)
-> WP-004 (Backend Implementation) [depends on WP-002, WP-003]
-> WP-005 (Frontend Implementation) [depends on WP-004]
-> WP-006 (Integration Testing) [depends on WP-004, WP-005]
The orchestrator enforces the DAG. WP-004 cannot start until both WP-002 and WP-003 are complete. This prevents agents from working on tasks whose prerequisites have not been met.
Swimming Lanes¶
Swimming lanes are parallel tracks of work that can proceed independently. In the DAG above, WP-002 and WP-003 are in separate swimming lanes — they have no dependency on each other and can execute in parallel.
GE's orchestrator identifies swimming lanes automatically from the DAG structure and dispatches parallelizable work simultaneously. This maximizes throughput while respecting dependencies.
Why Not Just Chain Agents Directly?¶
Direct chaining (agent A triggers agent B triggers agent C) seems simpler, but it creates three problems:
- No global visibility. Nobody knows where work is in the pipeline. The orchestrator provides a single view of all in-flight work.
- No dependency enforcement. Direct chains are linear. Real work has diamond dependencies (WP-004 depends on both WP-002 and WP-003). Only a DAG-aware orchestrator can handle this.
- No recovery. If agent B crashes in a direct chain, the chain is broken. The orchestrator detects stalled work and can reassign it.
Agent Boundaries¶
The Boundary Principle¶
RULE: Every agent must have clear, non-overlapping boundaries. Overlap creates conflict. Gaps create dropped work.
Overlap example (bad): - Agent A: "I handle all backend development." - Agent B: "I handle database-related development." - Result: Both agents modify the same database access code. Their changes conflict.
Clear boundaries (good): - Agent A: "I handle backend application logic. I call database functions but never define them." - Agent B: "I handle database schema, migrations, and query functions. I never write application logic." - Result: Agent A produces calls, Agent B produces implementations. No overlap.
Boundary Definition Checklist¶
For every agent, the following boundaries must be defined:
- [ ] What files/directories does this agent own?
- [ ] What files/directories must this agent never modify?
- [ ] What decisions can this agent make autonomously?
- [ ] What decisions must this agent escalate?
- [ ] Which other agents does this agent hand off to?
- [ ] Which other agents hand off to this agent?
- [ ] What constitutes "done" for this agent's work?
These boundaries are encoded in the agent's ROLE identity and enforced by the orchestrator's routing configuration.