Multi-Agent Patterns¶

Why Patterns Matter¶

Multi-agent systems fail when agents interact in unstructured ways. Research on 1,642 multi-agent execution traces found failure rates between 41% and 86.7%, with most failures arising from inter-agent interaction rather than individual agent limitations. The cure is not smarter agents — it is better-structured interaction.

GE uses four primary interaction patterns, each suited to a different type of work. Choosing the wrong pattern for a task is a common source of failure.

Pattern 1: Pipeline (Sequential Handoff)¶

What It Is¶

Work flows through a defined sequence of agents. Each agent receives input from the previous stage, performs its specialized function, and passes output to the next stage. No agent skips a stage. No agent operates out of sequence.

When to Use¶

Work that has a natural sequential dependency (specification must exist before tests, tests before implementation)
Work where each stage adds a distinct type of value
Work where the output of one stage is the input to the next

GE Implementation¶

The primary pipeline is the software delivery pipeline:

Aimee (Scoping)
  -> Anna (Formal Specification)
    -> Antje (Test Generation)
      -> Developer (Implementation)
        -> Koen (Deterministic Quality Gates)
          -> Marije (Integration Testing)
            -> Jasper (Reconciliation)
              -> Leon (Deployment)

Each handoff is implemented as a Redis Stream message. When Aimee completes scoping, a completion event triggers the orchestrator, which creates a task for Anna. When Anna completes her specification, the orchestrator creates a task for Antje. And so on.

Handoff Protocol¶

Every pipeline handoff includes:

Field	Description	Example
`work_item_id`	Unique identifier for the work	`WI-2026-0342`
`work_item`	Description of the work	"User authentication module"
`type`	Work type for routing	`specification`
`priority`	Urgency level	`normal`
`artifacts`	References to output files	`["/specs/auth-module.md"]`
`upstream_agent`	Who produced the input	`aimee`

RULE: Every handoff must be complete. If an agent cannot produce a complete output, it must not hand off. It must either request clarification or escalate.

Failure Modes¶

Incomplete handoff. An agent passes work forward without completing its stage. Downstream agents receive partial input and produce partial output. The error compounds through every subsequent stage.
Stage skipping. An agent or operator bypasses a pipeline stage ("we don't need a spec for this small change"). The missing stage removes a layer of verification, and errors that stage would have caught propagate forward.
Bottleneck. One stage takes much longer than the others, creating a queue. GE mitigates this with parallel teams (Alfa and Bravo) for the development stage.

Pattern 2: Review (Parallel Assessment)¶

What It Is¶

Multiple agents independently assess the same artifact. Their assessments are compared. Disagreements trigger investigation. Agreement provides confidence.

When to Use¶

Quality gates where a single reviewer might miss defects
Decisions where different perspectives add value (security review + performance review + usability review)
High-risk changes where the cost of a missed defect exceeds the cost of multiple reviews

GE Implementation¶

Code review in GE is not a single-agent operation. Multiple agents review from different perspectives:

Agent	Review Focus
Koen	Deterministic quality (lint, typecheck, dead code)
Eric	Business logic correctness, requirement alignment
Ashley	Adversarial testing (what happens if input is malicious?)
Victoria	Security vulnerabilities, credential exposure
Jasper	Reconciliation (does TDD output match post-implementation reality?)

Each reviewer operates independently. They do not see each other's reviews. This is deliberate: if one reviewer's output influences another, you lose the benefit of independent assessment.

Oracle Independence¶

RULE: Reviewers must not share context about each other's findings until all reviews are complete. RATIONALE: If Agent A's review is in Agent B's context, Agent B will anchor on Agent A's findings and miss independent issues. This is the same principle behind double-blind peer review.

After all reviews are complete, findings are merged. Conflicts (where reviewers disagree) are escalated to the discussion pattern.

Failure Modes¶

Anchoring. A reviewer sees another reviewer's output and anchors on it. Prevented by context isolation.
Rubber stamping. A reviewer approves without meaningful examination. Detected by tracking review depth (word count, specific findings, time spent).
Scope confusion. A reviewer checks things outside their expertise. Prevented by explicit review checklists per agent.

Pattern 3: Discussion (Consensus Building)¶

What It Is¶

Multiple agents deliberate on a question, propose answers, vote, and reach consensus. If consensus cannot be reached, the question escalates to a human.

When to Use¶

Architectural decisions that affect multiple agents or teams
Ambiguous requirements where the correct interpretation is not obvious
Conflicting findings from the review pattern
Any decision with lasting impact that should not be made by a single agent

GE Implementation¶

The discussion model operates through the Discussions API:

Phase 1: Initiation

Any agent can initiate a discussion. The initiator states the question, provides context, and identifies which agents should participate (based on domain expertise).

Discussion: "Should we use WebSocket or SSE for real-time client updates?"
Initiator: Arjan (infrastructure)
Participants: Floris (frontend), Urszula (backend), Nessa (performance), Ron (security)

Phase 2: Deliberation

Each participant reviews the question and submits their position with reasoning. Positions are structured:

Position: The agent's recommendation (e.g., "SSE")
Reasoning: Why this position is correct
Constraints: What conditions would change the recommendation
Risk assessment: What could go wrong with this choice

Phase 3: Voting

After all participants have submitted positions, a vote is taken. Each participant gets one vote. The question is:

If a clear majority exists (>50%) -> consensus is reached. The majority position becomes the decision.
If no majority exists -> the discussion continues with a second round of deliberation, incorporating the arguments from the first round.
If no majority after two rounds -> the question escalates to the human (Dirk-Jan).

Phase 4: Decision Capture

The outcome — whether by consensus or human decision — is captured as a learning in the wiki brain. Future agents working in the same domain will receive this decision as JIT knowledge, preventing the same question from being re-debated.

Key Design Choices¶

Asynchronous deliberation. Agents do not "talk" in real-time. Each agent submits their position independently. This prevents groupthink, where early speakers influence later ones.

Structured positions. Free-form discussion between LLMs degrades into verbose, repetitive exchanges. Structured positions force conciseness and comparability.

Finite rounds. Without a round limit, discussions can continue indefinitely. Two rounds is the maximum before human escalation.

Learning capture. The highest-value output of a discussion is not the decision itself but the documented reasoning. This becomes institutional knowledge.

Failure Modes¶

Groupthink. All agents agree because they share the same training data and arrive at the same (possibly wrong) conclusion. Mitigated by including agents with different model providers (Claude, OpenAI, Gemini) and different roles.
Decision fatigue. Too many discussions running simultaneously, overwhelming participants. Mitigated by rate limiting (orchestrator controls discussion creation).
Stale decisions. A decision made months ago may no longer be valid. Mitigated by expiration dates on discussion outcomes and periodic review.

Pattern 4: Escalation (Human-in-the-Loop)¶

What It Is¶

An agent recognizes it cannot or should not make a decision autonomously and escalates to a human or a more senior agent.

When to Use¶

Decisions that exceed the agent's authority tier
Situations where the agent's confidence is low
HALT conditions (see human-in-the-loop.md)
Disagreements that the discussion pattern cannot resolve

GE Implementation¶

Escalation follows a tiered structure:

Tier	Who Decides	Examples
Autonomous	The agent itself	Code formatting, variable naming, test structure
Peer escalation	Another agent with relevant expertise	"Is this a security concern?" -> Ron
Discussion	Multi-agent consensus	Architectural decisions, technology choices
Human escalation	Dirk-Jan	Contract decisions, client communication, agent commissioning

RULE: An agent must never escalate sideways when it should escalate upward. If the question is about business policy, it goes to the human, not to another agent.

Escalation is implemented through notification files written to ge-ops/notifications/human/ for human escalation, or through Redis Stream messages for peer and discussion escalation.

Failure Modes¶

Under-escalation. The agent makes a decision it should not have made. The most dangerous failure mode.
Over-escalation. The agent escalates routine decisions, overwhelming the human. Creates a bottleneck and defeats the purpose of automation.
Escalation loops. Agent A escalates to Agent B, who escalates back to Agent A. Prevented by the orchestrator's chain depth limit (max 3).

Communication Infrastructure¶

Redis Streams¶

All inter-agent communication in GE flows through Redis Streams. This is a deliberate architectural choice:

Why Redis Streams, not direct calls:

Property	Direct Calls	Redis Streams
Persistence	Lost if receiver is down	Persisted until consumed
Auditability	Requires separate logging	Every message is stored
Replay	Not possible	Any message can be replayed
Decoupling	Sender must know receiver	Sender writes to stream, orchestrator routes
Ordering	Depends on timing	Guaranteed by stream ID

Stream naming convention:

triggers.{agent} — Per-agent work queue. The orchestrator writes here, the executor reads.
ge:work:incoming — System-wide intake. The admin UI and external systems write here.
ge:completions — Completion events. The executor writes here when an agent finishes a task.

RULE: Never XADD to both triggers.{agent} and ge:work:incoming for the same task. This causes double delivery and double execution — a token burn incident that GE learned from directly.

RULE: Every XADD includes MAXLEN (~100 for per-agent, ~1000 for system). Unbounded streams grow until they consume all available memory.

Consumer Groups¶

Each stream is consumed by a consumer group. This enables:

Load balancing. Multiple executor instances can consume from the same stream.
Acknowledgment. Messages are not removed until explicitly acknowledged.
Recovery. Unacknowledged messages are visible for reprocessing after crashes.
Exactly-once semantics. Combined with work_item_id deduplication (5-minute window).

DAG Enforcement and Swimming Lanes¶

Work Package Dependencies¶

Complex work (a new feature, a client project) is broken into work packages with dependencies. These dependencies form a Directed Acyclic Graph (DAG):

WP-001 (Specification)
  -> WP-002 (Test Generation)
  -> WP-003 (Database Schema)
    -> WP-004 (Backend Implementation)  [depends on WP-002, WP-003]
      -> WP-005 (Frontend Implementation)  [depends on WP-004]
        -> WP-006 (Integration Testing)  [depends on WP-004, WP-005]

The orchestrator enforces the DAG. WP-004 cannot start until both WP-002 and WP-003 are complete. This prevents agents from working on tasks whose prerequisites have not been met.

Swimming Lanes¶

Swimming lanes are parallel tracks of work that can proceed independently. In the DAG above, WP-002 and WP-003 are in separate swimming lanes — they have no dependency on each other and can execute in parallel.

GE's orchestrator identifies swimming lanes automatically from the DAG structure and dispatches parallelizable work simultaneously. This maximizes throughput while respecting dependencies.

Why Not Just Chain Agents Directly?¶

Direct chaining (agent A triggers agent B triggers agent C) seems simpler, but it creates three problems:

No global visibility. Nobody knows where work is in the pipeline. The orchestrator provides a single view of all in-flight work.
No dependency enforcement. Direct chains are linear. Real work has diamond dependencies (WP-004 depends on both WP-002 and WP-003). Only a DAG-aware orchestrator can handle this.
No recovery. If agent B crashes in a direct chain, the chain is broken. The orchestrator detects stalled work and can reassign it.

Agent Boundaries¶

The Boundary Principle¶

RULE: Every agent must have clear, non-overlapping boundaries. Overlap creates conflict. Gaps create dropped work.

Overlap example (bad): - Agent A: "I handle all backend development." - Agent B: "I handle database-related development." - Result: Both agents modify the same database access code. Their changes conflict.

Clear boundaries (good): - Agent A: "I handle backend application logic. I call database functions but never define them." - Agent B: "I handle database schema, migrations, and query functions. I never write application logic." - Result: Agent A produces calls, Agent B produces implementations. No overlap.

Boundary Definition Checklist¶

For every agent, the following boundaries must be defined:

[ ] What files/directories does this agent own?
[ ] What files/directories must this agent never modify?
[ ] What decisions can this agent make autonomously?
[ ] What decisions must this agent escalate?
[ ] Which other agents does this agent hand off to?
[ ] Which other agents hand off to this agent?
[ ] What constitutes "done" for this agent's work?

These boundaries are encoded in the agent's ROLE identity and enforced by the orchestrator's routing configuration.