Skip to content

DOMAIN:PROJECT_MANAGEMENT:AGILE_AGENTIC

OWNER: faye (Team Alpha PM), sytske (Team Beta PM) UPDATED: 2026-03-24 SCOPE: adapting agile/scrum methodology for multi-agent development PREREQUISITE: work-package-design.md, delivery-lifecycle.md


PRINCIPLE: AGILE_FOR_AGENTS_IS_NOT_AGILE_FOR_HUMANS

RULE: agentic development borrows agile values but reinvents the practices. RULE: ceremonies that exist for human communication are replaced by data streams. RULE: estimation changes fundamentally — agents do not have "good days" and "bad days." REASON: agile was designed for teams of 5-9 humans with meetings, whiteboards, and social dynamics. Agents have none of these. Blindly applying scrum to agents wastes tokens on ceremony and misses the actual leverage points.


TRANSLATION_MAP

| Human Agile Concept      | Agentic Equivalent                | Why Different                     |
|--------------------------|-----------------------------------|-----------------------------------|
| Sprint (2 weeks)         | Continuous flow with gates         | Agents don't need rhythm          |
| Daily standup            | Redis status streams               | Real-time, not once-a-day         |
| Sprint planning          | WP set creation (one-time)         | No recurring planning needed      |
| Sprint review            | Gate verification                  | Automated, not a meeting          |
| Retrospective            | Learning extraction pipeline       | Continuous, not per-sprint        |
| Story points             | WP size (S/M/L) + token estimate   | More precise, less political      |
| Product backlog          | Functional spec (frozen)           | Scope is frozen post-gate         |
| Sprint backlog           | Active WP set                      | No sprints, just the WP queue     |
| Scrum master             | ge-orchestrator                    | Software, not a person            |
| Product owner            | Aimee + client                     | Spec is the authority             |
| Definition of done       | WP acceptance criteria + tests     | Machine-verifiable                |
| Burndown chart           | WP completion rate over time       | Linear, not S-curve               |
| Velocity                 | WPs per agent-hour                 | Deterministic, not estimated      |
| Technical debt           | Learning gaps in wiki brain        | Knowledge debt, not code debt     |
| Code review              | Anti-LLM pipeline                  | Multi-agent automated review      |

CONTINUOUS_FLOW_VS_SPRINTS

WHY_NOT_SPRINTS

ARGUMENT_1: sprints create artificial batching. Agents can start the next WP the moment the current one completes. Waiting for a sprint boundary wastes hours. ARGUMENT_2: sprints create artificial deadlines. A 2-week sprint ending on Friday creates pressure to cut corners. Agents do not respond to deadline pressure — they either complete the WP or fail. ARGUMENT_3: sprint planning is overhead. In agentic dev, the WP set is planned once during Phase 6 and adjusted only on replanning triggers. There is no recurring ceremony. ARGUMENT_4: sprint ceremonies consume tokens. A "standup" implemented as an agent conversation costs $2-5 per agent per day. For 10 agents, that is $50/day on status updates. Status streams cost $0.

CONTINUOUS_FLOW_MODEL

                    ┌─────────────────────────────────────────────┐
                    │              ORCHESTRATOR                    │
                    │  (dispatches WPs based on DAG + capacity)   │
                    └─────────┬───────────────┬──────────────────┘
                              │               │
                    ┌─────────▼───┐   ┌──────▼────────┐
                    │  Agent A    │   │  Agent B       │
                    │  WP-005     │   │  WP-007        │
                    │  (working)  │   │  (working)     │
                    └─────────┬───┘   └──────┬────────┘
                              │               │
                    ┌─────────▼───┐   ┌──────▼────────┐
                    │  COMPLETE   │   │  COMPLETE      │
                    │  → next WP  │   │  → next WP     │
                    │  dispatched │   │  dispatched    │
                    │  immediately│   │  immediately   │
                    └─────────────┘   └───────────────┘

RULE: when agent finishes WP, orchestrator immediately dispatches the next eligible WP. RULE: "eligible" means all hard dependencies resolved and agent is in the correct swimming lane. RULE: no batching, no waiting, no ceremonies between WPs. RULE: the only pauses are gates between phases (spec → design → planning → dev → integration).

WHEN_FLOW_PAUSES

PAUSE_1: phase gates — development cannot start until planning gate passes. PAUSE_2: dependency waiting — WP-008 cannot start until WP-005 (its hard dep) completes. PAUSE_3: replanning — PM pauses flow to restructure WPs after a trigger fires. PAUSE_4: client input needed — UAT feedback, scope clarification, design approval.


STANDUPS_VS_STATUS_STREAMS

HUMAN_STANDUP_PROBLEMS_FOR_AGENTS

PROBLEM_1: synchronous. All agents must pause work to "attend." Token waste. PROBLEM_2: periodic. Information is stale by the time the standup happens. PROBLEM_3: verbal. Agents communicate via structured data, not conversation.

STATUS_STREAM_MODEL

Every agent publishes to a Redis stream on WP completion:

{
  "agent": "urszula",
  "work_item_id": "WP-005",
  "status": "COMPLETE",
  "duration_seconds": 1847,
  "token_cost_usd": 1.23,
  "files_modified": ["lib/api/routes/orders.ts", "drizzle/migrations/003_orders.sql"],
  "acceptance_criteria_met": ["REQ-006-AC1", "REQ-006-AC2", "REQ-006-AC3"],
  "notes": "Added index on orders.client_id for query performance",
  "timestamp": "2026-03-24T14:32:00Z"
}

PM DASHBOARD aggregates these streams into a real-time view: - Which agents are working on which WPs right now - Which WPs completed in the last hour - Which WPs are blocked and why - Critical path progress - Cost accumulation

ADVANTAGE: information is real-time, structured, queryable, and costs zero tokens beyond the completion event.

PM_DAILY_DIGEST

Instead of a standup meeting, the PM generates a daily digest from the status stream:

## Project Status: {project_code} — {date} {time}

### Completed Since Last Digest
- WP-005: Orders API (urszula) — 31 min, $1.23
- WP-007: Login page (floris) — 22 min, $0.87

### In Progress
- WP-006: Invoice API (urszula) — started 14:35, running

### Blocked
- WP-008: Orders UI — waiting on WP-005 (now unblocked, dispatching)

### Metrics
- WPs completed today: 2
- Agent utilization: 78%
- Budget consumed: $12.40 / $85.00 (14.6%)

RETROSPECTIVES_VS_LEARNING_EXTRACTION

HUMAN_RETRO_FORMAT

In human agile: team gathers, discusses what went well, what went badly, creates action items. Happens once per sprint. Often stale. Action items frequently forgotten.

AGENTIC_LEARNING_EXTRACTION

In GE: learning extraction happens automatically after every agent session.

LAYER 1: session_summarizer.py
  - Runs inline at session end
  - Extracts structured learnings from session transcript
  - Costs ~$0.01-0.03 per session (Haiku)
  - Stores in session_learnings table

LAYER 2: knowledge_synthesizer.py
  - Runs every 6 hours
  - Cross-session pattern detection
  - Identifies recurring issues, effective solutions, emerging best practices
  - Stores in knowledge_patterns table with confidence scores

LAYER 3: Wiki brain
  - High-confidence patterns are promoted to wiki pages
  - JIT-injected into agent boot context
  - Agents benefit from learnings of ALL other agents

RETRO_EQUIVALENT_FOR_PMS

The PM does not need a ceremony. Instead:

DAILY: review knowledge_patterns flagged in the last 24 hours. WEEKLY: review patterns with confidence > 0.8 that are not yet in wiki. PER_PROJECT: after handoff, review all session_learnings for that project. Identify systematic issues.

ACTION_ITEMS are wiki updates, not meeting notes. Once a learning is in the wiki, it is automatically injected into future agent sessions. No "action item tracking" needed.


ESTIMATION_IN_AGENTIC_CONTEXT

WHY_STORY_POINTS_FAIL

STORY_POINTS_PROBLEM_1: story points measure relative human effort. Agent effort is determined by task complexity and context window usage, not human intuition. STORY_POINTS_PROBLEM_2: planning poker creates consensus through discussion. Agents do not discuss estimates — they execute. STORY_POINTS_PROBLEM_3: velocity in story points is unstable for humans (sick days, motivation, learning curves). Agent velocity is stable within narrow bounds.

AGENTIC_ESTIMATION_MODEL

DIMENSION_1: WP size category - S (Small): 15-20 min, 1-3 files, straightforward implementation - M (Medium): 20-35 min, 4-7 files, moderate complexity - L (Large): 35-45 min, 8+ files, significant complexity

DIMENSION_2: token cost estimate - S: $0.30-0.80 - M: $0.80-2.00 - L: $2.00-4.50

DIMENSION_3: session probability - Single session (expected): 90% of WPs - Multi-session (needs context rebuild): 8% of WPs - Needs PM intervention (blocker, spec gap): 2% of WPs

ESTIMATION_TECHNIQUE

STEP_1: count files to create or modify → determines size category. STEP_2: check if similar WPs have been done before → use historical data from session_learnings. STEP_3: multiply size category by standard rate → gives time and cost estimate. STEP_4: add 20% buffer for first-of-kind WPs (no historical data). STEP_5: critical path WPs get 30% buffer (blocking other work if they slip).

PROJECT_ESTIMATION

Total estimated time = sum of WP estimates on critical path
                     + parallel group overhead (10% per parallel group)
                     + gate processing time (1 hour per gate for phases 3-6)
                     + integration testing (20% of development time)
                     + UAT buffer (3 days minimum for client response time)

Total estimated cost = sum of all WP cost estimates
                     + overhead agents (linting, testing, reconciliation: 30% of dev cost)
                     + scoping and spec cost ($5-15 for Opus sessions)
                     + integration testing cost (20% of development cost)

KANBAN_FOR_AGENT_WORK_QUEUES

WHY_KANBAN_FITS_AGENTIC_WORK

FIT_1: kanban is pull-based. Agents pull work when ready. No sprint commitment. FIT_2: kanban has WIP limits. Agents have hard WIP limit of 1 (one WP at a time). FIT_3: kanban visualizes flow. Status streams feed a real-time board. FIT_4: kanban optimizes throughput. Critical path optimization = throughput optimization.

AGENTIC_KANBAN_BOARD

| BACKLOG      | READY       | IN PROGRESS  | REVIEW       | DONE        |
|--------------|-------------|--------------|--------------|-------------|
| WP-015       | WP-010      | WP-006 (urs) | WP-005 (koen)| WP-001     |
| WP-016       | WP-011      | WP-009 (flo) |              | WP-002     |
| WP-017       | WP-012      |              |              | WP-003     |
| WP-018       |             |              |              | WP-004     |
|              |             |              |              | WP-007     |
|              |             |              |              | WP-008     |

COLUMN_DEFINITIONS: - BACKLOG: WP defined but dependencies not yet resolved - READY: all dependencies resolved, waiting for agent availability - IN_PROGRESS: agent is executing this WP right now - REVIEW: WP complete, going through anti-LLM pipeline (lint → test → reconcile) - DONE: fully verified, merged

WIP_LIMITS

| Column       | WIP Limit | Reason                                    |
|--------------|-----------|-------------------------------------------|
| BACKLOG      | unlimited | all future work lives here                |
| READY        | 10        | too many ready = agents can't keep up     |
| IN_PROGRESS  | 5         | max parallel per feature branch           |
| REVIEW       | 3         | pipeline is sequential, don't queue too many |
| DONE         | unlimited | completed work stays forever              |

FLOW_METRICS

METRIC_1: lead time — time from WP entering BACKLOG to DONE. Target: < 4 hours for M-sized WPs. METRIC_2: cycle time — time from IN_PROGRESS to DONE. Target: < 90 minutes including review. METRIC_3: throughput — WPs reaching DONE per hour. Target: 2-3 per hour with full team. METRIC_4: WIP age — how long a WP has been IN_PROGRESS. Alert if > 60 minutes. METRIC_5: blocked time — how long a WP sits in BACKLOG waiting on deps. Track to optimize DAG.


ROLES_REDEFINED

PRODUCT_OWNER → AIMEE + CLIENT

Human agile: PO prioritizes backlog, writes stories, accepts deliverables. Agentic: Aimee writes the functional spec (stories equivalent). Client approves at gates. PM tracks but does not reprioritize — the spec defines priority.

DIFFERENCE: in human agile, PO constantly reprioritizes. In agentic, scope is frozen after Phase 3 gate. Reprioritization only happens through formal change requests.

SCRUM_MASTER → ORCHESTRATOR

Human agile: SM removes impediments, facilitates ceremonies, coaches team. Agentic: ge-orchestrator dispatches work, enforces WIP limits, handles blocked WPs. PM handles escalations that the orchestrator cannot resolve (spec gaps, client communication).

DIFFERENCE: the orchestrator is software. It cannot coach, motivate, or read the room. The PM fills this gap by monitoring for systemic issues (repeated failures, cost overruns, blocked agents).

DEVELOPMENT_TEAM → AGENT_POOL

Human agile: self-organizing team picks work, collaborates, pair-programs. Agentic: agents are assigned WPs based on swimming lane. They do not self-organize. They do not pair. They execute independently within their lane.

DIFFERENCE: agents have no social dynamics, no watercooler, no motivation issues. But they also cannot improvise or collaborate in real-time. The PM must pre-plan collaboration via WP dependencies.


CEREMONIES_THAT_SURVIVE

Not everything from agile is discarded. These concepts translate directly:

DEFINITION_OF_DONE

Survives as WP acceptance criteria + anti-LLM pipeline gates. Machine-verifiable, not negotiable.

AGENTIC DEFINITION OF DONE:
[ ] All acceptance criteria from WP met (verified by agent self-check)
[ ] Koen lint check passes (zero errors)
[ ] Antje TDD specs pass (all green)
[ ] Marije integration test passes (if applicable)
[ ] Jasper SSOT reconciliation passes (code matches spec)
[ ] No new TypeScript errors introduced
[ ] No console.log or debug statements

BACKLOG_REFINEMENT

Survives as PM reviewing upcoming WPs before they enter READY state. The PM checks: - Is the WP still correctly sized given what we learned from completed WPs? - Are the dependencies still accurate? - Is the acceptance criteria still aligned with the spec?

FREQUENCY: before each parallel group starts. Not a ceremony — a PM checklist.

DEMO / SHOWCASE

Survives as UAT (Phase 9). The client sees working software. But it happens once per project (at UAT), not every sprint. For larger projects with multiple features, the PM may arrange intermediate demos at natural milestones.


HANDLING_UNCERTAINTY

IN_HUMAN_AGILE

Uncertainty is managed through short iterations. Build a little, learn, adjust. Embrace change.

IN_AGENTIC_DEVELOPMENT

Uncertainty is front-loaded into phases 1-4 (intake → scoping → specification → design). By the time development starts, uncertainty should be near zero.

STRATEGY: invest more time in specification. A 2-hour spec session that eliminates ambiguity saves 10 hours of rework during development. Agents cannot "ask the PO" mid-sprint — they either have the information in the WP or they fail.

RESIDUAL_UNCERTAINTY_HANDLING

When an agent encounters something not covered by the spec: 1. Agent marks WP as BLOCKED with a specific question 2. Orchestrator pauses dependent WPs 3. PM routes the question: spec gap → Anna, scope gap → Aimee, client question → Dima 4. Answer comes back, PM unblocks WP with additional context 5. Agent resumes

COST_OF_BLOCKING: ~30 minutes average resolution time. This is why specs must be thorough.


METRICS_DASHBOARD

The PM monitors these metrics continuously via admin-ui:

THROUGHPUT:
- WPs completed per hour (team-wide)
- WPs completed per hour (per agent)
- Feature completion rate (features fully done / total features)

QUALITY:
- First-attempt success rate (WPs that pass review on first try)
- Rework rate (WPs that need revision after review)
- Bug rate in UAT (bugs found per feature)

COST:
- Cost per WP (average, by size category)
- Cost per feature (sum of WP costs)
- Cost vs estimate (actual / estimated)
- Budget burn rate (cumulative cost over time)

FLOW:
- Lead time (backlog → done)
- Cycle time (in progress → done)
- Blocked time (total hours WPs spent blocked)
- Queue depth (WPs in READY state)

ANTI_PATTERNS_IN_AGENTIC_AGILE

ANTI_PATTERN_1: "Let's do sprints anyway" WHY_BAD: creates artificial waiting. Agents complete WP-005 on Monday, but next sprint starts Wednesday. 2 days wasted. INSTEAD: continuous flow. Dispatch next WP immediately.

ANTI_PATTERN_2: "Agents should self-organize" WHY_BAD: agents have no social layer. Self-organization requires communication, negotiation, and shared understanding. Agents execute instructions. INSTEAD: PM pre-plans all assignment and sequencing. Orchestrator enforces.

ANTI_PATTERN_3: "Let's estimate in story points for the client" WHY_BAD: story points are a human abstraction. Clients want time and cost. Agents provide deterministic estimates. INSTEAD: estimate in hours and dollars. Report both to client.

ANTI_PATTERN_4: "Daily standup meeting with agents" WHY_BAD: costs $25-50/day in tokens. Produces no information that status streams don't already provide. INSTEAD: PM reads dashboard. Status streams are the standup.

ANTI_PATTERN_5: "Retrospective after every project" WHY_BAD: learning extraction happens continuously. A retrospective ceremony adds no value over what the knowledge pipeline already produces. INSTEAD: PM reviews knowledge_patterns weekly. Promotes high-confidence learnings to wiki.

ANTI_PATTERN_6: "Pair programming between agents" WHY_BAD: two agents working on the same WP doubles the cost with no quality benefit. Agents do not learn from each other in real-time. INSTEAD: the anti-LLM pipeline provides the "second pair of eyes" at a fraction of the cost.