DOMAIN:PROJECT_MANAGEMENT:AGILE_AGENTIC¶
OWNER: faye (Team Alpha PM), sytske (Team Beta PM) UPDATED: 2026-03-24 SCOPE: adapting agile/scrum methodology for multi-agent development PREREQUISITE: work-package-design.md, delivery-lifecycle.md
PRINCIPLE: AGILE_FOR_AGENTS_IS_NOT_AGILE_FOR_HUMANS¶
RULE: agentic development borrows agile values but reinvents the practices. RULE: ceremonies that exist for human communication are replaced by data streams. RULE: estimation changes fundamentally — agents do not have "good days" and "bad days." REASON: agile was designed for teams of 5-9 humans with meetings, whiteboards, and social dynamics. Agents have none of these. Blindly applying scrum to agents wastes tokens on ceremony and misses the actual leverage points.
TRANSLATION_MAP¶
| Human Agile Concept | Agentic Equivalent | Why Different |
|--------------------------|-----------------------------------|-----------------------------------|
| Sprint (2 weeks) | Continuous flow with gates | Agents don't need rhythm |
| Daily standup | Redis status streams | Real-time, not once-a-day |
| Sprint planning | WP set creation (one-time) | No recurring planning needed |
| Sprint review | Gate verification | Automated, not a meeting |
| Retrospective | Learning extraction pipeline | Continuous, not per-sprint |
| Story points | WP size (S/M/L) + token estimate | More precise, less political |
| Product backlog | Functional spec (frozen) | Scope is frozen post-gate |
| Sprint backlog | Active WP set | No sprints, just the WP queue |
| Scrum master | ge-orchestrator | Software, not a person |
| Product owner | Aimee + client | Spec is the authority |
| Definition of done | WP acceptance criteria + tests | Machine-verifiable |
| Burndown chart | WP completion rate over time | Linear, not S-curve |
| Velocity | WPs per agent-hour | Deterministic, not estimated |
| Technical debt | Learning gaps in wiki brain | Knowledge debt, not code debt |
| Code review | Anti-LLM pipeline | Multi-agent automated review |
CONTINUOUS_FLOW_VS_SPRINTS¶
WHY_NOT_SPRINTS¶
ARGUMENT_1: sprints create artificial batching. Agents can start the next WP the moment the current one completes. Waiting for a sprint boundary wastes hours. ARGUMENT_2: sprints create artificial deadlines. A 2-week sprint ending on Friday creates pressure to cut corners. Agents do not respond to deadline pressure — they either complete the WP or fail. ARGUMENT_3: sprint planning is overhead. In agentic dev, the WP set is planned once during Phase 6 and adjusted only on replanning triggers. There is no recurring ceremony. ARGUMENT_4: sprint ceremonies consume tokens. A "standup" implemented as an agent conversation costs $2-5 per agent per day. For 10 agents, that is $50/day on status updates. Status streams cost $0.
CONTINUOUS_FLOW_MODEL¶
┌─────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ (dispatches WPs based on DAG + capacity) │
└─────────┬───────────────┬──────────────────┘
│ │
┌─────────▼───┐ ┌──────▼────────┐
│ Agent A │ │ Agent B │
│ WP-005 │ │ WP-007 │
│ (working) │ │ (working) │
└─────────┬───┘ └──────┬────────┘
│ │
┌─────────▼───┐ ┌──────▼────────┐
│ COMPLETE │ │ COMPLETE │
│ → next WP │ │ → next WP │
│ dispatched │ │ dispatched │
│ immediately│ │ immediately │
└─────────────┘ └───────────────┘
RULE: when agent finishes WP, orchestrator immediately dispatches the next eligible WP. RULE: "eligible" means all hard dependencies resolved and agent is in the correct swimming lane. RULE: no batching, no waiting, no ceremonies between WPs. RULE: the only pauses are gates between phases (spec → design → planning → dev → integration).
WHEN_FLOW_PAUSES¶
PAUSE_1: phase gates — development cannot start until planning gate passes. PAUSE_2: dependency waiting — WP-008 cannot start until WP-005 (its hard dep) completes. PAUSE_3: replanning — PM pauses flow to restructure WPs after a trigger fires. PAUSE_4: client input needed — UAT feedback, scope clarification, design approval.
STANDUPS_VS_STATUS_STREAMS¶
HUMAN_STANDUP_PROBLEMS_FOR_AGENTS¶
PROBLEM_1: synchronous. All agents must pause work to "attend." Token waste. PROBLEM_2: periodic. Information is stale by the time the standup happens. PROBLEM_3: verbal. Agents communicate via structured data, not conversation.
STATUS_STREAM_MODEL¶
Every agent publishes to a Redis stream on WP completion:
{
"agent": "urszula",
"work_item_id": "WP-005",
"status": "COMPLETE",
"duration_seconds": 1847,
"token_cost_usd": 1.23,
"files_modified": ["lib/api/routes/orders.ts", "drizzle/migrations/003_orders.sql"],
"acceptance_criteria_met": ["REQ-006-AC1", "REQ-006-AC2", "REQ-006-AC3"],
"notes": "Added index on orders.client_id for query performance",
"timestamp": "2026-03-24T14:32:00Z"
}
PM DASHBOARD aggregates these streams into a real-time view: - Which agents are working on which WPs right now - Which WPs completed in the last hour - Which WPs are blocked and why - Critical path progress - Cost accumulation
ADVANTAGE: information is real-time, structured, queryable, and costs zero tokens beyond the completion event.
PM_DAILY_DIGEST¶
Instead of a standup meeting, the PM generates a daily digest from the status stream:
## Project Status: {project_code} — {date} {time}
### Completed Since Last Digest
- WP-005: Orders API (urszula) — 31 min, $1.23
- WP-007: Login page (floris) — 22 min, $0.87
### In Progress
- WP-006: Invoice API (urszula) — started 14:35, running
### Blocked
- WP-008: Orders UI — waiting on WP-005 (now unblocked, dispatching)
### Metrics
- WPs completed today: 2
- Agent utilization: 78%
- Budget consumed: $12.40 / $85.00 (14.6%)
RETROSPECTIVES_VS_LEARNING_EXTRACTION¶
HUMAN_RETRO_FORMAT¶
In human agile: team gathers, discusses what went well, what went badly, creates action items. Happens once per sprint. Often stale. Action items frequently forgotten.
AGENTIC_LEARNING_EXTRACTION¶
In GE: learning extraction happens automatically after every agent session.
LAYER 1: session_summarizer.py
- Runs inline at session end
- Extracts structured learnings from session transcript
- Costs ~$0.01-0.03 per session (Haiku)
- Stores in session_learnings table
LAYER 2: knowledge_synthesizer.py
- Runs every 6 hours
- Cross-session pattern detection
- Identifies recurring issues, effective solutions, emerging best practices
- Stores in knowledge_patterns table with confidence scores
LAYER 3: Wiki brain
- High-confidence patterns are promoted to wiki pages
- JIT-injected into agent boot context
- Agents benefit from learnings of ALL other agents
RETRO_EQUIVALENT_FOR_PMS¶
The PM does not need a ceremony. Instead:
DAILY: review knowledge_patterns flagged in the last 24 hours. WEEKLY: review patterns with confidence > 0.8 that are not yet in wiki. PER_PROJECT: after handoff, review all session_learnings for that project. Identify systematic issues.
ACTION_ITEMS are wiki updates, not meeting notes. Once a learning is in the wiki, it is automatically injected into future agent sessions. No "action item tracking" needed.
ESTIMATION_IN_AGENTIC_CONTEXT¶
WHY_STORY_POINTS_FAIL¶
STORY_POINTS_PROBLEM_1: story points measure relative human effort. Agent effort is determined by task complexity and context window usage, not human intuition. STORY_POINTS_PROBLEM_2: planning poker creates consensus through discussion. Agents do not discuss estimates — they execute. STORY_POINTS_PROBLEM_3: velocity in story points is unstable for humans (sick days, motivation, learning curves). Agent velocity is stable within narrow bounds.
AGENTIC_ESTIMATION_MODEL¶
DIMENSION_1: WP size category - S (Small): 15-20 min, 1-3 files, straightforward implementation - M (Medium): 20-35 min, 4-7 files, moderate complexity - L (Large): 35-45 min, 8+ files, significant complexity
DIMENSION_2: token cost estimate - S: $0.30-0.80 - M: $0.80-2.00 - L: $2.00-4.50
DIMENSION_3: session probability - Single session (expected): 90% of WPs - Multi-session (needs context rebuild): 8% of WPs - Needs PM intervention (blocker, spec gap): 2% of WPs
ESTIMATION_TECHNIQUE¶
STEP_1: count files to create or modify → determines size category. STEP_2: check if similar WPs have been done before → use historical data from session_learnings. STEP_3: multiply size category by standard rate → gives time and cost estimate. STEP_4: add 20% buffer for first-of-kind WPs (no historical data). STEP_5: critical path WPs get 30% buffer (blocking other work if they slip).
PROJECT_ESTIMATION¶
Total estimated time = sum of WP estimates on critical path
+ parallel group overhead (10% per parallel group)
+ gate processing time (1 hour per gate for phases 3-6)
+ integration testing (20% of development time)
+ UAT buffer (3 days minimum for client response time)
Total estimated cost = sum of all WP cost estimates
+ overhead agents (linting, testing, reconciliation: 30% of dev cost)
+ scoping and spec cost ($5-15 for Opus sessions)
+ integration testing cost (20% of development cost)
KANBAN_FOR_AGENT_WORK_QUEUES¶
WHY_KANBAN_FITS_AGENTIC_WORK¶
FIT_1: kanban is pull-based. Agents pull work when ready. No sprint commitment. FIT_2: kanban has WIP limits. Agents have hard WIP limit of 1 (one WP at a time). FIT_3: kanban visualizes flow. Status streams feed a real-time board. FIT_4: kanban optimizes throughput. Critical path optimization = throughput optimization.
AGENTIC_KANBAN_BOARD¶
| BACKLOG | READY | IN PROGRESS | REVIEW | DONE |
|--------------|-------------|--------------|--------------|-------------|
| WP-015 | WP-010 | WP-006 (urs) | WP-005 (koen)| WP-001 |
| WP-016 | WP-011 | WP-009 (flo) | | WP-002 |
| WP-017 | WP-012 | | | WP-003 |
| WP-018 | | | | WP-004 |
| | | | | WP-007 |
| | | | | WP-008 |
COLUMN_DEFINITIONS: - BACKLOG: WP defined but dependencies not yet resolved - READY: all dependencies resolved, waiting for agent availability - IN_PROGRESS: agent is executing this WP right now - REVIEW: WP complete, going through anti-LLM pipeline (lint → test → reconcile) - DONE: fully verified, merged
WIP_LIMITS¶
| Column | WIP Limit | Reason |
|--------------|-----------|-------------------------------------------|
| BACKLOG | unlimited | all future work lives here |
| READY | 10 | too many ready = agents can't keep up |
| IN_PROGRESS | 5 | max parallel per feature branch |
| REVIEW | 3 | pipeline is sequential, don't queue too many |
| DONE | unlimited | completed work stays forever |
FLOW_METRICS¶
METRIC_1: lead time — time from WP entering BACKLOG to DONE. Target: < 4 hours for M-sized WPs. METRIC_2: cycle time — time from IN_PROGRESS to DONE. Target: < 90 minutes including review. METRIC_3: throughput — WPs reaching DONE per hour. Target: 2-3 per hour with full team. METRIC_4: WIP age — how long a WP has been IN_PROGRESS. Alert if > 60 minutes. METRIC_5: blocked time — how long a WP sits in BACKLOG waiting on deps. Track to optimize DAG.
ROLES_REDEFINED¶
PRODUCT_OWNER → AIMEE + CLIENT¶
Human agile: PO prioritizes backlog, writes stories, accepts deliverables. Agentic: Aimee writes the functional spec (stories equivalent). Client approves at gates. PM tracks but does not reprioritize — the spec defines priority.
DIFFERENCE: in human agile, PO constantly reprioritizes. In agentic, scope is frozen after Phase 3 gate. Reprioritization only happens through formal change requests.
SCRUM_MASTER → ORCHESTRATOR¶
Human agile: SM removes impediments, facilitates ceremonies, coaches team. Agentic: ge-orchestrator dispatches work, enforces WIP limits, handles blocked WPs. PM handles escalations that the orchestrator cannot resolve (spec gaps, client communication).
DIFFERENCE: the orchestrator is software. It cannot coach, motivate, or read the room. The PM fills this gap by monitoring for systemic issues (repeated failures, cost overruns, blocked agents).
DEVELOPMENT_TEAM → AGENT_POOL¶
Human agile: self-organizing team picks work, collaborates, pair-programs. Agentic: agents are assigned WPs based on swimming lane. They do not self-organize. They do not pair. They execute independently within their lane.
DIFFERENCE: agents have no social dynamics, no watercooler, no motivation issues. But they also cannot improvise or collaborate in real-time. The PM must pre-plan collaboration via WP dependencies.
CEREMONIES_THAT_SURVIVE¶
Not everything from agile is discarded. These concepts translate directly:
DEFINITION_OF_DONE¶
Survives as WP acceptance criteria + anti-LLM pipeline gates. Machine-verifiable, not negotiable.
AGENTIC DEFINITION OF DONE:
[ ] All acceptance criteria from WP met (verified by agent self-check)
[ ] Koen lint check passes (zero errors)
[ ] Antje TDD specs pass (all green)
[ ] Marije integration test passes (if applicable)
[ ] Jasper SSOT reconciliation passes (code matches spec)
[ ] No new TypeScript errors introduced
[ ] No console.log or debug statements
BACKLOG_REFINEMENT¶
Survives as PM reviewing upcoming WPs before they enter READY state. The PM checks: - Is the WP still correctly sized given what we learned from completed WPs? - Are the dependencies still accurate? - Is the acceptance criteria still aligned with the spec?
FREQUENCY: before each parallel group starts. Not a ceremony — a PM checklist.
DEMO / SHOWCASE¶
Survives as UAT (Phase 9). The client sees working software. But it happens once per project (at UAT), not every sprint. For larger projects with multiple features, the PM may arrange intermediate demos at natural milestones.
HANDLING_UNCERTAINTY¶
IN_HUMAN_AGILE¶
Uncertainty is managed through short iterations. Build a little, learn, adjust. Embrace change.
IN_AGENTIC_DEVELOPMENT¶
Uncertainty is front-loaded into phases 1-4 (intake → scoping → specification → design). By the time development starts, uncertainty should be near zero.
STRATEGY: invest more time in specification. A 2-hour spec session that eliminates ambiguity saves 10 hours of rework during development. Agents cannot "ask the PO" mid-sprint — they either have the information in the WP or they fail.
RESIDUAL_UNCERTAINTY_HANDLING¶
When an agent encounters something not covered by the spec: 1. Agent marks WP as BLOCKED with a specific question 2. Orchestrator pauses dependent WPs 3. PM routes the question: spec gap → Anna, scope gap → Aimee, client question → Dima 4. Answer comes back, PM unblocks WP with additional context 5. Agent resumes
COST_OF_BLOCKING: ~30 minutes average resolution time. This is why specs must be thorough.
METRICS_DASHBOARD¶
The PM monitors these metrics continuously via admin-ui:
THROUGHPUT:
- WPs completed per hour (team-wide)
- WPs completed per hour (per agent)
- Feature completion rate (features fully done / total features)
QUALITY:
- First-attempt success rate (WPs that pass review on first try)
- Rework rate (WPs that need revision after review)
- Bug rate in UAT (bugs found per feature)
COST:
- Cost per WP (average, by size category)
- Cost per feature (sum of WP costs)
- Cost vs estimate (actual / estimated)
- Budget burn rate (cumulative cost over time)
FLOW:
- Lead time (backlog → done)
- Cycle time (in progress → done)
- Blocked time (total hours WPs spent blocked)
- Queue depth (WPs in READY state)
ANTI_PATTERNS_IN_AGENTIC_AGILE¶
ANTI_PATTERN_1: "Let's do sprints anyway" WHY_BAD: creates artificial waiting. Agents complete WP-005 on Monday, but next sprint starts Wednesday. 2 days wasted. INSTEAD: continuous flow. Dispatch next WP immediately.
ANTI_PATTERN_2: "Agents should self-organize" WHY_BAD: agents have no social layer. Self-organization requires communication, negotiation, and shared understanding. Agents execute instructions. INSTEAD: PM pre-plans all assignment and sequencing. Orchestrator enforces.
ANTI_PATTERN_3: "Let's estimate in story points for the client" WHY_BAD: story points are a human abstraction. Clients want time and cost. Agents provide deterministic estimates. INSTEAD: estimate in hours and dollars. Report both to client.
ANTI_PATTERN_4: "Daily standup meeting with agents" WHY_BAD: costs $25-50/day in tokens. Produces no information that status streams don't already provide. INSTEAD: PM reads dashboard. Status streams are the standup.
ANTI_PATTERN_5: "Retrospective after every project" WHY_BAD: learning extraction happens continuously. A retrospective ceremony adds no value over what the knowledge pipeline already produces. INSTEAD: PM reviews knowledge_patterns weekly. Promotes high-confidence learnings to wiki.
ANTI_PATTERN_6: "Pair programming between agents" WHY_BAD: two agents working on the same WP doubles the cost with no quality benefit. Agents do not learn from each other in real-time. INSTEAD: the anti-LLM pipeline provides the "second pair of eyes" at a fraction of the cost.