GE TDD Workflow¶
Pipeline Overview¶
GE's TDD pipeline is a six-stage process with explicit handoffs between specialized agents. Each stage has a single owner, defined inputs, defined outputs, and clear escalation paths.
Stage 1: SPECIFY → Anna (Formal Specification)
Stage 2: TEST → Antje (Test Generation)
Stage 3: IMPLEMENT → Developer Agents (Code to Pass Tests)
Stage 4: GATE → Koen (Deterministic Quality Checks)
Stage 5: INTEGRATE → Marije + Judith (Integration/E2E)
Stage 6: RECONCILE → Jasper (TDD vs Post-Impl Gap Analysis)
Stage 1: SPECIFY — Anna Produces Formal Specification¶
OWNER: Anna
INPUT: Aimee's scope document (functional specification)
OUTPUT: Formal YAML specification with invariants, edge cases, pre/post-conditions
TRIGGER: functional.spec.stored via Redis Stream
What Anna Does¶
- Reads Aimee's functional specification
- Extracts behavioral requirements
- Identifies every edge case
- Defines at least 3 invariants per function/feature
- Maps pre-conditions and post-conditions
- Verifies constitution compliance
- Publishes formal spec to
formal.spec.created
Decision Tree: Anna¶
CHECK: Is the functional spec unambiguous?
IF: Yes
THEN: Proceed with formalization
IF: No
THEN: Escalate to Aimee with specific ambiguous elements
IF: Aimee cannot resolve
THEN: Aimee escalates to client via Faye/Sytske
THEN: Anna WAITS. Does not guess.
CHECK: Does the feature require multiple formal specs?
IF: Spec exceeds 15,000 tokens
THEN: Split into one spec per function/feature
THEN: Maintain lineage tracking between split specs
IF: Spec is within budget
THEN: Publish as single spec
ANTI_PATTERN: Anna guessing at ambiguous requirements. FIX: Every ambiguity MUST be resolved through the escalation chain, never assumed.
Handoff to Stage 2¶
Anna publishes to formal.spec.created. Annegreet stores the spec and notifies downstream consumers. Antje receives the notification and begins test generation.
RULE: The handoff is AUTOMATIC. Anna does not coordinate with Antje directly. RULE: Antje must NOT begin test generation until the formal spec is stored and published.
Stage 2: TEST — Antje Generates Test Suite from Spec¶
OWNER: Antje
INPUT: Anna's formal specification (YAML)
OUTPUT: Complete test suite (unit tests + integration contracts)
TRIGGER: formal.spec.created or formal.spec.updated via Redis Stream
What Antje Does¶
- Reads Anna's formal specification
- Maps each invariant to one or more test assertions
- Maps each edge case to a dedicated test
- Maps each pre-condition to a guard test (verify rejection on violation)
- Maps each post-condition to a verification test
- Generates integration contract tests for cross-boundary behavior
- Ensures every test FAILS (no implementation exists yet)
- Publishes test suite
Mapping Rules¶
| Spec Element | Test Type | Example |
|---|---|---|
| Invariant | Property assertion | expect(account.balance).toBeGreaterThanOrEqual(0) |
| Pre-condition | Guard/rejection test | expect(() => withdraw(-1)).toThrow() |
| Post-condition | State verification | expect(order.status).toBe('confirmed') after confirmOrder() |
| Edge case | Boundary test | expect(search('')).toEqual([]) |
| Error condition | Negative test | expect(login('wrong-pass')).rejects.toThrow('InvalidCredentials') |
| State transition | Sequence test | draft → submitted → approved with assertions at each step |
Decision Tree: Antje¶
CHECK: Can every spec element be mapped to a test?
IF: Yes
THEN: Generate complete test suite
IF: No — spec element is untestable as written
THEN: Escalate to Anna with specific element
THEN: Anna revises spec to make it testable
THEN: Antje regenerates affected tests
CHECK: Does the test require external dependencies?
IF: Test needs a database
THEN: Use test database with known seed data
IF: Test needs an external API
THEN: Use contract test (verify request shape, mock response)
THEN: Flag for Marije/Judith to verify with real API in integration
IF: Test needs Redis
THEN: Use real Redis in test environment (not mocks)
CHECK: Are there tests that pass before implementation?
IF: Yes — test verifies existing functionality
THEN: Verify this is intentional (feature extension, not new feature)
IF: Yes — test is trivially true
THEN: DELETE the test. Rewrite with meaningful assertion.
ANTI_PATTERN: Antje reading existing code to inform test design. FIX: Antje reads ONLY the formal specification. Never the codebase.
Handoff to Stage 3¶
Antje commits the test suite to the repository. The test files are the developer's work order.
RULE: The test suite IS the specification for the developer. No additional instructions needed. RULE: Developer agents receive the test file paths, not prose descriptions of what to build.
Stage 3: IMPLEMENT — Developers Write Code to Pass Tests¶
OWNER: Team developer agents (assigned by Faye/Sytske) INPUT: Failing test suite from Antje OUTPUT: Implementation that passes all tests TRIGGER: Work package assignment via orchestrator
What Developers Do¶
- Read the failing test suite
- Run tests to confirm they all fail (RED)
- Implement the minimum code to make the first test pass
- Run tests — verify that test passes (GREEN)
- Implement the next test
- Repeat until all tests pass
- Refactor for code quality while keeping tests green
- Commit implementation
Decision Tree: Developer¶
CHECK: Does a test seem incorrect?
IF: Test assertion contradicts another test
THEN: STOP. Escalate to Antje. Do NOT modify the test.
IF: Test seems to test the wrong behavior
THEN: STOP. Escalate to Antje. Do NOT modify the test.
IF: Test requires a design decision not covered in the spec
THEN: STOP. Escalate to Antje → Anna → Aimee chain.
CHECK: All tests pass. Is the implementation complete?
IF: All tests pass and code is clean
THEN: Commit and hand off to Koen
IF: All tests pass but code has obvious gaps
THEN: Flag for Jasper (Stage 6) — possible spec gap
THEN: Still commit and proceed
RULE: Developers MUST NOT write new tests during implementation. RATIONALE: If a developer discovers an untested edge case, they report it to Antje. Antje decides whether to add a test. This preserves oracle independence.
ANTI_PATTERN: Developer modifying a test to make their implementation pass. FIX: This is a CRITICAL VIOLATION. The test reflects the spec. If the test seems wrong, escalate. Never silently change it.
Handoff to Stage 4¶
Developer commits code. Koen's quality gates trigger automatically.
Stage 4: GATE — Koen Runs Deterministic Quality Checks¶
OWNER: Koen INPUT: Developer's committed code OUTPUT: Pass/fail report for each quality dimension TRIGGER: Commit or PR event
What Koen Checks¶
- Lint — Code style, formatting, import ordering
- TypeCheck — Static type analysis (TypeScript strict, Python mypy)
- Build — Compilation succeeds without warnings
- Dead Code — No unreachable code, no unused exports
- Mutation Testing — Tests are meaningful (mutations cause failures)
- Test Coverage — All spec-derived paths are covered
Decision Tree: Koen¶
CHECK: Do all deterministic gates pass?
IF: Yes
THEN: Approve and forward to Marije/Judith
IF: Lint/TypeCheck/Build fails
THEN: Reject back to developer with specific errors
THEN: Developer fixes and resubmits
IF: Mutation testing reveals surviving mutants
THEN: Escalate to Antje — test suite has gaps
THEN: Antje reviews surviving mutants against spec
THEN: Antje adds tests or confirms spec does not require them
IF: Dead code detected
THEN: Reject back to developer — remove dead code
RULE: Koen's checks are fully deterministic. No LLM judgment. Pass or fail. RULE: Mutation testing is mandatory for all business logic. UI-only code is exempt.
Handoff to Stage 5¶
Koen's approval triggers integration testing.
Stage 5: INTEGRATE — Marije + Judith Run Integration/E2E Tests¶
OWNER: Marije (integration), Judith (E2E) INPUT: Koen-approved code OUTPUT: Integration test results, E2E test results TRIGGER: Quality gate approval
What Marije Does (Integration)¶
- Tests cross-service communication (API → DB → Redis → response)
- Tests contract compliance between services
- Tests database migration compatibility
- Tests real network paths (not mocks)
- Tests error propagation across boundaries
What Judith Does (E2E)¶
- Tests complete user workflows
- Tests UI rendering with real data
- Tests authentication/authorization flows
- Tests deployment configuration (environment variables, secrets)
- Tests rollback scenarios
Decision Tree: Marije/Judith¶
CHECK: Do integration/E2E tests pass?
IF: Yes
THEN: Feature is deployment-ready. Forward to Jasper for reconciliation.
IF: Integration test fails — contract mismatch
THEN: Escalate to developer — implementation violates integration contract
IF: E2E test fails — workflow broken
THEN: Escalate to developer — feature works in isolation but breaks in context
IF: New integration behavior discovered (not in spec)
THEN: Report to Jasper for Stage 6 analysis
RULE: Integration tests use real services, real databases, real Redis. Never mocks. RULE: E2E tests run in a staging environment that mirrors production.
Handoff to Stage 6¶
Marije/Judith results feed into Jasper's reconciliation.
Stage 6: RECONCILE — Jasper Finds Gaps Between TDD and Post-Impl¶
OWNER: Jasper INPUT: Antje's original TDD test suite + Marije/Judith's integration/E2E tests + developer code OUTPUT: Gap analysis report TRIGGER: Stage 5 completion
What Jasper Does¶
- Compares Antje's TDD tests against Marije/Judith's integration tests
- Identifies behaviors tested in integration but NOT in unit tests
- Identifies unit tests that pass but fail in integration (environment assumptions)
- Identifies spec elements with no corresponding test at any level
- Identifies tests at any level with no corresponding spec element
- Produces gap report
Decision Tree: Jasper¶
CHECK: Are there behaviors tested in integration but not in TDD tests?
IF: Yes — integration test covers something Antje missed
THEN: Report to Antje — potential spec gap
THEN: Antje traces back to Anna's spec
THEN: If spec is incomplete → Anna revises → Antje adds TDD test
THEN: If spec is complete but Antje missed a mapping → Antje adds test
CHECK: Are there TDD tests that pass in isolation but fail in integration?
IF: Yes — environmental assumption in TDD test
THEN: Report to Antje — test needs to account for real environment
THEN: Developer may need to adjust implementation
CHECK: Are there spec elements with no test at ANY level?
IF: Yes — coverage gap
THEN: CRITICAL — report to Antje and Marije/Judith
THEN: Tests must be added before deployment
CHECK: Are there tests with no corresponding spec element?
IF: Yes — speculative test
THEN: Review whether this reveals a spec gap or is an unnecessary test
THEN: If spec gap → Anna adds to spec → Antje regenerates test
THEN: If unnecessary → remove test
RULE: Jasper does not write tests or modify code. Jasper reports. Others act. RULE: Zero spec coverage gaps are allowed at deployment time.
When Tests Need Changing¶
Tests are not immutable, but they change through a controlled process:
Trigger: Bug Found in Production¶
- Jasper documents the gap — which spec element was not adequately tested
- Anna reviews whether the spec covered this case
- IF spec covered it: Antje adds missing test. Developer was wrong.
- IF spec did NOT cover it: Anna revises spec. Antje generates new test. Developer implements fix.
Trigger: Requirement Changed by Client¶
- Aimee updates the scope document
- Anna revises the formal specification
- Antje regenerates affected tests
- Developer modifies implementation to pass new tests
- Old tests that no longer apply are removed by Antje (not by developer)
Trigger: Surviving Mutant Found by Koen¶
- Koen reports the surviving mutant (code change that did not break any test)
- Antje reviews the mutant against the spec
- IF the mutation violates a spec requirement: Antje adds a test that kills it
- IF the mutation is semantically equivalent: Antje documents it as acceptable
Timing and SLAs¶
| Stage | Owner | Target Duration | Escalation After |
|---|---|---|---|
| Specify | Anna | 30 minutes | 2 hours |
| Test | Antje | 45 minutes | 3 hours |
| Implement | Developer | 2 hours | 8 hours |
| Gate | Koen | 10 minutes | 30 minutes |
| Integrate | Marije/Judith | 30 minutes | 2 hours |
| Reconcile | Jasper | 20 minutes | 1 hour |
RULE: If any stage exceeds its escalation time, the team PM (Faye/Sytske) is notified. RULE: Total pipeline time from spec to deployment-ready should be under 4 hours for standard features.
Related Documentation¶
- TDD Philosophy — Why TDD is mandatory in GE
- TDD Patterns — Domain-specific testing patterns
- Agentic TDD — AI-specific TDD considerations
- TDD Pitfalls — Common mistakes and anti-patterns
- Formal Specification Workflow — How specs are produced
- Spec-Driven Testing — How specs feed tests