Skip to content

GE TDD Workflow

Pipeline Overview

GE's TDD pipeline is a six-stage process with explicit handoffs between specialized agents. Each stage has a single owner, defined inputs, defined outputs, and clear escalation paths.

Stage 1: SPECIFY    → Anna (Formal Specification)
Stage 2: TEST       → Antje (Test Generation)
Stage 3: IMPLEMENT  → Developer Agents (Code to Pass Tests)
Stage 4: GATE       → Koen (Deterministic Quality Checks)
Stage 5: INTEGRATE  → Marije + Judith (Integration/E2E)
Stage 6: RECONCILE  → Jasper (TDD vs Post-Impl Gap Analysis)

Stage 1: SPECIFY — Anna Produces Formal Specification

OWNER: Anna INPUT: Aimee's scope document (functional specification) OUTPUT: Formal YAML specification with invariants, edge cases, pre/post-conditions TRIGGER: functional.spec.stored via Redis Stream

What Anna Does

  1. Reads Aimee's functional specification
  2. Extracts behavioral requirements
  3. Identifies every edge case
  4. Defines at least 3 invariants per function/feature
  5. Maps pre-conditions and post-conditions
  6. Verifies constitution compliance
  7. Publishes formal spec to formal.spec.created

Decision Tree: Anna

CHECK: Is the functional spec unambiguous?
  IF: Yes
    THEN: Proceed with formalization
  IF: No
    THEN: Escalate to Aimee with specific ambiguous elements
      IF: Aimee cannot resolve
        THEN: Aimee escalates to client via Faye/Sytske
        THEN: Anna WAITS. Does not guess.
CHECK: Does the feature require multiple formal specs?
  IF: Spec exceeds 15,000 tokens
    THEN: Split into one spec per function/feature
    THEN: Maintain lineage tracking between split specs
  IF: Spec is within budget
    THEN: Publish as single spec

ANTI_PATTERN: Anna guessing at ambiguous requirements. FIX: Every ambiguity MUST be resolved through the escalation chain, never assumed.

Handoff to Stage 2

Anna publishes to formal.spec.created. Annegreet stores the spec and notifies downstream consumers. Antje receives the notification and begins test generation.

RULE: The handoff is AUTOMATIC. Anna does not coordinate with Antje directly. RULE: Antje must NOT begin test generation until the formal spec is stored and published.


Stage 2: TEST — Antje Generates Test Suite from Spec

OWNER: Antje INPUT: Anna's formal specification (YAML) OUTPUT: Complete test suite (unit tests + integration contracts) TRIGGER: formal.spec.created or formal.spec.updated via Redis Stream

What Antje Does

  1. Reads Anna's formal specification
  2. Maps each invariant to one or more test assertions
  3. Maps each edge case to a dedicated test
  4. Maps each pre-condition to a guard test (verify rejection on violation)
  5. Maps each post-condition to a verification test
  6. Generates integration contract tests for cross-boundary behavior
  7. Ensures every test FAILS (no implementation exists yet)
  8. Publishes test suite

Mapping Rules

Spec Element Test Type Example
Invariant Property assertion expect(account.balance).toBeGreaterThanOrEqual(0)
Pre-condition Guard/rejection test expect(() => withdraw(-1)).toThrow()
Post-condition State verification expect(order.status).toBe('confirmed') after confirmOrder()
Edge case Boundary test expect(search('')).toEqual([])
Error condition Negative test expect(login('wrong-pass')).rejects.toThrow('InvalidCredentials')
State transition Sequence test draft → submitted → approved with assertions at each step

Decision Tree: Antje

CHECK: Can every spec element be mapped to a test?
  IF: Yes
    THEN: Generate complete test suite
  IF: No — spec element is untestable as written
    THEN: Escalate to Anna with specific element
    THEN: Anna revises spec to make it testable
    THEN: Antje regenerates affected tests
CHECK: Does the test require external dependencies?
  IF: Test needs a database
    THEN: Use test database with known seed data
  IF: Test needs an external API
    THEN: Use contract test (verify request shape, mock response)
    THEN: Flag for Marije/Judith to verify with real API in integration
  IF: Test needs Redis
    THEN: Use real Redis in test environment (not mocks)
CHECK: Are there tests that pass before implementation?
  IF: Yes — test verifies existing functionality
    THEN: Verify this is intentional (feature extension, not new feature)
  IF: Yes — test is trivially true
    THEN: DELETE the test. Rewrite with meaningful assertion.

ANTI_PATTERN: Antje reading existing code to inform test design. FIX: Antje reads ONLY the formal specification. Never the codebase.

Handoff to Stage 3

Antje commits the test suite to the repository. The test files are the developer's work order.

RULE: The test suite IS the specification for the developer. No additional instructions needed. RULE: Developer agents receive the test file paths, not prose descriptions of what to build.


Stage 3: IMPLEMENT — Developers Write Code to Pass Tests

OWNER: Team developer agents (assigned by Faye/Sytske) INPUT: Failing test suite from Antje OUTPUT: Implementation that passes all tests TRIGGER: Work package assignment via orchestrator

What Developers Do

  1. Read the failing test suite
  2. Run tests to confirm they all fail (RED)
  3. Implement the minimum code to make the first test pass
  4. Run tests — verify that test passes (GREEN)
  5. Implement the next test
  6. Repeat until all tests pass
  7. Refactor for code quality while keeping tests green
  8. Commit implementation

Decision Tree: Developer

CHECK: Does a test seem incorrect?
  IF: Test assertion contradicts another test
    THEN: STOP. Escalate to Antje. Do NOT modify the test.
  IF: Test seems to test the wrong behavior
    THEN: STOP. Escalate to Antje. Do NOT modify the test.
  IF: Test requires a design decision not covered in the spec
    THEN: STOP. Escalate to Antje → Anna → Aimee chain.
CHECK: All tests pass. Is the implementation complete?
  IF: All tests pass and code is clean
    THEN: Commit and hand off to Koen
  IF: All tests pass but code has obvious gaps
    THEN: Flag for Jasper (Stage 6) — possible spec gap
    THEN: Still commit and proceed

RULE: Developers MUST NOT write new tests during implementation. RATIONALE: If a developer discovers an untested edge case, they report it to Antje. Antje decides whether to add a test. This preserves oracle independence.

ANTI_PATTERN: Developer modifying a test to make their implementation pass. FIX: This is a CRITICAL VIOLATION. The test reflects the spec. If the test seems wrong, escalate. Never silently change it.

Handoff to Stage 4

Developer commits code. Koen's quality gates trigger automatically.


Stage 4: GATE — Koen Runs Deterministic Quality Checks

OWNER: Koen INPUT: Developer's committed code OUTPUT: Pass/fail report for each quality dimension TRIGGER: Commit or PR event

What Koen Checks

  1. Lint — Code style, formatting, import ordering
  2. TypeCheck — Static type analysis (TypeScript strict, Python mypy)
  3. Build — Compilation succeeds without warnings
  4. Dead Code — No unreachable code, no unused exports
  5. Mutation Testing — Tests are meaningful (mutations cause failures)
  6. Test Coverage — All spec-derived paths are covered

Decision Tree: Koen

CHECK: Do all deterministic gates pass?
  IF: Yes
    THEN: Approve and forward to Marije/Judith
  IF: Lint/TypeCheck/Build fails
    THEN: Reject back to developer with specific errors
    THEN: Developer fixes and resubmits
  IF: Mutation testing reveals surviving mutants
    THEN: Escalate to Antje — test suite has gaps
    THEN: Antje reviews surviving mutants against spec
    THEN: Antje adds tests or confirms spec does not require them
  IF: Dead code detected
    THEN: Reject back to developer — remove dead code

RULE: Koen's checks are fully deterministic. No LLM judgment. Pass or fail. RULE: Mutation testing is mandatory for all business logic. UI-only code is exempt.

Handoff to Stage 5

Koen's approval triggers integration testing.


Stage 5: INTEGRATE — Marije + Judith Run Integration/E2E Tests

OWNER: Marije (integration), Judith (E2E) INPUT: Koen-approved code OUTPUT: Integration test results, E2E test results TRIGGER: Quality gate approval

What Marije Does (Integration)

  1. Tests cross-service communication (API → DB → Redis → response)
  2. Tests contract compliance between services
  3. Tests database migration compatibility
  4. Tests real network paths (not mocks)
  5. Tests error propagation across boundaries

What Judith Does (E2E)

  1. Tests complete user workflows
  2. Tests UI rendering with real data
  3. Tests authentication/authorization flows
  4. Tests deployment configuration (environment variables, secrets)
  5. Tests rollback scenarios

Decision Tree: Marije/Judith

CHECK: Do integration/E2E tests pass?
  IF: Yes
    THEN: Feature is deployment-ready. Forward to Jasper for reconciliation.
  IF: Integration test fails — contract mismatch
    THEN: Escalate to developer — implementation violates integration contract
  IF: E2E test fails — workflow broken
    THEN: Escalate to developer — feature works in isolation but breaks in context
  IF: New integration behavior discovered (not in spec)
    THEN: Report to Jasper for Stage 6 analysis

RULE: Integration tests use real services, real databases, real Redis. Never mocks. RULE: E2E tests run in a staging environment that mirrors production.

Handoff to Stage 6

Marije/Judith results feed into Jasper's reconciliation.


Stage 6: RECONCILE — Jasper Finds Gaps Between TDD and Post-Impl

OWNER: Jasper INPUT: Antje's original TDD test suite + Marije/Judith's integration/E2E tests + developer code OUTPUT: Gap analysis report TRIGGER: Stage 5 completion

What Jasper Does

  1. Compares Antje's TDD tests against Marije/Judith's integration tests
  2. Identifies behaviors tested in integration but NOT in unit tests
  3. Identifies unit tests that pass but fail in integration (environment assumptions)
  4. Identifies spec elements with no corresponding test at any level
  5. Identifies tests at any level with no corresponding spec element
  6. Produces gap report

Decision Tree: Jasper

CHECK: Are there behaviors tested in integration but not in TDD tests?
  IF: Yes — integration test covers something Antje missed
    THEN: Report to Antje — potential spec gap
    THEN: Antje traces back to Anna's spec
    THEN: If spec is incomplete → Anna revises → Antje adds TDD test
    THEN: If spec is complete but Antje missed a mapping → Antje adds test

CHECK: Are there TDD tests that pass in isolation but fail in integration?
  IF: Yes — environmental assumption in TDD test
    THEN: Report to Antje — test needs to account for real environment
    THEN: Developer may need to adjust implementation

CHECK: Are there spec elements with no test at ANY level?
  IF: Yes — coverage gap
    THEN: CRITICAL — report to Antje and Marije/Judith
    THEN: Tests must be added before deployment

CHECK: Are there tests with no corresponding spec element?
  IF: Yes — speculative test
    THEN: Review whether this reveals a spec gap or is an unnecessary test
    THEN: If spec gap → Anna adds to spec → Antje regenerates test
    THEN: If unnecessary → remove test

RULE: Jasper does not write tests or modify code. Jasper reports. Others act. RULE: Zero spec coverage gaps are allowed at deployment time.


When Tests Need Changing

Tests are not immutable, but they change through a controlled process:

Trigger: Bug Found in Production

  1. Jasper documents the gap — which spec element was not adequately tested
  2. Anna reviews whether the spec covered this case
  3. IF spec covered it: Antje adds missing test. Developer was wrong.
  4. IF spec did NOT cover it: Anna revises spec. Antje generates new test. Developer implements fix.

Trigger: Requirement Changed by Client

  1. Aimee updates the scope document
  2. Anna revises the formal specification
  3. Antje regenerates affected tests
  4. Developer modifies implementation to pass new tests
  5. Old tests that no longer apply are removed by Antje (not by developer)

Trigger: Surviving Mutant Found by Koen

  1. Koen reports the surviving mutant (code change that did not break any test)
  2. Antje reviews the mutant against the spec
  3. IF the mutation violates a spec requirement: Antje adds a test that kills it
  4. IF the mutation is semantically equivalent: Antje documents it as acceptable

Timing and SLAs

Stage Owner Target Duration Escalation After
Specify Anna 30 minutes 2 hours
Test Antje 45 minutes 3 hours
Implement Developer 2 hours 8 hours
Gate Koen 10 minutes 30 minutes
Integrate Marije/Judith 30 minutes 2 hours
Reconcile Jasper 20 minutes 1 hour

RULE: If any stage exceeds its escalation time, the team PM (Faye/Sytske) is notified. RULE: Total pipeline time from spec to deployment-ready should be under 4 hours for standard features.