GE TDD Workflow¶

Pipeline Overview¶

GE's TDD pipeline is a six-stage process with explicit handoffs between specialized agents. Each stage has a single owner, defined inputs, defined outputs, and clear escalation paths.

Stage 1: SPECIFY    → Anna (Formal Specification)
Stage 2: TEST       → Antje (Test Generation)
Stage 3: IMPLEMENT  → Developer Agents (Code to Pass Tests)
Stage 4: GATE       → Koen (Deterministic Quality Checks)
Stage 5: INTEGRATE  → Marije + Judith (Integration/E2E)
Stage 6: RECONCILE  → Jasper (TDD vs Post-Impl Gap Analysis)

Stage 1: SPECIFY — Anna Produces Formal Specification¶

OWNER: Anna INPUT: Aimee's scope document (functional specification) OUTPUT: Formal YAML specification with invariants, edge cases, pre/post-conditions TRIGGER: functional.spec.stored via Redis Stream

What Anna Does¶

Reads Aimee's functional specification
Extracts behavioral requirements
Identifies every edge case
Defines at least 3 invariants per function/feature
Maps pre-conditions and post-conditions
Verifies constitution compliance
Publishes formal spec to formal.spec.created

Decision Tree: Anna¶

CHECK: Is the functional spec unambiguous?
  IF: Yes
    THEN: Proceed with formalization
  IF: No
    THEN: Escalate to Aimee with specific ambiguous elements
      IF: Aimee cannot resolve
        THEN: Aimee escalates to client via Faye/Sytske
        THEN: Anna WAITS. Does not guess.

CHECK: Does the feature require multiple formal specs?
  IF: Spec exceeds 15,000 tokens
    THEN: Split into one spec per function/feature
    THEN: Maintain lineage tracking between split specs
  IF: Spec is within budget
    THEN: Publish as single spec

ANTI_PATTERN: Anna guessing at ambiguous requirements. FIX: Every ambiguity MUST be resolved through the escalation chain, never assumed.

Handoff to Stage 2¶

Anna publishes to formal.spec.created. Annegreet stores the spec and notifies downstream consumers. Antje receives the notification and begins test generation.

RULE: The handoff is AUTOMATIC. Anna does not coordinate with Antje directly. RULE: Antje must NOT begin test generation until the formal spec is stored and published.

Stage 2: TEST — Antje Generates Test Suite from Spec¶

OWNER: Antje INPUT: Anna's formal specification (YAML) OUTPUT: Complete test suite (unit tests + integration contracts) TRIGGER: formal.spec.created or formal.spec.updated via Redis Stream

What Antje Does¶

Reads Anna's formal specification
Maps each invariant to one or more test assertions
Maps each edge case to a dedicated test
Maps each pre-condition to a guard test (verify rejection on violation)
Maps each post-condition to a verification test
Generates integration contract tests for cross-boundary behavior
Ensures every test FAILS (no implementation exists yet)
Publishes test suite

Mapping Rules¶

Spec Element	Test Type	Example
Invariant	Property assertion	`expect(account.balance).toBeGreaterThanOrEqual(0)`
Pre-condition	Guard/rejection test	`expect(() => withdraw(-1)).toThrow()`
Post-condition	State verification	`expect(order.status).toBe('confirmed')` after `confirmOrder()`
Edge case	Boundary test	`expect(search('')).toEqual([])`
Error condition	Negative test	`expect(login('wrong-pass')).rejects.toThrow('InvalidCredentials')`
State transition	Sequence test	`draft → submitted → approved` with assertions at each step

Decision Tree: Antje¶

CHECK: Can every spec element be mapped to a test?
  IF: Yes
    THEN: Generate complete test suite
  IF: No — spec element is untestable as written
    THEN: Escalate to Anna with specific element
    THEN: Anna revises spec to make it testable
    THEN: Antje regenerates affected tests

CHECK: Does the test require external dependencies?
  IF: Test needs a database
    THEN: Use test database with known seed data
  IF: Test needs an external API
    THEN: Use contract test (verify request shape, mock response)
    THEN: Flag for Marije/Judith to verify with real API in integration
  IF: Test needs Redis
    THEN: Use real Redis in test environment (not mocks)

CHECK: Are there tests that pass before implementation?
  IF: Yes — test verifies existing functionality
    THEN: Verify this is intentional (feature extension, not new feature)
  IF: Yes — test is trivially true
    THEN: DELETE the test. Rewrite with meaningful assertion.

ANTI_PATTERN: Antje reading existing code to inform test design. FIX: Antje reads ONLY the formal specification. Never the codebase.

Handoff to Stage 3¶

Antje commits the test suite to the repository. The test files are the developer's work order.

RULE: The test suite IS the specification for the developer. No additional instructions needed. RULE: Developer agents receive the test file paths, not prose descriptions of what to build.

Stage 3: IMPLEMENT — Developers Write Code to Pass Tests¶

OWNER: Team developer agents (assigned by Faye/Sytske) INPUT: Failing test suite from Antje OUTPUT: Implementation that passes all tests TRIGGER: Work package assignment via orchestrator

What Developers Do¶

Read the failing test suite
Run tests to confirm they all fail (RED)
Implement the minimum code to make the first test pass
Run tests — verify that test passes (GREEN)
Implement the next test
Repeat until all tests pass
Refactor for code quality while keeping tests green
Commit implementation

Decision Tree: Developer¶

CHECK: Does a test seem incorrect?
  IF: Test assertion contradicts another test
    THEN: STOP. Escalate to Antje. Do NOT modify the test.
  IF: Test seems to test the wrong behavior
    THEN: STOP. Escalate to Antje. Do NOT modify the test.
  IF: Test requires a design decision not covered in the spec
    THEN: STOP. Escalate to Antje → Anna → Aimee chain.

CHECK: All tests pass. Is the implementation complete?
  IF: All tests pass and code is clean
    THEN: Commit and hand off to Koen
  IF: All tests pass but code has obvious gaps
    THEN: Flag for Jasper (Stage 6) — possible spec gap
    THEN: Still commit and proceed

RULE: Developers MUST NOT write new tests during implementation. RATIONALE: If a developer discovers an untested edge case, they report it to Antje. Antje decides whether to add a test. This preserves oracle independence.

ANTI_PATTERN: Developer modifying a test to make their implementation pass. FIX: This is a CRITICAL VIOLATION. The test reflects the spec. If the test seems wrong, escalate. Never silently change it.

Handoff to Stage 4¶

Developer commits code. Koen's quality gates trigger automatically.

Stage 4: GATE — Koen Runs Deterministic Quality Checks¶

OWNER: Koen INPUT: Developer's committed code OUTPUT: Pass/fail report for each quality dimension TRIGGER: Commit or PR event

What Koen Checks¶

Lint — Code style, formatting, import ordering
TypeCheck — Static type analysis (TypeScript strict, Python mypy)
Build — Compilation succeeds without warnings
Dead Code — No unreachable code, no unused exports
Mutation Testing — Tests are meaningful (mutations cause failures)
Test Coverage — All spec-derived paths are covered

Decision Tree: Koen¶

CHECK: Do all deterministic gates pass?
  IF: Yes
    THEN: Approve and forward to Marije/Judith
  IF: Lint/TypeCheck/Build fails
    THEN: Reject back to developer with specific errors
    THEN: Developer fixes and resubmits
  IF: Mutation testing reveals surviving mutants
    THEN: Escalate to Antje — test suite has gaps
    THEN: Antje reviews surviving mutants against spec
    THEN: Antje adds tests or confirms spec does not require them
  IF: Dead code detected
    THEN: Reject back to developer — remove dead code

RULE: Koen's checks are fully deterministic. No LLM judgment. Pass or fail. RULE: Mutation testing is mandatory for all business logic. UI-only code is exempt.

Handoff to Stage 5¶

Koen's approval triggers integration testing.

Stage 5: INTEGRATE — Marije + Judith Run Integration/E2E Tests¶

OWNER: Marije (integration), Judith (E2E) INPUT: Koen-approved code OUTPUT: Integration test results, E2E test results TRIGGER: Quality gate approval

What Marije Does (Integration)¶

Tests cross-service communication (API → DB → Redis → response)
Tests contract compliance between services
Tests database migration compatibility
Tests real network paths (not mocks)
Tests error propagation across boundaries

What Judith Does (E2E)¶

Tests complete user workflows
Tests UI rendering with real data
Tests authentication/authorization flows
Tests deployment configuration (environment variables, secrets)
Tests rollback scenarios

Decision Tree: Marije/Judith¶

CHECK: Do integration/E2E tests pass?
  IF: Yes
    THEN: Feature is deployment-ready. Forward to Jasper for reconciliation.
  IF: Integration test fails — contract mismatch
    THEN: Escalate to developer — implementation violates integration contract
  IF: E2E test fails — workflow broken
    THEN: Escalate to developer — feature works in isolation but breaks in context
  IF: New integration behavior discovered (not in spec)
    THEN: Report to Jasper for Stage 6 analysis

RULE: Integration tests use real services, real databases, real Redis. Never mocks. RULE: E2E tests run in a staging environment that mirrors production.

Handoff to Stage 6¶

Marije/Judith results feed into Jasper's reconciliation.

Stage 6: RECONCILE — Jasper Finds Gaps Between TDD and Post-Impl¶

OWNER: Jasper INPUT: Antje's original TDD test suite + Marije/Judith's integration/E2E tests + developer code OUTPUT: Gap analysis report TRIGGER: Stage 5 completion

What Jasper Does¶

Compares Antje's TDD tests against Marije/Judith's integration tests
Identifies behaviors tested in integration but NOT in unit tests
Identifies unit tests that pass but fail in integration (environment assumptions)
Identifies spec elements with no corresponding test at any level
Identifies tests at any level with no corresponding spec element
Produces gap report

Decision Tree: Jasper¶

CHECK: Are there behaviors tested in integration but not in TDD tests?
  IF: Yes — integration test covers something Antje missed
    THEN: Report to Antje — potential spec gap
    THEN: Antje traces back to Anna's spec
    THEN: If spec is incomplete → Anna revises → Antje adds TDD test
    THEN: If spec is complete but Antje missed a mapping → Antje adds test

CHECK: Are there TDD tests that pass in isolation but fail in integration?
  IF: Yes — environmental assumption in TDD test
    THEN: Report to Antje — test needs to account for real environment
    THEN: Developer may need to adjust implementation

CHECK: Are there spec elements with no test at ANY level?
  IF: Yes — coverage gap
    THEN: CRITICAL — report to Antje and Marije/Judith
    THEN: Tests must be added before deployment

CHECK: Are there tests with no corresponding spec element?
  IF: Yes — speculative test
    THEN: Review whether this reveals a spec gap or is an unnecessary test
    THEN: If spec gap → Anna adds to spec → Antje regenerates test
    THEN: If unnecessary → remove test

RULE: Jasper does not write tests or modify code. Jasper reports. Others act. RULE: Zero spec coverage gaps are allowed at deployment time.

When Tests Need Changing¶

Tests are not immutable, but they change through a controlled process:

Trigger: Bug Found in Production¶

Jasper documents the gap — which spec element was not adequately tested
Anna reviews whether the spec covered this case
IF spec covered it: Antje adds missing test. Developer was wrong.
IF spec did NOT cover it: Anna revises spec. Antje generates new test. Developer implements fix.

Trigger: Requirement Changed by Client¶

Aimee updates the scope document
Anna revises the formal specification
Antje regenerates affected tests
Developer modifies implementation to pass new tests
Old tests that no longer apply are removed by Antje (not by developer)

Trigger: Surviving Mutant Found by Koen¶

Koen reports the surviving mutant (code change that did not break any test)
Antje reviews the mutant against the spec
IF the mutation violates a spec requirement: Antje adds a test that kills it
IF the mutation is semantically equivalent: Antje documents it as acceptable

Timing and SLAs¶

Stage	Owner	Target Duration	Escalation After
Specify	Anna	30 minutes	2 hours
Test	Antje	45 minutes	3 hours
Implement	Developer	2 hours	8 hours
Gate	Koen	10 minutes	30 minutes
Integrate	Marije/Judith	30 minutes	2 hours
Reconcile	Jasper	20 minutes	1 hour

RULE: If any stage exceeds its escalation time, the team PM (Faye/Sytske) is notified. RULE: Total pipeline time from spec to deployment-ready should be under 4 hours for standard features.

TDD Philosophy — Why TDD is mandatory in GE
TDD Patterns — Domain-specific testing patterns
Agentic TDD — AI-specific TDD considerations
TDD Pitfalls — Common mistakes and anti-patterns
Formal Specification Workflow — How specs are produced
Spec-Driven Testing — How specs feed tests