Skip to content

DOMAIN:TESTING — TDD_METHODOLOGY

OWNER: antje
ALSO_USED_BY: jasper (reconciliation), marije, judith (cross-reference)
UPDATED: 2026-03-24
SCOPE: test-driven development for all GE client projects, pre-implementation phase


TDD_IN_GE_CONTEXT

Antje writes tests BEFORE any implementation code exists.
This is NOT optional. This is not "write tests alongside code."
Tests come FIRST. Implementation comes SECOND.

INPUT: Anna's formal specification (inputs, outputs, constraints, invariants, edge cases)
OUTPUT: executable test suite that defines correct behavior
CONSTRAINT: Antje has ZERO access to implementation code, developer plans, or architectural decisions
PURPOSE: create an independent test oracle — if code passes these tests, it meets the specification

WHY_TWO_PHASE:
- TDD tests verify the SPEC is met (what SHOULD happen)
- Post-impl tests verify the CODE works (what DOES happen)
- Reconciliation (Jasper) finds gaps between the two
- If TDD and post-impl disagree, the spec arbitrates
- This catches bugs that single-phase testing misses


RED_GREEN_REFACTOR_CYCLE

1_RED: WRITE A FAILING TEST

Write the smallest test that describes one behavior from the spec.
The test MUST fail because no implementation exists yet.
If the test passes without implementation, the test is WRONG.

2_GREEN: DEVELOPER WRITES MINIMUM CODE

NOT Antje's job — the developer writes the minimum code to make the test pass.
No more, no less. No gold plating. No "while I'm here" additions.

3_REFACTOR: CLEAN UP WITHOUT CHANGING BEHAVIOR

Developer improves code structure while all tests remain green.
Antje's tests act as a safety net — refactoring that breaks tests = refactoring that broke behavior.

RULE: Antje only does step 1. Steps 2-3 belong to the developer.
RULE: Antje writes ALL tests for a feature before any developer touches it.
RULE: tests are committed to __tests__/tdd/ directory — separate from post-impl tests.


WRITING_TESTS_FROM_SPECS

ANNA_SPEC_FORMAT

Anna produces specifications with these sections:
- INPUTS: what the system receives (types, formats, ranges)
- OUTPUTS: what the system produces (types, formats)
- CONSTRAINTS: business rules, validation rules, invariants
- EDGE_CASES: boundary conditions, error states, empty states
- ACCEPTANCE_CRITERIA: specific scenarios that must work

TRANSLATING_SPEC_TO_TESTS

STEP 1: Extract every ACCEPTANCE_CRITERION — each becomes at least one test
STEP 2: Extract every CONSTRAINT — each becomes a validation test
STEP 3: Extract EDGE_CASES — each becomes a boundary test
STEP 4: Derive IMPLICIT behaviors (what happens with null? empty string? max int?)
STEP 5: Add property-based tests for algorithmic constraints

EXAMPLE_SPEC:

FEATURE: calculateDiscount
INPUTS: orderTotal (number, >= 0), customerTier (string: "bronze" | "silver" | "gold")
OUTPUTS: discountAmount (number, >= 0, <= orderTotal)
CONSTRAINTS:
  - bronze: 0% discount
  - silver: 10% discount
  - gold: 20% discount
  - orders > 1000: additional 5% discount (stacks)
  - discount cannot exceed orderTotal
EDGE_CASES:
  - orderTotal = 0
  - orderTotal = negative (should throw)
  - customerTier = invalid string (should throw)

TRANSLATED_TESTS:

import { describe, it, expect } from 'vitest';
import { fc } from '@fast-check/vitest';

// NOTE: We import only the type signature, not the implementation
// The function doesn't exist yet — that's the point of TDD
type CalculateDiscount = (orderTotal: number, customerTier: string) => number;

describe('calculateDiscount', () => {
  // From acceptance criteria
  describe('tier discounts', () => {
    it('applies 0% discount for bronze tier', () => {
      expect(calculateDiscount(100, 'bronze')).toBe(0);
    });

    it('applies 10% discount for silver tier', () => {
      expect(calculateDiscount(100, 'silver')).toBe(10);
    });

    it('applies 20% discount for gold tier', () => {
      expect(calculateDiscount(100, 'gold')).toBe(20);
    });
  });

  // From constraints
  describe('bulk order bonus', () => {
    it('adds 5% for orders over 1000', () => {
      expect(calculateDiscount(2000, 'bronze')).toBe(100);  // 0% + 5%
      expect(calculateDiscount(2000, 'silver')).toBe(300);   // 10% + 5%
      expect(calculateDiscount(2000, 'gold')).toBe(500);     // 20% + 5%
    });

    it('does not add bonus for orders exactly 1000', () => {
      expect(calculateDiscount(1000, 'silver')).toBe(100);  // 10% only
    });

    it('adds bonus for orders at 1001', () => {
      expect(calculateDiscount(1001, 'silver')).toBeCloseTo(150.15);  // 15%
    });
  });

  // From edge cases
  describe('edge cases', () => {
    it('returns 0 for zero order total', () => {
      expect(calculateDiscount(0, 'gold')).toBe(0);
    });

    it('throws for negative order total', () => {
      expect(() => calculateDiscount(-1, 'silver')).toThrow();
    });

    it('throws for invalid tier', () => {
      expect(() => calculateDiscount(100, 'platinum')).toThrow();
    });
  });

  // From constraint: discount cannot exceed orderTotal
  describe('discount ceiling', () => {
    it('never returns more than the order total', () => {
      // Property-based test — holds for ALL valid inputs
      fc.assert(
        fc.property(
          fc.float({ min: 0, max: 1_000_000, noNaN: true }),
          fc.constantFrom('bronze', 'silver', 'gold'),
          (total, tier) => {
            const discount = calculateDiscount(total, tier);
            return discount >= 0 && discount <= total;
          }
        )
      );
    });
  });
});


TEST_ORACLE_INDEPENDENCE

The most critical principle of GE's TDD approach:
Tests must NOT know how the code works. They must only know what it should DO.

WHAT_ORACLE_INDEPENDENCE_MEANS

GOOD: expect(result).toBe(42) — I know the answer from the spec
BAD: expect(service.cache.get('key')).toBeDefined() — I'm testing internal caching
GOOD: expect(output).toHaveLength(3) — spec says 3 results
BAD: expect(mockDb.query).toHaveBeenCalledWith('SELECT ...') — I'm testing the SQL

HOW_TO_MAINTAIN_INDEPENDENCE

  1. NEVER import implementation modules in TDD tests (they don't exist yet)
  2. NEVER reference internal data structures, caches, or state
  3. NEVER verify HOW something happens, only WHAT the result is
  4. Test through the PUBLIC interface only
  5. Derive expected values from the SPEC, not by running the code

ANTI_PATTERN: writing tests that effectively reimplement the algorithm

// BAD — this test mirrors the implementation
it('calculates tax', () => {
  const price = 100;
  const taxRate = 0.21;
  const expected = price * taxRate;  // This IS the implementation
  expect(calculateTax(price)).toBe(expected);
});

FIX: use KNOWN values from the spec

// GOOD — expected value comes from spec/domain knowledge
it('calculates 21% tax on 100', () => {
  expect(calculateTax(100)).toBe(21);
});


PROPERTY_BASED_TESTING

Property-based testing with fast-check generates hundreds of random inputs
and verifies that PROPERTIES (invariants) always hold.

WHEN_TO_USE_PBT

  • Mathematical properties (commutativity, associativity, idempotency)
  • Invariants from the spec ("output is always positive", "list is always sorted")
  • Round-trip properties (serialize then deserialize = identity)
  • Relationships between operations (create then find = same entity)

FAST_CHECK_PATTERNS

import { fc } from '@fast-check/vitest';
import { it } from 'vitest';

// Invariant: sorting is idempotent
it.prop([fc.array(fc.integer())])('sort is idempotent', (arr) => {
  const once = sort(arr);
  const twice = sort(sort(arr));
  expect(twice).toEqual(once);
});

// Invariant: sorted output has same elements as input
it.prop([fc.array(fc.integer())])('sort preserves elements', (arr) => {
  const sorted = sort(arr);
  expect(sorted).toHaveLength(arr.length);
  for (const item of arr) {
    expect(sorted).toContain(item);
  }
});

// Invariant: sorted output is actually sorted
it.prop([fc.array(fc.integer())])('sort produces ordered output', (arr) => {
  const sorted = sort(arr);
  for (let i = 1; i < sorted.length; i++) {
    expect(sorted[i]).toBeGreaterThanOrEqual(sorted[i - 1]);
  }
});

// Round-trip: encode then decode = identity
it.prop([fc.string()])('JSON round-trip', (str) => {
  expect(JSON.parse(JSON.stringify(str))).toBe(str);
});

// Domain-specific: discount is always within bounds
it.prop([
  fc.float({ min: 0, max: 100000, noNaN: true }),
  fc.constantFrom('bronze', 'silver', 'gold'),
])('discount within bounds', (total, tier) => {
  const discount = calculateDiscount(total, tier);
  expect(discount).toBeGreaterThanOrEqual(0);
  expect(discount).toBeLessThanOrEqual(total);
});

FAST_CHECK_ARBITRARIES

// Built-in arbitraries
fc.integer()                           // any integer
fc.integer({ min: 0, max: 100 })       // bounded integer
fc.float({ noNaN: true })              // float without NaN
fc.string()                            // any string
fc.string({ minLength: 1 })            // non-empty string
fc.boolean()                           // true or false
fc.date()                              // any Date object
fc.array(fc.integer())                 // array of integers
fc.constantFrom('a', 'b', 'c')        // one of specific values

// Custom arbitraries for domain objects
const emailArb = fc.tuple(
  fc.stringOf(fc.char().filter(c => /[a-z0-9]/.test(c)), { minLength: 1 }),
  fc.constantFrom('example.com', 'test.org', 'mail.nl'),
).map(([local, domain]) => `${local}@${domain}`);

const userArb = fc.record({
  name: fc.string({ minLength: 1, maxLength: 100 }),
  email: emailArb,
  age: fc.integer({ min: 18, max: 120 }),
});

RULE: property-based tests SUPPLEMENT example-based tests — they don't replace them
RULE: if fast-check finds a failure, it SHRINKS the input to the minimal failing case — use that as a regression test
RULE: always set noNaN: true for floats — NaN behavior is rarely what you're testing


BOUNDARY_VALUE_ANALYSIS

Test at the BOUNDARIES of input ranges — this is where bugs cluster.

METHOD

For every input range in the spec:
1. Test the MINIMUM valid value
2. Test just BELOW minimum (should fail/reject)
3. Test the MAXIMUM valid value
4. Test just ABOVE maximum (should fail/reject)
5. Test at transition points (where behavior changes)

EXAMPLE

SPEC: age must be 18-120, integer

describe('age validation', () => {
  // Minimum boundary
  it('accepts age 18', () => {
    expect(validateAge(18)).toBe(true);
  });
  it('rejects age 17', () => {
    expect(validateAge(17)).toBe(false);
  });

  // Maximum boundary
  it('accepts age 120', () => {
    expect(validateAge(120)).toBe(true);
  });
  it('rejects age 121', () => {
    expect(validateAge(121)).toBe(false);
  });

  // Type boundaries
  it('rejects non-integer', () => {
    expect(validateAge(18.5)).toBe(false);
  });
  it('rejects negative', () => {
    expect(validateAge(-1)).toBe(false);
  });
  it('rejects zero', () => {
    expect(validateAge(0)).toBe(false);
  });
});

EQUIVALENCE_PARTITIONING

Divide the input space into CLASSES where all values in a class should produce the same behavior.
Test ONE representative from each class — if one works, all should work.

METHOD

  1. Identify input partitions from the spec
  2. For each partition, pick one REPRESENTATIVE value
  3. Test the representative
  4. Combine with boundary analysis for partition edges

EXAMPLE

SPEC: gradeScore(score: number) → "A" | "B" | "C" | "D" | "F"
PARTITIONS:
- 90-100 → "A" (representative: 95)
- 80-89 → "B" (representative: 85)
- 70-79 → "C" (representative: 75)
- 60-69 → "D" (representative: 65)
- 0-59 → "F" (representative: 30)
- < 0 → invalid (representative: -1)
- > 100 → invalid (representative: 101)

describe('gradeScore', () => {
  // One test per partition (representative values)
  it.each([
    [95, 'A'],
    [85, 'B'],
    [75, 'C'],
    [65, 'D'],
    [30, 'F'],
  ])('returns %s for score %i', (score, grade) => {
    expect(gradeScore(score)).toBe(grade);
  });

  // Boundaries between partitions
  it.each([
    [90, 'A'], [89, 'B'],
    [80, 'B'], [79, 'C'],
    [70, 'C'], [69, 'D'],
    [60, 'D'], [59, 'F'],
  ])('boundary: score %i returns %s', (score, grade) => {
    expect(gradeScore(score)).toBe(grade);
  });

  // Invalid partitions
  it('throws for score below 0', () => {
    expect(() => gradeScore(-1)).toThrow();
  });
  it('throws for score above 100', () => {
    expect(() => gradeScore(101)).toThrow();
  });
});

WRITING_UNFALSIFIABLE_TESTS

A core problem with LLM-generated code: scaffolding that passes weak tests.
Antje's tests must be IMPOSSIBLE to satisfy with trivial implementations.

THE_SCAFFOLDING_PROBLEM

BAD_TEST: expect(add(2, 3)).toBe(5)
TRIVIAL_IMPL: function add() { return 5; } — passes the test!

FIX_1: Multiple examples

it.each([
  [2, 3, 5],
  [0, 0, 0],
  [-1, 1, 0],
  [100, 200, 300],
])('add(%i, %i) = %i', (a, b, expected) => {
  expect(add(a, b)).toBe(expected);
});

FIX_2: Property-based testing

it.prop([fc.integer(), fc.integer()])('add is commutative', (a, b) => {
  expect(add(a, b)).toBe(add(b, a));
});

it.prop([fc.integer()])('add(x, 0) = x', (x) => {
  expect(add(x, 0)).toBe(x);
});

FIX_3: Test relationships, not just values

it('add(x, 1) = x + 1 for any x', () => {
  for (const x of [0, 1, -1, 42, -999, Number.MAX_SAFE_INTEGER - 1]) {
    expect(add(x, 1)).toBe(x + 1);
  }
});

TRIANGULATION_TECHNIQUE

Use multiple test cases that TOGETHER force the correct implementation.
No single test proves correctness — the COMBINATION does.

describe('isPrime', () => {
  // These together force a real implementation
  it('returns false for 1', () => expect(isPrime(1)).toBe(false));
  it('returns true for 2', () => expect(isPrime(2)).toBe(true));
  it('returns true for 3', () => expect(isPrime(3)).toBe(true));
  it('returns false for 4', () => expect(isPrime(4)).toBe(false));
  it('returns true for 97', () => expect(isPrime(97)).toBe(true));
  it('returns false for 100', () => expect(isPrime(100)).toBe(false));
  it('returns true for 7919', () => expect(isPrime(7919)).toBe(true));
  it('returns false for 7920', () => expect(isPrime(7920)).toBe(false));
});

RULE: if your test can be satisfied by a lookup table with < 5 entries, add more cases
RULE: if your test can be satisfied by hardcoding, add property-based tests
RULE: every test should ELIMINATE at least one incorrect implementation


SPEC_AMBIGUITY_HANDLING

WHEN spec is ambiguous:
1. Document the ambiguity in a test comment
2. Write tests for BOTH interpretations
3. Mark the ambiguous tests with it.todo or a clear label
4. Escalate to Anna for clarification via discussion
5. NEVER guess — an ambiguous test is worse than no test

// AMBIGUITY: spec says "round to nearest cent" but doesn't specify
// rounding direction for exactly .5 — banker's rounding or always up?
describe('price rounding (AMBIGUOUS — awaiting Anna clarification)', () => {
  it('rounds 10.124 to 10.12', () => {
    expect(roundPrice(10.124)).toBe(10.12);
  });
  it('rounds 10.126 to 10.13', () => {
    expect(roundPrice(10.126)).toBe(10.13);
  });
  // TODO: clarify with Anna — is 10.125 → 10.12 (banker's) or 10.13 (ceil)?
  it.todo('rounds 10.125 — awaiting spec clarification');
});

CROSS_REFERENCES

VITEST: domains/testing/vitest-patterns.md — Vitest mechanics and patterns
RECONCILIATION: domains/testing/test-reconciliation.md — how TDD tests are compared with post-impl tests
PITFALLS: domains/testing/pitfalls.md — TDD-specific anti-patterns
MUTATION: domains/testing/mutation-testing.md — verifying TDD test quality