Skip to content

Spec-Driven Testing

Overview

This page documents how Anna's formal specification feeds Antje's test generation. This is the bridge between specification and verification — the mechanism that ensures every requirement is testable and every test traces to a requirement.

Anna (Formal Spec)          Antje (Test Suite)
─────────────────           ─────────────────
Invariant INV-1      ───►   Test assertion(s)
Pre-condition PC-1   ───►   Guard test(s)
Post-condition PO-1  ───►   Verification test(s)
Edge case EC-1       ───►   Boundary test(s)
State transition ST-1───►   Sequence test(s)
Error condition ER-1 ───►   Negative test(s)
Example EX-1         ───►   Smoke test

RULE: Every spec element produces at least one test. No spec element is untested. RULE: Every test traces to at least one spec element. No test exists without justification.


Spec Assertions to Test Assertions

Invariants

An invariant is a property that must ALWAYS hold, regardless of system state or operation. Invariants produce the strongest tests.

Spec:

invariants:
  - id: "INV-1"
    statement: "Account balance must never be negative"
    rationale: "Financial integrity  overdrafts are not permitted"

Antje's Tests:

// INV-1: Account balance must never be negative

// Direct assertion — balance starts non-negative
it('[INV-1] new account has zero balance', () => {
  const account = createAccount();
  expect(account.balance).toBeGreaterThanOrEqual(0);
});

// Invariant holds after deposits
it('[INV-1] balance remains non-negative after deposit', () => {
  const account = createAccount();
  account.deposit(100);
  expect(account.balance).toBeGreaterThanOrEqual(0);
});

// Invariant holds under attack — withdrawal exceeding balance
it('[INV-1] withdrawal exceeding balance is rejected', () => {
  const account = createAccount();
  account.deposit(50);
  expect(() => account.withdraw(51)).toThrow();
  expect(account.balance).toBe(50); // Unchanged
});

// Invariant holds at boundary — withdraw exact balance
it('[INV-1] withdrawing exact balance leaves zero, not negative', () => {
  const account = createAccount();
  account.deposit(100);
  account.withdraw(100);
  expect(account.balance).toBe(0);
});

RULE: Each invariant produces at least 3 tests: 1. Invariant holds in initial state 2. Invariant holds after valid operations 3. Invariant holds when operations attempt to violate it


Pre-conditions

A pre-condition defines what must be true BEFORE an operation can execute. Pre-conditions produce guard tests that verify rejection when conditions are not met.

Spec:

pre_conditions:
  - "user must be authenticated"
  - "user must have 'invoice.create' permission"
  - "invoice.lineItems must contain at least one item"

Antje's Tests:

// PC-1: User must be authenticated
it('[PC-1] rejects unauthenticated request', async () => {
  const res = await request(app).post('/api/invoices').send(validInvoice);
  expect(res.status).toBe(401);
});

// PC-2: User must have invoice.create permission
it('[PC-2] rejects user without invoice.create permission', async () => {
  const res = await authedRequest('viewer').post('/api/invoices').send(validInvoice);
  expect(res.status).toBe(403);
});

// PC-3: Invoice must have at least one line item
it('[PC-3] rejects invoice with empty lineItems array', async () => {
  const res = await authedRequest('admin')
    .post('/api/invoices')
    .send({ ...validInvoice, lineItems: [] });
  expect(res.status).toBe(400);
  expect(res.body.error).toContain('lineItems');
});

it('[PC-3] rejects invoice with no lineItems field', async () => {
  const { lineItems, ...noItems } = validInvoice;
  const res = await authedRequest('admin').post('/api/invoices').send(noItems);
  expect(res.status).toBe(400);
});

RULE: Each pre-condition produces at least one test that violates the condition and verifies rejection. RULE: Pre-condition tests verify the system rejects invalid requests BEFORE performing any side effects.


Post-conditions

A post-condition defines what must be true AFTER an operation succeeds. Post-conditions produce verification tests.

Spec:

post_conditions:
  - "invoice is persisted to database with status 'draft'"
  - "invoice.id matches format INV-{YYYYMMDD}-{seq}"
  - "audit log entry created with action 'invoice.created'"
  - "response contains the full invoice object"

Antje's Tests:

// PO-1: Invoice persisted with status 'draft'
it('[PO-1] persists invoice to database with draft status', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send(validInvoice);
  const dbRecord = await db.query.invoices.findFirst({
    where: eq(invoices.id, res.body.id),
  });
  expect(dbRecord).toBeDefined();
  expect(dbRecord.status).toBe('draft');
});

// PO-2: Invoice ID format
it('[PO-2] generates ID in format INV-{YYYYMMDD}-{seq}', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send(validInvoice);
  const today = new Date().toISOString().slice(0, 10).replace(/-/g, '');
  expect(res.body.id).toMatch(new RegExp(`^INV-${today}-\\d+$`));
});

// PO-3: Audit log entry
it('[PO-3] creates audit log entry for invoice creation', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send(validInvoice);
  const auditEntry = await db.query.auditLog.findFirst({
    where: and(
      eq(auditLog.action, 'invoice.created'),
      eq(auditLog.entityId, res.body.id),
    ),
  });
  expect(auditEntry).toBeDefined();
});

// PO-4: Response contains full invoice
it('[PO-4] returns full invoice object in response', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send(validInvoice);
  expect(res.body).toMatchObject({
    id: expect.any(String),
    status: 'draft',
    amount: validInvoice.lineItems.reduce((sum, li) => sum + li.amount * li.quantity, 0),
    createdAt: expect.any(String),
  });
});

RULE: Post-condition tests verify actual system state (database, logs, events), not just the response body.


Edge Cases to Boundary Tests

Edge cases are the most critical test source. LLMs are weakest at edge case handling because training data overrepresents happy paths.

Spec:

edge_cases:
  - case: "invoice with single line item of minimum amount"
    input: { lineItems: [{ amount: 0.01, quantity: 1 }] }
    expected: "Invoice created with total 0.01"
  - case: "invoice with maximum allowed line items"
    input: { lineItems: "100 items, each with amount 100.00" }
    expected: "Invoice created with total 10000.00"
  - case: "invoice with line item amount at maximum"
    input: { lineItems: [{ amount: 999999.99, quantity: 1 }] }
    expected: "Invoice created"
  - case: "invoice with line item amount exceeding maximum"
    input: { lineItems: [{ amount: 1000000.00, quantity: 1 }] }
    expected: "Rejected with 400 and error indicating amount exceeds maximum"
  - case: "invoice with floating point arithmetic edge"
    input: { lineItems: [{ amount: 0.1, quantity: 3 }, { amount: 0.2, quantity: 1 }] }
    expected: "Total is exactly 0.50, not 0.5000000000000001"

Antje's Tests:

// EC-1: Minimum amount
it('[EC-1] accepts invoice with minimum line item amount 0.01', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send({
    ...validInvoice,
    lineItems: [{ description: 'Min', amount: 0.01, quantity: 1 }],
  });
  expect(res.status).toBe(201);
  expect(res.body.amount).toBe(0.01);
});

// EC-2: Maximum line items
it('[EC-2] accepts invoice with 100 line items', async () => {
  const items = Array.from({ length: 100 }, (_, i) => ({
    description: `Item ${i + 1}`, amount: 100.00, quantity: 1,
  }));
  const res = await authedRequest('admin').post('/api/invoices').send({
    ...validInvoice,
    lineItems: items,
  });
  expect(res.status).toBe(201);
  expect(res.body.amount).toBe(10000.00);
});

// EC-4: Amount exceeding maximum
it('[EC-4] rejects line item amount exceeding 999999.99', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send({
    ...validInvoice,
    lineItems: [{ description: 'Over', amount: 1000000.00, quantity: 1 }],
  });
  expect(res.status).toBe(400);
  expect(res.body.error).toContain('amount');
});

// EC-5: Floating point precision
it('[EC-5] handles floating point arithmetic correctly', async () => {
  const res = await authedRequest('admin').post('/api/invoices').send({
    ...validInvoice,
    lineItems: [
      { description: 'A', amount: 0.1, quantity: 3 },
      { description: 'B', amount: 0.2, quantity: 1 },
    ],
  });
  expect(res.body.amount).toBe(0.50); // Not 0.5000000000000001
});

RULE: Every edge case in the spec produces exactly one test. The test input and expected output match the spec exactly.


State Transitions to Sequence Tests

State transitions from Anna's state machine specs produce ordered sequence tests.

Spec:

state_machine:
  transitions:
    - from: draft
      to: sent
      action: send
      guard: "amount > 0 AND recipient IS NOT NULL"
      side_effects: ["email sent", "sent_at set"]

Antje's Tests:

// ST-1: Valid transition draft → sent
describe('Invoice state: draft → sent', () => {
  it('[ST-1] transitions draft to sent when guard conditions met', async () => {
    const invoice = await createInvoice({ status: 'draft', amount: 100, recipient: 'a@b.com' });
    const result = await sendInvoice(invoice.id);
    expect(result.status).toBe('sent');
  });

  it('[ST-1-guard] rejects send when amount is zero', async () => {
    const invoice = await createInvoice({ status: 'draft', amount: 0, recipient: 'a@b.com' });
    await expect(sendInvoice(invoice.id)).rejects.toThrow();
  });

  it('[ST-1-guard] rejects send when recipient is null', async () => {
    const invoice = await createInvoice({ status: 'draft', amount: 100, recipient: null });
    await expect(sendInvoice(invoice.id)).rejects.toThrow();
  });

  it('[ST-1-effect] sets sent_at timestamp on transition', async () => {
    const invoice = await createInvoice({ status: 'draft', amount: 100, recipient: 'a@b.com' });
    const result = await sendInvoice(invoice.id);
    expect(result.sentAt).toBeInstanceOf(Date);
  });

  it('[ST-1-effect] sends email to recipient on transition', async () => {
    const invoice = await createInvoice({ status: 'draft', amount: 100, recipient: 'a@b.com' });
    await sendInvoice(invoice.id);
    expect(emailSpy).toHaveBeenCalledWith(
      expect.objectContaining({ to: 'a@b.com', template: 'invoice-sent' }),
    );
  });
});

// ST-invalid: Invalid transitions
it.each([
  ['draft', 'pay'],
  ['sent', 'send'],
  ['paid', 'send'],
  ['cancelled', 'send'],
])('[ST-invalid] rejects invalid transition %s → %s', async (fromStatus, action) => {
  const invoice = await createInvoice({ status: fromStatus, amount: 100, recipient: 'a@b.com' });
  await expect(performAction(invoice.id, action)).rejects.toThrow('Invalid transition');
});

RULE: Every valid transition produces tests for: 1. Successful transition (happy path) 2. Each guard condition violation (rejection) 3. Each side effect (verification)

RULE: Every invalid transition is tested exhaustively. Use it.each for the matrix.


Error Conditions to Negative Tests

Error conditions from the spec produce tests that verify correct failure behavior.

Spec:

error_conditions:
  - condition: "Database unavailable during invoice creation"
    expected_behavior: "Return 503 with retry-after header"
    data_safety: "No partial writes  transaction rolled back"
  - condition: "Concurrent modification of same invoice"
    expected_behavior: "Return 409 with current version"
    data_safety: "Optimistic locking prevents lost updates"

Antje's Tests:

// ER-1: Database unavailable
it('[ER-1] returns 503 when database is unavailable', async () => {
  await simulateDatabaseDown();
  const res = await authedRequest('admin').post('/api/invoices').send(validInvoice);
  expect(res.status).toBe(503);
  expect(res.headers['retry-after']).toBeDefined();
  await restoreDatabase();
});

it('[ER-1-safety] no partial writes when database fails mid-transaction', async () => {
  await simulateDatabaseFailAfterInsert();
  await authedRequest('admin').post('/api/invoices').send(validInvoice);
  const orphanedRecords = await db.query.invoices.findMany({
    where: eq(invoices.status, 'incomplete'),
  });
  expect(orphanedRecords).toHaveLength(0);
  await restoreDatabase();
});

// ER-2: Concurrent modification
it('[ER-2] returns 409 on concurrent modification', async () => {
  const invoice = await createInvoice(validInvoice);
  // Simulate concurrent update
  await db.update(invoices).set({ version: 2 }).where(eq(invoices.id, invoice.id));
  const res = await authedRequest('admin')
    .patch(`/api/invoices/${invoice.id}`)
    .set('If-Match', '1') // Original version
    .send({ amount: 200 });
  expect(res.status).toBe(409);
  expect(res.body.currentVersion).toBe(2);
});


The Formal Spec as Test Oracle

In testing theory, the "test oracle" determines whether a test output is correct. In GE, Anna's formal specification IS the test oracle.

This means:

  1. Antje does not decide what is correct. The spec decides. Antje translates.
  2. If a test fails, the spec determines who is wrong — the test (Antje's translation error) or the code (developer's implementation error).
  3. If the spec is silent on a behavior, there is no test. Untested behavior means unspecified behavior means a spec gap.

CHECK: Does every test reference a spec element ID? IF: Yes — traceability is maintained IF: No — the test is orphaned. Either link it to a spec element or remove it.

RULE: Test names include spec element IDs (e.g., [INV-1], [PC-2], [EC-5]). This enables automated traceability checking.


Coverage Matrix

Antje maintains a coverage matrix for every spec:

| Spec Element | Type | Test File | Test Name | Status |
|-------------|------|-----------|-----------|--------|
| INV-1 | Invariant | create-invoice.test.ts | [INV-1] balance non-negative | COVERED |
| PC-1 | Pre-cond | create-invoice.test.ts | [PC-1] rejects unauthed | COVERED |
| PC-2 | Pre-cond | create-invoice.test.ts | [PC-2] rejects no permission | COVERED |
| EC-5 | Edge case | create-invoice.test.ts | [EC-5] floating point | COVERED |
| ST-1 | State trans | invoice-state.test.ts | [ST-1] draft to sent | COVERED |

CHECK: Are all rows COVERED? IF: Any row is UNCOVERED — this is a gap. Antje must add the missing test. IF: Any test exists without a row — this is an orphan. Investigate.


When Spec Changes After Tests Are Written

IF: Anna revises the spec after Antje has generated tests THEN: Antje must: 1. Read the spec diff (what changed) 2. Identify which tests are affected 3. Update affected tests to match the new spec 4. Add new tests for new spec elements 5. Remove tests for removed spec elements 6. Re-verify that all spec elements are covered

RULE: Spec revision triggers test revision. Tests must always reflect the current spec version. RULE: Test revision does NOT trigger code revision directly. The developer re-runs the (now possibly failing) tests and adjusts their implementation.