DOMAIN:TESTING — RECONCILIATION_CALIBRATION¶
OWNER: jasper
ALSO_USED_BY: antje (TDD source), marije (post-impl source), anna (spec arbitration)
UPDATED: 2026-03-24
SCOPE: calibration examples for TDD-vs-post-impl test reconciliation — JIT injected before every reconciliation task
PURPOSE: ensure Jasper consistently resolves test conflicts by tracing back to the specification, and correctly triages coverage gaps
HOW_TO_USE_THIS_PAGE¶
Read these examples BEFORE reconciling any test pair.
Jasper's job: compare TDD tests (Antje, pre-implementation) against post-impl tests (Marije, post-implementation) and resolve discrepancies.
RECONCILIATION_PRINCIPLES:
- The SPEC is the ultimate authority, not either test suite
- When TDD and post-impl disagree, ask: "What does Anna's spec say?"
- If the spec is ambiguous, escalate to Anna for clarification — do NOT guess
- Coverage gaps are only blocking if they cover spec-required behavior or error paths with real impact
- Cosmetic gaps (logging, dev utilities) are noted but never blocking
DECISION_FRAMEWORK:
TDD says X, Post-impl says Y
├─ Spec says X → TDD is right, post-impl needs update
├─ Spec says Y → Post-impl is right, TDD was based on wrong assumption
├─ Spec says both X and Y are valid → Not a conflict, both pass
├─ Spec says neither X nor Y → Escalate to Anna
└─ Spec is silent on this behavior → Escalate to Anna
EXAMPLE_1: TDD CONTRADICTS POST-IMPL — TDD IS RIGHT¶
RESOLUTION: KEEP TDD, UPDATE POST-IMPL¶
SCENARIO¶
Feature: User registration with email validation.
Anna's spec states: "Registration MUST reject email addresses without a TLD (e.g., user@localhost is invalid)."
Antje's TDD test:
it('should reject email without TLD', async () => {
const result = await registerUser({ email: 'admin@localhost', password: 'Str0ng!Pass' });
expect(result.success).toBe(false);
expect(result.error).toContain('invalid email');
});
Marije's post-impl test:
it('should accept valid email formats', async () => {
// Developer's regex accepts user@hostname (no TLD required)
const result = await registerUser({ email: 'admin@localhost', password: 'Str0ng!Pass' });
expect(result.success).toBe(true);
});
RECONCILIATION_ANALYSIS¶
CONFLICT: TDD expects rejection of admin@localhost, post-impl expects acceptance.
SPEC_CHECK: Anna's spec explicitly says "MUST reject email addresses without a TLD."
VERDICT: TDD is correct. The developer's implementation is too permissive.
ROOT_CAUSE: The developer used a regex that considers user@hostname valid. The spec requires a stricter validation that demands at least one dot in the domain part.
ACTION_ITEMS:
- Developer must update email validation to reject addresses without TLD
- Marije must update post-impl test to expect rejection for admin@localhost
- Add additional TDD-aligned tests: user@.com, user@com., user@-domain.com
BLOCKING: YES — spec violation, security-relevant (localhost could bypass email verification flows)
EXAMPLE_2: TDD CONTRADICTS POST-IMPL — POST-IMPL IS RIGHT¶
RESOLUTION: UPDATE TDD, KEEP POST-IMPL¶
SCENARIO¶
Feature: Search results pagination.
Anna's spec states: "Search endpoint returns paginated results. Default page size is configurable."
Antje's TDD test:
it('should return 10 results per page by default', async () => {
await seedProducts(25);
const result = await searchProducts({ query: 'test' });
expect(result.items).toHaveLength(10);
expect(result.totalPages).toBe(3);
});
Marije's post-impl test:
it('should return default page size from config', async () => {
await seedProducts(25);
// Config sets default page size to 20
const result = await searchProducts({ query: 'test' });
expect(result.items).toHaveLength(20);
expect(result.totalPages).toBe(2);
});
RECONCILIATION_ANALYSIS¶
CONFLICT: TDD assumes default page size is 10, post-impl says it's 20.
SPEC_CHECK: Anna's spec says "default page size is configurable." It does NOT specify the default value.
IMPLEMENTATION_CHECK: The config file sets DEFAULT_PAGE_SIZE=20.
VERDICT: Post-impl is correct. TDD hardcoded an assumption (10) that the spec did not mandate. The spec said "configurable" — the configured value is 20.
ROOT_CAUSE: Antje assumed a common convention (10 per page) because the spec didn't specify. This is not Antje's fault — the spec was ambiguous on the exact default value.
ACTION_ITEMS:
- Antje must update TDD test to read from config or use the configured value (20)
- Flag to Anna: spec should explicitly state the default page size to prevent future ambiguity
- Consider making the TDD test config-aware: expect(result.items).toHaveLength(config.defaultPageSize)
BLOCKING: NO — this is a spec ambiguity, not a code defect. The implementation behavior is correct.
FOLLOW_UP: File spec clarification request to Anna. This prevents the same class of ambiguity in future features.
EXAMPLE_3: COVERAGE GAP THAT MATTERS¶
RESOLUTION: BLOCKING — ADD TESTS BEFORE RELEASE¶
SCENARIO¶
Feature: Payment processing with Stripe.
Anna's spec states: "Payment must handle: success, card declined, insufficient funds, network timeout, and Stripe outage."
TDD tests cover: success, card declined, insufficient funds.
Post-impl tests cover: success, card declined, network timeout.
Neither test suite covers: insufficient funds (post-impl) or network timeout (TDD) or Stripe outage (both).
RECONCILIATION_ANALYSIS¶
COVERAGE_MATRIX:
| Scenario | TDD (Antje) | Post-Impl (Marije) | Spec Required |
|---|---|---|---|
| Success | YES | YES | YES |
| Card declined | YES | YES | YES |
| Insufficient funds | YES | NO | YES |
| Network timeout | NO | YES | YES |
| Stripe outage | NO | NO | YES |
GAPS_IDENTIFIED:
- Stripe outage: UNTESTED BY BOTH — spec-required error path in payment flow
- Insufficient funds: only TDD tested — post-impl should verify implementation handles it
- Network timeout: only post-impl tested — TDD should have caught this from spec
WHY_THIS_MATTERS:
- Payment flows are the highest-risk code path in any client project
- Stripe outage is not hypothetical — Stripe has had 4 incidents in the past 12 months
- If the app doesn't handle Stripe outage gracefully, users see a blank screen or a 500 error
- The user's payment may have been charged but the order not created (worst case)
BLOCKING: YES — Stripe outage is untested spec-required behavior in a financial flow.
ACTION_ITEMS:
- Antje: Add TDD test for network timeout scenario
- Marije: Add post-impl tests for insufficient funds and Stripe outage
- Priority: Stripe outage test is the most critical gap — covers a scenario with real financial risk
EXAMPLE_4: COVERAGE GAP THAT IS COSMETIC¶
RESOLUTION: NOT BLOCKING — NOTE AND MOVE ON¶
SCENARIO¶
Feature: Application logging utility.
Anna's spec does NOT mention logging requirements (logging is infrastructure, not feature behavior).
TDD tests: None (Antje correctly skipped — logging is not in the spec).
Post-impl tests: None (Marije did not test the logging utility).
Code coverage tool flags: lib/utils/logger.ts has 0% coverage.
RECONCILIATION_ANALYSIS¶
COVERAGE_CHECK: logger.ts has no tests.
SPEC_CHECK: The spec does not mention logging. Logging is an internal utility, not user-facing behavior.
WHY_THIS_DOES_NOT_MATTER:
- The logger is a thin wrapper around pino — testing it would test the pino library, not our code
- Logger failures do not affect user-facing behavior (fire-and-forget)
- The logger has no branching logic — it formats and outputs. There is nothing to "get wrong"
- Adding tests here would be test theater (see testing/calibration-examples.md, Example 5)
WHEN_LOGGING_GAPS_WOULD_MATTER:
- If the logger contained PII filtering logic — that MUST be tested
- If the logger wrote to a database (audit trail) — the write must be tested
- If the logger had conditional output (log level routing) — the routing must be tested
- If the spec explicitly required "all API calls must be logged" — coverage is required
BLOCKING: NO — cosmetic gap. Logging utility has no spec requirement and no business logic.
ACTION_ITEMS:
- Note in reconciliation report: "logger.ts untested — no spec requirement, no business logic, acceptable"
- If code coverage gate is set to a threshold that fails because of this, exclude lib/utils/logger.ts from coverage calculation (not from the codebase)
RECONCILIATION_DECISION_TABLE¶
| Situation | Blocking? | Action |
|---|---|---|
| TDD and post-impl agree | No | Confirm and move on |
| TDD and post-impl disagree, spec is clear | Yes | Spec wins, update the wrong suite |
| TDD and post-impl disagree, spec is ambiguous | Yes | Escalate to Anna for clarification |
| Coverage gap in spec-required behavior | Yes | Add tests before release |
| Coverage gap in error path with financial/security impact | Yes | Add tests before release |
| Coverage gap in internal utility with no spec requirement | No | Note and move on |
| Coverage gap in cosmetic feature (tooltips, animations) | No | Note, low-priority ticket |
| Both suites test the same thing differently but equivalently | No | Keep both — independent verification has value |
ESCALATION_RULES¶
ESCALATE_TO_ANNA when:
- Spec is ambiguous and both interpretations are reasonable
- Spec is missing a scenario that both test suites assumed differently
- A behavior exists in code that the spec never mentioned
ESCALATE_TO_MARIJE/ANTJE when:
- Their test has a bug (wrong assertion, wrong setup)
- Their test is flaky (passes sometimes, fails sometimes)
- Their test is redundant with the other suite (consolidation opportunity)
ESCALATE_TO_MARTA when:
- Reconciliation reveals a pattern of spec gaps across multiple features
- The same class of conflict keeps recurring (systemic issue)
- Coverage gap is in a security-critical path and release is imminent