Skip to content

Testing Standards

TDD Approach

RULE: Write the test before the implementation. The test defines what "done" means. RATIONALE: Prevents scaffolding without integration. The test is the first call site for any new code.

Test Levels

Unit Tests (supplementary)

  • Test individual functions in isolation
  • Fast, numerous, focused
  • NOT sufficient for feature verification
  • NOT proof of life

Integration Tests (required for features)

  • Test the actual system path end-to-end
  • Exercise real network calls, real database operations, real Redis streams
  • This IS proof of life
  • Every new feature must have at least one integration test

Regression Tests (scheduled)

  • Run on schedule (post-deployment or daily)
  • Exercise all existing features through actual system paths
  • A failing regression test blocks new feature work (Principle 10)

What Tests Must Prove

For a new API endpoint: - The route is registered and reachable (curl from outside the pod) - The handler processes a real request - The response matches the contract schema - The side effects (DB writes, Redis publishes) actually occurred

For a new agent capability: - The trigger reaches the executor via Redis Stream - The executor spawns a real CLI session (Claude/Codex/Gemini) - The CLI produces real output (verified via PTY capture) - The completion file is written to ge-ops/system/completions/

NOT ACCEPTABLE: Tests that only verify the function body without verifying it's callable through the real path.

ENFORCEMENT: Marije/Judith run test suites. Koen/Eric verify test coverage in code review.

Testing Tools

webapp-testing Skill (Playwright)

The webapp-testing skill (installed in .claude/skills/, source: anthropics/skills) provides Playwright-based web app testing patterns. Used by Marije and Judith for E2E and integration test authoring during Phase 8 (Integration).

The skill auto-activates when agents work on test files targeting web applications. It provides structured patterns for:

  • Page object models
  • Test fixtures and setup/teardown
  • Assertion patterns for UI state
  • Network interception and mocking
  • Visual regression testing

See also: Playwright integration, Anthropic skills and plugins

CI Pipeline Testing Standards

General Rules

  • All tests must pass in CI — no allow_failure on test stages
  • Test paths must use dynamic GE_ROOT detection via tests.conftest.GE_ROOT_PATH — never hardcode /home/claude/ge-bootstrap in test files
  • Test fixtures must be self-contained and clean up after themselves

Mutation Testing

  • Mutation testing threshold: 80% on new code (enforced by test:mutation CI stage)
  • Mutation testing threshold: 60% on existing code (tracked, not yet blocking)
  • Tool: Stryker (TypeScript), mutmut (Python)

Adversarial Testing

  • Property-based testing with Hypothesis (Python) and fast-check (TypeScript) in the test:adversarial CI stage
  • Fuzz testing on condition evaluator and critical path functions
  • All 7 attack categories (type confusion, boundary, resource exhaustion, injection, concurrency, precision, unicode) must be covered

CI Job Reference

CI Job Stage What it verifies
tdd:red-gate TDD All TDD tests are red before implementation
tdd:green-gate TDD All TDD tests turn green after implementation
tdd:oracle-check TDD Oracle independence — tests don't import implementation
build:backend Build Implementation compiles and builds
lint:python Quality Ruff linting (zero errors)
lint:secrets Quality Gitleaks secret detection
security:bandit Security Python SAST
security:semgrep Security Multi-language static analysis
security:dependency-scan Security Dependency vulnerability audit
test:unit:backend Testing Backend unit test suite
test:integration Testing Full integration test suite
test:reconciliation Testing TDD vs post-impl test suite comparison
test:adversarial Testing Fuzz and property-based tests
test:contract SSOT API contract verification + verify_ssot.sh
test:mutation Quality Mutation testing thresholds
review:gate Merge Manual merge approval (future: automated scoring)